Parse, Don’t Validate: Love ’em Types
Why you should parse instead of validating “Our belief is often strongest when it should be weakest. That is the nature of hope.” – Brandon Sanderson; Mistborn … Recently while reading a very good book on rust, Zero to Production in Rust by Luca Palmieri (highly recommended if you want to learn best programming principles for production level projects), I came across a very interesting software design principle: “Parse, don’t Validate”. Originally tipped off by Alexis King in one of her blogs. To understand this design principle, let’s first go over how “validation” of certain data works. Let’s say we are building a RESTful API which takes a user’s email as input and saves the data in our database. We would obviously need to set some checks (or “validation” steps) while taking the input; we wouldn’t want invalid or trash data piling up in our database. Before validating if an email is valid or not, we need to understand what defines a valid email. RFC5322 and RFC6854 define the expected structure for an email. We could read the material and try to come up with a function to run the checks, but it is simply not worth it. Luckily for us Rustaceans, the validator crate provides validation functions for emails, URLs, etc. We can use that in our API! The logic would look something like this: This function to add the email to the database looks fine; we take an email But there is a major pitfall we might encounter with this approach! Can In a small codebase, this is not such a big problem. But as the codebase grows, we might end up missing the validation step at some point, and invalid data will silently pile up in the database. Other parts of our program cannot reuse it effectively - they are forced to perform another point-in-time check, leading to a crowded codebase with noisy input checks at every step. There is another problem: Once the We have two choices here: We finally get to our point: “Parse, don’t Validate.” In very simple terms: instead of passing around primitive types for our data and validating them again and again, we create our own custom The core idea here is that once an instance of The new logic would look something like this: In the above example, Additionally, we can implement the This returns a reference string slice containing the actual email address. Yay rust type system and traits! You still have to validate to begin with: Instead of scattering validation logic throughout your codebase, you validate once at the boundary and then work with a type that guarantees correctness. Moreover, what we are validating here is the “format”. We cannot claim for the email to be valid until we actually try sending a mail to it. (Read this reddit post for one such case encountered) The benefit of using this principle is subtle and it leads to your applications being more robust; but the implementation and error handling logic tends to get complicated as the codebase grows. “Parse, Don’t Validate” is not something new; it has existed as Type-Driven Development for a while, especially in functional programming communities (Haskell, F#, OCaml, etc.). It is a shift in mindset from defensive programming to type-driven design. By capturing invariants within the type system, we make “illegal states unrepresentable.” Adopting this pattern transforms your messy validation logic into a well-defined type domain. While you still need to handle the initial validation failures gracefully, it eliminates a whole bunch of bugs where invalid data slips into the business logic, causing massive damage. Rust’s type system is one of its biggest strengths. Taking advantage of this, you shift the mental burden of correctness to the compiler, allowing it to do the heavy lifting for you. …
“Parse, Don’t Validate”
What exactly is “Parse, Don’t Validate”?
use ValidateEmail;
/// `add_user` takes an email `String` as input,
/// validates the email, and calls `insert_email_to_db` if valid.
/// `insert_email_to_db` takes an email `String` as input
/// and inserts it into the db.
The Problem
String as input, run the validation function, and add the email to the database. In case of an error, we return an error string.insert_email_to_db safely assume that the email being passed into it as an argument is valid? It cannot. Email is a primitive type (i.e., a String) it can be empty, or contain and invalid email - the example above relies on the validate_email check to make sure the email is in fact valid.validate_email check passes and we add the email to our database, we continue using the email as a primitive type (i.e., a String). If we pass the email to another function, say send_email(email: String), that function has no idea if the email is valid or not.validate_email check again, which is redundant and adds unnecessary noise to the codebase.The solution: “Parse, don’t Validate”
Types - say in our case UserEmail - and we parse the data into the new Type.UserEmail has been created, it is by definition impossible for it to be invalid. Instead of cluttering our codebase with validation logic everywhere, we define clear types that guarantee correctness by leveraging Rust’s type system.use ValidateEmail;
;
/// `add_user` takes a `UserEmail` instance
UserEmail::parse is the only way to get an instance of UserEmail, and by definition of the type an instance of UserEmail can only exist if the valid_email check passes. We also update our function signatures to take our custom types as input (i.e., UserEmail) instead of passing primitive types like String. This is called the “new-type pattern” in the rust community.What are all the benefits you get from using this approach?
AsRef trait for our type UserEmail to get the inner value where we actually need to use the email:/// `AsRef` trait to get a reference to the inner value of `UserEmail`
But wait!!
Conclusion