Why Use Newtypes? Encoding Domain Knowledge in the Type System

I have spent a lot of time debugging research pipelines, and the bugs that scare me most are not the ones that crash loudly. They are the ones that produce plausible-looking output and let you carry on for weeks before anyone notices something is wrong. Wrong-order bugs are the worst offender. You have a function that takes several parameters of the same primitive type, the caller passes them in the wrong order, the compiler says nothing, and the output sits just inside the range of values you would expect to see in real data. By the time you find it, if you find it, you have already done a lot of work on incorrect results.

This post is about how to make that class of bug impossible to write. It is a companion to the Serde post, which was about stopping the wrong data getting in; this one is about stopping the right data being used incorrectly once it is inside your pipeline. The mechanism is Rust’s newtype pattern, and throughout I will be using obstetric data as the example - maternal age, gestational age in weeks, duration of stay in days - because those three concepts illustrate the problem and the solution particularly well.

The bug you won’t see

Hidden Bugs in your pipeline

Let’s take a concrete example: you have a function that summarises an episode of care in obstetrics, and it takes three parameters: Maternal Age, gestational age of foetus in weeks, and duration of hospital stay in days. All of these are positive integers, so you might represent them as u8 in Rust, or just int in Python or R. Now, if someone calls this function with the arguments in the wrong order, say they pass duration of stay where gestational age is expected, and gestational age where Maternal Age is expected, the compiler will not complain.

// Before: the compiler cannot help you
fn summarise_episode(maternal_age: u8, gestational_age: u8, stay_days: u8) { ... }

// This compiles. It is wrong. Nothing tells you.
summarise_episode(gestational_age, stay_days, maternal_age);

The output of the function might look plausible. If you originally had a 20 years old with a gestational age of 34 weeks and a stay of 2 days, the wrong-order call would look like a 34 years old with a gestational age of 2 weeks and a stay of 20 days - all of which are numerically reasonable, but certainly at the extremes of the distributions you would expect to see in real data. I spent a lot of time thinking about synthetic data generation, and I certainly don’t want to accidentally generate thousands of records with unusual distributions because of a wrong-order bug in my code.

This is not a hypothetical: wrong-order bugs are one of the most common silent failures in research pipelines, precisely because the values are plausible enough to survive manual inspection. The fix is not better tests or more careful code review; it is making the error impossible to express in the first place, and that is where newtypes come in.

Primitives describe shape, not meaning

A primitive type like u8 describes the shape of the data - it is an unsigned 8-bit integer - but it does not capture the meaning of the data. The compiler knows that a u8 is a number between 0 and 255, but it does not know that this number represents a patient’s age in years, or a gestational age in weeks, or a duration of stay in days. All of that domain knowledge is invisible to the compiler, and it is up to the researcher to remember it and apply it correctly at every call site. The human brain is fallible, and when you have dozens of parameters across hundreds of functions, it is easy to make a mistake and not even realise it.

As a result, you end up with code that is technically correct in terms of types, but semantically wrong. The compiler has no way to catch this because it does not understand the domain concepts, it only understands the primitive types. This is the fundamental problem that newtypes solve: they allow you to create distinct types for each concept, even if they have the same underlying representation, so that the compiler can enforce the correct usage and limits of each concept.

Let’s stick with our example because it illustrates the point well, and we can think through what rules we might want to encode in the type system:

Maternal Age is measured in years, has a practical ceiling around 120. Arithmetic operations like addition and subtraction produce numbers that have no clinical meaning - adding two maternal ages together does not produce a number that you can reason about clinically.
Gestational age is measured in weeks, has a hard biological ceiling of 42 weeks, and maps to clinically significant thresholds like 37 weeks for term
Duration of stay is measured in days, has no hard ceiling but is often short, and arithmetic operations like addition and subtraction are meaningful for comparing stays or calculating total stay across an admission.

A primitive type like u8 cannot capture any of this domain knowledge, but a newtype can.

A newtype is a tuple struct with a single private field. For example:

pub struct MaternalAge(u8);

The underlying representation is still a u8, but now we have a distinct type called MaternalAge that the compiler can differentiate from other types. This u8 is private by default, which means that nothing outside can reach in and grab the raw integer without going through what the type decides to offer.

Let’s think about what we might want to offer. If we think about it logically, we want to be able to construct a MaternalAge from a raw u8, but we want to validate that the age is within a reasonable range, which we have said is under 120. We do not want to offer arithmetic operations like addition or subtraction because they do not produce meaningful results. We will probably also want to provide the age in years as a method.

impl MaternalAge {
    pub fn new(age: u8) -> Result<Self, String> {
        if age > 120 {
            Err(format!("Maternal Age of {age} years exceeds practical maximum of 120"))
        } else {
            Ok(MaternalAge(age))
        }
    }

    pub fn years(&self) -> u8 {
        self.0
    }
}

Notice here that the constructor new returns a Result, which means that if you try to create a MaternalAge with an invalid value, you get an error instead of a panic. This is important for data pipeline code, because a bad value in a record is not a bug, it is a data quality issue, and it should be handled explicitly rather than crashing the process. Take a look at my series on error handling for more on this topic, which starts here: Error Handling in Rust: Fundamentals.

We want to force the caller to construct a MaternalAge through the new() function, which means that validation happens at the point of creation rather than at the point of use. By making the field private, we ensure this that direct construction is not possible, and the only way to get a MaternalAge is through the constructor that enforces the rules.

let age = MaternalAge(150); // This is not allowed because the field is private

Conveniently we have also enforced that you cannot get the age as a raw u8 directly as well without going through the years() method.

i.e. you can’t do this:

let age = MaternalAge::new(30).unwrap();
let raw_age = age.0; // This is not allowed because the field is private

The only way to get the raw age is through the years() method:

let age = MaternalAge::new(30).unwrap();
let raw_age = age.years(); // This is allowed and returns a u8

As MaternalAge is just a struct, we can add any methods we like to it. We might want to check if the patient is paediatric, which we might define as under 18 years old, and we might want to get the decade of life for age-banded analysis.

impl MaternalAge {
    pub fn is_paediatric(&self) -> bool {
        self.0 < 18
    }

    pub fn decade(&self) -> u8 {
        // This returns the decade of life, e.g. 1 for 0-9 years, 2 for 10-19 years, etc.
        (self.0 / 10) + 1
    }
}

As we said we are just making simple structs, and this also means that we can use out of the box annotations of rust like #[derive(Debug, Clone, Copy, PartialEq, Eq)] to get those traits for free, which is a nice bonus. We also cannot do arithmetic on MaternalAge because we have not implemented any arithmetic operations, and that is intentional: adding two maternal ages together does not produce a number that has any clinical meaning, so the type simply does not offer it. If you tried to subtract one MaternalAge from another, you would get a compile error because that operation is not defined for the type.

error[E0369]: cannot subtract `MaternalAge` from `MaternalAge`

So you can see that we have encoded a lot of useful domain knowledge about maternal age directly into the type system: the valid range, the clinical meaning of certain thresholds, the fact that arithmetic operations do not make sense. The compiler enforces all of this at every call site, so you cannot accidentally misuse a MaternalAge without getting a compile error. This is the fundamental advantage of newtypes: they allow you to encode what you know about your domain in a way that the compiler can understand and enforce, which makes your code safer and more maintainable in the long run.

Our GestationalAgeInWeeks type would look very similar, but with different validation rules and methods. The valid range is 0 to 42 weeks, and we might want to have a method that returns the trimester based on the gestational age, as well as a method that checks if the pregnancy is term (37 weeks or more). We would not want to offer multiplication or division because those operations do not produce meaningful results for gestational age.

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct GestationalAgeInWeeks(u8);

pub enum Trimester { First, Second, Third }

impl GestationalAgeInWeeks {
    pub fn new(weeks: u8) -> Result<Self, String> {
        if weeks > 42 {
            Err(format!("gestational age of {weeks} weeks exceeds biological maximum of 42"))
        } else {
            Ok(GestationalAgeInWeeks(weeks))
        }
    }

    pub fn trimester(&self) -> Trimester {
        match self.0 {
            0..=12  => Trimester::First,
            13..=26 => Trimester::Second,
            _       => Trimester::Third,
        }
    }

    pub fn is_term(&self) -> bool {
        self.0 >= 37
    }
}

Here you can see that I have encoded the clinical knowledge about trimester thresholds and term pregnancy directly into the methods of the GestationalAgeInWeeks type, and in fact I can return an enum Trimester rather than a raw number, which means that the clinical knowledge lives in one place and the compiler ensures that every call site handles all three cases of trimester correctly. If you had returned a raw number for trimester, every caller would have to re-implement the thresholds, which is error-prone and violates the DRY principle. I hate repeating myself, and I want to avoid it at all costs, so encoding this logic in the type is a big win.

Duration of stay is the deliberate contrast here: it has no hard ceiling.

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct DurationOfStay(u16);

impl DurationOfStay {
    pub fn new(days: u16) -> Self {
        DurationOfStay(days)
    }

    pub fn is_same_day(&self) -> bool {
        self.0 == 0
    }

    pub fn days(&self) -> u16 {
        self.0
    }
}

But arithmetic operations like addition and subtraction are meaningful for comparing stays or calculating total stay across an admission, so we can implement the Add trait for DurationOfStay to allow adding two durations together.

impl std::ops::Add for DurationOfStay {
    type Output = DurationOfStay;
    fn add(self, other: DurationOfStay) -> DurationOfStay {
        DurationOfStay(self.0 + other.0)
    }
}

impl std::ops::Sub for DurationOfStay {
    type Output = DurationOfStay;
    fn sub(self, other: DurationOfStay) -> DurationOfStay {
        DurationOfStay(self.0 - other.0)
    }
}

Multiplication and division probably don’t make sense for duration of stay, so we would not implement those operations. The type simply does not offer them, which means that if you try to multiply two DurationOfStay values together, you would get a compile error because that operation is not defined for the type.

Toolbox for the compiler

The way I think about this is that we are giving the compiler the tools it needs to help us avoid mistakes and enforce the rules that otherwise just exist in our heads. The compiler is a collaborator that checks our work and makes sure we are using the right data correctly.

What other languages give you

In my world, most researchers are using Python or R, so I want to talk about what those languages give you in terms of type safety and domain knowledge encoding, and why newtypes in Rust offer something different. Now I will preface this by saying that most researchers, in my experience, are not taking advantage of the inherited type systems of Python to encode domain knowledge but there are some features that they could use if they wanted to, and I want to talk about those as well.

At first glance, Python’s class GestationalAgeInWeeks(int) looks like it solves the problem: you can create a new class that inherits from int, which means that isinstance checks work, you can add methods to it, and the code reads clearly. However, there is a problem with arithmetic evaporation: if you add two GestationalAgeInWeeks instances together, you get a plain int back which doesn’t really make much sense anyway, and even if it did, you would just end up with a plain int back.

class GestationalAgeInWeeks(int):
    pass

a = GestationalAgeInWeeks(20)
b = GestationalAgeInWeeks(10)
result = a + b
print(type(result))  # <class 'int'> - not GestationalAgeInWeeks
print(isinstance(result, GestationalAgeInWeeks))  # False

Pythons typing module gets us closer by allowing us to add typing (of a sort) to our function signatures, and the NewType function allows us to create distinct types that are checked by type checkers like mypy. We could do this to our function:

from typing import NewType

MaternalAge = NewType('MaternalAge', int)
GestationalAgeInWeeks = NewType('GestationalAgeInWeeks', int)
DurationOfStay = NewType('DurationOfStay', int)

def summarise_episode(maternal_age: MaternalAge, gestational_age: GestationalAgeInWeeks, stay: DurationOfStay):
    pass

However if you try to call this function with the arguments in the wrong order, you will not get a compile error because Python’s type system is not enforced at runtime. The NewType function creates a distinct type for type checkers, but it does not actually create a new class or enforce any constraints at runtime. So you could still do this:

summarise_episode(gestational_age, stay, maternal_age)  # This will not raise an error at runtime

In fact you would only find this error, if you had type checking enabled and were using a tool like mypy, which is not commonly used in research code. So while Python’s NewType gets closer to the idea of distinct types, it is a fiction enforced only by the type checker, not the language itself, and mypy adoption in research code is close to zero, so it does not solve the problem in practice.

R takes a different approach entirely: multiple versions of classes exist but the label does not constrain the contents, and nothing is enforced at the language level. R users in health research typically think in data frames rather than individual typed values, so the concept of encoding domain knowledge in a single value’s type is largely foreign to the paradigm. This is not a criticism, it is just a different model that does not have the same tools for enforcing constraints on individual values.

How does this fit into the pipeline?

Pipeline

In my last post on Serde, I talked about how to use Serde to handle the structural validation of your data at the point of deserialisation, and I mentioned that it is good separate that from the domain validation step that checks the business rules afterwards. I maintain that to be the case but it is more complicated where that line is and ultimately it is a judgement call. Let’s talk through an example to help explain why it is not cut and dry.

In the serde post, we talked about deserialising data from a csv (but it could be any data store) into a struct. The struct is a direct mapping of the data structure, so it has fields like patient_age: u8, gestational_age: u8, and stay_days: u16. This struct is what I called a “raw” struct because it is a direct representation of the data as it exists in the source, without any domain validation or meaning attached to it, and we saw that we can use Serde’s Deserialize trait to handle the structural validation of the data at the point of deserialisation, which means that if we get a non-integer value for patient_age, or a missing field, we get an error right away.

#[derive(Deserialize)]
struct RawEpisodeRecord {
    patient_age: u8,
    gestational_age: u8,
    stay_days: u16,
}

The next step in the pipeline could be to take this raw struct and construct domain types from it, which is where we would use our MaternalAge, GestationalAgeInWeeks, and DurationOfStay newtypes. This is the point where we would apply the domain validation rules that we have encoded in the constructors of those types, so if we get a patient_age of 150, we would get an error when we try to construct a MaternalAge from it.

struct EpisodeRecord {
    patient_age: MaternalAge,
    gestational_age: GestationalAgeInWeeks,
    stay: DurationOfStay,
}

In fact we could make a TryFrom<RawEpisodeRecord> implementation for EpisodeRecord that tries to construct the domain types from the raw struct, and returns an error if any of the validations fail.

impl TryFrom<RawEpisodeRecord> for EpisodeRecord {
    type Error = String;
    fn try_from(raw: RawEpisodeRecord) -> Result<Self, Self::Error> {
        Ok(EpisodeRecord {
            patient_age: MaternalAge::new(raw.patient_age)?,
            gestational_age: GestationalAgeInWeeks::new(raw.gestational_age)?,
            stay: DurationOfStay::new(raw.stay_days)?,
        })
    }
}

This two-stage pipeline allows us to separate concerns: Serde handles the structural validation at the point of deserialisation, and a separate domain validation step checks the business rules afterwards. Each stage has a single responsibility, and each is independently testable. The raw struct is just a mirror of the data structure, while the domain types are where all the meaning and constraints live.

However this line is not always clear cut. What if we didn’t even want to load the data into a raw struct if it contained invalid values? If we are taking patient data and there are some ages that are 200, and clearly wrong, we might make a decision that we want to know about that the earliest possible point. In that case, we could implement a custom Deserialize for MaternalAge that validates the age at the point of deserialisation, which means that if we get an invalid age in the data, we get an error right away without even creating a raw struct. This is a perfectly valid approach, and it is really a judgement call about where you want to draw the line between structural validation and domain validation. The important thing is that you have the tools to enforce your rules at the point where they make the most sense for your pipeline and your data.

Serde has this useful feature called #[serde(transparent)] that allows you to tell Serde to treat a newtype as if it were its inner type for the purposes of deserialisation, which means that you can have your custom validation logic in the constructor of the newtype, and Serde will use that when deserialising data into that type. This is a nice way to combine structural and domain validation in one step if that is what you want to do.

#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
#[serde(transparent)]
pub struct MaternalAge(u8);

#[derive(Deserialize)]
pub struct EpisodeRecord {
    patient_age: MaternalAge,
    ...  // other fields
}

Where that line is drawn can be hard to say. I personally prefer to keep the raw struct as a mirror of the data structure, and then have a separate step where I construct domain types from it, because I like the separation of concerns and the ability to test each stage independently. But I would be strongly tempted to Deserialize directly into the domain types if I had a strong reason to want to know about invalid values as early as possible, and I would not consider that to be a bad practice at all. Equally if I was loading data that I had cleaned and saved myself, and I was confident that it was all valid, for example, if earlier in the pipeline we had inserted cleaned data into a database and now we are just loading it back out again, I might skip the raw struct and just deserialize directly into the domain types for simplicity. The important thing is that you have the tools to enforce your rules at the point where they make the most sense for your pipeline and your data, and that you are consistent in how you apply those tools across your codebase.

The disadvantages

Newtypes are not free. Let’s talk through the trade-offs.

Newtypes are unfamiliar if you are coming from Python or R, and they do require a bit more ceremony to set up - the constructor, the private field, the impl block - that is more code than just age: u32, and for a quick analysis script it is probably not worth it. The payoff is at pipeline scale and at the point where someone else has to maintain your code six months after you wrote it. The investment front-loads complexity so that the rest of the codebase can be simpler.
Testing requires construction: in Python you pass a raw integer to your test; with newtypes you need to construct a valid GestationalAgeInWeeks first, which means slightly more verbose tests and some thought about test constructors or test-only helpers.
Repetition - If you have multiple data sources and you are deserialising into raw structs, and then constructing domain structs from them, you might end up with a lot of boilerplate code that looks very similar across different records. This is not a problem with newtypes per se, but it is a common pattern that can lead to a lot of repetitive code if you have many different types of records, which are all quite similar. Any time you have more code, you have more to maintain, and more surface area for bugs, so it is something to be mindful of. The solution to this is usually to write helper functions or macros that can reduce the boilerplate, but it is still a consideration.
Trait implementations do not come for free: a plain u8 gives you Debug, Clone, Copy, PartialEq, Hash without asking; wrap it in a newtype and you need to explicitly derive or implement each one you need - for basic traits this is one line, but it is easy to forget. External crates are where the friction accumulates: if you want to store a GestationalAgeInWeeks in Postgres via SQLx, you need to implement sqlx::Type, sqlx::Encode, and sqlx::Decode for your newtype - non-trivial boilerplate if you have 20 domain types. The ecosystem’s answer to this is the same word in both cases: #[serde(transparent)] tells Serde to treat the newtype as its inner type; #[sqlx(transparent)] does the same for SQLx - once you know to look for transparent, the solution is usually one attribute away, and the fact that the ecosystem converged on the same word is not a coincidence. This is a good example of how the ecosystem evolves to make common patterns easier to implement, but it does require you to be aware of these features and to apply them consistently across your codebase. I have forgotten #[serde(transparent)] exists multiple times.

Traits as a domain taxonomy

If you have ever been to one of my talks on clean code, you will know that I love the book The Programmer’s Brain by Felienne Hermans. One of my key takeaways from that book is the idea of developing a taxonomy of the types of things that exist in your codebase and using common language for naming them. If you have several different age types - maternal age, paternal age, patient age - giving them a shared Age suffix means that when you encounter MaternalAge or PatientAge in the code, you immediately know what category of thing it is before you know anything else about it. That ordering matters for comprehension as it allows us to build a mental model of the code more quickly. We are taking advantage of chunking information in our brains: when we see MaternalAge, we chunk it as an Age type, and then we can focus on the differences between maternal and patient age, rather than having to figure out what kind of thing each one is from scratch.

Taxonomy

Rust lets you take this further. You can create a 1:1 mapping between your domain taxonomy and your trait hierarchy: a trait Age that defines the contract all age types must fulfil, implemented separately for each concrete type. This is a bit like surgical inheritance. You get the shared behaviour of a class hierarchy without the rigidity, and without accidentally pulling in functionality that does not belong.

trait Age {
    /// Required: each type provides access to its inner value
    fn years(&self) -> u8;

    fn decade(&self) -> u8 {
        // The number of decades of life, e.g. 1 for 0-9 years, 2 for 10-19 years, etc.
        (self.years() / 10) + 1
    }

    /// Abstract: each type defines its own threshold for paediatric age
    fn is_paediatric(&self) -> bool;
}

impl Age for MaternalAge {
    fn years(&self) -> u8 {
        self.0
    }

    fn is_paediatric(&self) -> bool {
        self.years() < 18
    }
}

impl Age for PatientAge {
    fn years(&self) -> u8 {
        self.0
    }

    fn is_paediatric(&self) -> bool {
        self.years() < 16
    }
}

years() is required because the trait cannot reach into the struct’s private field. Each implementor exposes its inner value through this accessor, and the default implementations can build on top of it. decade() has a shared default because the calculation is the same for all ages. is_paediatric() is abstract because the threshold is genuinely different depending on context. Obstetric, paediatric, and general medicine all draw that line in different places, and encoding that per-type enables us to code up that domain knowledge directly.

The suffix convention and the trait name converge on the same word intentionally. When a reader encounters a type ending in Age, they know it implements Age, and they know where to look for its contract. That is taxonomy doing real work in your codebase, not just a naming convention, but a navigational system.

I am going to write about this in more depth in an upcoming post on writing Rust for readers rather than compilers. The short version is that when your types and traits reflect your domain vocabulary, a reader can understand what something is before they have to understand what it does, and that ordering makes a significant difference to how quickly someone can build a mental model of your code.

Conclusion: The compiler remembers what you know

Ultimately, I hope this post has given you a good overview of why you would want to use newtypes in Rust, and how they allow you to encode domain knowledge directly into the type system. The key takeaway is that newtypes give you a way to make the constraints and rules of your domain visible and enforceable in your code, which makes it safer and more maintainable in the long run.

I worry that domain knowledge in research code lives in the researcher’s head, occasionally in a README, almost never in the code itself - and when the researcher moves on, the knowledge goes with them. Newtypes give domain knowledge somewhere to live where it does something. This is the same argument as the Serde post, one stage later in the pipeline: Serde stops the wrong data getting in; newtypes stop the right data being used wrong. Both are about making the constraints and rules of your domain visible and enforceable in your code, which is what you want for any code that is going to be maintained by someone else in the future. More often than not that someone else is you (!!!) six months down the line, and you will be grateful to your past self for encoding that knowledge in a way that the compiler can understand and enforce, rather than relying on your memory or comments in the code.

Why Use Newtypes? Encoding Domain Knowledge in the Type System

The bug you won’t see

Primitives describe shape, not meaning

What other languages give you

How does this fit into the pipeline?

The disadvantages

Traits as a domain taxonomy

Conclusion: The compiler remembers what you know

Further reading

Related Posts

Clinical Metrics

The Single Responsibility Principle for Scientists Who Write Code

Synthetic Data: The Complete Series