Posts tagged: Data

graph_2 orange

Representativeness in Synthetic Data: What It Means and How to Measure It

Understanding the concept of representativeness in synthetic data and the methods used to measure it.

Read More
reproducability yellow

Why Rust for Data-Intensive Applications

Explores why Rust matters for research data pipelines - not for performance, but for correctness. Learn how Rust's type system prevents data failures.

Read More
errors green

Your Errors Are Data Too

How Rust's error handling patterns let you treat errors as structured observations about your data - capturing context, categorising failures, and producing data quality reports as first-class pipeline outputs.

Read More
head_brain yellow

Why Use Newtypes? Encoding Domain Knowledge in the Type System

How Rust's newtype pattern lets you encode domain knowledge - valid ranges, clinical thresholds, meaningful operations - directly into the type system, so the compiler enforces what you already know to be true about your data.

Read More
table orange

Serde Rust: Data Serialisation for Data Scientists

Practical Rust patterns for building validated data pipelines with Serde. Custom deserialisers, domain-constrained types, streaming CSV processing, and structured error handling for messy real-world data.

Read More
padlock blue

How Synthetic Data Is Used in Healthcare, Research and Beyond

Explore real-world use cases for synthetic data in healthcare, clinical trials, finance and more.

Read More
table green

Multiple Imputation and Perturbation: Why They're Not Built for Synthetic Data

This blog explores why multiple imputation and perturbation are not suitable for generating synthetic data.

Read More
hospital blue

Clinic to Code to Care

This blog came out of a talk Steph Jones and I gave at Women in Data and AI in October 2025. It explores the journey of information from a patient in clinic to how that information is coded for research and ultimately ends up informing statistical and machine learning models that can help improve patient care.

Read More
padlock green

What is Synthetic Data and Why Does it Matter?

This blog is the first in a series exploring synthetic data, its benefits, and its applications in various fields.

Read More