Caroline Morton

Posts tagged: Data Pipelines

All Posts Health Data (12) Data (11) Rust (9) Synthetic Data (6) Electronic Health Records (5) PhD (5) Reproducibility (5) SNOMED (5) Codelists (4) Data Pipelines (4) Errors (4) Machine Learning (4) Open Science (4) AI (3) Data Privacy (3) Software Reliability (3) Systems Programming (3) Women in Rust (3) Data Quality (2) Adapter Pattern (1) Code Review (1) Data Science (1) Data Validation (1) Databases (1) Design Patterns (1) Epidemiology (1) Functional Programming (1) GANs (1) Health Tech (1) ICD-10 (1) Maintainability (1) Mental Models (1) OMOP (1) OPCS (1) Readability (1) Representativeness (1) Research Code (1) Scientific Computing (1) Scientific Software (1) Serde (1) SurrealDB (1) Type System (1) Vector Search (1) Women in Tech (1)

Why Rust for Data-Intensive Applications

Explores why Rust matters for research data pipelines - not for performance, but for correctness. Learn how Rust's type system prevents data failures.

Your Errors Are Data Too

How Rust's error handling patterns let you treat errors as structured observations about your data - capturing context, categorising failures, and producing data quality reports as first-class pipeline outputs.

Why Use Newtypes? Encoding Domain Knowledge in the Type System

How Rust's newtype pattern lets you encode domain knowledge - valid ranges, clinical thresholds, meaningful operations - directly into the type system, so the compiler enforces what you already know to be true about your data.

Serde Rust: Data Serialisation for Data Scientists

Practical Rust patterns for building validated data pipelines with Serde. Custom deserialisers, domain-constrained types, streaming CSV processing, and structured error handling for messy real-world data.