<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Caroline Morton</title>
    <link>/</link>
    <description>Caroline Morton&#39;s personal blog</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Sat, 23 May 2026 00:00:00 &#43;0100</lastBuildDate>
    <atom:link href="/blog/tags/privacy-metrics/index.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>Logs and tracing in Rust: From Terminal to Grafana</title>
      <link>/blog/rust-tracing-grafana-loki/</link>
      <pubDate>Sat, 23 May 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/rust-tracing-grafana-loki/</guid>
      <description>&lt;p&gt;This is the third part of a series on logging and tracing in Rust. The first part covered &lt;a href=&#34;../rust-tracing-logging-fundamentals&#34;



 


&gt;the fundamentals&lt;/a&gt; and the second part covered &lt;a href=&#34;../rust-tracing-structured-fields-and-spans&#34;



 


&gt;structured fields and spans&lt;/a&gt;. This series accompanies my &lt;a href=&#34;https://www.meetup.com/women-in-rust/events/313506048/?eventOrigin=group_upcoming_events&#34;




 target=&#34;_blank&#34;
 


&gt;Women in Rust&lt;/a&gt; talk on this topic but should stand alone as a reference.&lt;/p&gt;
&lt;p&gt;In the first two posts, all of our logs have been going to the terminal. That is fine for development, but in production you need your logs somewhere you can search, filter and alert on them. Alert in this context might mean pinging a slack channel or emailing someone. In this post we are going to take everything we have built so far and ship it to Grafana via Loki. By the end of this post, you will be able to query your structured fields and spans in a real dashboard. We are going to do all of this locally with Docker Compose and write logs to a file locally so we don&amp;rsquo;t have to worry about any cloud infrastructure or setting up a full production pipeline. The aim of this blog post is to show you the bare minimum so you can get your logs into Grafana and start querying them, not to show you how to set up a production ready logging pipeline. I hope at the end of this post, you have a basic rust app to grafana pipeline running and you can see the possibilities of having structured logs in a log aggregation tool. You can then take that knowledge and apply it to your own projects and production pipelines as you see fit.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Logs and tracing in Rust: Structured Fields and Spans</title>
      <link>/blog/rust-tracing-structured-fields-and-spans/</link>
      <pubDate>Fri, 22 May 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/rust-tracing-structured-fields-and-spans/</guid>
      <description>&lt;p&gt;This blog post is the second part of a 3 part series on logging and tracing in Rust. The first part is &lt;a href=&#34;../rust-tracing-logging-fundamentals&#34;



 


&gt;here&lt;/a&gt; and the third part is &lt;a href=&#34;../rust-tracing-grafana-loki&#34;



 


&gt;here&lt;/a&gt;. This series accompanies my &lt;a href=&#34;https://www.meetup.com/women-in-rust/events/313506048/?eventOrigin=group_upcoming_events&#34;




 target=&#34;_blank&#34;
 


&gt;Women in Rust&lt;/a&gt; talk on this topic but should stand alone as a reference. If you are brand new to logging and tracing, I would recommend starting with the first part, but if you are already familiar with the basics of using logging macros in rust and want to learn about spans and fields, then this post is a good starting point. In this post we are going to be covering: what are spans and fields, how to use them in your Rust projects, and some best practices for using them effectively.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Logs and tracing in Rust: Fundamentals</title>
      <link>/blog/rust-tracing-logging-fundamentals/</link>
      <pubDate>Thu, 21 May 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/rust-tracing-logging-fundamentals/</guid>
      <description>&lt;p&gt;In this blog post I am going to be introducing the concept of logging and how to use tracing in your Rust projects. In this first of three posts, we are going to be covering: what is the purpose of logging, when we should log and how to set up a basic tracing subscriber. This series accompanies my &lt;a href=&#34;https://www.meetup.com/women-in-rust/events/313506048/?eventOrigin=group_upcoming_events&#34;




 target=&#34;_blank&#34;
 


&gt;Women in Rust&lt;/a&gt; talk on this topic but should stand alone as a reference. A lot of the initial concepts I am going to talk about in this first blog is not specific to Rust but speak more to the philosophy behind knowing exactly what is happening in your code at any particular time. There are some great references on this topic and I will try to reference them as I go along. The &lt;a href=&#34;../rust-tracing-structured-fields-and-spans&#34;



 


&gt;second post&lt;/a&gt; is about how to use spans and structured fields to add more context to your logs and &lt;a href=&#34;../rust-tracing-grafana-loki&#34;



 


&gt;the third post&lt;/a&gt; is about how to ship your logs to a log aggregation tool like Grafana Loki and query them once they are there.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why Synthetic Data is Good for Open Science</title>
      <link>/blog/synthetic-data-open-science/</link>
      <pubDate>Mon, 18 May 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/synthetic-data-open-science/</guid>
      <description>&lt;p&gt;I regularly do talks on reproducibility and open science at conferences and workshops, and it is one of my most requested topics. The statistics about reproducibility are pretty shocking with some papers quoting &lt;a href=&#34;https://www.nature.com/articles/s41597-022-01143-6&#34;




 target=&#34;_blank&#34;
 


&gt;up to 74% of studies being unable to be reproduced&lt;/a&gt;. This is obviously a huge problem for science and it is something that we need to address. I have done a lot of thinking about what are the barriers and blockers to reproducibility and a key part of it is that often researchers do not share their code. This means it is hard to truly understand what they did and to check it. When I ask researchers why they don&amp;rsquo;t share their code, the most common answer I get is that they cannot share their data, so sharing the code feels pointless. Recently I have been thinking a lot about how synthetic data might be a solution to this problem. It can provide a way for researchers to share their code and still allow others to run it and check it, even if they cannot share the real data. In this blog post, I want to explore this idea in more detail and discuss how synthetic data can be a bridge between the need for reproducibility and the need to protect sensitive data.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Is your Synthetic Data actually private?</title>
      <link>/blog/synthetic-data-privacy-metrics/</link>
      <pubDate>Mon, 11 May 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/synthetic-data-privacy-metrics/</guid>
      <description>&lt;p&gt;Healthcare organisations face mounting pressure to demonstrate privacy protection in their data applications. This has been partially driven by ambitious new regulatory frameworks that are reshaping the European health data landscape. The &lt;a href=&#34;https://www.european-health-data-space.com/&#34;




 target=&#34;_blank&#34;
 


&gt;European Health Data Space (EHDS) Regulation&lt;/a&gt;, which entered into force in March 2025, establishes unprecedented standards for health data use across EU member states. Together with the &lt;a href=&#34;https://iapp.org/resources/article/uk-data-protection-reform-an-overview/&#34;




 target=&#34;_blank&#34;
 


&gt;UK&amp;rsquo;s Data (Use and Access) Act 2025&lt;/a&gt;, these frameworks require organisations to meet &amp;ldquo;&lt;a href=&#34;https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space-regulation-ehds_en&#34;




 target=&#34;_blank&#34;
 


&gt;the highest standards of privacy and cybersecurity&lt;/a&gt;&amp;rdquo; when working with health data.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Applying the Open-Closed Principle to Research Code</title>
      <link>/blog/open-close-principle-research-code/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/open-close-principle-research-code/</guid>
      <description>&lt;p&gt;Let&amp;rsquo;s start with a scenario. You wrote some code for a research project last December. You ran the analysis, reported a number, wrote it up, and submitted the paper to a journal. Peer review came back a few months later with the usual list of points. One reviewer asked you to use a slightly different definition of the exposure variable as a sensitivity check. You opened the function, added an &lt;code&gt;elif&lt;/code&gt; branch for the new definition, tidied up the shared logic above it while you were there, and sent the revised paper back. Months later, your boss asks you to pull a subset of the project together for a conference poster. You rerun the original analysis. The number is different. You cannot tell whether the number in the submitted paper is right, or the number on your screen is right, or both, or neither. You feel the dread of &amp;ldquo;oh no&amp;rdquo;!&lt;/p&gt;</description>
    </item>
    <item>
      <title>The Single Responsibility Principle for Scientists Who Write Code</title>
      <link>/blog/single-responsibility-principle-research-code/</link>
      <pubDate>Wed, 15 Apr 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/single-responsibility-principle-research-code/</guid>
      <description>&lt;p&gt;Most research code starts life as a script. You load a dataset, clean it, fit a model and write out the results. It works, but then your study evolves, and you end up with a function that looks like this.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#6639ba&#34;&gt;analyse_cohort&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;input_path&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; output_path&lt;span style=&#34;color:#1f2328&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# Load data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; pd&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;read_csv&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;input_path&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;columns &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;c&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;lower&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;strip&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#cf222e&#34;&gt;for&lt;/span&gt; c &lt;span style=&#34;color:#0550ae&#34;&gt;in&lt;/span&gt; df&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;columns&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;date_of_birth&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; pd&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;to_datetime&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;date_of_birth&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;],&lt;/span&gt; dayfirst&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;True&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;index_date&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; pd&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;to_datetime&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;index_date&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;],&lt;/span&gt; dayfirst&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;True&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# Derive age&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;age&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;index_date&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;-&lt;/span&gt; df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;date_of_birth&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;])&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;dt&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;days &lt;span style=&#34;color:#0550ae&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;365.25&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# Clean and exclude&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;age&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;gt;=&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;18&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;age&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;100&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; df&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;dropna&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;subset&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;systolic_bp&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;creatinine&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;hba1c&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;systolic_bp&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;creatinine&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;~&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;region&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;isin&lt;span style=&#34;color:#1f2328&#34;&gt;([&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;unknown&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;])]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# Create derived variables&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;egfr&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#6639ba&#34;&gt;round&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;175&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;((&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;creatinine&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;88.4&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;**&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;1.154&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;age&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;**&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;0.203&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;loc&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;sex&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;F&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;egfr&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;*=&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;0.742&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;ckd_stage&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; pd&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;cut&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;egfr&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        bins&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;15&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;30&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;45&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;60&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;90&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#6639ba&#34;&gt;float&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;inf&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        labels&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;5&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;4&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;3b&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;3a&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;2&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;hba1c_mmol&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;hba1c&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;-&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;2.15&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;0.0915&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;diabetic&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;hba1c_mmol&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;&amp;gt;=&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;48&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# Fit model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;outcome&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt; &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; df&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;died_within_year&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;astype&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;int&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    formula &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;outcome ~ age + C(sex) + egfr + systolic_bp + diabetic + C(region)&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    model &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; smf&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;logit&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;formula&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; data&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;fit&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;disp&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#57606a&#34;&gt;# Write everything out&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    summary &lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt; model&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;summary2&lt;span style=&#34;color:#1f2328&#34;&gt;()&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;tables&lt;span style=&#34;color:#1f2328&#34;&gt;[&lt;/span&gt;&lt;span style=&#34;color:#0550ae&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    summary&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;to_csv&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;output_path &lt;span style=&#34;color:#0550ae&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;model_coefficients.csv&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    df&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;to_csv&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;output_path &lt;span style=&#34;color:#0550ae&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;analysis_cohort.csv&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; index&lt;span style=&#34;color:#0550ae&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#cf222e&#34;&gt;False&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#cf222e&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#6639ba&#34;&gt;open&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;output_path &lt;span style=&#34;color:#0550ae&#34;&gt;/&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;exclusions.txt&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;,&lt;/span&gt; &lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;w&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt; &lt;span style=&#34;color:#cf222e&#34;&gt;as&lt;/span&gt; f&lt;span style=&#34;color:#1f2328&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        f&lt;span style=&#34;color:#0550ae&#34;&gt;.&lt;/span&gt;write&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Final cohort size: &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;len&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#6639ba&#34;&gt;print&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;&amp;#34;Done. &lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#6639ba&#34;&gt;len&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;(&lt;/span&gt;df&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#0a3069&#34;&gt; patients included.&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#1f2328&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is the sort of function that grows organically. It loads data, cleans it, applies exclusions, creates clinical variables, fits a model, and writes the output. Nobody sets out to write a long and complicated function like this that does 6 different things. It starts out as a script and then somebody wraps it into a function so that it can be used on different datasets or cuts of the same data, perhaps a small dataset for development and then the full one. As the analysis develops, things get slotted in and suddenly this function is getting quite long and unwieldy .&lt;/p&gt;</description>
    </item>
    <item>
      <title>Synthetic Data: The Complete Series</title>
      <link>/blog/synthetic-data-series/</link>
      <pubDate>Thu, 09 Apr 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/synthetic-data-series/</guid>
      <description>&lt;p&gt;I have spent the last year thinking and writing about synthetic data. The promise is straightforward: generate data that looks and behaves like the real thing, without exposing anything confidential. I write from the perspective of someone working in epidemiology and healthcare research, but the principles extend well beyond this area. Synthetic data is being used across finance, pharma, university research centres, and large corporate enterprises. What I have found drives adoption in almost every case is the same tension: organisations want more data to work with, but the real data is sensitive. In healthcare and finance that sensitivity is about protecting individuals from being identified, after all what could be more sensitive than your health record or bank statements. In commercial settings it is about protecting data that is valuable to competitors.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Representativeness in Synthetic Data: What It Means and How to Measure It</title>
      <link>/blog/representativeness-in-synthetic-data/</link>
      <pubDate>Tue, 07 Apr 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/representativeness-in-synthetic-data/</guid>
      <description>&lt;p&gt;In &lt;a href=&#34;/blog/what-is-synthetic-data/&#34;



 


&gt;my article, introduction to synthetic data&lt;/a&gt;, I described how GAN-generated records had been used to &lt;a href=&#34;https://www.mdpi.com/2071-1050/15/18/13690&#34;




 target=&#34;_blank&#34;
 


&gt;predict patient length of stay in German hospitals&lt;/a&gt;. It’s a compelling example of how the synthetic data we generate can serve as a credible proxy for real patient records. But how do we know that the synthetic datasets we generate actually represent the patient population they are modelling?&lt;/p&gt;
&lt;p&gt;Consider the German study above. If the data subtly underrepresented elderly patients with multiple comorbidities (a group who typically account for the longest and most resource-intensive stays), the model could look quite accurate on paper, while producing systematically skewed predictions in practice. The difference between ‘it looks like real data to some degree’ and ‘this actually represents the population’ is called &lt;em&gt;representativeness&lt;/em&gt;, and is what this article is all about.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why Rust for Data-Intensive Applications</title>
      <link>/blog/why-rust-for-data-intensive-applications/</link>
      <pubDate>Wed, 01 Apr 2026 00:00:00 &#43;0100</pubDate>
      <guid>/blog/why-rust-for-data-intensive-applications/</guid>
      <description>&lt;p&gt;This is a prologue to a series on Rust for data-intensive research applications - written after the first three parts, which is perhaps the wrong order, but reflects how the thinking actually developed. I wanted to write something that introduces the series as a whole, explains the motivation behind it, and I hope is accessible to researchers who may not be familiar with Rust. It is also, in the spirit of Austin Kleon&amp;rsquo;s &lt;a href=&#34;https://www.amazon.co.uk/Show-Your-Work-Getting-Discovered/dp/076117897X&#34;




 target=&#34;_blank&#34;
 


&gt;Show Your Work!&lt;/a&gt;, an attempt to share the process of thinking rather than just the conclusions.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Your Errors Are Data Too</title>
      <link>/blog/rust-errors-data-quality/</link>
      <pubDate>Mon, 23 Mar 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/rust-errors-data-quality/</guid>
      <description>&lt;p&gt;This is the third post in the Rust for Data-Intensive Applications series. The &lt;a href=&#34;/blog/rust-serde-data-pipelines/&#34;



 


&gt;Serde post&lt;/a&gt; covered moving the validation boundary to the point of ingestion. The &lt;a href=&#34;/blog/rust-newtypes-domain-knowledge/&#34;



 


&gt;newtypes post&lt;/a&gt; covered encoding domain knowledge in types so the compiler enforces it. This post is about what happens when things go wrong at either of those boundaries, and why capturing that information carefully is as important as the valid records themselves.&lt;/p&gt;
&lt;h2 id=&#34;errors-in-research-code-are-different&#34;&gt;Errors in research code are different&lt;/h2&gt;&lt;p&gt;In application development, an error means something went wrong that needs fixing. Your code threw an exception, your service returned a 500, your database query failed. The goal is to find the error, understand it, and eliminate it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why Use Newtypes? Encoding Domain Knowledge in the Type System</title>
      <link>/blog/rust-newtype-domain-knowledge/</link>
      <pubDate>Mon, 16 Mar 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/rust-newtype-domain-knowledge/</guid>
      <description>&lt;p&gt;I have spent a lot of time debugging research pipelines, and the bugs that scare me most are not the ones that crash loudly. They are the ones that produce plausible-looking output and let you carry on for weeks before anyone notices something is wrong. Wrong-order bugs are the worst offender. You have a function that takes several parameters of the same primitive type, the caller passes them in the wrong order, the compiler says nothing, and the output sits just inside the range of values you would expect to see in real data. By the time you find it, if you find it, you have already done a lot of work on incorrect results.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Serde Rust: Data Serialisation for Data Scientists</title>
      <link>/blog/rust-serde-data-pipelines/</link>
      <pubDate>Sun, 08 Mar 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/rust-serde-data-pipelines/</guid>
      <description>&lt;p&gt;I have a confession to make: I love &lt;a href=&#34;https://serde.rs/&#34;




 target=&#34;_blank&#34;
 


&gt;Serde&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Serde for those not in the know is the Rust ecosystem&amp;rsquo;s workhorse for serialisation and deserialisation, but for data pipelines I find it more helpful to think of it as something slightly different: a schema enforcement mechanism and a validation boundary.&lt;/p&gt;
&lt;p&gt;That distinction matters. In many data pipelines, validation is treated as something that happens later. You ingest the data, clean it, transform it, and only then check whether it actually matches the assumptions your code is making. By the time you discover a problem, you may already have done quite a lot of work on data that is not what you thought it was.&lt;/p&gt;</description>
    </item>
    <item>
      <title>How Synthetic Data Is Used in Healthcare, Research and Beyond</title>
      <link>/blog/synthetic-data-use-cases-healthcare-research/</link>
      <pubDate>Mon, 02 Mar 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/synthetic-data-use-cases-healthcare-research/</guid>
      <description>&lt;p&gt;In a previous post, I introduced &lt;a href=&#34;/blog/what-is-synthetic-data&#34;



 


&gt;what synthetic data is&lt;/a&gt; and why it is generating so much interest in healthcare. However, in this article I want to talk about the practical use cases, both in healthcare but also in other industries, to give a better sense of how synthetic data is being used in the real world. This is not an exhaustive list, but my aim is to give a better sense of the breadth of applications. I often get asked why I choose to focus on this area, and my hope is that this article will show why it is such an exciting space to be working in.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Accidental Functional Programming in Rust (From an Epidemiologist&#39;s Perspective)</title>
      <link>/blog/functional-programming-rust-scientific-software/</link>
      <pubDate>Mon, 09 Feb 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/functional-programming-rust-scientific-software/</guid>
      <description>&lt;p&gt;This post started as a talk I gave at &lt;a href=&#34;https://www.lambda.world/&#34;




 target=&#34;_blank&#34;
 


&gt;Lambda World 2025&lt;/a&gt;. It&amp;rsquo;s the kind of conference where you end up in a two-hour conversation about Scala data streams despite not knowing Scala. If you&amp;rsquo;d rather watch than read, the video is embedded below.&lt;/p&gt;
&lt;iframe
  style=&#34;center&#34;
  width=&#34;850&#34;
  height=&#34;450&#34;
  src=&#34;https://www.youtube.com/embed/e-7yHc7OqcA&#34;
  title=&#34;YouTube video player&#34;
  frameborder=&#34;0&#34;
  allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34;
  referrerpolicy=&#34;strict-origin-when-cross-origin&#34;
  allowfullscreen&gt;
&lt;/iframe&gt;
&lt;p&gt;Side note: ignore the worryingly smiley face in the thumbnail. Someone at the conference decided to run my headshot through an AI filter to increase its pixels and I ended up looking like a Pixar character. I don&amp;rsquo;t know why they thought that was a good idea, but here we are!&lt;/p&gt;</description>
    </item>
    <item>
      <title>How to Create a Codelist</title>
      <link>/blog/how-to-make-codelist/</link>
      <pubDate>Mon, 02 Feb 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/how-to-make-codelist/</guid>
      <description>&lt;p&gt;This blog post is the second part of a two-part series that accompanies a lecture I am giving on Codelists as part of the &lt;a href=&#34;https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/health-data-in-practice-msc/&#34;




 target=&#34;_blank&#34;
 


&gt;Health Data in Practice MSc&lt;/a&gt; at Queen Mary University of London.&lt;/p&gt;
&lt;p&gt;In the &lt;a href=&#34;/blog/what-is-a-codelist&#34;



 


&gt;previous blog&lt;/a&gt;, we looked at what a codelist is and why they are both important and difficult to create. In this post, we are going to look at some of the methods for creating codelists for your study.&lt;/p&gt;</description>
    </item>
    <item>
      <title>What is a Codelist?</title>
      <link>/blog/what-is-a-codelist/</link>
      <pubDate>Sun, 01 Feb 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/what-is-a-codelist/</guid>
      <description>&lt;p&gt;This blog post accompanies a lecture that I am giving on Codelists as part of the &lt;a href=&#34;https://www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/health-data-in-practice-msc/&#34;




 target=&#34;_blank&#34;
 


&gt;Health Data in Practice MSc&lt;/a&gt; at Queen Mary University of London.&lt;/p&gt;
&lt;p&gt;In this post, I am going to introduce what a codelist is, why they are needed and how they are used in health data research. There will be a follow-up post where I will walk through some of the ways that codelists can be created or sourced.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Error Handling in Rust: anyhow and thiserror</title>
      <link>/blog/rust-error-handling-anyhow-thiserror/</link>
      <pubDate>Sun, 25 Jan 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/rust-error-handling-anyhow-thiserror/</guid>
      <description>&lt;p&gt;This is part two of my error handling series for Women in Rust (here&amp;rsquo;s &lt;a href=&#34;/blog/rust-error-handling-fundamentals&#34;



 


&gt;part one&lt;/a&gt;). It stands alone as a reference, but we&amp;rsquo;ll build on examples from the previous post.&lt;/p&gt;
&lt;p&gt;Last time, we hit a wall: the &lt;code&gt;?&lt;/code&gt; operator made our code clean but stripped out our context. This post fixes that with two crates - anyhow for quick, contextual errors, and thiserror for when you need callers to handle errors differently.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Error Handling in Rust: Fundamentals</title>
      <link>/blog/rust-error-handling-fundamentals/</link>
      <pubDate>Mon, 19 Jan 2026 00:00:00 &#43;0000</pubDate>
      <guid>/blog/rust-error-handling-fundamentals/</guid>
      <description>&lt;p&gt;Rust&amp;rsquo;s approach to error handling is one of its most distinctive features, and one of the most confusing for newcomers. In this first of two posts, we&amp;rsquo;ll cover the core concepts: recoverable vs unrecoverable errors, when to panic, and how to propagate errors effectively. This series accompanies my Women in Rust talk on the topic, but stands alone as a reference.&lt;/p&gt;
&lt;p&gt;Before we get into the mechanics, we need to understand the fundamental distinction that shapes all error handling in Rust.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Women in Rust 2025</title>
      <link>/blog/women-in-rust-2025/</link>
      <pubDate>Fri, 19 Dec 2025 00:00:00 &#43;0000</pubDate>
      <guid>/blog/women-in-rust-2025/</guid>
      <description>&lt;p&gt;What a fantastic year 2025 has been for the &lt;a href=&#34;https://www.meetup.com/women-in-rust/&#34;




 target=&#34;_blank&#34;
 


&gt;Women in Rust&lt;/a&gt; community!&lt;/p&gt;
&lt;p&gt;Thank you to everyone who has contributed to making this year so special. This was our first full year of events after launching in Spring 2024, and the enthusiasm and engagement from the amazing women in our community has been truly inspiring. I am going to take us back through the year to highlight some of the wonderful moments we shared together. We had 18 events this year with a mix of online and in-person formats. Here are some of the highlights:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Multiple Imputation and Perturbation: Why They&#39;re Not Built for Synthetic Data</title>
      <link>/blog/multiple-imputation-and-synthetic-data/</link>
      <pubDate>Thu, 18 Dec 2025 00:00:00 &#43;0000</pubDate>
      <guid>/blog/multiple-imputation-and-synthetic-data/</guid>
      <description>&lt;p&gt;Synthetic data is often described as a solution to the limited data problem, and as we have discussed in previous blogs, &lt;a href=&#34;/blog/?tag=synthetic-data&#34;



 


&gt;Synthetic Data&lt;/a&gt; it can be a powerful tool for creating larger datasets that model real data while protecting privacy. When it comes to generating synthetic data, there are many methods available, each with its own strengths and weaknesses. Some of these methods are designed specifically for generating synthetic data, while others are not.&lt;/p&gt;</description>
    </item>
    <item>
      <title>What are GANs and how can they generate synthetic data?</title>
      <link>/blog/what-are-gans/</link>
      <pubDate>Mon, 24 Nov 2025 00:00:00 &#43;0000</pubDate>
      <guid>/blog/what-are-gans/</guid>
      <description>&lt;p&gt;This blog is going to introduce the concept of Generative Adversarial Networks (GANs) and explore how they can be used to generate synthetic data. This is particularly important in the case of generating synthetic healthcare data as many of the published research papers out there focus on this application.&lt;/p&gt;
&lt;h3 id=&#34;what-are-gans&#34;&gt;What are GANs?&lt;/h3&gt;&lt;p&gt;Generative Adversarial Networks (GANs) are a class of machine learning models that consist of two neural networks competing against each other. The first network, called the generator, creates synthetic data samples, while the second network, called the discriminator, evaluates whether the samples are real (from the training data) or fake (produced by the generator). This competition is inherently adversarial, hence the name. The generator aims to produce data that is indistinguishable from real data, while the discriminator strives to become better at identifying fake data. Through this process, both networks improve over time, leading to the generation of highly realistic synthetic data. Training continues until the generator produces synthetic healthcare data so convincing that the discriminator cannot reliably identify it as artificial. The result is a trained model capable of generating unlimited synthetic patient records that maintain the statistical properties, correlations, and clinical patterns found in real datasets while containing no actual patient information.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Clinic to Code to Care</title>
      <link>/blog/clinic-to-code-to-care/</link>
      <pubDate>Sun, 26 Oct 2025 00:00:00 &#43;0100</pubDate>
      <guid>/blog/clinic-to-code-to-care/</guid>
      <description>&lt;p&gt;This blog is an adaptation of a talk that &lt;a href=&#34;https://www.codingwithsteph.com/about&#34;




 target=&#34;_blank&#34;
 


&gt;Steph Jones&lt;/a&gt; and I gave at Women in Data and AI in October 2025. It explores the journey of information from a patient in clinic to how that information is coded for research and ultimately ends up informing statistical and machine learning models that can help improve patient care. I hope it provides a useful overview of the process and highlights some of the challenges and opportunities along the way.&lt;/p&gt;</description>
    </item>
    <item>
      <title>What is Synthetic Data and Why Does it Matter?</title>
      <link>/blog/what-is-synthetic-data/</link>
      <pubDate>Sat, 11 Oct 2025 00:00:00 &#43;0100</pubDate>
      <guid>/blog/what-is-synthetic-data/</guid>
      <description>&lt;p&gt;Healthcare research is facing a fundamental paradox. The demand for comprehensive datasets to drive medical innovation has never been greater, yet access to real patient data remains severely restricted by privacy laws and ethical constraints. In this blog post, I will explore how synthetic data is emerging as a powerful solution to this challenge, enabling researchers to access high-quality datasets without compromising patient privacy.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;../images/lockdown-data.jpg&#34; alt=&#34;Data Privacy&#34; style=&#34;max-width: 100%; height: auto; display: block; margin: 2rem auto;&#34;&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why the Adapter Pattern is King in Health Data</title>
      <link>/blog/adapter-pattern-for-health/</link>
      <pubDate>Sun, 09 Mar 2025 00:00:00 &#43;0000</pubDate>
      <guid>/blog/adapter-pattern-for-health/</guid>
      <description>&lt;p&gt;Healthcare data is messy. If you&amp;rsquo;ve ever worked in a clinical setting, you know the frustration logging into multiple systems just to piece together a patient&amp;rsquo;s history. Pharmacy records don’t sync with GP notes, hospital systems don’t talk to each other, and critical information gets lost in the gaps. In the worst cases, paper documentation is still part of the process.&lt;/p&gt;
&lt;p&gt;Recently, a close friend went to a midwife appointment, only to be offered a whooping cough and flu vaccine she had already received two weeks earlier at her GP. The midwife’s system didn’t communicate with the GP’s records. If she hadn’t remembered getting the vaccines, she might have been given an unnecessary second dose. This kind of duplication doesn’t just waste resources-it can lead to medical errors. And this is 2025.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Finding Similarity with Vector Search: A Beginner&#39;s Guide</title>
      <link>/blog/vector-search-and-string/</link>
      <pubDate>Fri, 10 Jan 2025 00:00:00 &#43;0000</pubDate>
      <guid>/blog/vector-search-and-string/</guid>
      <description>&lt;p&gt;Have you ever wondered how to find someone or something that’s most like you, whether it’s a roommate, someone who shares your Christmas traditions, or even a celebrity? Vector search is the answer. It’s a modern way to find matches based on multiple preferences at once, and tools like SurrealDB make it incredibly easy to use. Let’s explore what vector search is and how it works, step by step.&lt;/p&gt;
&lt;h2 id=&#34;what-is-vector-search&#34;&gt;What is Vector Search?&lt;/h2&gt;&lt;p&gt;Vector Search is a method to find the most similar items in a dataset by considering multiple dimensions (or preferences) simultaneously. Unlike traditional database queries, which filter data based on specific conditions, vector search calculates the &amp;ldquo;distance&amp;rdquo; between items in a multi-dimensional space to find the closest match. The easiest way to explain this is with an example.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Code Review for Research Code</title>
      <link>/blog/code-review-for-research-code/</link>
      <pubDate>Mon, 23 Sep 2024 00:00:00 &#43;0100</pubDate>
      <guid>/blog/code-review-for-research-code/</guid>
      <description>&lt;p&gt;This blog is a written version of a talk I have given a few times now on how to conduct a code review for research code. I am writing up to provide a reference for those who have attended the talk and for those who are interested in learning more about code review for research code.&lt;/p&gt;
&lt;p&gt;This guide is intended to be a high-level overview of what a code review is, why you should do it, and how to do it. It does require some knowledge of Github, but I will try to explain things as I go along. If you have any questions, please feel free to ask me.&lt;/p&gt;</description>
    </item>
    <item>
      <title>SNOMED and friends</title>
      <link>/blog/what-is-snomed/</link>
      <pubDate>Sat, 31 Aug 2024 00:00:00 &#43;0100</pubDate>
      <guid>/blog/what-is-snomed/</guid>
      <description>&lt;p&gt;This blog is a short introduction to SNOMED CT from my perspective as someone who has interacted with SNOMED in a variety of different ways. I am going to try to reflect a variety of points of view from my time as a clinician, an epidemiologist using electronic health records for research and a software engineer, trying to create realistic synthetic data.&lt;/p&gt;
&lt;h3 id=&#34;what-is-snomed-ct&#34;&gt;What is SNOMED CT?&lt;/h3&gt;&lt;p&gt;SNOMED CT stands for Systematized Nomenclature of Medicine - Clinical Terms, and it is designed to be a comprehensive multi-lingual set of clinical healthcare terminiology that can be used to record and exchange clinical health information.&lt;/p&gt;</description>
    </item>
    <item>
      <title>An Introduction to Electronic Health Records</title>
      <link>/blog/what-is-an-ehr/</link>
      <pubDate>Thu, 29 Aug 2024 00:00:00 &#43;0100</pubDate>
      <guid>/blog/what-is-an-ehr/</guid>
      <description>&lt;p&gt;A question that I get asked a lot is &amp;ldquo;What is an Electronic Health Record (EHR)?&amp;rdquo; This is a great question, and I hope to answer it in this blog post. This will be UK focussed as that is where I am based and have experience with but the principles are the same in other countries, even if the systems are slightly different. This is a very high level overview and I will go into more detail in future posts. The aim is to provide a basic understanding of what an EHR is and how it is used in research and clinical practice.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
