PhD

Hey there! 😊 Let’s dive into what my PhD is all about and why it’s super exciting!

I’m working on creating synthetic data for electronic health record research. Now, you might be wondering, ‘What on earth is electronic health record research?’ Well it is research that studies the data captured during routine healthcare, like when you see your GP or attend hospital. This data is not collected primarily for research purposes, but it can still tell us a lot about health trends, causes, treatments, and outcomes.

If you have seen my other pages or heard me speak, you will know that I am a big believer in open science. I think that is is important that we are transparent about our research. In the world of electronic health record research, we are actually writing a lot of code to do our research and then writing papers about it. But sadly, the code that actually produces the results often stays hidden. To me, that’s a missed opportunity because seeing the nitty-gritty details of how research is done at the code level is so valuable!. It means people can check our work and see how we did it, and reuse our code for their own research, so we aren’t all reinventing the wheel.

So, why are we not sharing the code already? Well, some of us are but the catch is that the code often isn’t very useful without the data it’s supposed to analyse. But here’s where it gets tricky: we can’t just release people’s medical data. I mean, who would want their GP records shared publicly? Definitely not me!

This is where synthetic data comes into play. My goal is to create high-quality synthetic data that mirrors the structure of real data but is completely made up. This way, we can run our code on synthetic data and share everything openly with the study—no privacy issues involved!

I’m just starting out on this adventure, but there’s so much to look forward to. With advances in technology and the power of Rust (my favorite programming language), it should be possible to create super-complex synthetic datasets. My plan is to build a set of tools using Rust, Python and SurrealDB, a multi-model database, to generate synthetic data that can be used by researchers worldwide. I am delighted to be sponsored by SurrealDB for my PhD and I can’t wait to see where this journey takes me!