What is a Codelist?

Author

Caroline Morton

Date

February 1, 2026

This blog post accompanies a lecture that I am giving on Codelists as part of the Health Data in Practice MSc at Queen Mary University of London.

In this post, I am going to introduce what a codelist is, why they are needed and how they are used in health data research. There will be a follow-up post where I will walk through some of the ways that codelists can be created or sourced.

Background: Health Data and Coding Systems

When you visit a healthcare professional, whether it’s your GP or a hospital specialist, the information about your health is recorded in electronic health records (EHRs). However, this information is not stored as free text but rather using standardised coding systems and free text. Specifically in the UK, we use SNOMED CT in primary care (GP practices) and ICD-10 and OPCS-4 in hospitals. I have previously written about SNOMED CT here so I won’t go into detail again but in brief:

All of these coding systems provide a standardised way to represent clinical concepts (e.g. diagnoses, symptoms, procedures) using unique alphanumeric codes, arranged in a tree hierarchy. For example, if we look in SNOMED CT Browser, “Cough” is represented by the code 49727002 and is a child of Findings of respiratory function (finding) (365852007), and is a parent of 40 more specific types of cough (e.g. Dry cough (finding) 11833005). We can imagine that this can be represented as a tree structure:

- Findings of respiratory function (finding) (365852007)
    - Cough (49727002)
        - Dry cough (finding) (11833005)
        - Nocturnal cough (finding) (418470008)
        - Cough with fever (finding) (135883003)
        - ... (37 more child codes)

ICD-10 and OPCS-4 have similar hierarchical structures but different coding schemes.

How are Codes Used in Healthcare?

In the UK, when you visit your GP or a hospital, the healthcare professional records information about your health using these coding systems but they are not usually thinking about the codes themselves, or even aware of them. Instead, they see clinical terms (e.g. “Cough”, “Type 2 Diabetes Mellitus”) in their electronic health record software and select the appropriate term from a list. The software then automatically stores the corresponding code in the background. It works a bit like autocomplete on your phone - as you start typing a word, it suggests possible completions, and you select the one that matches what you want to say. From a healthcare professional’s perspective, they are simply recording your symptoms, diagnoses, and treatments using familiar clinical terms, and this is a bit of a time saver for them too as they can select from a list rather than typing everything out in full.

Let’s take an example. Imagine you have a sore throat and fever, so you visit your GP. You tell your story of fever, sore throat, and difficulty swallowing. Your GP listens to your symptoms, examines your chest and throat, and decides that you likely have a bacterial throat infection, and prescribes a course of penicillin. As they type up their notes, they see terms like “Sore throat”, “Fever”, “Bacterial tonsillitis”, and “Penicillin” in their EHR software.

Your appointment can be summarised like this:

Symptom/Diagnosis/Treatment Code
Sore throat 267102003
Fever 386661006
Dry cough 11833005
Chest clear on auscultation 48348007
Temperature 703421000
Exudate on tonsils (finding) 301791008
Bacterial tonsillitis 703468005
Phenoxymethylpenicillin 372725003

In this example, your GP has recorded your symptoms, diagnosis, and treatment using standardised codes from SNOMED CT. This coded data is then stored in your electronic health record for future reference.

At this point, it is worth noting that free text is also recorded but this is not standardised and therefore much harder to use for research purposes. See my previous post on Clinic to Code to Care for more information on how this can be used in machine learning projects.

Why this is useful?

Using standardised coding systems has several advantages:

  • It ensures consistency and accuracy in recording clinical information across different healthcare settings and professionals. For example, you can imagine that if it was just free text finding all the people with “tonsillitis” would be tricky as some GPs might write “bacterial tonsillitis”, others “tonsil infection”, and others “tonsillitis”, nevermind all the possible typos and misspellings. Using codes ensures that everyone is on the same page.
  • It facilitates data sharing and interoperability between different healthcare systems and providers. For example, if you visit a new GP practice, they can access your previous records and understand your medical history because the codes are standardised.
  • It enables efficient data analysis and research by providing a structured way to extract and analyse clinical information. We are going to be talking about this in more detail shortly!

My main point here is that it is useful across multiple different areas from research to the logistics of running a healthcare system, where patients might move between different providers and settings.

What is a Codelist?

A codelist is a list of codes that represent a specific clinical concept. It is a way of collapsing a complex clinical idea into a set of discrete codes that can be used to represent that a disease state, symptom, or treatment is present in a patient’s record.

For example, if we wanted to find all the patients who had a diagnosis of tonsillitis, we would need to create a codelist that includes all the relevant codes for tonsillitis. This might include codes for bacterial tonsillitis, viral tonsillitis, recurrent tonsillitis, and so on. We don’t know which specific codes a GP might have used when recording tonsillitis, so we need to include all the possible codes that could represent that yes, this patient has tonsillitis.

Without a codelist, we would not be able to reliably identify all the patients with tonsillitis in our dataset.

Why is this harder than it sounds?

A single condition might have dozens or hundreds of relevant codes. For example, “Cough” alone has 40 child codes in SNOMED CT - dry cough, nocturnal cough, cough with fever, and so on. Clinicians choose codes based on what autocomplete offers them, not what researchers would find logical. Historical data contains retired codes that still need capturing. Your codelist decisions directly determine which patients end up in your study.

Clinical Nuance

Creating a codelist is not as simple as just picking a few codes that seem relevant. It typically requires some inbuilt clinical knowledge to understand what is relevant to include. For example, if you were creating a codelist for “Diabetes”, you would need to know that you should include codes for “Type 1 Diabetes Mellitus”, “Type 2 Diabetes Mellitus”, “Gestational Diabetes”, and so on. You would also need to know that you should include codes for complications of diabetes, such as “Diabetic Retinopathy” and “Diabetic Nephropathy”, as these patients also have diabetes. You can imagine if a patient had a diagnosis of diabetes from 30 years ago, and you can only get the last 10 years of their records, you might miss them as diabetic if you only included codes for “Diabetes” and not its complications or routine screening codes. Therefore you would need to have a good understanding of the clinical context to create a comprehensive codelist, and in this example, include things like “Diabetic foot examination” or “HbA1c measurement” codes. This is one reason why codelist creation is often a multidisciplinary effort, involving clinicians, epidemiologists, and data scientists.

Context of Recording

It also requires understanding of what motivates certain behaviours in recording in certain conditions. For example, for years, recording depression in the GP record triggered certain pathways in QOF (Quality and Outcomes Framework) payments, where certain actions had to be taken to ensure the patient was being treated appropriately. GPs did not always feel that these actions were clinically necessary - for example, maybe the patient didn’t need to be reviewed within the next two weeks, particularly if the depression was mild, situational and the patient was already receiving support elsewhere. However, to ensure that they were meeting QOF requirements, which affected their payments, GPs were incentivised to follow a strict pathway of care that didn’t always align with clinical judgement. Therefore, a lot of GPs would have recorded this patient not as depression but rather as low mood which did not trigger the same QOF requirements. If you were to create a codelist for depression that only included codes for depression, you would miss all these patients who were recorded as low mood but were in fact mildly depressed.

Mapping Between Coding Systems

Another challenge is that we have multiple coding systems in use, and sometimes we need to map between them. We also have older coding systems (such as READ codes). We might take a codelist that someone else created in an older coding system and need to map it across to SNOMED CT. This is not always straightforward, as there may not be a one-to-one mapping between codes in different systems. Some codes may map to multiple codes, or there may be no equivalent code in the other system. This requires careful consideration and sometimes clinical input to ensure that the mapping is accurate and comprehensive. SNOMED is very verbose whereas other coding systems like ICD-10 are more coarse, so you might lose some granularity when mapping from SNOMED to ICD-10. We will look at this in more detail in the next post when we work through building a codelist from scratch.

Research Implications

As we have just seen, knocking out a codelist is not a trivial task. The choices you make when creating your codelist can have significant implications for your research findings. It is my opinion that codelist choices are scientific decisions that affect reproducibility. Two researchers studying “cough” could get different populations if their codelists differ significantly.

We can see that it would be prudent to have a systematic approach to codelist creation, involving clinical experts, and where possible to use existing codelists that have been validated in previous studies. Why spend hours creating a cough codelist from scratch when someone else has already done it and published it from their research? I think it is uncontroversial to say that reusing existing codelists is a good practice, provided they are relevant to your research question and population.

One of the major issues here though is of trust - how do you know that the codelist that you are sharing with me is valid? This is where we need to think about not only sharing codelists but also sharing the methodology behind how they were created. This includes the original purpose of the codelist, the clinical rationale for including/excluding certain codes, and any validation that was performed. Without this, we are left to take the researcher’s word for it that their codelist is valid, which is not ideal for reproducibility, and inevitably forces researchers to recreate codelists from scratch rather than reusing existing ones.

Good metadata practices

Useful meta data to share alongside codelists includes:

  • Purpose of the codelist (e.g. identifying patients with asthma for a prevalence study in adults aged 18-65 in primary care)
  • Clinical rationale for inclusion/exclusion of certain codes (e.g. we excluded chronic cough codes as we were only interested in acute cough)
  • Who was involved in the codelist creation (e.g. clinicians, epidemiologists, data scientists).
  • Date of codelist creation and any updates
  • Validation performed (e.g. studies that have used this codelist and their findings)
  • Software/tools used to create the codelist (e.g. OpenCodelists)

Conclusion

In this blog, we have introduced what a codelist is, why they are needed, and how they are used in health data research. We have seen that creating a codelist is not a trivial task and requires clinical knowledge and understanding of the context in which the codes were recorded. The choices made when creating a codelist can have significant implications for research findings, and therefore it is important to have a systematic approach to codelist creation, involving clinical experts, and where possible to use existing validated codelists. Finally, sharing codelists with good metadata practices is essential for reproducibility and trust in research findings.

In the next blog post, I will be walking through some of the ways that codelists can be created. Stay tuned!

Further Reading

Related Posts

graph_1 yellow

A PhD in generating synthetic health data

This is an introduction to my PhD project and what I am hoping to achieve with it, which is to develop methods for generating realistic synthetic health data. This project is generously sponsored by SurrealDB, a multi-model database entirely written in Rust. I am using SurrealDB for a number of reasons, including its ability to do complex queries, vector searching and embedding functions that are useful for generating synthetic data.

Read More
brackets yellow

Accidental Functional Programming in Rust (From an Epidemiologist's Perspective)

Rust quietly pushes you into functional patterns. An epidemiologist explains Result, match, enums, iterators, and when readability beats idioms.

Read More
star green

Code Review for Research Code

An overview of how to conduct a code review for research code

Read More