Summary

Identifying cases of disease in large databases is important for surveillance, research, quality improvement and clinical care. Case definitions can be created from single or combinations of information including diagnostic codes, medications, condition-specific service claims and laboratory results.

General practice data are a rich information source which contains a range of information to enable identification of whether a person has a particular condition or not. Although general practice data are generally not available in linked data sets, these data often contain markers of diabetes status that are commonly found in such data sets. The aim of this report is to explore approaches for diabetes mellitus (diabetes) case definitions using markers of diabetes status including diabetes-specific prescriptions, pathology tests and Medicare Benefits Schedule (MBS) service items recorded in general practice data. The identified diabetes case definitions can be used (or refined where necessary) to identify people with diabetes, in linked data collections that include these diabetes status markers.

All analyses were conducted using a 10% sample of MedicineInsight, a general practice data set. The diabetes definition from the MedicineInsight condition flag (standard definition) was considered as the reference standard and case definitions using diabetes markers recorded in MedicineInsight were compared against this standard definition. Case definitions for diabetes, type 1 and type 2 diabetes were identified. While type 1 diabetes affects people of all ages, to improve the statistical power of identifying type 1 diabetes based on the assessed diabetes markers the analysis for type 1 diabetes was limited to people aged under 35. Analyses for all diabetes and type 2 diabetes included people of all ages.

The goal was to identify diabetes case definitions that minimise misclassification risk without compromising the predictive power (precision, as measured by the positive predictive value (PPV)) of the diabetes markers for detecting diabetes. Definitions with a very high PPV for identifying diabetes were preferred.

This information will help in understanding criteria for identifying people with diagnosed diabetes that can be applied to other data sets with similar diabetes markers, particularly linked data.

Key findings from the validation of algorithms for diabetes case definition

Findings from this analysis show that approaches using a combination of diabetes markers provide robust definitions for diabetes with very high precision (≥ 90%) and acceptable sensitivity (> 60%). The diabetes case definition with a minimum of one diabetes prescription and at least one HbA1c test each with a gap of 6 months to another HbA1c test had the highest precision (PPV 96%) and sensitivity of 61%. The probability that a person meeting this definition has diabetes (positive probability) is very high at 96% and the probability that a person not meeting this definition has diabetes (negative probability) is low (3%), suggesting that this definition is good for identifying people with diabetes. A sensitivity of 70% and PPV of 92% was observed for the algorithm with at least one diabetes prescription and 2 or more HbA1c tests recorded any time during the study period (positive probability 93%, negative probability 2%).

Single markers such as diabetes-specific prescriptions and MBS items had high precision, but sensitivity was very low. This suggests that most people with each single marker had diabetes according to the MedicineInsight standard definition, but of those identified as having diabetes by the standard definition the proportion who had each single marker recorded was small.

Increasing the minimum number of records for each marker during the study period improves precision, but results in very low sensitivity.

Key findings from case definitions for diabetes type ascertainment

Using a minimum of 2 prescriptions for insulin only (without other diabetes medicines) at any time during the study period for people aged under 35 had the highest precision for identifying type 1 diabetes (91%) and sensitivity of 67%. The probability that a person not meeting this definition has type 1 diabetes is very low at 0.1%. The definition for type 1 diabetes with the highest sensitivity (79%) and PPV of 85% was having at least one prescription for insulin only (without other diabetes medicines) during the study period (negative probability 0.1%).

Using the study population containing all age groups, a minimum of one diabetes prescription with people prescribed insulin only excluded and one or more HbA1c test each with a 6-month gap of another HbA1c test had the highest precision (91%) of identifying type 2 diabetes and sensitivity of 61%. The probability that a person meeting this definition has type 2 diabetes is 91% and the probability that a person not meeting this definition has type 2 diabetes is 3%. The definition algorithm for type 2 diabetes with the highest sensitivity of 76% and PPV of 82% was having at least one diabetes prescription (excluding people with only insulin prescriptions) and at least one HbA1c test (positive probability 84%, negative probability 1%).

Conclusion

Findings from this analysis indicate that approaches using a combination of markers of diabetes status provide better capture of people with diagnosed diabetes.

We have identified potential case definition algorithms that can be used to identify people with diabetes in a cohort of people attending primary care. While the definitions for diabetes and type 2 diabetes included people of all ages, case definitions for type 1 diabetes were limited to people aged under 35. Therefore, the findings should be interpreted accordingly.

It is important to note that the diabetes case definitions and validity estimates observed in the current analysis might vary with those from other data sets due to some differences in the diabetes markers in the data used and other administrative data sets such as the Pharmaceutical Benefits Scheme (PBS) and MBS. Moreover, performance characteristics like precision are influenced by the prevalence of the condition in the study population, which might limit the generalisability of the findings in study populations with different prevalence estimates. Using the same case definition, PPV could decrease and negative predictive value (NPV) increase in a setting where the prevalence of diabetes is lower than that observed in this study.

In this analysis the diabetes marker definition algorithms were compared to the MedicineInsight standard definition (diabetes definition from the MedicineInsight condition flags). This reference standard may have limitations if there is incomplete recording of diabetes in the MedicineInsight data fields used for the standard definition. A reference standard with limitations can introduce measurement error in the analysis and the performance of the definition algorithms depends on the quality of the reference standard.

Nevertheless, the algorithms in this report provide approaches for diabetes case definition that could be utilised (or refined where necessary) to identify people with diabetes in data collections that include these diabetes status markers. These insights can supplement the existing data sources, that is the National Diabetes Services Scheme (NDSS), in identifying people with diagnosed diabetes, particularly in linked data collections, thus enabling better estimation of its prevalence and further monitoring. This is important for implementing policies for prevention and management as well as proper resource allocation.