Repurposing Billing And Administrative Terminologies As Instruments Of Public Health: Lessons From The COVID-19 Pandemic

A clinical terminology is a taxonomy of terms used to standardize the storage, retrieval, and exchange of electronic medical data. Many widely used terminologies such as ICD-10 (International Classification of Diseases, Tenth Revision, Clinical Modification, a set of diagnosis codes), CPT (Current Procedural Terminology, a set of procedure codes), and SNOMED (Systemized Nomenclature of Medicine—Clinical Terms, a comprehensive set of medical concepts) are used for various purposes such as administrative coding, procedural billing, and standardizing clinical documentation. Many of these terminologies have also played important behind-the-scenes roles in the COVID-19 pandemic. Beyond their original intended everyday administrative and clinical purposes, they have taken on additional roles in enabling research and surveillance, and providing incentives for new ways of delivering care in response to the pandemic. In some ways, the pandemic has revealed how the design and accuracy of clinical terminologies is a public health consideration: Administrative and billing codes are the foundation upon which needed real-world data are passively generated and often used to inform public health decisions.

Despite the importance of these data sources, COVID-19 has also exposed how their limited accuracy and timeliness hinder our ability to generate public health insights. In fact, improving the accuracy and timeliness of terminology-based data may further strengthen the value they offer to public health efforts. In this piece, we describe the role these terminologies played during the COVID-19 pandemic and, more broadly, how they may serve as an instrument for public health going forward. We also identify key limitations of clinical terminology-based data that the pandemic exposed. Finally, we discuss possible approaches that policy makers, health information technology (IT) vendors, health systems, and terminology oversight bodies might take to improve these real-world structured data sources and the infrastructure supporting them. Applying these lessons from the COVID-19 pandemic will ensure that clinical terminologies can be mobilized more quickly to tackle future public health challenges.


Generating Observational Data

Many widely used diagnosis and procedural terminologies have introduced new codes throughout the pandemic to more accurately capture new conditions and new care activities. One positive consequence of these new codes is an abundance of new observational data on COVID-19 that are now available for researchers to study. In early 2020, ICD-10, CPT, and many other clinical terminologies were quickly amended to include new COVID-19-specific codes that aid in billing and documenting COVID-19 patient care. For example, U07.1 (COVID-19, virus identified) was added to ICD-10 shortly after the start of the pandemic. Beyond documentation and billing, these new codes have also served as the backbone for many extremely important research and public health surveillance activities; new COVID-19 codes were tremendously helpful in powering observational studies, characterizing the illness’s phenotype, tracking disease prevalence, and answering other important epidemiological questions. As the US begins to face the aftereffects of the pandemic, many of these terminologies have announced plans to introduce additional timely codes that reflect current challenges including post-COVID-19 long-hauler care and vaccine administration. This suggests that clinical terminologies will continue to help collect relevant observational data from which important public health insights can be generated.

Application of clinical terminologies at the point of care may, in fact, be among the most efficient ways to generate near-real-time machine-readable real-world data at scale. Alternatives such as manual extraction of data from electronic health records (EHRs) would require a tremendous investment of time, funding, and human capital. In addition, post-hoc use of natural language processing algorithms that can “read” unstructured data in patient charts are likely to miss important contextual information from the patient encounter. 

Enabling New Health Care Delivery Behaviors

The COVID-19 pandemic was a catalyst for telehealth adoption: Many visits that were once in-person became virtual out of necessity for patient and provider safety. Prior to the COVID-19 pandemic, health care services delivered via telehealth were not reimbursed at the same level as in-person care. Soon after the start of the pandemic, the Centers for Medicare and Medicaid Services (CMS) recognized the need to incentivize the shift toward virtual care. As part of their strategy to make this shift, they used billing terminologies to capture relevant data on virtual care activities. Several activities not previously captured through reimbursable codes, such as virtual visits and vaccine administration, had their fee schedule revised so they could be reimbursed if coded correctly. Ensuring these new care delivery models could be coded and billed was an important step in ensuring provider and payer alignment during the early stages of the pandemic.

Applying Variant-Specific Research And Surveillance

As the COVID-19 pandemic evolves, surveillance and early detection of isolated variant strain cases will remain ever-important to help contain them before they escalate into widespread outbreaks. Ongoing variant-specific research will also be important to help characterize the phenotype, illness severity, and transmissibility of each variant. While there are many challenges that ongoing variant surveillance faces in the US—including the availability of federal funding, current genomic sequencing capacity, public health infrastructure, and more—establishing a common data model is also an important consideration.

Some terminologies are designed for automated documentation and reporting and thus may enable highly specific variant-level surveillance, tracking, and research. LOINC, a widely used terminology created by the Regenstrief Institute that includes laboratory test results, has pre-released new codes that correspond to PCR laboratory results for specific COVID-19 variants. The advantage of using LOINC for variant reporting is that its codes can be automatically generated and reported by a laboratory information system (LIS) processing high volumes of test results. While these new variant-specific codes would still require one-time mapping to unstructured laboratory data and incorporation into the LIS by IT support personnel, this approach would otherwise require minimal investment of human capital and time to implement.

Challenges: Accuracy And Timeliness

The value of administrative claims data is limited by inaccuracy and delayed code entry. During the pandemic, there were significant delays in the entry of ICD-10 codes used to identify COVID-19 patients, limiting the real-time accuracy of administrative claims data for this purpose. The same delays prohibit more effective data collection for many other clinical conditions. Even after accounting for delayed entry of codes, these data still often exhibit some missingness.

The accuracy of these data is likely poor for a number of reasons. One potential explanation is that coding and documentation guidelines are constantly changing. It is difficult for health systems, providers, and EHR vendors to keep up with all these changes, even when health information management professionals in health systems are diligent in ensuring compliance. During the pandemic it was particularly difficult for health systems to keep up with ever-changing diagnostic and procedural coding recommendations from CMS and others. Another potential reason for the poor clinical fidelity of these data is that providers’ documentation can be influenced by their clinical suspicion (even in cases where a diagnosis is not confirmed) and also financial incentives such as insurance coverage or reimbursement policies.

An additional consideration affecting the accuracy of these data is the granularity of the clinical terminology itself: Claims data research can only be as specific as the terminology allows it to be. Unfortunately, it is not practical for all diagnoses to be assigned unique and explicit codes. The absence of a unique code for a specific condition limits our ability to identify it in structured clinical data, track it, and study it.

Potential Solutions: Next Steps For Researchers, Policy Makers, And Others

There are many possible ways to improve the accuracy and timeliness of clinical terminology-based data. Reimbursement policies have the power to shift coding accuracy: Assigning reimbursement value to specific health care activities and their associated code sets can be leveraged to encourage health systems to capture these codes more accurately and in a more timely fashion. Another force driving better claims data is a shift toward transparency. As patients gain increasing control over their own medical data, health systems may be incentivized to maintain billing data as accurately as possible given that these data may be scrutinized by parties that the patient chooses to share their data with.

Consolidation of claims data sources across payers and providers may provide a more complete view. Portions of medical data on one patient fragmented across multiple different institutions means that no one institution has a clinically accurate picture of the patient. This was a challenge during the pandemic, as codes for COVID-19 PCR test results and vaccine administration were siloed across different health systems, making it very difficult for institutions such as the Veterans Health Administration (VHA) to accurately measure infection status or vaccination status among veterans who received testing, vaccination, or other care outside the VHA network. Based on these experiences, the Association of Public Health Laboratories recently called for the consolidation of single-institution data and the creation of a national, publicly accessible database of COVID-19 case metadata to inform future research and public health decisions.

These data may also benefit from having consistent, streamlined processes for updating clinical terminologies themselves. While most widely used terminologies such as ICD-10-CM, CPT, and SNOMED have procedures in place to update their codes one–three times per year, their priorities and methods for considering which codes to add or remove are all different, resulting in content disparities across terminologies. To add further complexity, these terminologies are often updated at different times of the year, forcing health systems, health IT vendors, and providers to readjust each time one terminology is updated (for example, diagnosis codes for COVID-19 were added at different times). Organizations involved in medical terminology should strive toward one consolidated, universal update process at regular intervals.

Finally, clinical terminology-based data can also be used to accurately identify conditions without explicit codes. In cases where no unique code exists for a condition of interest, some research groups have developed alternative methods to identify them in administrative claims data. These studies have combined existing claims data with machine learning-based inference techniques to identify conditions without a specific code. A proof of concept for this approach has been demonstrated in other specialties and has also been applied to diagnosing severe phenotypes of COVID-19. These methods could be applied to future “un-codeable” conditions of interest to public health researchers as well.


Throughout the COVID-19 pandemic, many clinical terminologies—often designed originally for documentation and billing—have also played important roles in informing and enforcing public health efforts through passive observational data generation, enabling incentives for new modes of health care delivery, and in variant-level surveillance and research. The pandemic has also highlighted the ways in which different terminologies complement one another. Some terminologies dependent on provider documentation such as ICD-10 and CPT are most useful if kept broad, while others such as LOINC are well-suited for a higher level of granularity. The limited accuracy and timeliness of data gathered using these terminologies are challenges; some ways to address these challenges include the strategic use of reimbursement policies to incentivize timely coding, increasing the transparency of billing data to promote accountability, consolidation of terminology-based data across institutions, streamlined terminology update procedures, and the use of unique approaches to identify conditions that cannot be explicitly coded.

Despite these limitations, the COVID-19 pandemic has highlighted how clinical terminologies can be an important instrument for public health. Lessons from their utility during the pandemic should be used to address other public health challenges as well. Furthermore, there is a pressing need for policy makers, researchers, public health agencies, health systems, and governing bodies of these clinical terminologies to address their limitations using the strategies described in this piece so that their benefits can be realized more quickly for future public health issues.

Authors’ Note

Jayson S. Marwaha is supported by T15LM007092 from the US National Library of Medicine/National Institutes of Health, and the Biomedical Informatics and Data Science Research Training (BIRT) Program of Harvard University. William J. Gordon reports consulting income from the Office of the National Coordinator for Health IT and Novocardia, Inc., both outside the scope of this work.

Laisser un commentaire