Identifying and Characterizing a Chronic Cough Cohort Through Electronic Health Records

Michael Weiner, Paul R. Dexter, Kim Heithoff, Anna R. Roberts, Ziyue Liu, Ashley Griffith, Siu Hui, Jonathan Schelfhout, Peter Dicpinigaitis, Ishita Doshi, Jessica P. Weaver

Research output: Contribution to journalArticlepeer-review

17 Scopus citations


Background: Chronic cough (CC) of 8 weeks or more affects about 10% of adults and may lead to expensive treatments and reduced quality of life. Incomplete diagnostic coding complicates identifying CC in electronic health records (EHRs). Natural language processing (NLP) of EHR text could improve detection. Research Question: Can NLP be used to identify cough in EHRs, and to characterize adults and encounters with CC? Study Design and Methods: A Midwestern EHR system identified patients aged 18 to 85 years during 2005 to 2015. NLP was used to evaluate text notes, except prescriptions and instructions, for mentions of cough. Two physicians and a biostatistician reviewed 12 sets of 50 encounters each, with iterative refinements, until the positive predictive value for cough encounters exceeded 90%. NLP, International Classification of Diseases, 10th revision, or medication was used to identify cough. Three encounters spanning 56 to 120 days defined CC. Descriptive statistics summarized patients and encounters, including referrals. Results: Optimizing NLP required identifying and eliminating cough denials, instructions, and historical references. Of 235,457 cough encounters, 23% had a relevant diagnostic code or medication. Applying chronicity to cough encounters identified 23,371 patients (61% women) with CC. NLP alone identified 74% of these patients; diagnoses or medications alone identified 15%. The positive predictive value of NLP in the reviewed sample was 97%. Referrals for cough occurred for 3.0% of patients; pulmonary medicine was most common initially (64% of referrals). Limitations: Some patients with diagnosis codes for cough, encounters at intervals greater than 4 months, or multiple acute cough episodes may have been misclassified. Interpretation: NLP successfully identified a large cohort with CC. Most patients were identified through NLP alone, rather than diagnoses or medications. NLP improved detection of patients nearly sevenfold, addressing the gap in ability to identify and characterize CC disease burden. Nearly all cases appeared to be managed in primary care. Identifying these patients is important for characterizing treatment and unmet needs.

Original languageEnglish (US)
Pages (from-to)2346-2355
Number of pages10
Issue number6
StatePublished - Jun 2021


  • chronic cough
  • electronic health records
  • natural language processing
  • structured data
  • unstructured data

ASJC Scopus subject areas

  • Pulmonary and Respiratory Medicine
  • Critical Care and Intensive Care Medicine
  • Cardiology and Cardiovascular Medicine


Dive into the research topics of 'Identifying and Characterizing a Chronic Cough Cohort Through Electronic Health Records'. Together they form a unique fingerprint.

Cite this