Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records

Li C. Cheung, Qing Pan, Noorie Hyun, Mark Schiffman, Barbara Fetterman, Philip E. Castle, Thomas Lorey, Hormuzd A. Katki

Research output: Contribution to journalArticlepeer-review

25 Scopus citations

Abstract

For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within large health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and there can be pre-existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan–Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. We propose a general family of mixture models for undiagnosed prevalent disease and interval-censored incident disease that we call prevalence–incidence models. Parameters for parametric prevalence–incidence models, such as the logistic regression and Weibull survival (logistic–Weibull) model, are estimated by direct likelihood maximization or by EM algorithm. Non-parametric methods are proposed to calculate cumulative risks for cases without covariates. We compare naive Kaplan–Meier, logistic–Weibull, and non-parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan–Meier provided poor estimates while the logistic–Weibull model was a close fit to the non-parametric. Our findings support our use of logistic–Weibull models to develop the risk estimates that underlie current US risk-based cervical cancer screening guidelines. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

Original languageEnglish (US)
Pages (from-to)3583-3595
Number of pages13
JournalStatistics in Medicine
Volume36
Issue number22
DOIs
StatePublished - Sep 30 2017

Keywords

  • HPV
  • Kaplan–Meier
  • cervical cancer
  • cumulative risk estimation
  • prevalence–incidence models

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records'. Together they form a unique fingerprint.

Cite this