Simpson's paradox - Aggregating and partitioning populations in health disparities of lung cancer patients

P. Fu, A. Panneerselvam, B. Clifford, A. Dowlati, P. C. Ma, G. Zeng, B. Halmos, R. S. Leidner

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


It is well known that non-small cell lung cancer (NSCLC) is a heterogeneous group of diseases. Previous studies have demonstrated genetic variation among different ethnic groups in the epidermal growth factor receptor (EGFR) in NSCLC. Research by our group and others has recently shown a lower frequency of EGFR mutations in African Americans with NSCLC, as compared to their White counterparts. In this study, we use our original study data of EGFR pathway genetics in African American NSCLC as an example to illustrate that univariate analyses based on aggregation versus partition of data leads to contradictory results, in order to emphasize the importance of controlling statistical confounding. We further investigate analytic approaches in logistic regression for data with separation, as is the case in our example data set, and apply appropriate methods to identify predictors of EGFR mutation. Our simulation shows that with separated or nearly separated data, penalized maximum likelihood (PML) produces estimates with smallest bias and approximately maintains the nominal value with statistical power equal to or better than that from maximum likelihood and exact conditional likelihood methods. Application of the PML method in our example data set shows that race and EGFR-FISH are independently significant predictors of EGFR mutation.

Original languageEnglish (US)
Pages (from-to)937-948
Number of pages12
JournalStatistical Methods in Medical Research
Issue number6
StatePublished - Dec 1 2015
Externally publishedYes


  • Simpson's paradox
  • data with separation
  • exact logistic regression
  • penalized likelihood
  • targeted therapy

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management


Dive into the research topics of 'Simpson's paradox - Aggregating and partitioning populations in health disparities of lung cancer patients'. Together they form a unique fingerprint.

Cite this