Handling missing data by deleting completely observed records

Myunghee Cho Paik, Cuiling Wang

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.

Original languageEnglish (US)
Pages (from-to)2341-2350
Number of pages10
JournalJournal of Statistical Planning and Inference
Issue number7
StatePublished - Jul 1 2009


  • Deletion method
  • Inverse probability weighting
  • Missing data

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Applied Mathematics


Dive into the research topics of 'Handling missing data by deleting completely observed records'. Together they form a unique fingerprint.

Cite this