Genotype copy number variations using Gaussian mixture models: Theory and algorithms

Chang Yun Lin; Yungtai Lo; Kenny Q. Ye

doi:10.1515/1544-6115.1725

Genotype copy number variations using Gaussian mixture models: Theory and algorithms

Chang Yun Lin, Yungtai Lo, Kenny Q. Ye

Epidemiology & Population Health

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Copy number variations (CNVs) are important in the disease association studies and are usually targeted by most recent microarray platforms developed for GWAS studies. However, the probes targeting the same CNV regions could vary greatly in performance, with some of the probes carrying little information more than pure noise. In this paper, we investigate how to best combine measurements of multiple probes to estimate copy numbers of individuals under the framework of Gaussian mixture model (GMM). First we show that under two regularity conditions and assume all the parameters except the mixing proportions are known, optimal weights can be obtained so that the univariate GMM based on the weighted average gives the exactly the same classification as the multivariate GMM does. We then developed an algorithm that iteratively estimates the parameters and obtains the optimal weights, and uses them for classification. The algorithm performs well on simulation data and two sets of real data, which shows clear advantage over classification based on the equal weighted average.

Original language	English (US)
Article number	5
Journal	Statistical Applications in Genetics and Molecular Biology
Volume	11
Issue number	5
DOIs	https://doi.org/10.1515/1544-6115.1725
State	Published - Sep 2012

Keywords

Common copy number variants
EM algorithm
Microarray
Rate of correct classification

ASJC Scopus subject areas

Statistics and Probability
Molecular Biology
Genetics
Computational Mathematics

Access to Document

10.1515/1544-6115.1725

Cite this

@article{5f367f9f3c8d4c448aef6f56c41ee082,

title = "Genotype copy number variations using Gaussian mixture models: Theory and algorithms",

abstract = "Copy number variations (CNVs) are important in the disease association studies and are usually targeted by most recent microarray platforms developed for GWAS studies. However, the probes targeting the same CNV regions could vary greatly in performance, with some of the probes carrying little information more than pure noise. In this paper, we investigate how to best combine measurements of multiple probes to estimate copy numbers of individuals under the framework of Gaussian mixture model (GMM). First we show that under two regularity conditions and assume all the parameters except the mixing proportions are known, optimal weights can be obtained so that the univariate GMM based on the weighted average gives the exactly the same classification as the multivariate GMM does. We then developed an algorithm that iteratively estimates the parameters and obtains the optimal weights, and uses them for classification. The algorithm performs well on simulation data and two sets of real data, which shows clear advantage over classification based on the equal weighted average.",

keywords = "Common copy number variants, EM algorithm, Microarray, Rate of correct classification",

author = "Lin, {Chang Yun} and Yungtai Lo and Ye, {Kenny Q.}",

note = "Funding Information: KEYWORDS: microarray, rate of correct classification, common copy number variants, EM algorithm Author Notes: The research of CYL is supported by NIH P41 HG004222-01. The research of KY is in part supported by NIH P41 HG004222-01 and Simons Foundation. The authors would like to thank Drs. Dan Levy and Yunha Lee in Wigler Lab of Cold Spring Harbor Laboratory for their helps on sharing the NimbleGen HD2 data. We would also like to thank Dr. Kith Pradhan and Dr. Tao Wang for useful discussions. For all correspondence, please contact Dr. Kenny Q. Ye.",

year = "2012",

month = sep,

doi = "10.1515/1544-6115.1725",

language = "English (US)",

volume = "11",

journal = "Statistical Applications in Genetics and Molecular Biology",

issn = "1544-6115",

publisher = "Berkeley Electronic Press",

number = "5",

}

TY - JOUR

T1 - Genotype copy number variations using Gaussian mixture models

T2 - Theory and algorithms

AU - Lin, Chang Yun

AU - Lo, Yungtai

AU - Ye, Kenny Q.

N1 - Funding Information: KEYWORDS: microarray, rate of correct classification, common copy number variants, EM algorithm Author Notes: The research of CYL is supported by NIH P41 HG004222-01. The research of KY is in part supported by NIH P41 HG004222-01 and Simons Foundation. The authors would like to thank Drs. Dan Levy and Yunha Lee in Wigler Lab of Cold Spring Harbor Laboratory for their helps on sharing the NimbleGen HD2 data. We would also like to thank Dr. Kith Pradhan and Dr. Tao Wang for useful discussions. For all correspondence, please contact Dr. Kenny Q. Ye.

PY - 2012/9

Y1 - 2012/9

N2 - Copy number variations (CNVs) are important in the disease association studies and are usually targeted by most recent microarray platforms developed for GWAS studies. However, the probes targeting the same CNV regions could vary greatly in performance, with some of the probes carrying little information more than pure noise. In this paper, we investigate how to best combine measurements of multiple probes to estimate copy numbers of individuals under the framework of Gaussian mixture model (GMM). First we show that under two regularity conditions and assume all the parameters except the mixing proportions are known, optimal weights can be obtained so that the univariate GMM based on the weighted average gives the exactly the same classification as the multivariate GMM does. We then developed an algorithm that iteratively estimates the parameters and obtains the optimal weights, and uses them for classification. The algorithm performs well on simulation data and two sets of real data, which shows clear advantage over classification based on the equal weighted average.

AB - Copy number variations (CNVs) are important in the disease association studies and are usually targeted by most recent microarray platforms developed for GWAS studies. However, the probes targeting the same CNV regions could vary greatly in performance, with some of the probes carrying little information more than pure noise. In this paper, we investigate how to best combine measurements of multiple probes to estimate copy numbers of individuals under the framework of Gaussian mixture model (GMM). First we show that under two regularity conditions and assume all the parameters except the mixing proportions are known, optimal weights can be obtained so that the univariate GMM based on the weighted average gives the exactly the same classification as the multivariate GMM does. We then developed an algorithm that iteratively estimates the parameters and obtains the optimal weights, and uses them for classification. The algorithm performs well on simulation data and two sets of real data, which shows clear advantage over classification based on the equal weighted average.

KW - Common copy number variants

KW - EM algorithm

KW - Microarray

KW - Rate of correct classification

UR - http://www.scopus.com/inward/record.url?scp=84874971703&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874971703&partnerID=8YFLogxK

U2 - 10.1515/1544-6115.1725

DO - 10.1515/1544-6115.1725

M3 - Article

C2 - 23079517

AN - SCOPUS:84874971703

SN - 1544-6115

VL - 11

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

IS - 5

M1 - 5

ER -

Genotype copy number variations using Gaussian mixture models: Theory and algorithms

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this