PopCluster: An algorithm to identify genetic variants with ethnicity-dependent effects

Anastasia Gurinovich; Harold Bae; John J. Farrell; Stacy L. Andersen; Stefano Monti; Annibale Puca; Gil Atzmon; Nir Barzilai; Thomas T. Perls; Paola Sebastiani

doi:10.1093/bioinformatics/btz017

PopCluster: An algorithm to identify genetic variants with ethnicity-dependent effects

Anastasia Gurinovich, Harold Bae, John J. Farrell, Stacy L. Andersen, Stefano Monti, Annibale Puca, Gil Atzmon, Nir Barzilai, Thomas T. Perls, Paola Sebastiani

Medicine

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Motivation: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.

Original language	English (US)
Pages (from-to)	3046-3054
Number of pages	9
Journal	Bioinformatics
Volume	35
Issue number	17
DOIs	https://doi.org/10.1093/bioinformatics/btz017
State	Published - Sep 1 2019

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btz017

Cite this

@article{53d26b63406140688fea5e4938e1052b,

title = "PopCluster: An algorithm to identify genetic variants with ethnicity-dependent effects",

abstract = "Motivation: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.",

author = "Anastasia Gurinovich and Harold Bae and Farrell, {John J.} and Andersen, {Stacy L.} and Stefano Monti and Annibale Puca and Gil Atzmon and Nir Barzilai and Perls, {Thomas T.} and Paola Sebastiani",

note = "Publisher Copyright: {\textcopyright} 2019 The Author(s).",

year = "2019",

month = sep,

day = "1",

doi = "10.1093/bioinformatics/btz017",

language = "English (US)",

volume = "35",

pages = "3046--3054",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "17",

}

TY - JOUR

T1 - PopCluster

T2 - An algorithm to identify genetic variants with ethnicity-dependent effects

AU - Gurinovich, Anastasia

AU - Bae, Harold

AU - Farrell, John J.

AU - Andersen, Stacy L.

AU - Monti, Stefano

AU - Puca, Annibale

AU - Atzmon, Gil

AU - Barzilai, Nir

AU - Perls, Thomas T.

AU - Sebastiani, Paola

PY - 2019/9/1

Y1 - 2019/9/1

N2 - Motivation: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.

AB - Motivation: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.

UR - http://www.scopus.com/inward/record.url?scp=85072058253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072058253&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz017

DO - 10.1093/bioinformatics/btz017

M3 - Article

C2 - 30624692

AN - SCOPUS:85072058253

SN - 1367-4803

VL - 35

SP - 3046

EP - 3054

JO - Bioinformatics

JF - Bioinformatics

IS - 17

ER -

PopCluster: An algorithm to identify genetic variants with ethnicity-dependent effects

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this