A computational approach for identifying pseudogenes in the ENCODE regions.

Deyou Zheng; Mark B. Gerstein

A computational approach for identifying pseudogenes in the ENCODE regions.

Deyou Zheng, Mark B. Gerstein

Research output: Contribution to journal › Article › peer-review

Abstract

BACKGROUND: Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions). RESULTS: Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications. CONCLUSION: Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.

Original language	English (US)
Pages (from-to)	S13.1-10
Journal	Genome biology
Volume	7 Suppl 1
State	Published - 2006
Externally published	Yes

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Genetics
Cell Biology

Cite this

@article{a38a7638e5de4b6595ae275089d1e01b,

title = "A computational approach for identifying pseudogenes in the ENCODE regions.",

abstract = "BACKGROUND: Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions). RESULTS: Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications. CONCLUSION: Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.",

author = "Deyou Zheng and Gerstein, {Mark B.}",

year = "2006",

language = "English (US)",

volume = "7 Suppl 1",

pages = "S13.1--10",

journal = "Genome biology",

issn = "1474-7596",

publisher = "BioMed Central",

}

TY - JOUR

T1 - A computational approach for identifying pseudogenes in the ENCODE regions.

AU - Zheng, Deyou

AU - Gerstein, Mark B.

PY - 2006

Y1 - 2006

N2 - BACKGROUND: Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions). RESULTS: Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications. CONCLUSION: Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.

AB - BACKGROUND: Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions). RESULTS: Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications. CONCLUSION: Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.

UR - http://www.scopus.com/inward/record.url?scp=33748664359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748664359&partnerID=8YFLogxK

M3 - Article

C2 - 16925835

AN - SCOPUS:33748664359

SN - 1474-7596

VL - 7 Suppl 1

SP - S13.1-10

JO - Genome biology

JF - Genome biology

ER -

A computational approach for identifying pseudogenes in the ENCODE regions.

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this