Improving functional annotation of no n-s yno no mo us s nps with information theory

R. Karchin; L. Kelly; A. Sali

Improving functional annotation of no n-s yno no mo us s nps with information theory

R. Karchin, L. Kelly, A. Sali

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Automated functional annotation of nsSNPs requires that amino-acid residue changes are represented by a set of descriptive features, such as evolutionary conservation, side-chain volume change, effect on ligand-binding, and residue structural rigidity. Identifying the most informative combinations of features is critical to the success of a computational prediction method. We rank 32 features according to their mutual information with functional effects of amino-acid substitutions, as measured by in vivo assays. In addition, we use a greedy algorithm to identify a subset of highly informative features [1], The method is simple to implement and provides a quantitative measure for selecting the best predictive features given a set of features that a human expert believes to be informative. We demonstrate the usefulness of the selected highly informative features by cross-validated tests of a computational classifier, a support vector machine (SVM). The SVM's classification accuracy is highly correlated with the ranking of the input features by their mutual information. Two features describing the solvent accessibility of "wild-type" and "mutant" amino-acid residues and one evolutionary feature based on superfamily-level multiple alignments produce comparable overall accuracy and 6% fewer false positives than a 32- feature set that considers physiochemical properties of amino acids, protein electrostatics, amino-acid residue flexibility, and binding interactions.

Original language	English (US)
Title of host publication	Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005
Pages	397-408
Number of pages	12
State	Published - 2005
Externally published	Yes
Event	10th Pacific Symposium on Biocomputing, PSB 2005 - Big Island of Hawaii, United States Duration: Jan 4 2005 → Jan 8 2005

Publication series

Name	Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005

Other

Other	10th Pacific Symposium on Biocomputing, PSB 2005
Country/Territory	United States
City	Big Island of Hawaii
Period	1/4/05 → 1/8/05

ASJC Scopus subject areas

Computational Theory and Mathematics
Biomedical Engineering

Cite this

Karchin, R, Kelly, L & Sali, A 2005, Improving functional annotation of no n-s yno no mo us s nps with information theory. in Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005. Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005, pp. 397-408, 10th Pacific Symposium on Biocomputing, PSB 2005, Big Island of Hawaii, United States, 1/4/05.

@inproceedings{27bf0422208547599acfdea2889770aa,

title = "Improving functional annotation of no n-s yno no mo us s nps with information theory",

abstract = "Automated functional annotation of nsSNPs requires that amino-acid residue changes are represented by a set of descriptive features, such as evolutionary conservation, side-chain volume change, effect on ligand-binding, and residue structural rigidity. Identifying the most informative combinations of features is critical to the success of a computational prediction method. We rank 32 features according to their mutual information with functional effects of amino-acid substitutions, as measured by in vivo assays. In addition, we use a greedy algorithm to identify a subset of highly informative features [1], The method is simple to implement and provides a quantitative measure for selecting the best predictive features given a set of features that a human expert believes to be informative. We demonstrate the usefulness of the selected highly informative features by cross-validated tests of a computational classifier, a support vector machine (SVM). The SVM's classification accuracy is highly correlated with the ranking of the input features by their mutual information. Two features describing the solvent accessibility of {"}wild-type{"} and {"}mutant{"} amino-acid residues and one evolutionary feature based on superfamily-level multiple alignments produce comparable overall accuracy and 6% fewer false positives than a 32- feature set that considers physiochemical properties of amino acids, protein electrostatics, amino-acid residue flexibility, and binding interactions.",

author = "R. Karchin and L. Kelly and A. Sali",

year = "2005",

language = "English (US)",

isbn = "9812560467",

series = "Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005",

pages = "397--408",

booktitle = "Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005",

note = "10th Pacific Symposium on Biocomputing, PSB 2005 ; Conference date: 04-01-2005 Through 08-01-2005",

}

TY - GEN

T1 - Improving functional annotation of no n-s yno no mo us s nps with information theory

AU - Karchin, R.

AU - Kelly, L.

AU - Sali, A.

PY - 2005

Y1 - 2005

N2 - Automated functional annotation of nsSNPs requires that amino-acid residue changes are represented by a set of descriptive features, such as evolutionary conservation, side-chain volume change, effect on ligand-binding, and residue structural rigidity. Identifying the most informative combinations of features is critical to the success of a computational prediction method. We rank 32 features according to their mutual information with functional effects of amino-acid substitutions, as measured by in vivo assays. In addition, we use a greedy algorithm to identify a subset of highly informative features [1], The method is simple to implement and provides a quantitative measure for selecting the best predictive features given a set of features that a human expert believes to be informative. We demonstrate the usefulness of the selected highly informative features by cross-validated tests of a computational classifier, a support vector machine (SVM). The SVM's classification accuracy is highly correlated with the ranking of the input features by their mutual information. Two features describing the solvent accessibility of "wild-type" and "mutant" amino-acid residues and one evolutionary feature based on superfamily-level multiple alignments produce comparable overall accuracy and 6% fewer false positives than a 32- feature set that considers physiochemical properties of amino acids, protein electrostatics, amino-acid residue flexibility, and binding interactions.

AB - Automated functional annotation of nsSNPs requires that amino-acid residue changes are represented by a set of descriptive features, such as evolutionary conservation, side-chain volume change, effect on ligand-binding, and residue structural rigidity. Identifying the most informative combinations of features is critical to the success of a computational prediction method. We rank 32 features according to their mutual information with functional effects of amino-acid substitutions, as measured by in vivo assays. In addition, we use a greedy algorithm to identify a subset of highly informative features [1], The method is simple to implement and provides a quantitative measure for selecting the best predictive features given a set of features that a human expert believes to be informative. We demonstrate the usefulness of the selected highly informative features by cross-validated tests of a computational classifier, a support vector machine (SVM). The SVM's classification accuracy is highly correlated with the ranking of the input features by their mutual information. Two features describing the solvent accessibility of "wild-type" and "mutant" amino-acid residues and one evolutionary feature based on superfamily-level multiple alignments produce comparable overall accuracy and 6% fewer false positives than a 32- feature set that considers physiochemical properties of amino acids, protein electrostatics, amino-acid residue flexibility, and binding interactions.

UR - http://www.scopus.com/inward/record.url?scp=15944417881&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=15944417881&partnerID=8YFLogxK

M3 - Conference contribution

C2 - 15759645

AN - SCOPUS:15944417881

SN - 9812560467

SN - 9789812560469

T3 - Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005

SP - 397

EP - 408

BT - Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005

T2 - 10th Pacific Symposium on Biocomputing, PSB 2005

Y2 - 4 January 2005 through 8 January 2005

ER -

Improving functional annotation of no n-s yno no mo us s nps with information theory

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this