Protein (multi-)location prediction: Utilizing interdependencies via a generative model

Ramanuja Simha, Sebastian Briesemeister, Oliver Kohlbacher, Hagit Shatkay

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Motivation: Proteins are responsible for a multitude of vital tasks in all living organisms. Given that a protein's function and role are strongly related to its subcellular location, protein location prediction is an important research area. While proteins move from one location to another and can localize to multiple locations, most existing location prediction systems assign only a single location per protein. A few recent systems attempt to predict multiple locations for proteins, however, their performance leaves much room for improvement. Moreover, such systems do not capture dependencies among locations and usually consider locations as independent. We hypothesize that a multi-location predictor that captures location inter-dependencies can improve location predictions for proteins. Results: We introduce a probabilistic generative model for protein localization, and develop a system based on it - which we call MDLoc - that utilizes inter-dependencies among locations to predict multiple locations for proteins. The model captures location inter-dependencies using Bayesian networks and represents dependency between features and locations using a mixture model. We use iterative processes for learning model parameters and for estimating protein locations. We evaluate our classifier MDLoc, on a dataset of single- and multi-localized proteins derived from the DBMLoc dataset, which is the most comprehensive protein multi-localization dataset currently available. Our results, obtained by using MDLoc, significantly improve upon results obtained by an initial simpler classifier, as well as on results reported by other top systems.

Original languageEnglish (US)
Pages (from-to)i365-i374
JournalBioinformatics
Volume31
Issue number12
DOIs
StatePublished - Jun 15 2015
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Protein (multi-)location prediction: Utilizing interdependencies via a generative model'. Together they form a unique fingerprint.

Cite this