Multiclass disease classification from microbial whole-community metagenomes

Saad Khan, Libusha Kelly

Research output: Contribution to journalConference articlepeer-review

13 Scopus citations


The microbiome, the community of microorganisms living within an individual, is a promis-ing avenue for developing non-invasive methods for disease screening and diagnosis. Here, we utilize 5643 aggregated, annotated whole-community metagenomes to implement the first multiclass microbiome disease classifier of this scale, able to discriminate between 18 different diseases and healthy. We compared three different machine learning models: ran-dom forests, deep neural nets, and a novel graph convolutional architecture which exploits the graph structure of phylogenetic trees as its input. We show that the graph convolutional model outperforms deep neural nets in terms of accuracy (achieving 75% average test-set accuracy), receiver-operator-characteristics (92.1% average area-under-ROC (AUC)), and precision-recall (50% average area-under-precision-recall (AUPR)). Additionally, the convo-lutional net's performance complements that of the random forest, showing a lower propen-sity for Type-I errors (false-positives) while the random forest makes less Type-II errors (false-negatives). Lastly, we are able to achieve over 90% average top-3 accuracy across all of our models. Together, these results indicate that there are predictive, disease-specific signatures across microbiomes that can be used for diagnostic purposes.

Original languageEnglish (US)
Pages (from-to)55-66
Number of pages12
JournalPacific Symposium on Biocomputing
Issue number2020
StatePublished - 2020
Event25th Pacific Symposium on Biocomputing, PSB 2020 - Big Island, United States
Duration: Jan 3 2020Jan 7 2020


  • Machine learning
  • Metagenomics
  • Microbiome

ASJC Scopus subject areas

  • Biomedical Engineering
  • Computational Theory and Mathematics


Dive into the research topics of 'Multiclass disease classification from microbial whole-community metagenomes'. Together they form a unique fingerprint.

Cite this