TY - GEN
T1 - Classifying osteosarcoma patients using machine learning approaches
AU - Li, Zhi
AU - Soroushmehr, S. M.Reza
AU - Hua, Yingqi
AU - Mao, Min
AU - Qiu, Yunping
AU - Najarian, Kayvan
N1 - Funding Information:
ACKNOWLEDGEMENT Work by the authors was supported by University of Michigan and Shanghai Jiao Tong University joint Institute.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/13
Y1 - 2017/9/13
N2 - Metabolomic data analysis presents a unique opportunity to advance our understanding of osteosarcoma, a common bone malignancy for which genomic and proteomic studies have enjoyed limited success. One of the major goals of metabolomic studies is to classify osteosarcoma in early stages, which is required for metastasectomy treatment. In this paper we subject our metabolomic data on osteosarcoma patients collected by the SJTU team to three classification methods: logistic regression, support vector machine (SVM) and random forest (RF). The performances are evaluated and compared using receiver operating characteristic curves. All three classifiers are successful in distinguishing between healthy control and tumor cases, with random forest outperforming the other two for cross-validation in training set (accuracy rate for logistic regression, support vector machine and random forest are 88%, 90% and 97% respectively). Random forest achieved overall accuracy rate of 95% with 0.99 AUC on testing set.
AB - Metabolomic data analysis presents a unique opportunity to advance our understanding of osteosarcoma, a common bone malignancy for which genomic and proteomic studies have enjoyed limited success. One of the major goals of metabolomic studies is to classify osteosarcoma in early stages, which is required for metastasectomy treatment. In this paper we subject our metabolomic data on osteosarcoma patients collected by the SJTU team to three classification methods: logistic regression, support vector machine (SVM) and random forest (RF). The performances are evaluated and compared using receiver operating characteristic curves. All three classifiers are successful in distinguishing between healthy control and tumor cases, with random forest outperforming the other two for cross-validation in training set (accuracy rate for logistic regression, support vector machine and random forest are 88%, 90% and 97% respectively). Random forest achieved overall accuracy rate of 95% with 0.99 AUC on testing set.
KW - Cancer
KW - Machine Learning
KW - Osteosarcoma
KW - Random Forest
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=85032212731&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032212731&partnerID=8YFLogxK
U2 - 10.1109/EMBC.2017.8036768
DO - 10.1109/EMBC.2017.8036768
M3 - Conference contribution
C2 - 29059816
AN - SCOPUS:85032212731
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
SP - 82
EP - 85
BT - 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2017
Y2 - 11 July 2017 through 15 July 2017
ER -