TY - JOUR
T1 - Classification of protein–protein association rates based on biophysical informatics
AU - Dhusia, Kalyani
AU - Wu, Yinghao
N1 - Funding Information:
Publication costs are funded by the National Institutes of Health under Grant Nos. R01GM120238 and R01GM122804. The work is also partially supported by a start-up Grant from Albert Einstein College of Medicine. Computational support was provided by Albert Einstein College of Medicine High Performance Computing Center.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Background: Proteins form various complexes to carry out their versatile functions in cells. The dynamic properties of protein complex formation are mainly characterized by the association rates which measures how fast these complexes can be formed. It was experimentally observed that the association rates span an extremely wide range with over ten orders of magnitudes. Identification of association rates within this spectrum for specific protein complexes is therefore essential for us to understand their functional roles. Results: To tackle this problem, we integrate physics-based coarse-grained simulations into a neural-network-based classification model to estimate the range of association rates for protein complexes in a large-scale benchmark set. The cross-validation results show that, when an optimal threshold was selected, we can reach the best performance with specificity, precision, sensitivity and overall accuracy all higher than 70%. The quality of our cross-validation data has also been testified by further statistical analysis. Additionally, given an independent testing set, we can successfully predict the group of association rates for eight protein complexes out of ten. Finally, the analysis of failed cases suggests the future implementation of conformational dynamics into simulation can further improve model. Conclusions: In summary, this study demonstrated that a new modeling framework that combines biophysical simulations with bioinformatics approaches is able to identify protein–protein interactions with low association rates from those with higher association rates. This method thereby can serve as a useful addition to a collection of existing experimental approaches that measure biomolecular recognition.
AB - Background: Proteins form various complexes to carry out their versatile functions in cells. The dynamic properties of protein complex formation are mainly characterized by the association rates which measures how fast these complexes can be formed. It was experimentally observed that the association rates span an extremely wide range with over ten orders of magnitudes. Identification of association rates within this spectrum for specific protein complexes is therefore essential for us to understand their functional roles. Results: To tackle this problem, we integrate physics-based coarse-grained simulations into a neural-network-based classification model to estimate the range of association rates for protein complexes in a large-scale benchmark set. The cross-validation results show that, when an optimal threshold was selected, we can reach the best performance with specificity, precision, sensitivity and overall accuracy all higher than 70%. The quality of our cross-validation data has also been testified by further statistical analysis. Additionally, given an independent testing set, we can successfully predict the group of association rates for eight protein complexes out of ten. Finally, the analysis of failed cases suggests the future implementation of conformational dynamics into simulation can further improve model. Conclusions: In summary, this study demonstrated that a new modeling framework that combines biophysical simulations with bioinformatics approaches is able to identify protein–protein interactions with low association rates from those with higher association rates. This method thereby can serve as a useful addition to a collection of existing experimental approaches that measure biomolecular recognition.
KW - Kinetic Monte-Carlo simulation
KW - Neural network model
KW - Protein–protein association
UR - http://www.scopus.com/inward/record.url?scp=85112741959&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112741959&partnerID=8YFLogxK
U2 - 10.1186/s12859-021-04323-0
DO - 10.1186/s12859-021-04323-0
M3 - Article
C2 - 34404340
AN - SCOPUS:85112741959
SN - 1471-2105
VL - 22
JO - BMC bioinformatics
JF - BMC bioinformatics
IS - 1
M1 - 408
ER -