XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

Mohammad Hamim Zajuli Al Faroby; Mohammad Isa Irawan; Ni Nyoman Tri Puspaningsih

doi:10.22219/kinetik.v5i4.1076

Issue

Vol. 5, No. 4, November 2020

Issue Published : Nov 30, 2020

XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

https://doi.org/10.22219/kinetik.v5i4.1076

Mohammad Hamim Zajuli Al Faroby

Department of Mathematics, Faculty Science and Data Analytics, Institut Teknologi Sepuluh Nopember.

Mohammad Isa Irawan

Department of Mathematics, Faculty Science and Data Analytics, Institut Teknologi Sepuluh Nopember

https://orcid.org/0000-0001-5496-599X

Ni Nyoman Tri Puspaningsih

Department of Chemistry, Faculty Science and Technology, Universitas Airlangga

https://orcid.org/0000-0001-5835-1653

Corresponding Author(s) : Mohammad Hamim Zajuli Al Faroby

hamim.18061@mhs.its.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 5, No. 4, November 2020
Article Published : Nov 22, 2020

Abstract

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.

Keywords

Extreme Gradient Boosting Centrality Insulin Machine Learning Gene Onthology Protein-protein Interaction.

Al Faroby, M. H. Z., Irawan, M. I., & Puspaningsih, N. N. T. (2020). XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 5(4), 253-262. https://doi.org/10.22219/kinetik.v5i4.1076

Download Citation

References

J. Calles-Escandon and M. Cipolla, “Diabetes and endothelial dysfunction: A clinical perspective,” Endocr. Rev., vol. 22, no. 1, pp. 36–52, 2001, doi: 10.1210/edrv.22.1.0417.
P. Sun et al., “Protein Function Prediction Using Function Associations in Protein-Protein Interaction Network,” IEEE Access, vol. 6, pp. 30892–30902, 2018, doi: 10.1109/ACCESS.2018.2806478.
W. Xiong, L. Xie, S. Zhou, and J. Guan, “Active learning for protein function prediction in protein-protein interaction networks,” Neurocomputing, vol. 145, pp. 44–52, 2014, doi: 10.1016/j.neucom.2014.05.075.
G. S. Oliveira and A. R. Santos, “Using the gene ontology tool to produce de novo protein-protein interaction networks with IS_A relationship,” Genet. Mol. Res., vol. 15, no. 4, 2016, doi: 10.4238/gmr15049273.
P. W. Lord, R. D. Stevens, A. Brass, and C. A. Goble, “Investigating semantic similarity measures across the gene ontology: The relationship between sequence and annotation,” Bioinformatics, vol. 19, no. 10, pp. 1275–1283, 2003, doi: 10.1093/bioinformatics/btg153.
G. D. Montañez and Y. R. Cho, “Assessing reliability of protein-protein interactions by gene ontology integration,” in 2012 IEEE Symposium on Computational Intelligence and Computational Biology, CIBCB 2012, 2012, pp. 21–27, doi: 10.1109/CIBCB.2012.6217206.
G. Iván and V. Grolmusz, “When the web meets the cell: Using personalized PageRank for analyzing protein interaction networks,” Bioinformatics, vol. 27, no. 3, pp. 405–407, 2011, doi: 10.1093/bioinformatics/btq680.
S. Iyer, T. Killingback, B. Sundaram, and Z. Wang, “Attack Robustness and Centrality of Complex Networks,” PLoS One, vol. 8, no. 4, 2013, doi: 10.1371/journal.pone.0059613.
J. Zhong, J. Wang, W. Peng, Z. Zhang, and M. Li, “A feature selection method for prediction essential protein,” Tsinghua Sci. Technol., vol. 20, no. 5, pp. 491–499, 2015, doi: 10.1109/TST.2015.7297748.
S. Mei and H. Zhu, “A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks,” Sci. Rep., vol. 5, p. 8034, 2015, doi: 10.1038/srep08034.
C. Pizzuti and S. E. Rombo, “Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods,” Bioinformatics, vol. 30, no. 10, pp. 1343–1352, 2014, doi: 10.1093/bioinformatics/btu034.
R. Vyas, S. Bapat, E. Jain, M. Karthikeyan, S. Tambe, and B. D. Kulkarni, “Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis,” Comput. Biol. Chem., vol. 65, pp. 37–44, 2016, doi: 10.1016/j.compbiolchem.2016.09.011.
H. Zhou et al., “Improving neural protein-protein interaction extraction with knowledge selection,” Comput. Biol. Chem., vol. 83, no. May, p. 107146, 2019, doi: 10.1016/j.compbiolchem.2019.107146.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, vol. 13-17-Augu, pp. 785–794, doi: 10.1145/2939672.2939785.
A. Gupta, K. Gusain, and B. Popli, “Verifying the Value and Veracity of eXtreme Gradient Boosted Decision Trees on a Variety of Dataset,” in 2016 11th International Conference on Industrial and Information Systems (ICIIS), 2015, pp. 457–462, doi: 10.1109/ICIINFS.2016.8262984.
I. Babajide Mustapha and F. Saeed, “Bioactive Molecule Prediction Using Extreme Gradient Boosting,” Molecules, vol. 21, no. 8, pp. 1–11, 2016, doi: 10.3390/molecules21080983.
T. W. Valente, K. Coronges, C. Lakon, and E. Costenbader, “How Correlated Are Network Centrality Measures?,” Connect. (Tor)., vol. 28, no. 1, pp. 16–26, 2008, [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/20505784%0Ahttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC2875682.
E. Cohen, D. Delling, T. Pajor, and R. F. Werneck, “Computing classic closeness centrality, at scale,” in COSN 2014 - Proceedings of the 2014 ACM Conference on Online Social Networks, 2014, pp. 37–49, doi: 10.1145/2660460.2660465.
S. Oldham, B. Fulcher, L. Parkes, A. Arnatkeviciūtė, C. Suo, and A. Fornito, “Consistency and differences between centrality measures across distinct classes of networks,” PLoS One, vol. 14, no. 7, pp. 1–23, 2019, doi: 10.1371/journal.pone.0220061.
J. Zhong, Y. Sun, W. Peng, M. Xie, J. Yang, and X. Tang, “XGBFEMF: An XGBoost-Based framework for essential protein prediction,” IEEE Trans. Nanobioscience, vol. 17, no. 3, pp. 243–250, 2018, doi: 10.1109/TNB.2018.2842219.
J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal., vol. 38, no. 4, pp. 367–378, 2002, doi: 10.1016/S0167-9473(01)00065-2.
T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PLoS One, vol. 10, no. 3, pp. 1–21, 2015, doi: 10.1371/journal.pone.0118432.
C. Marzban, “The ROC curve and the area under it as performance measures,” Weather Forecast., vol. 19, no. 6, pp. 1106–1114, 2004, doi: 10.1175/825.1.
X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys. Conf. Ser., vol. 1168, no. 2, 2019, doi: 10.1088/1742-6596/1168/2/022022.
M. Sokolova, S. Szpakowicz, and N. Japkowicz, “Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Perfor,ance Evaluation,” AI 2006 Adv. Artif. Intell., vol. 4304, no. 1, pp. 1015–1021, 2006, doi: 10.1007/11941439.

References

J. Calles-Escandon and M. Cipolla, “Diabetes and endothelial dysfunction: A clinical perspective,” Endocr. Rev., vol. 22, no. 1, pp. 36–52, 2001, doi: 10.1210/edrv.22.1.0417.

P. Sun et al., “Protein Function Prediction Using Function Associations in Protein-Protein Interaction Network,” IEEE Access, vol. 6, pp. 30892–30902, 2018, doi: 10.1109/ACCESS.2018.2806478.

W. Xiong, L. Xie, S. Zhou, and J. Guan, “Active learning for protein function prediction in protein-protein interaction networks,” Neurocomputing, vol. 145, pp. 44–52, 2014, doi: 10.1016/j.neucom.2014.05.075.

G. S. Oliveira and A. R. Santos, “Using the gene ontology tool to produce de novo protein-protein interaction networks with IS_A relationship,” Genet. Mol. Res., vol. 15, no. 4, 2016, doi: 10.4238/gmr15049273.

P. W. Lord, R. D. Stevens, A. Brass, and C. A. Goble, “Investigating semantic similarity measures across the gene ontology: The relationship between sequence and annotation,” Bioinformatics, vol. 19, no. 10, pp. 1275–1283, 2003, doi: 10.1093/bioinformatics/btg153.

G. D. Montañez and Y. R. Cho, “Assessing reliability of protein-protein interactions by gene ontology integration,” in 2012 IEEE Symposium on Computational Intelligence and Computational Biology, CIBCB 2012, 2012, pp. 21–27, doi: 10.1109/CIBCB.2012.6217206.

G. Iván and V. Grolmusz, “When the web meets the cell: Using personalized PageRank for analyzing protein interaction networks,” Bioinformatics, vol. 27, no. 3, pp. 405–407, 2011, doi: 10.1093/bioinformatics/btq680.

S. Iyer, T. Killingback, B. Sundaram, and Z. Wang, “Attack Robustness and Centrality of Complex Networks,” PLoS One, vol. 8, no. 4, 2013, doi: 10.1371/journal.pone.0059613.

J. Zhong, J. Wang, W. Peng, Z. Zhang, and M. Li, “A feature selection method for prediction essential protein,” Tsinghua Sci. Technol., vol. 20, no. 5, pp. 491–499, 2015, doi: 10.1109/TST.2015.7297748.

S. Mei and H. Zhu, “A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks,” Sci. Rep., vol. 5, p. 8034, 2015, doi: 10.1038/srep08034.

C. Pizzuti and S. E. Rombo, “Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods,” Bioinformatics, vol. 30, no. 10, pp. 1343–1352, 2014, doi: 10.1093/bioinformatics/btu034.

R. Vyas, S. Bapat, E. Jain, M. Karthikeyan, S. Tambe, and B. D. Kulkarni, “Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis,” Comput. Biol. Chem., vol. 65, pp. 37–44, 2016, doi: 10.1016/j.compbiolchem.2016.09.011.

H. Zhou et al., “Improving neural protein-protein interaction extraction with knowledge selection,” Comput. Biol. Chem., vol. 83, no. May, p. 107146, 2019, doi: 10.1016/j.compbiolchem.2019.107146.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, vol. 13-17-Augu, pp. 785–794, doi: 10.1145/2939672.2939785.

A. Gupta, K. Gusain, and B. Popli, “Verifying the Value and Veracity of eXtreme Gradient Boosted Decision Trees on a Variety of Dataset,” in 2016 11th International Conference on Industrial and Information Systems (ICIIS), 2015, pp. 457–462, doi: 10.1109/ICIINFS.2016.8262984.

I. Babajide Mustapha and F. Saeed, “Bioactive Molecule Prediction Using Extreme Gradient Boosting,” Molecules, vol. 21, no. 8, pp. 1–11, 2016, doi: 10.3390/molecules21080983.

T. W. Valente, K. Coronges, C. Lakon, and E. Costenbader, “How Correlated Are Network Centrality Measures?,” Connect. (Tor)., vol. 28, no. 1, pp. 16–26, 2008, [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/20505784%0Ahttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC2875682.

E. Cohen, D. Delling, T. Pajor, and R. F. Werneck, “Computing classic closeness centrality, at scale,” in COSN 2014 - Proceedings of the 2014 ACM Conference on Online Social Networks, 2014, pp. 37–49, doi: 10.1145/2660460.2660465.

S. Oldham, B. Fulcher, L. Parkes, A. Arnatkeviciūtė, C. Suo, and A. Fornito, “Consistency and differences between centrality measures across distinct classes of networks,” PLoS One, vol. 14, no. 7, pp. 1–23, 2019, doi: 10.1371/journal.pone.0220061.

J. Zhong, Y. Sun, W. Peng, M. Xie, J. Yang, and X. Tang, “XGBFEMF: An XGBoost-Based framework for essential protein prediction,” IEEE Trans. Nanobioscience, vol. 17, no. 3, pp. 243–250, 2018, doi: 10.1109/TNB.2018.2842219.

J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal., vol. 38, no. 4, pp. 367–378, 2002, doi: 10.1016/S0167-9473(01)00065-2.

T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PLoS One, vol. 10, no. 3, pp. 1–21, 2015, doi: 10.1371/journal.pone.0118432.

C. Marzban, “The ROC curve and the area under it as performance measures,” Weather Forecast., vol. 19, no. 6, pp. 1106–1114, 2004, doi: 10.1175/825.1.

X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys. Conf. Ser., vol. 1168, no. 2, 2019, doi: 10.1088/1742-6596/1168/2/022022.

M. Sokolova, S. Szpakowicz, and N. Japkowicz, “Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Perfor,ance Evaluation,” AI 2006 Adv. Artif. Intell., vol. 4304, no. 1, pp. 1015–1021, 2006, doi: 10.1007/11941439.

Author Biographies

Mohammad Hamim Zajuli Al Faroby, Department of Mathematics, Faculty Science and Data Analytics, Institut Teknologi Sepuluh Nopember.

Master of Science in Departement Mathematics, ITS. Research Interest on Bioinformatics, Data Analytics, Computing, and Big Data. Computer science lab assistant. Graduate degree in Mathematics, FMKSD ITS, specialization in Computer Science.

Mohammad Isa Irawan, Department of Mathematics, Faculty Science and Data Analytics, Institut Teknologi Sepuluh Nopember

Professor of Computer Sciences and Mathematics, Department of Mathematics, ITS. Research interest in Bioinformatics, Machine Learning, and Data Mining. Head of the computer science lab. A master's degree in ITB in computer science and a doctoral degree at Vienna University of Technology.

Ni Nyoman Tri Puspaningsih, Department of Chemistry, Faculty Science and Technology, Universitas Airlangga

Professor of Biochemistry, Department of Chemistry, Universitas Airlangga. Research interest in Biochemistry and Computational Biochemistry. A master's degree in ITB and doctoral degree at Institut Pertanian Bogor.

Issue

Vol. 5, No. 4, November 2020

XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

Corresponding Author(s) : Mohammad Hamim Zajuli Al Faroby

Abstract

Keywords

Download Citation

References

Author Biographies

Downloads