XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions
Corresponding Author(s) : Mohammad Hamim Zajuli Al Faroby
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 5, No. 4, November 2020
Abstract
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- J. Calles-Escandon and M. Cipolla, “Diabetes and endothelial dysfunction: A clinical perspective,” Endocr. Rev., vol. 22, no. 1, pp. 36–52, 2001, doi: 10.1210/edrv.22.1.0417.
- P. Sun et al., “Protein Function Prediction Using Function Associations in Protein-Protein Interaction Network,” IEEE Access, vol. 6, pp. 30892–30902, 2018, doi: 10.1109/ACCESS.2018.2806478.
- W. Xiong, L. Xie, S. Zhou, and J. Guan, “Active learning for protein function prediction in protein-protein interaction networks,” Neurocomputing, vol. 145, pp. 44–52, 2014, doi: 10.1016/j.neucom.2014.05.075.
- G. S. Oliveira and A. R. Santos, “Using the gene ontology tool to produce de novo protein-protein interaction networks with IS_A relationship,” Genet. Mol. Res., vol. 15, no. 4, 2016, doi: 10.4238/gmr15049273.
- P. W. Lord, R. D. Stevens, A. Brass, and C. A. Goble, “Investigating semantic similarity measures across the gene ontology: The relationship between sequence and annotation,” Bioinformatics, vol. 19, no. 10, pp. 1275–1283, 2003, doi: 10.1093/bioinformatics/btg153.
- G. D. Montañez and Y. R. Cho, “Assessing reliability of protein-protein interactions by gene ontology integration,” in 2012 IEEE Symposium on Computational Intelligence and Computational Biology, CIBCB 2012, 2012, pp. 21–27, doi: 10.1109/CIBCB.2012.6217206.
- G. Iván and V. Grolmusz, “When the web meets the cell: Using personalized PageRank for analyzing protein interaction networks,” Bioinformatics, vol. 27, no. 3, pp. 405–407, 2011, doi: 10.1093/bioinformatics/btq680.
- S. Iyer, T. Killingback, B. Sundaram, and Z. Wang, “Attack Robustness and Centrality of Complex Networks,” PLoS One, vol. 8, no. 4, 2013, doi: 10.1371/journal.pone.0059613.
- J. Zhong, J. Wang, W. Peng, Z. Zhang, and M. Li, “A feature selection method for prediction essential protein,” Tsinghua Sci. Technol., vol. 20, no. 5, pp. 491–499, 2015, doi: 10.1109/TST.2015.7297748.
- S. Mei and H. Zhu, “A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks,” Sci. Rep., vol. 5, p. 8034, 2015, doi: 10.1038/srep08034.
- C. Pizzuti and S. E. Rombo, “Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods,” Bioinformatics, vol. 30, no. 10, pp. 1343–1352, 2014, doi: 10.1093/bioinformatics/btu034.
- R. Vyas, S. Bapat, E. Jain, M. Karthikeyan, S. Tambe, and B. D. Kulkarni, “Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis,” Comput. Biol. Chem., vol. 65, pp. 37–44, 2016, doi: 10.1016/j.compbiolchem.2016.09.011.
- H. Zhou et al., “Improving neural protein-protein interaction extraction with knowledge selection,” Comput. Biol. Chem., vol. 83, no. May, p. 107146, 2019, doi: 10.1016/j.compbiolchem.2019.107146.
- T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, vol. 13-17-Augu, pp. 785–794, doi: 10.1145/2939672.2939785.
- A. Gupta, K. Gusain, and B. Popli, “Verifying the Value and Veracity of eXtreme Gradient Boosted Decision Trees on a Variety of Dataset,” in 2016 11th International Conference on Industrial and Information Systems (ICIIS), 2015, pp. 457–462, doi: 10.1109/ICIINFS.2016.8262984.
- I. Babajide Mustapha and F. Saeed, “Bioactive Molecule Prediction Using Extreme Gradient Boosting,” Molecules, vol. 21, no. 8, pp. 1–11, 2016, doi: 10.3390/molecules21080983.
- T. W. Valente, K. Coronges, C. Lakon, and E. Costenbader, “How Correlated Are Network Centrality Measures?,” Connect. (Tor)., vol. 28, no. 1, pp. 16–26, 2008, [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/20505784%0Ahttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC2875682.
- E. Cohen, D. Delling, T. Pajor, and R. F. Werneck, “Computing classic closeness centrality, at scale,” in COSN 2014 - Proceedings of the 2014 ACM Conference on Online Social Networks, 2014, pp. 37–49, doi: 10.1145/2660460.2660465.
- S. Oldham, B. Fulcher, L. Parkes, A. Arnatkeviciūtė, C. Suo, and A. Fornito, “Consistency and differences between centrality measures across distinct classes of networks,” PLoS One, vol. 14, no. 7, pp. 1–23, 2019, doi: 10.1371/journal.pone.0220061.
- J. Zhong, Y. Sun, W. Peng, M. Xie, J. Yang, and X. Tang, “XGBFEMF: An XGBoost-Based framework for essential protein prediction,” IEEE Trans. Nanobioscience, vol. 17, no. 3, pp. 243–250, 2018, doi: 10.1109/TNB.2018.2842219.
- J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal., vol. 38, no. 4, pp. 367–378, 2002, doi: 10.1016/S0167-9473(01)00065-2.
- T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PLoS One, vol. 10, no. 3, pp. 1–21, 2015, doi: 10.1371/journal.pone.0118432.
- C. Marzban, “The ROC curve and the area under it as performance measures,” Weather Forecast., vol. 19, no. 6, pp. 1106–1114, 2004, doi: 10.1175/825.1.
- X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys. Conf. Ser., vol. 1168, no. 2, 2019, doi: 10.1088/1742-6596/1168/2/022022.
- M. Sokolova, S. Szpakowicz, and N. Japkowicz, “Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Perfor,ance Evaluation,” AI 2006 Adv. Artif. Intell., vol. 4304, no. 1, pp. 1015–1021, 2006, doi: 10.1007/11941439.
References
J. Calles-Escandon and M. Cipolla, “Diabetes and endothelial dysfunction: A clinical perspective,” Endocr. Rev., vol. 22, no. 1, pp. 36–52, 2001, doi: 10.1210/edrv.22.1.0417.
P. Sun et al., “Protein Function Prediction Using Function Associations in Protein-Protein Interaction Network,” IEEE Access, vol. 6, pp. 30892–30902, 2018, doi: 10.1109/ACCESS.2018.2806478.
W. Xiong, L. Xie, S. Zhou, and J. Guan, “Active learning for protein function prediction in protein-protein interaction networks,” Neurocomputing, vol. 145, pp. 44–52, 2014, doi: 10.1016/j.neucom.2014.05.075.
G. S. Oliveira and A. R. Santos, “Using the gene ontology tool to produce de novo protein-protein interaction networks with IS_A relationship,” Genet. Mol. Res., vol. 15, no. 4, 2016, doi: 10.4238/gmr15049273.
P. W. Lord, R. D. Stevens, A. Brass, and C. A. Goble, “Investigating semantic similarity measures across the gene ontology: The relationship between sequence and annotation,” Bioinformatics, vol. 19, no. 10, pp. 1275–1283, 2003, doi: 10.1093/bioinformatics/btg153.
G. D. Montañez and Y. R. Cho, “Assessing reliability of protein-protein interactions by gene ontology integration,” in 2012 IEEE Symposium on Computational Intelligence and Computational Biology, CIBCB 2012, 2012, pp. 21–27, doi: 10.1109/CIBCB.2012.6217206.
G. Iván and V. Grolmusz, “When the web meets the cell: Using personalized PageRank for analyzing protein interaction networks,” Bioinformatics, vol. 27, no. 3, pp. 405–407, 2011, doi: 10.1093/bioinformatics/btq680.
S. Iyer, T. Killingback, B. Sundaram, and Z. Wang, “Attack Robustness and Centrality of Complex Networks,” PLoS One, vol. 8, no. 4, 2013, doi: 10.1371/journal.pone.0059613.
J. Zhong, J. Wang, W. Peng, Z. Zhang, and M. Li, “A feature selection method for prediction essential protein,” Tsinghua Sci. Technol., vol. 20, no. 5, pp. 491–499, 2015, doi: 10.1109/TST.2015.7297748.
S. Mei and H. Zhu, “A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks,” Sci. Rep., vol. 5, p. 8034, 2015, doi: 10.1038/srep08034.
C. Pizzuti and S. E. Rombo, “Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods,” Bioinformatics, vol. 30, no. 10, pp. 1343–1352, 2014, doi: 10.1093/bioinformatics/btu034.
R. Vyas, S. Bapat, E. Jain, M. Karthikeyan, S. Tambe, and B. D. Kulkarni, “Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis,” Comput. Biol. Chem., vol. 65, pp. 37–44, 2016, doi: 10.1016/j.compbiolchem.2016.09.011.
H. Zhou et al., “Improving neural protein-protein interaction extraction with knowledge selection,” Comput. Biol. Chem., vol. 83, no. May, p. 107146, 2019, doi: 10.1016/j.compbiolchem.2019.107146.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, vol. 13-17-Augu, pp. 785–794, doi: 10.1145/2939672.2939785.
A. Gupta, K. Gusain, and B. Popli, “Verifying the Value and Veracity of eXtreme Gradient Boosted Decision Trees on a Variety of Dataset,” in 2016 11th International Conference on Industrial and Information Systems (ICIIS), 2015, pp. 457–462, doi: 10.1109/ICIINFS.2016.8262984.
I. Babajide Mustapha and F. Saeed, “Bioactive Molecule Prediction Using Extreme Gradient Boosting,” Molecules, vol. 21, no. 8, pp. 1–11, 2016, doi: 10.3390/molecules21080983.
T. W. Valente, K. Coronges, C. Lakon, and E. Costenbader, “How Correlated Are Network Centrality Measures?,” Connect. (Tor)., vol. 28, no. 1, pp. 16–26, 2008, [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/20505784%0Ahttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC2875682.
E. Cohen, D. Delling, T. Pajor, and R. F. Werneck, “Computing classic closeness centrality, at scale,” in COSN 2014 - Proceedings of the 2014 ACM Conference on Online Social Networks, 2014, pp. 37–49, doi: 10.1145/2660460.2660465.
S. Oldham, B. Fulcher, L. Parkes, A. Arnatkeviciūtė, C. Suo, and A. Fornito, “Consistency and differences between centrality measures across distinct classes of networks,” PLoS One, vol. 14, no. 7, pp. 1–23, 2019, doi: 10.1371/journal.pone.0220061.
J. Zhong, Y. Sun, W. Peng, M. Xie, J. Yang, and X. Tang, “XGBFEMF: An XGBoost-Based framework for essential protein prediction,” IEEE Trans. Nanobioscience, vol. 17, no. 3, pp. 243–250, 2018, doi: 10.1109/TNB.2018.2842219.
J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal., vol. 38, no. 4, pp. 367–378, 2002, doi: 10.1016/S0167-9473(01)00065-2.
T. Saito and M. Rehmsmeier, “The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets,” PLoS One, vol. 10, no. 3, pp. 1–21, 2015, doi: 10.1371/journal.pone.0118432.
C. Marzban, “The ROC curve and the area under it as performance measures,” Weather Forecast., vol. 19, no. 6, pp. 1106–1114, 2004, doi: 10.1175/825.1.
X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys. Conf. Ser., vol. 1168, no. 2, 2019, doi: 10.1088/1742-6596/1168/2/022022.
M. Sokolova, S. Szpakowicz, and N. Japkowicz, “Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Perfor,ance Evaluation,” AI 2006 Adv. Artif. Intell., vol. 4304, no. 1, pp. 1015–1021, 2006, doi: 10.1007/11941439.