This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Rule-based Disease Classification using Text Mining on Symptoms Extraction from Electronic Medical Records in Indonesian
Corresponding Author(s) : Alfonsus Haryo Sangaji
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 7, No. 1, February 2022
Abstract
Recently, electronic medical record (EMR) has become the source of many insights for clinicians and hospital management. EMR stores much important information and new knowledge regarding many aspects for hospital and clinician competitive advantage. It is valuable not only for mining data patterns saved in it regarding the patient symptoms, medication, and treatment, but also it is the box deposit of many new strategies and future trends in the medical world. However, EMR remains a challenge for many clinicians because of its unstructured form. Information extraction helps in finding valuable information in unstructured data. In this paper, information on disease symptoms in the form of text data is the focus of this study. Only the highest prevalence rate of diseases in Indonesia, such as tuberculosis, malignant neoplasm, diabetes mellitus, hypertensive, and renal failure, are analyzed. Pre-processing techniques such as data cleansing and correction play a significant role in obtaining the features. Since the amount of data is imbalanced, SMOTE technique is implemented to overcome this condition. The process of extracting symptoms from EMR data uses a rule-based algorithm. Two algorithms were implemented to classify the disease based on the features, namely SVM and Random Forest. The result showed that the rule-based symptoms extraction works well in extracting valuable information from the unstructured EMR. The classification performance on all algorithms with accuracy in SVM 78% and RF 89%.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- Leon, N., Balakrishna, Y., Hohlfeld, A., Odendaal, W. A., Schmidt, B. M., Zweigenthal, V., Anstey Watkins, J., & Daniels, K. (2020). Routine Health Information System (RHIS) improvements for strengthened health system management. The Cochrane database of systematic reviews, 8(8), CD012012. https://doi.org/10.1002/14651858.CD012012.pub2
- Anderson J. F. (1913). Organization, Powers, and Duties of The United States Public Health Service Today. American journal of public health (New York, N.Y. : 1912), 3(9), 845–852. https://doi.org/10.2105/ajph.3.9.845-a
- Lye, C. T., Forman, H. P., Gao, R., Daniel, J. G., Hsiao, A. L., Mann, M. K., deBronkart, D., Campos, H. O., & Krumholz, H. M. (2018). Assessment of US Hospital Compliance With Regulations for Patients' Requests for Medical Records. JAMA network open, 1(6), e183014. https://doi.org/10.1001/jamanetworkopen.2018.3014
- Cesarani A., Alpini D., Brambilla D. (1996) Anamnesis and Clinical Evaluation. In: Cesarani A. et al. (eds) Whiplash Injuries. Springer, Milano. https://doi.org/10.1007/978-88-470-2293-5_11
- Cottam, M. A., Itani, H. A., Beasley, A. A., 4th, & Hasty, A. H. (2018). Links between Immunologic Memory and Metabolic Cycling. Journal of immunology (Baltimore, Md. : 1950), 200(11), 3681–3689. https://doi.org/10.4049/jimmunol.1701713
- Faridah, L., Rinawan, F. R., Fauziah, N., Mayasari, W., Dwiartama, A., & Watanabe, K. (2020). Evaluation of Health Information System (HIS) in The Surveillance of Dengue in Indonesia: Lessons from Case in Bandung, West Java. International journal of environmental research and public health, 17(5), 1795. https://doi.org/10.3390/ijerph17051795
- Sharifi, S., Zahiri, M., Dargahi, H., & Faraji-Khiavi, F. (2021). Medical record documentation quality in the hospital accreditation. Journal of education and health promotion, 10, 76. https://doi.org/10.4103/jehp.jehp_852_20
- Fritz, Z., Schlindwein, A., & Slowther, A. M. (2019). Patient engagement or information overload: patient and physician views on sharing the medical record in the acute setting. Clinical medicine (London, England), 19(5), 386–391. https://doi.org/10.7861/clinmed.2019-0079
- Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. (2018). Clinical information extraction applications: A literature review. Journal of biomedical informatics, 77, 34–49. https://doi.org/10.1016/j.jbi.2017.11.011
- Jonnalagadda, S. R., Del Fiol, G., Medlin, R., Weir, C., Fiszman, M., Mostafa, J., & Liu, H. (2013). Automatically extracting sentences from Medline citations to support clinicians' information needs. Journal of the American Medical Informatics Association : JAMIA, 20(5), 995–1000. https://doi.org/10.1136/amiajnl-2012-001347
- Hassanpour, S., & Langlotz, C. P. (2016). Information extraction from multi-institutional radiology reports. Artificial intelligence in medicine, 66, 29–39. https://doi.org/10.1016/j.artmed.2015.09.007
- Hahn, Udo, Martin Romacker, and Stefan Schulz. "MEDSYNDIKATE—a natural language system for the extraction of medical information from findings reports." International journal of medical informatics 67.1-3 (2002): 63-74. https://doi.org/10.1016/S1386-5056(02)00053-9
- Spyns, Peter, et al. "Medical language processing applied to extract clinical information from Dutch medical documents." MEDINFO'98. IOS Press, 1998. 685-689. https://ebooks.iospress.nl/doi/10.3233/978-1-60750-896-0-685
- Boytcheva, Svetla, et al. "Some aspects of negation processing in electronic health records." Proc. of International Workshop Language and Speech Infrastructure for Information Access in the Balkan Countries. 2005.
- Mykowiecka, A., Marciniak, M., & Kupść, A. (2009). Rule-based information extraction from patients’ clinical data. Journal of biomedical informatics, 42(5), 923-936. https://doi.org/10.1016/j.jbi.2009.07.007
- Research and development agency of the Indonesian Ministry of Health. “2018 National Basic Health Research Report”. Lembaga Penerbit Balitbangkes, 2019.
- Y. Sun and D. Zhang, "Diagnosis and Analysis of Diabetic Retinopathy Based on Electronic Health Records," in IEEE Access, vol. 7, pp. 86115-86120, 2019, https://doi.org/10.1109/ACCESS.2019.2918625
- M. Jamaluddin and A. D. Wibawa, "Patient Diagnosis Classification based on Electronic Medical Record using Text Mining and Support Vector Machine," 2021 International Seminar on Application for Technology of Information and Communication (iSemantic), 2021, pp. 243-248, https://doi.org/10.1109/iSemantic52711.2021.9573178
- M. S. C. Almeida, L. F. de Sousa Filho, P. M. Rabello, and B. M. Santiago, “International Classification of Diseases – 11th revision: from design to implementation”, Rev. saúde pública, vol. 54, p. 104, Dec. 2020. https://doi.org/10.11606/s1518-8787.2020054002120
- Tala, F. Z, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia”. M.Sc. Thesis. Master of Logic Project. Institute for Logic, Language and Computation. Universiteit van Amsterdam, The Netherlands. 2003.
- Blagec, K., Xu, H., Agibetov, A., & Samwald, M. (2019). Neural sentence embedding models for semantic similarity estimation in the biomedical domain. BMC bioinformatics, 20(1), 178. https://doi.org/10.1186/s12859-019-2789-2
- T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, 2013. https://arxiv.org/abs/1301.3781
- Arguello Casteleiro, M., Des Diz, J., Maroto, N., Fernandez Prieto, M. J., Peters, S., Wroe, C., Sevillano Torrado, C., Maseda Fernandez, D., & Stevens, R. (2020). Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases. JMIR medical informatics, 8(8), e16948. https://doi.org/10.2196/16948
- Abdulrauf Sharifai, G., & Zainol, Z. (2020). Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes, 11(7), 717. https://doi.org/10.3390/genes11070717
- O'Brien, R., & Ishwaran, H. (2019). A Random Forests Quantile Classifier for Class Imbalanced Data. Pattern recognition, 90, 232–249. https://doi.org/10.1016/j.patcog.2019.01.036
- Deng, M., Guo, Y., Wang, C., & Wu, F. (2021). An oversampling method for multi-class imbalanced data based on composite weights. PloS one, 16(11), e0259227. https://doi.org/10.1371/journal.pone.0259227
- Gnip, P., Vokorokos, L., & Drotár, P. (2021). Selective oversampling approach for strongly imbalanced data. PeerJ. Computer science, 7, e604. https://doi.org/10.7717/peerj-cs.604
- Shen, J., Wu, J., Xu, M., Gan, D., An, B., & Liu, F. (2021). A Hybrid Method to Predict Postoperative Survival of Lung Cancer Using Improved SMOTE and Adaptive SVM. Computational and mathematical methods in medicine, 2021, 2213194. https://doi.org/10.1155/2021/2213194
- Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 2002, pp.321–357. https://doi.org/10.1613/jair.953
- Ma, L., & Fan, S. (2017). CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC bioinformatics, 18(1), 169. https://doi.org/10.1186/s12859-017-1578-z
- Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32. https://doi.org/10.1023/A:1010933404324
- Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press. 2001.
References
Leon, N., Balakrishna, Y., Hohlfeld, A., Odendaal, W. A., Schmidt, B. M., Zweigenthal, V., Anstey Watkins, J., & Daniels, K. (2020). Routine Health Information System (RHIS) improvements for strengthened health system management. The Cochrane database of systematic reviews, 8(8), CD012012. https://doi.org/10.1002/14651858.CD012012.pub2
Anderson J. F. (1913). Organization, Powers, and Duties of The United States Public Health Service Today. American journal of public health (New York, N.Y. : 1912), 3(9), 845–852. https://doi.org/10.2105/ajph.3.9.845-a
Lye, C. T., Forman, H. P., Gao, R., Daniel, J. G., Hsiao, A. L., Mann, M. K., deBronkart, D., Campos, H. O., & Krumholz, H. M. (2018). Assessment of US Hospital Compliance With Regulations for Patients' Requests for Medical Records. JAMA network open, 1(6), e183014. https://doi.org/10.1001/jamanetworkopen.2018.3014
Cesarani A., Alpini D., Brambilla D. (1996) Anamnesis and Clinical Evaluation. In: Cesarani A. et al. (eds) Whiplash Injuries. Springer, Milano. https://doi.org/10.1007/978-88-470-2293-5_11
Cottam, M. A., Itani, H. A., Beasley, A. A., 4th, & Hasty, A. H. (2018). Links between Immunologic Memory and Metabolic Cycling. Journal of immunology (Baltimore, Md. : 1950), 200(11), 3681–3689. https://doi.org/10.4049/jimmunol.1701713
Faridah, L., Rinawan, F. R., Fauziah, N., Mayasari, W., Dwiartama, A., & Watanabe, K. (2020). Evaluation of Health Information System (HIS) in The Surveillance of Dengue in Indonesia: Lessons from Case in Bandung, West Java. International journal of environmental research and public health, 17(5), 1795. https://doi.org/10.3390/ijerph17051795
Sharifi, S., Zahiri, M., Dargahi, H., & Faraji-Khiavi, F. (2021). Medical record documentation quality in the hospital accreditation. Journal of education and health promotion, 10, 76. https://doi.org/10.4103/jehp.jehp_852_20
Fritz, Z., Schlindwein, A., & Slowther, A. M. (2019). Patient engagement or information overload: patient and physician views on sharing the medical record in the acute setting. Clinical medicine (London, England), 19(5), 386–391. https://doi.org/10.7861/clinmed.2019-0079
Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. (2018). Clinical information extraction applications: A literature review. Journal of biomedical informatics, 77, 34–49. https://doi.org/10.1016/j.jbi.2017.11.011
Jonnalagadda, S. R., Del Fiol, G., Medlin, R., Weir, C., Fiszman, M., Mostafa, J., & Liu, H. (2013). Automatically extracting sentences from Medline citations to support clinicians' information needs. Journal of the American Medical Informatics Association : JAMIA, 20(5), 995–1000. https://doi.org/10.1136/amiajnl-2012-001347
Hassanpour, S., & Langlotz, C. P. (2016). Information extraction from multi-institutional radiology reports. Artificial intelligence in medicine, 66, 29–39. https://doi.org/10.1016/j.artmed.2015.09.007
Hahn, Udo, Martin Romacker, and Stefan Schulz. "MEDSYNDIKATE—a natural language system for the extraction of medical information from findings reports." International journal of medical informatics 67.1-3 (2002): 63-74. https://doi.org/10.1016/S1386-5056(02)00053-9
Spyns, Peter, et al. "Medical language processing applied to extract clinical information from Dutch medical documents." MEDINFO'98. IOS Press, 1998. 685-689. https://ebooks.iospress.nl/doi/10.3233/978-1-60750-896-0-685
Boytcheva, Svetla, et al. "Some aspects of negation processing in electronic health records." Proc. of International Workshop Language and Speech Infrastructure for Information Access in the Balkan Countries. 2005.
Mykowiecka, A., Marciniak, M., & Kupść, A. (2009). Rule-based information extraction from patients’ clinical data. Journal of biomedical informatics, 42(5), 923-936. https://doi.org/10.1016/j.jbi.2009.07.007
Research and development agency of the Indonesian Ministry of Health. “2018 National Basic Health Research Report”. Lembaga Penerbit Balitbangkes, 2019.
Y. Sun and D. Zhang, "Diagnosis and Analysis of Diabetic Retinopathy Based on Electronic Health Records," in IEEE Access, vol. 7, pp. 86115-86120, 2019, https://doi.org/10.1109/ACCESS.2019.2918625
M. Jamaluddin and A. D. Wibawa, "Patient Diagnosis Classification based on Electronic Medical Record using Text Mining and Support Vector Machine," 2021 International Seminar on Application for Technology of Information and Communication (iSemantic), 2021, pp. 243-248, https://doi.org/10.1109/iSemantic52711.2021.9573178
M. S. C. Almeida, L. F. de Sousa Filho, P. M. Rabello, and B. M. Santiago, “International Classification of Diseases – 11th revision: from design to implementation”, Rev. saúde pública, vol. 54, p. 104, Dec. 2020. https://doi.org/10.11606/s1518-8787.2020054002120
Tala, F. Z, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia”. M.Sc. Thesis. Master of Logic Project. Institute for Logic, Language and Computation. Universiteit van Amsterdam, The Netherlands. 2003.
Blagec, K., Xu, H., Agibetov, A., & Samwald, M. (2019). Neural sentence embedding models for semantic similarity estimation in the biomedical domain. BMC bioinformatics, 20(1), 178. https://doi.org/10.1186/s12859-019-2789-2
T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, 2013. https://arxiv.org/abs/1301.3781
Arguello Casteleiro, M., Des Diz, J., Maroto, N., Fernandez Prieto, M. J., Peters, S., Wroe, C., Sevillano Torrado, C., Maseda Fernandez, D., & Stevens, R. (2020). Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases. JMIR medical informatics, 8(8), e16948. https://doi.org/10.2196/16948
Abdulrauf Sharifai, G., & Zainol, Z. (2020). Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes, 11(7), 717. https://doi.org/10.3390/genes11070717
O'Brien, R., & Ishwaran, H. (2019). A Random Forests Quantile Classifier for Class Imbalanced Data. Pattern recognition, 90, 232–249. https://doi.org/10.1016/j.patcog.2019.01.036
Deng, M., Guo, Y., Wang, C., & Wu, F. (2021). An oversampling method for multi-class imbalanced data based on composite weights. PloS one, 16(11), e0259227. https://doi.org/10.1371/journal.pone.0259227
Gnip, P., Vokorokos, L., & Drotár, P. (2021). Selective oversampling approach for strongly imbalanced data. PeerJ. Computer science, 7, e604. https://doi.org/10.7717/peerj-cs.604
Shen, J., Wu, J., Xu, M., Gan, D., An, B., & Liu, F. (2021). A Hybrid Method to Predict Postoperative Survival of Lung Cancer Using Improved SMOTE and Adaptive SVM. Computational and mathematical methods in medicine, 2021, 2213194. https://doi.org/10.1155/2021/2213194
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 2002, pp.321–357. https://doi.org/10.1613/jair.953
Ma, L., & Fan, S. (2017). CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC bioinformatics, 18(1), 169. https://doi.org/10.1186/s12859-017-1578-z
Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32. https://doi.org/10.1023/A:1010933404324
Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press. 2001.