Quick jump to page content
  • Main Navigation
  • Main Content
  • Sidebar

  • Home
  • Current
  • Archives
  • Join As Reviewer
  • Info
  • Announcements
  • Statistics
  • About
    • About the Journal
    • Submissions
    • Editorial Team
    • Privacy Statement
    • Contact
  • Register
  • Login
  • Home
  • Current
  • Archives
  • Join As Reviewer
  • Info
  • Announcements
  • Statistics
  • About
    • About the Journal
    • Submissions
    • Editorial Team
    • Privacy Statement
    • Contact
  1. Home
  2. Archives
  3. Vol. 7, No. 1, February 2022
  4. Articles

Issue

Vol. 7, No. 1, February 2022

Issue Published : Feb 28, 2022
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Rule-based Disease Classification using Text Mining on Symptoms Extraction from Electronic Medical Records in Indonesian

https://doi.org/10.22219/kinetik.v7i1.1377
Alfonsus Haryo Sangaji
Institut Teknologi Sepuluh Nopember
Yuri Pamungkas
Institut Teknologi Sepuluh Nopember
Supeno Mardi Susiki Nugroho
Institut Teknologi Sepuluh Nopember
Adhi Dharma Wibawa
Institut Teknologi Sepuluh Nopember

Corresponding Author(s) : Alfonsus Haryo Sangaji

haryo.alfon@gmail.com

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 7, No. 1, February 2022
Article Published : Feb 28, 2022

Share
WA Share on Facebook Share on Twitter Pinterest Email Telegram
  • Abstract
  • Cite
  • References
  • Authors Details

Abstract

Recently, electronic medical record (EMR) has become the source of many insights for clinicians and hospital management. EMR stores much important information and new knowledge regarding many aspects for hospital and clinician competitive advantage. It is valuable not only for mining data patterns saved in it regarding the patient symptoms, medication, and treatment, but also it is the box deposit of many new strategies and future trends in the medical world. However, EMR remains a challenge for many clinicians because of its unstructured form. Information extraction helps in finding valuable information in unstructured data. In this paper, information on disease symptoms in the form of text data is the focus of this study. Only the highest prevalence rate of diseases in Indonesia, such as tuberculosis, malignant neoplasm, diabetes mellitus, hypertensive, and renal failure, are analyzed. Pre-processing techniques such as data cleansing and correction play a significant role in obtaining the features. Since the amount of data is imbalanced, SMOTE technique is implemented to overcome this condition. The process of extracting symptoms from EMR data uses a rule-based algorithm. Two algorithms were implemented to classify the disease based on the features, namely SVM and Random Forest. The result showed that the rule-based symptoms extraction works well in extracting valuable information from the unstructured EMR. The classification performance on all algorithms with accuracy in SVM 78% and RF 89%.

Keywords

Electronic Medical Record Symptoms Extraction Rule-based feature extraction Random Forests Support Vector Machine SMOTE
Sangaji, A. H., Pamungkas, Y., Nugroho , S. M. S. ., & Wibawa , A. D. (2022). Rule-based Disease Classification using Text Mining on Symptoms Extraction from Electronic Medical Records in Indonesian. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 7(1), 69-80. https://doi.org/10.22219/kinetik.v7i1.1377
  • ACM
  • ACS
  • APA
  • ABNT
  • Chicago
  • Harvard
  • IEEE
  • MLA
  • Turabian
  • Vancouver
Download Citation
Endnote/Zotero/Mendeley (RIS)
BibTeX
References
  1. Leon, N., Balakrishna, Y., Hohlfeld, A., Odendaal, W. A., Schmidt, B. M., Zweigenthal, V., Anstey Watkins, J., & Daniels, K. (2020). Routine Health Information System (RHIS) improvements for strengthened health system management. The Cochrane database of systematic reviews, 8(8), CD012012. https://doi.org/10.1002/14651858.CD012012.pub2
  2. Anderson J. F. (1913). Organization, Powers, and Duties of The United States Public Health Service Today. American journal of public health (New York, N.Y. : 1912), 3(9), 845–852. https://doi.org/10.2105/ajph.3.9.845-a
  3. Lye, C. T., Forman, H. P., Gao, R., Daniel, J. G., Hsiao, A. L., Mann, M. K., deBronkart, D., Campos, H. O., & Krumholz, H. M. (2018). Assessment of US Hospital Compliance With Regulations for Patients' Requests for Medical Records. JAMA network open, 1(6), e183014. https://doi.org/10.1001/jamanetworkopen.2018.3014
  4. Cesarani A., Alpini D., Brambilla D. (1996) Anamnesis and Clinical Evaluation. In: Cesarani A. et al. (eds) Whiplash Injuries. Springer, Milano. https://doi.org/10.1007/978-88-470-2293-5_11
  5. Cottam, M. A., Itani, H. A., Beasley, A. A., 4th, & Hasty, A. H. (2018). Links between Immunologic Memory and Metabolic Cycling. Journal of immunology (Baltimore, Md. : 1950), 200(11), 3681–3689. https://doi.org/10.4049/jimmunol.1701713
  6. Faridah, L., Rinawan, F. R., Fauziah, N., Mayasari, W., Dwiartama, A., & Watanabe, K. (2020). Evaluation of Health Information System (HIS) in The Surveillance of Dengue in Indonesia: Lessons from Case in Bandung, West Java. International journal of environmental research and public health, 17(5), 1795. https://doi.org/10.3390/ijerph17051795
  7. Sharifi, S., Zahiri, M., Dargahi, H., & Faraji-Khiavi, F. (2021). Medical record documentation quality in the hospital accreditation. Journal of education and health promotion, 10, 76. https://doi.org/10.4103/jehp.jehp_852_20
  8. Fritz, Z., Schlindwein, A., & Slowther, A. M. (2019). Patient engagement or information overload: patient and physician views on sharing the medical record in the acute setting. Clinical medicine (London, England), 19(5), 386–391. https://doi.org/10.7861/clinmed.2019-0079
  9. Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. (2018). Clinical information extraction applications: A literature review. Journal of biomedical informatics, 77, 34–49. https://doi.org/10.1016/j.jbi.2017.11.011
  10. Jonnalagadda, S. R., Del Fiol, G., Medlin, R., Weir, C., Fiszman, M., Mostafa, J., & Liu, H. (2013). Automatically extracting sentences from Medline citations to support clinicians' information needs. Journal of the American Medical Informatics Association : JAMIA, 20(5), 995–1000. https://doi.org/10.1136/amiajnl-2012-001347
  11. Hassanpour, S., & Langlotz, C. P. (2016). Information extraction from multi-institutional radiology reports. Artificial intelligence in medicine, 66, 29–39. https://doi.org/10.1016/j.artmed.2015.09.007
  12. Hahn, Udo, Martin Romacker, and Stefan Schulz. "MEDSYNDIKATE—a natural language system for the extraction of medical information from findings reports." International journal of medical informatics 67.1-3 (2002): 63-74. https://doi.org/10.1016/S1386-5056(02)00053-9
  13. Spyns, Peter, et al. "Medical language processing applied to extract clinical information from Dutch medical documents." MEDINFO'98. IOS Press, 1998. 685-689. https://ebooks.iospress.nl/doi/10.3233/978-1-60750-896-0-685
  14. Boytcheva, Svetla, et al. "Some aspects of negation processing in electronic health records." Proc. of International Workshop Language and Speech Infrastructure for Information Access in the Balkan Countries. 2005.
  15. Mykowiecka, A., Marciniak, M., & Kupść, A. (2009). Rule-based information extraction from patients’ clinical data. Journal of biomedical informatics, 42(5), 923-936. https://doi.org/10.1016/j.jbi.2009.07.007
  16. Research and development agency of the Indonesian Ministry of Health. “2018 National Basic Health Research Report”. Lembaga Penerbit Balitbangkes, 2019.
  17. Y. Sun and D. Zhang, "Diagnosis and Analysis of Diabetic Retinopathy Based on Electronic Health Records," in IEEE Access, vol. 7, pp. 86115-86120, 2019, https://doi.org/10.1109/ACCESS.2019.2918625
  18. M. Jamaluddin and A. D. Wibawa, "Patient Diagnosis Classification based on Electronic Medical Record using Text Mining and Support Vector Machine," 2021 International Seminar on Application for Technology of Information and Communication (iSemantic), 2021, pp. 243-248, https://doi.org/10.1109/iSemantic52711.2021.9573178
  19. M. S. C. Almeida, L. F. de Sousa Filho, P. M. Rabello, and B. M. Santiago, “International Classification of Diseases – 11th revision: from design to implementation”, Rev. saúde pública, vol. 54, p. 104, Dec. 2020. https://doi.org/10.11606/s1518-8787.2020054002120
  20. Tala, F. Z, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia”. M.Sc. Thesis. Master of Logic Project. Institute for Logic, Language and Computation. Universiteit van Amsterdam, The Netherlands. 2003.
  21. Blagec, K., Xu, H., Agibetov, A., & Samwald, M. (2019). Neural sentence embedding models for semantic similarity estimation in the biomedical domain. BMC bioinformatics, 20(1), 178. https://doi.org/10.1186/s12859-019-2789-2
  22. T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, 2013. https://arxiv.org/abs/1301.3781
  23. Arguello Casteleiro, M., Des Diz, J., Maroto, N., Fernandez Prieto, M. J., Peters, S., Wroe, C., Sevillano Torrado, C., Maseda Fernandez, D., & Stevens, R. (2020). Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases. JMIR medical informatics, 8(8), e16948. https://doi.org/10.2196/16948
  24. Abdulrauf Sharifai, G., & Zainol, Z. (2020). Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes, 11(7), 717. https://doi.org/10.3390/genes11070717
  25. O'Brien, R., & Ishwaran, H. (2019). A Random Forests Quantile Classifier for Class Imbalanced Data. Pattern recognition, 90, 232–249. https://doi.org/10.1016/j.patcog.2019.01.036
  26. Deng, M., Guo, Y., Wang, C., & Wu, F. (2021). An oversampling method for multi-class imbalanced data based on composite weights. PloS one, 16(11), e0259227. https://doi.org/10.1371/journal.pone.0259227
  27. Gnip, P., Vokorokos, L., & Drotár, P. (2021). Selective oversampling approach for strongly imbalanced data. PeerJ. Computer science, 7, e604. https://doi.org/10.7717/peerj-cs.604
  28. Shen, J., Wu, J., Xu, M., Gan, D., An, B., & Liu, F. (2021). A Hybrid Method to Predict Postoperative Survival of Lung Cancer Using Improved SMOTE and Adaptive SVM. Computational and mathematical methods in medicine, 2021, 2213194. https://doi.org/10.1155/2021/2213194
  29. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 2002, pp.321–357. https://doi.org/10.1613/jair.953
  30. Ma, L., & Fan, S. (2017). CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC bioinformatics, 18(1), 169. https://doi.org/10.1186/s12859-017-1578-z
  31. Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32. https://doi.org/10.1023/A:1010933404324
  32. Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press. 2001.
Read More

References


Leon, N., Balakrishna, Y., Hohlfeld, A., Odendaal, W. A., Schmidt, B. M., Zweigenthal, V., Anstey Watkins, J., & Daniels, K. (2020). Routine Health Information System (RHIS) improvements for strengthened health system management. The Cochrane database of systematic reviews, 8(8), CD012012. https://doi.org/10.1002/14651858.CD012012.pub2

Anderson J. F. (1913). Organization, Powers, and Duties of The United States Public Health Service Today. American journal of public health (New York, N.Y. : 1912), 3(9), 845–852. https://doi.org/10.2105/ajph.3.9.845-a

Lye, C. T., Forman, H. P., Gao, R., Daniel, J. G., Hsiao, A. L., Mann, M. K., deBronkart, D., Campos, H. O., & Krumholz, H. M. (2018). Assessment of US Hospital Compliance With Regulations for Patients' Requests for Medical Records. JAMA network open, 1(6), e183014. https://doi.org/10.1001/jamanetworkopen.2018.3014

Cesarani A., Alpini D., Brambilla D. (1996) Anamnesis and Clinical Evaluation. In: Cesarani A. et al. (eds) Whiplash Injuries. Springer, Milano. https://doi.org/10.1007/978-88-470-2293-5_11

Cottam, M. A., Itani, H. A., Beasley, A. A., 4th, & Hasty, A. H. (2018). Links between Immunologic Memory and Metabolic Cycling. Journal of immunology (Baltimore, Md. : 1950), 200(11), 3681–3689. https://doi.org/10.4049/jimmunol.1701713

Faridah, L., Rinawan, F. R., Fauziah, N., Mayasari, W., Dwiartama, A., & Watanabe, K. (2020). Evaluation of Health Information System (HIS) in The Surveillance of Dengue in Indonesia: Lessons from Case in Bandung, West Java. International journal of environmental research and public health, 17(5), 1795. https://doi.org/10.3390/ijerph17051795

Sharifi, S., Zahiri, M., Dargahi, H., & Faraji-Khiavi, F. (2021). Medical record documentation quality in the hospital accreditation. Journal of education and health promotion, 10, 76. https://doi.org/10.4103/jehp.jehp_852_20

Fritz, Z., Schlindwein, A., & Slowther, A. M. (2019). Patient engagement or information overload: patient and physician views on sharing the medical record in the acute setting. Clinical medicine (London, England), 19(5), 386–391. https://doi.org/10.7861/clinmed.2019-0079

Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. (2018). Clinical information extraction applications: A literature review. Journal of biomedical informatics, 77, 34–49. https://doi.org/10.1016/j.jbi.2017.11.011

Jonnalagadda, S. R., Del Fiol, G., Medlin, R., Weir, C., Fiszman, M., Mostafa, J., & Liu, H. (2013). Automatically extracting sentences from Medline citations to support clinicians' information needs. Journal of the American Medical Informatics Association : JAMIA, 20(5), 995–1000. https://doi.org/10.1136/amiajnl-2012-001347

Hassanpour, S., & Langlotz, C. P. (2016). Information extraction from multi-institutional radiology reports. Artificial intelligence in medicine, 66, 29–39. https://doi.org/10.1016/j.artmed.2015.09.007

Hahn, Udo, Martin Romacker, and Stefan Schulz. "MEDSYNDIKATE—a natural language system for the extraction of medical information from findings reports." International journal of medical informatics 67.1-3 (2002): 63-74. https://doi.org/10.1016/S1386-5056(02)00053-9

Spyns, Peter, et al. "Medical language processing applied to extract clinical information from Dutch medical documents." MEDINFO'98. IOS Press, 1998. 685-689. https://ebooks.iospress.nl/doi/10.3233/978-1-60750-896-0-685

Boytcheva, Svetla, et al. "Some aspects of negation processing in electronic health records." Proc. of International Workshop Language and Speech Infrastructure for Information Access in the Balkan Countries. 2005.

Mykowiecka, A., Marciniak, M., & Kupść, A. (2009). Rule-based information extraction from patients’ clinical data. Journal of biomedical informatics, 42(5), 923-936. https://doi.org/10.1016/j.jbi.2009.07.007

Research and development agency of the Indonesian Ministry of Health. “2018 National Basic Health Research Report”. Lembaga Penerbit Balitbangkes, 2019.

Y. Sun and D. Zhang, "Diagnosis and Analysis of Diabetic Retinopathy Based on Electronic Health Records," in IEEE Access, vol. 7, pp. 86115-86120, 2019, https://doi.org/10.1109/ACCESS.2019.2918625

M. Jamaluddin and A. D. Wibawa, "Patient Diagnosis Classification based on Electronic Medical Record using Text Mining and Support Vector Machine," 2021 International Seminar on Application for Technology of Information and Communication (iSemantic), 2021, pp. 243-248, https://doi.org/10.1109/iSemantic52711.2021.9573178

M. S. C. Almeida, L. F. de Sousa Filho, P. M. Rabello, and B. M. Santiago, “International Classification of Diseases – 11th revision: from design to implementation”, Rev. saúde pública, vol. 54, p. 104, Dec. 2020. https://doi.org/10.11606/s1518-8787.2020054002120

Tala, F. Z, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia”. M.Sc. Thesis. Master of Logic Project. Institute for Logic, Language and Computation. Universiteit van Amsterdam, The Netherlands. 2003.

Blagec, K., Xu, H., Agibetov, A., & Samwald, M. (2019). Neural sentence embedding models for semantic similarity estimation in the biomedical domain. BMC bioinformatics, 20(1), 178. https://doi.org/10.1186/s12859-019-2789-2

T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, 2013. https://arxiv.org/abs/1301.3781

Arguello Casteleiro, M., Des Diz, J., Maroto, N., Fernandez Prieto, M. J., Peters, S., Wroe, C., Sevillano Torrado, C., Maseda Fernandez, D., & Stevens, R. (2020). Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases. JMIR medical informatics, 8(8), e16948. https://doi.org/10.2196/16948

Abdulrauf Sharifai, G., & Zainol, Z. (2020). Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes, 11(7), 717. https://doi.org/10.3390/genes11070717

O'Brien, R., & Ishwaran, H. (2019). A Random Forests Quantile Classifier for Class Imbalanced Data. Pattern recognition, 90, 232–249. https://doi.org/10.1016/j.patcog.2019.01.036

Deng, M., Guo, Y., Wang, C., & Wu, F. (2021). An oversampling method for multi-class imbalanced data based on composite weights. PloS one, 16(11), e0259227. https://doi.org/10.1371/journal.pone.0259227

Gnip, P., Vokorokos, L., & Drotár, P. (2021). Selective oversampling approach for strongly imbalanced data. PeerJ. Computer science, 7, e604. https://doi.org/10.7717/peerj-cs.604

Shen, J., Wu, J., Xu, M., Gan, D., An, B., & Liu, F. (2021). A Hybrid Method to Predict Postoperative Survival of Lung Cancer Using Improved SMOTE and Adaptive SVM. Computational and mathematical methods in medicine, 2021, 2213194. https://doi.org/10.1155/2021/2213194

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 2002, pp.321–357. https://doi.org/10.1613/jair.953

Ma, L., & Fan, S. (2017). CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC bioinformatics, 18(1), 169. https://doi.org/10.1186/s12859-017-1578-z

Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32. https://doi.org/10.1023/A:1010933404324

Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press. 2001.

Author biographies is not available.
Download this PDF file
PDF
Statistic
Read Counter : 303 Download : 86

Downloads

Download data is not yet available.

Quick Link

  • Author Guidelines
  • Download Manuscript Template
  • Peer Review Process
  • Editorial Board
  • Reviewer Acknowledgement
  • Aim and Scope
  • Publication Ethics
  • Licensing Term
  • Copyright Notice
  • Open Access Policy
  • Important Dates
  • Author Fees
  • Indexing and Abstracting
  • Archiving Policy
  • Scopus Citation Analysis
  • Statistic
  • Article Withdrawal

Meet Our Editorial Team

Ir. Amrul Faruq, M.Eng., Ph.D
Editor in Chief
Universitas Muhammadiyah Malang
Google Scholar Scopus
Agus Eko Minarno
Editorial Board
Universitas Muhammadiyah Malang
Google Scholar  Scopus
Hanung Adi Nugroho
Editorial Board
Universitas Gadjah Mada
Google Scholar Scopus
Roman Voliansky
Editorial Board
Dniprovsky State Technical University, Ukraine
Google Scholar Scopus
Read More
 

KINETIK: Game Technology, Information System, Computer Network, Computing, Electronics, and Control
eISSN : 2503-2267
pISSN : 2503-2259


Address

Program Studi Elektro dan Informatika

Fakultas Teknik, Universitas Muhammadiyah Malang

Jl. Raya Tlogomas 246 Malang

Phone 0341-464318 EXT 247

Contact Info

Principal Contact

Amrul Faruq
Phone: +62 812-9398-6539
Email: faruq@umm.ac.id

Support Contact

Fauzi Dwi Setiawan Sumadi
Phone: +62 815-1145-6946
Email: fauzisumadi@umm.ac.id

© 2020 KINETIK, All rights reserved. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License