
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Exploiting Vulnerabilities of Machine Learning Models on Medical Text via Generative Adversarial Attacks
Corresponding Author(s) : Setio Basuki
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 10, No. 3, August 2025
Abstract
Artificial Intelligence (AI) technology has undergone significant advancements, driving its adoption across various fields. AI, especially Machine Learning (ML) adoption has also grown widely in the medical field, which naturally requires actions with high precision. Challenges arise when AI-based predictive models prove to be vulnerable to adversarial attacks. These attacks use perturbed data (modified data), which is unnoticeable to humans but can significantly alter prediction results. This paper aims to implement the TextFooler technique to deceive predictive models on medical text against adversarial attacks. The experiment reveals that three ML models developed using popular approachs i.e., transformer-based model based on Bidirectional Encoder Representations from Transformers (BERT), Stack Classifier methods and traditional machine learning algorithms achieved the same classification accuracy of 99.98%. Among the three models, BERT is the most affected by adversarial attacks, with an attack success rate of 76.8%, followed by traditional machine learning methods and the stack classifier, with success rates of 28.73% and 5.21%, respectively. These findings indicate that although the advanced ML model BERT demonstrates good performance, it is highly vulnerable to adversarial attacks. Therefore, there is an urgency to develop predictive models that are robust and secure against potential attacks.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- R. Gubareva and R. Lopes, “Virtual Assistants for Learning: A Systematic Literature Review,” Oct. 2020, pp. 97–103. doi: 10.5220/0009417600970103.
- A. M. Nascimento et al., “A Systematic Literature Review About the Impact of Artificial Intelligence on Autonomous Vehicle Safety,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 12, pp. 4928–4946, 2020, doi: 10.1109/TITS.2019.2949915.
- M. Vázquez-Hernández, L. A. Morales-Rosales, I. Algredo-Badillo, S. I. Fernández-Gregorio, H. Rodr’iguez-Rangel, and M.-L. Córdoba-Tlaxcalteco, “A Survey of Adversarial Attacks: An Open Issue for Deep Learning Sentiment Analysis Models,” Applied Sciences, vol. 14, no. 11, p. 4614, 2024.
- M. Pejić Bach, Ž. Krstić, S. Seljan, and L. Turulja, “Text mining for big data analysis in financial sector: A literature review,” Sustainability, vol. 11, no. 5, p. 1277, 2019.
- M. Ahmed and M. N. Uddin, “Cyber attack detection method based on nlp and ensemble learning approach,” in 2020 23rd International Conference on Computer and Information Technology (ICCIT), 2020, pp. 1–6.
- T. Arjunan, “Detecting Anomalies and Intrusions in Unstructured Cybersecurity Data Using Natural Language Processing,” Int J Res Appl Sci Eng Technol, vol. 12, no. 9, pp. 10–22214, 2024.
- S. Huang, J. Yang, S. Fong, and Q. Zhao, “Artificial intelligence in the diagnosis of covid-19: Challenges and perspectives,” 2021, Ivyspring International Publisher. doi: 10.7150/ijbs.58855.
- L. Q. Zhou et al., “Artificial intelligence in medical imaging of the liver,” World J Gastroenterol, vol. 25, no. 6, pp. 672–682, 2019, doi: 10.3748/wjg.v25.i6.672.
- M. A. Al-Garadi et al., “Text classification models for the automatic detection of nonmedical prescription medication use from social media,” BMC Med Inform Decis Mak, vol. 21, pp. 1–13, 2021.
- X. Li, H. Wang, H. He, J. Du, J. Chen, and J. Wu, “Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks,” BMC Bioinformatics, vol. 20, pp. 1–12, 2019.
- H. Lu, L. Ehwerhemuepha, and C. Rakovski, “A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance,” BMC Med Res Methodol, vol. 22, no. 1, p. 181, 2022.
- P. Sai Nishant, S. Mehrotra, B. Mohan, and G. Devaraju, “Identifying Classification Technique for Medical Diagnosis,” 2020, pp. 95–104. doi: 10.1007/978-981-15-0630-7_10.
- R. Morales-Sánchez, S. Montalvo, A. Riaño, R. Mart’inez, and M. Velasco, “Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes,” Comput Biol Med, vol. 179, p. 108830, 2024.
- D. Pak et al., “Application of text-classification based machine learning in predicting psychiatric diagnosis,” Korean Journal of Biological Psychiatry, vol. 27, no. 1, pp. 18–26, 2020.
- S. Cohen, A.-S. Jannot, L. Iserin, D. Bonnet, A. Burgun, and J.-B. Escudié, “Accuracy of claim data in the identification and classification of adults with congenital heart diseases in electronic medical records,” Arch Cardiovasc Dis, vol. 112, no. 1, pp. 31–43, 2019.
- Z. I. Attia, D. M. Harmon, E. R. Behr, and P. A. Friedman, “Application of artificial intelligence to the electrocardiogram,” Eur Heart J, vol. 42, no. 46, pp. 4717–4730, 2021.
- R. Vliegenthart, A. Fouras, C. Jacobs, and N. Papanikolaou, “Innovations in thoracic imaging: CT, radiomics, AI and x-ray velocimetry,” Respirology, vol. 27, no. 10, pp. 818–833, 2022.
- M. Jamaluddin and A. D. Wibawa, “Patient Diagnosis Classification based on Electronic Medical Record using Text Mining and Support Vector Machine,” in Proceedings - 2021 International Seminar on Application for Technology of Information and Communication, in Proceedings - 2021 International Seminar on Application for Technology of Information and Communication: IT Opportunities and Creativities for Digital Innovation and Communication within Global Pandemic, iSemantic 2021. United States: Institute of Electrical and Electronics Engineers Inc., Sep. 2021, pp. 243–248. doi: 10.1109/iSemantic52711.2021.9573178.
- X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE Trans Neural Netw Learn Syst, vol. 30, no. 9, pp. 2805–2824, 2019.
- H. Xu et al., “Adversarial attacks and defenses in images, graphs and text: A review,” International journal of automation and computing, vol. 17, pp. 151–178, 2020.
- Y. Li, M. Cheng, C.-J. Hsieh, and T. C. M. Lee, “A review of adversarial attack and defense for classification methods,” Am Stat, vol. 76, no. 4, pp. 329–345, 2022.
- G. Apruzzese, M. Colajanni, L. Ferretti, and M. Marchetti, “Addressing adversarial attacks against security systems based on machine learning,” in 2019 11th international conference on cyber conflict (CyCon), 2019, pp. 1–18.
- E. Anthi, L. Williams, M. Rhode, P. Burnap, and A. Wedgbury, “Adversarial attacks on machine learning cybersecurity defences in industrial control systems,” Journal of Information Security and Applications, vol. 58, p. 102717, 2021.
- I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach, “Adversarial machine learning attacks and defense methods in the cyber security domain,” ACM Computing Surveys (CSUR), vol. 54, no. 5, pp. 1–36, 2021.
- M. Macas, C. Wu, and W. Fuertes, “Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems,” Expert Syst Appl, vol. 238, p. 122223, 2024.
- S. G. Finlayson, H. W. Chung, I. S. Kohane, and A. L. Beam, “Adversarial attacks against medical deep learning systems,” arXiv preprint arXiv:1804.05296, 2018.
- X. Li and D. Zhu, “Robust detection of adversarial attacks on medical images,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020, pp. 1154–1158.
- S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and I. S. Kohane, “Adversarial attacks on medical machine learning,” Science (1979), vol. 363, no. 6433, pp. 1287–1289, 2019.
- M.-J. Tsai, P.-Y. Lin, and M.-E. Lee, “Adversarial attacks on medical image classification,” Cancers (Basel), vol. 15, no. 17, p. 4228, 2023.
- E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh, “Universal adversarial triggers for attacking and analyzing NLP,” arXiv preprint arXiv:1908.07125, 2019.
- D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits, “Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment,” 2020. [Online]. Available: https://arxiv.org/abs/1907.11932
- M. Mozes, M. Bartolo, P. Stenetorp, B. Kleinberg, and L. D. Griffin, “Contrasting human-and machine-generated word-level adversarial examples for text classification,” arXiv preprint arXiv:2109.04385, 2021.
- J. Hauser, Z. Meng, D. Pascual, and R. Wattenhofer, “Bert is robust! a case against synonym-based adversarial examples in text classification,” arXiv preprint arXiv:2109.07403, 2021.
- K. Uzzal, “lung x-ray image+clinical text dataset,” 2024, Kaggle. doi: 10.34740/KAGGLE/DSV/8778977.
- M. G. Hussain, B. Sultana, M. Rahman, and M. R. Hasan, “Comparison analysis of bangla news articles classification using support vector machine and logistic regression,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 21, no. 3, pp. 584–591, 2023.
- X. Luo, “Efficient English text classification using selected machine learning techniques,” Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401–3409, 2021.
- A. Bhavani and B. S. Kumar, “A review of state art of text classification algorithms,” in 2021 5th international conference on computing methodologies and communication (ICCMC), 2021, pp. 1484–1490.
- L. Taherkhani, A. Daneshvar, H. Amoozad Khalili, and M. R. Sanaei, “Analysis of the Customer Churn Prediction Project in the Hotel Industry Based on Text Mining and the Random Forest Algorithm,” Advances in Civil Engineering, vol. 2023, no. 1, p. 6029121, 2023.
- S. Ghosal and A. Jain, “Depression and suicide risk detection on social media using fasttext embedding and xgboost classifier,” Procedia Comput Sci, vol. 218, pp. 1631–1639, 2023.
- P. W. Khan, Y. C. Byun, and O.-R. Jeong, “A stacking ensemble classifier-based machine learning model for classifying pollution sources on photovoltaic panels,” Sci Rep, vol. 13, no. 1, p. 10256, 2023.
- N. Chattopadhyay, A. Goswami, and A. Chattopadhyay, “Adversarial Attacks and Dimensionality in Text Classifiers,” arXiv preprint arXiv:2404.02660, 2024.
- D. Li et al., “Contextualized perturbation for textual adversarial attack,” arXiv preprint arXiv:2009.07502, 2020.
- C. Guo, A. Sablayrolles, H. Jégou, and D. Kiela, “Gradient-based adversarial attacks against text transformers,” arXiv preprint arXiv:2104.13733, 2021.
- Y. Gu et al., “Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing,” 2020.
- J. X. Morris, E. Lifland, J. Y. Yoo, J. Grigsby, D. Jin, and Y. Qi, “TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP,” 2020. [Online]. Available: https://arxiv.org/abs/2005.05909
- N. Mrkšić et al., “Counter-fitting Word Vectors to Linguistic Constraints,” in Proceedings of HLT-NAACL, 2016.
References
R. Gubareva and R. Lopes, “Virtual Assistants for Learning: A Systematic Literature Review,” Oct. 2020, pp. 97–103. doi: 10.5220/0009417600970103.
A. M. Nascimento et al., “A Systematic Literature Review About the Impact of Artificial Intelligence on Autonomous Vehicle Safety,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 12, pp. 4928–4946, 2020, doi: 10.1109/TITS.2019.2949915.
M. Vázquez-Hernández, L. A. Morales-Rosales, I. Algredo-Badillo, S. I. Fernández-Gregorio, H. Rodr’iguez-Rangel, and M.-L. Córdoba-Tlaxcalteco, “A Survey of Adversarial Attacks: An Open Issue for Deep Learning Sentiment Analysis Models,” Applied Sciences, vol. 14, no. 11, p. 4614, 2024.
M. Pejić Bach, Ž. Krstić, S. Seljan, and L. Turulja, “Text mining for big data analysis in financial sector: A literature review,” Sustainability, vol. 11, no. 5, p. 1277, 2019.
M. Ahmed and M. N. Uddin, “Cyber attack detection method based on nlp and ensemble learning approach,” in 2020 23rd International Conference on Computer and Information Technology (ICCIT), 2020, pp. 1–6.
T. Arjunan, “Detecting Anomalies and Intrusions in Unstructured Cybersecurity Data Using Natural Language Processing,” Int J Res Appl Sci Eng Technol, vol. 12, no. 9, pp. 10–22214, 2024.
S. Huang, J. Yang, S. Fong, and Q. Zhao, “Artificial intelligence in the diagnosis of covid-19: Challenges and perspectives,” 2021, Ivyspring International Publisher. doi: 10.7150/ijbs.58855.
L. Q. Zhou et al., “Artificial intelligence in medical imaging of the liver,” World J Gastroenterol, vol. 25, no. 6, pp. 672–682, 2019, doi: 10.3748/wjg.v25.i6.672.
M. A. Al-Garadi et al., “Text classification models for the automatic detection of nonmedical prescription medication use from social media,” BMC Med Inform Decis Mak, vol. 21, pp. 1–13, 2021.
X. Li, H. Wang, H. He, J. Du, J. Chen, and J. Wu, “Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks,” BMC Bioinformatics, vol. 20, pp. 1–12, 2019.
H. Lu, L. Ehwerhemuepha, and C. Rakovski, “A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance,” BMC Med Res Methodol, vol. 22, no. 1, p. 181, 2022.
P. Sai Nishant, S. Mehrotra, B. Mohan, and G. Devaraju, “Identifying Classification Technique for Medical Diagnosis,” 2020, pp. 95–104. doi: 10.1007/978-981-15-0630-7_10.
R. Morales-Sánchez, S. Montalvo, A. Riaño, R. Mart’inez, and M. Velasco, “Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes,” Comput Biol Med, vol. 179, p. 108830, 2024.
D. Pak et al., “Application of text-classification based machine learning in predicting psychiatric diagnosis,” Korean Journal of Biological Psychiatry, vol. 27, no. 1, pp. 18–26, 2020.
S. Cohen, A.-S. Jannot, L. Iserin, D. Bonnet, A. Burgun, and J.-B. Escudié, “Accuracy of claim data in the identification and classification of adults with congenital heart diseases in electronic medical records,” Arch Cardiovasc Dis, vol. 112, no. 1, pp. 31–43, 2019.
Z. I. Attia, D. M. Harmon, E. R. Behr, and P. A. Friedman, “Application of artificial intelligence to the electrocardiogram,” Eur Heart J, vol. 42, no. 46, pp. 4717–4730, 2021.
R. Vliegenthart, A. Fouras, C. Jacobs, and N. Papanikolaou, “Innovations in thoracic imaging: CT, radiomics, AI and x-ray velocimetry,” Respirology, vol. 27, no. 10, pp. 818–833, 2022.
M. Jamaluddin and A. D. Wibawa, “Patient Diagnosis Classification based on Electronic Medical Record using Text Mining and Support Vector Machine,” in Proceedings - 2021 International Seminar on Application for Technology of Information and Communication, in Proceedings - 2021 International Seminar on Application for Technology of Information and Communication: IT Opportunities and Creativities for Digital Innovation and Communication within Global Pandemic, iSemantic 2021. United States: Institute of Electrical and Electronics Engineers Inc., Sep. 2021, pp. 243–248. doi: 10.1109/iSemantic52711.2021.9573178.
X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE Trans Neural Netw Learn Syst, vol. 30, no. 9, pp. 2805–2824, 2019.
H. Xu et al., “Adversarial attacks and defenses in images, graphs and text: A review,” International journal of automation and computing, vol. 17, pp. 151–178, 2020.
Y. Li, M. Cheng, C.-J. Hsieh, and T. C. M. Lee, “A review of adversarial attack and defense for classification methods,” Am Stat, vol. 76, no. 4, pp. 329–345, 2022.
G. Apruzzese, M. Colajanni, L. Ferretti, and M. Marchetti, “Addressing adversarial attacks against security systems based on machine learning,” in 2019 11th international conference on cyber conflict (CyCon), 2019, pp. 1–18.
E. Anthi, L. Williams, M. Rhode, P. Burnap, and A. Wedgbury, “Adversarial attacks on machine learning cybersecurity defences in industrial control systems,” Journal of Information Security and Applications, vol. 58, p. 102717, 2021.
I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach, “Adversarial machine learning attacks and defense methods in the cyber security domain,” ACM Computing Surveys (CSUR), vol. 54, no. 5, pp. 1–36, 2021.
M. Macas, C. Wu, and W. Fuertes, “Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems,” Expert Syst Appl, vol. 238, p. 122223, 2024.
S. G. Finlayson, H. W. Chung, I. S. Kohane, and A. L. Beam, “Adversarial attacks against medical deep learning systems,” arXiv preprint arXiv:1804.05296, 2018.
X. Li and D. Zhu, “Robust detection of adversarial attacks on medical images,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020, pp. 1154–1158.
S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and I. S. Kohane, “Adversarial attacks on medical machine learning,” Science (1979), vol. 363, no. 6433, pp. 1287–1289, 2019.
M.-J. Tsai, P.-Y. Lin, and M.-E. Lee, “Adversarial attacks on medical image classification,” Cancers (Basel), vol. 15, no. 17, p. 4228, 2023.
E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh, “Universal adversarial triggers for attacking and analyzing NLP,” arXiv preprint arXiv:1908.07125, 2019.
D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits, “Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment,” 2020. [Online]. Available: https://arxiv.org/abs/1907.11932
M. Mozes, M. Bartolo, P. Stenetorp, B. Kleinberg, and L. D. Griffin, “Contrasting human-and machine-generated word-level adversarial examples for text classification,” arXiv preprint arXiv:2109.04385, 2021.
J. Hauser, Z. Meng, D. Pascual, and R. Wattenhofer, “Bert is robust! a case against synonym-based adversarial examples in text classification,” arXiv preprint arXiv:2109.07403, 2021.
K. Uzzal, “lung x-ray image+clinical text dataset,” 2024, Kaggle. doi: 10.34740/KAGGLE/DSV/8778977.
M. G. Hussain, B. Sultana, M. Rahman, and M. R. Hasan, “Comparison analysis of bangla news articles classification using support vector machine and logistic regression,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 21, no. 3, pp. 584–591, 2023.
X. Luo, “Efficient English text classification using selected machine learning techniques,” Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401–3409, 2021.
A. Bhavani and B. S. Kumar, “A review of state art of text classification algorithms,” in 2021 5th international conference on computing methodologies and communication (ICCMC), 2021, pp. 1484–1490.
L. Taherkhani, A. Daneshvar, H. Amoozad Khalili, and M. R. Sanaei, “Analysis of the Customer Churn Prediction Project in the Hotel Industry Based on Text Mining and the Random Forest Algorithm,” Advances in Civil Engineering, vol. 2023, no. 1, p. 6029121, 2023.
S. Ghosal and A. Jain, “Depression and suicide risk detection on social media using fasttext embedding and xgboost classifier,” Procedia Comput Sci, vol. 218, pp. 1631–1639, 2023.
P. W. Khan, Y. C. Byun, and O.-R. Jeong, “A stacking ensemble classifier-based machine learning model for classifying pollution sources on photovoltaic panels,” Sci Rep, vol. 13, no. 1, p. 10256, 2023.
N. Chattopadhyay, A. Goswami, and A. Chattopadhyay, “Adversarial Attacks and Dimensionality in Text Classifiers,” arXiv preprint arXiv:2404.02660, 2024.
D. Li et al., “Contextualized perturbation for textual adversarial attack,” arXiv preprint arXiv:2009.07502, 2020.
C. Guo, A. Sablayrolles, H. Jégou, and D. Kiela, “Gradient-based adversarial attacks against text transformers,” arXiv preprint arXiv:2104.13733, 2021.
Y. Gu et al., “Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing,” 2020.
J. X. Morris, E. Lifland, J. Y. Yoo, J. Grigsby, D. Jin, and Y. Qi, “TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP,” 2020. [Online]. Available: https://arxiv.org/abs/2005.05909
N. Mrkšić et al., “Counter-fitting Word Vectors to Linguistic Constraints,” in Proceedings of HLT-NAACL, 2016.