Development of Lung Cancer Risk Screening Tool with Causal Discovery Model Evaluation Approach

Sandi Wibowo; Jatniko Nur Mutaqin; Ari Apriansyah; Muhamad Komiyatu; Gusti Ayu Putri Saptawati Soekidjo

doi:10.22219/kinetik.v10i2.2188

Issue

Vol. 10, No. 2, May 2025

Issue Published : May 31, 2025

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Development of Lung Cancer Risk Screening Tool with Causal Discovery Model Evaluation Approach

https://doi.org/10.22219/kinetik.v10i2.2188

Sandi Wibowo

Bandung Institute of Technology

Jatniko Nur Mutaqin

Bandung Institute of Technology

Ari Apriansyah

Bandung Institute of Technology

Muhamad Komiyatu

Bandung Institute of Technology

Gusti Ayu Putri Saptawati Soekidjo

Bandung Institute of Technology

Corresponding Author(s) : Sandi Wibowo

sandihex@gmail.com

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 10, No. 2, May 2025
Article Published : May 31, 2025

Abstract

Causal graph discovery approaches in healthcare for detecting high-risk diseases have been more widely applied in the last decade. The main challenge in causal graph discovery in healthcare data is the complexity of big data, which requires appropriate algorithms to reveal causal relationships between variables. This study focuses on evaluating the performance of seven causal discovery models—Peter-Clark (PC), Greedy Equivalent Search (GES), Direct LiNGAM, Directed Acyclic Graph-Graph Neural Network (DAG-GNN), Greedy Sparsest Permutation (GraSP), and Recursive Causal Discovery (RCD)—on opensource healthcare datasets. The model performance was evaluated using the Structural Intervention Distance (SID), Structural Hamming Distance (SHD), Matthews Correlation Coefficient (MCC), and Fobernius Norm (FN) metrics. The evaluation results conclusively show that the GES model performs best on low-complexity datasets. Meanwhile, the DAG-GNN model offers consistent performance on high-complexity data with MCC values ranging from 0.77 to 0.88. The application of the GES model for lung cancer risk screening, based on user question responses, demonstrated effectiveness by measuring MCC, SID, and SHD scores between the reference adjacency metrics and the resulting screening metrics.

Keywords

Structural Intervention Distance Structural Hamming Distance Matthews Correlation Coefficient Fobernius Norm Screening

Wibowo, S., Mutaqin, J. N., Apriansyah, A., Komiyatu, M., & Soekidjo, G. A. P. S. (2025). Development of Lung Cancer Risk Screening Tool with Causal Discovery Model Evaluation Approach. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 10(2). https://doi.org/10.22219/kinetik.v10i2.2188

Download Citation

References

Krishna, R. S. (2023). Machine Learning Approaches in Early Lung Cancer Prediction: A Comprehensive Review. INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT, 07(09). https://doi.org/10.55041/IJSREM25584
P.R., R., Nair, R. A. S., & G., V. (2019). A Comparative Study of Lung Cancer Detection using Machine Learning Algorithms. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), 1–4. https://doi.org/10.1109/ICECCT.2019.8869001
Castro, D. C. de, Walker, I. D., & Glocker, B. (2020). Causality Matters in Medical Imaging. In Nature Communications. https://doi.org/10.1038/s41467-020-17478-w
Doupé, P., Faghmous, J. H., & Basu, S. (2019). Machine Learning for Health Services Researchers. In Value in Health. https://doi.org/10.1016/j.jval.2019.02.012
Nauta, M., Bucur, D., & Seifert, C. (2019). Causal Discovery with Attention-Based Convolutional Neural Networks. Machine Learning and Knowledge Extraction, 1(1), 312–340. https://doi.org/10.3390/make1010019
Runge, J. (2018). Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7). https://doi.org/10.1063/1.5025050
Xie, N.-N., Hu, L., & Li, T.-H. (2015). Lung Cancer Risk Prediction Method Based on Feature Selection and Artificial Neural Network. Asian Pacific Journal of Cancer Prevention, 15(23), 10539–10542. https://doi.org/10.7314/APJCP.2014.15.23.10539
Niu, W., Gao, Z., Song, L., & Li, L. (2024). Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data. http://arxiv.org/abs/2407.13054
Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinović, D. (2019). Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets. In Science Advances. https://doi.org/10.1126/sciadv.aau4996
Schreiber, T. (2000). Measuring Information Transfer. In Physical Review Letters. https://doi.org/10.1103/physrevlett.85.461
Shimizu, S., Hoyer, P. O., & Hyvärinen, A. (2009). Estimation of Linear Non-Gaussian Acyclic Models for Latent Factors. In Neurocomputing. https://doi.org/10.1016/j.neucom.2008.11.018
Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., López-Paz, D., & Sebag, M. (2018). Learning Functional Causal Models With Generative Neural Networks. https://doi.org/10.1007/978-3-319-98131-4_3
Bühlmann, P., Peters, J., & Ernest, J. (2014). CAM: Causal Additive Models, High-Dimensional Order Search and Penalized Regression. In The Annals of Statistics. https://doi.org/10.1214/14-aos1260
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. In Machine Learning. https://doi.org/10.1007/s10994-006-6889-7
David Maxwell Chickering. (1996). Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V: Vol. V. Springer.
Singh, K., Gupta, G., Tewari, V., & Shroff, G. (2018). Comparative benchmarking of causal discovery algorithms. Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 46–56. https://doi.org/10.1145/3152494.3152499
Cao, L., Su, J., Cao, Y., Siang, L. C., Li, J., Saddler, J., & Gopaluni, R. B. (2022). Causal Discovery Based on Observational Data and Process Knowledge in Industrial Processes. In Industrial & Engineering Chemistry Research. https://doi.org/10.1021/acs.iecr.2c01326
Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. In Frontiers in Genetics. https://doi.org/10.3389/fgene.2019.00524
Lu, N. Y., Zhang, K., & Yuan, C. (2021). Improving Causal Discovery By Optimal Bayesian Network Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8741–8748. https://doi.org/10.1609/aaai.v35i10.17059
Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinovic, D. (2019). Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances, 5(11). https://doi.org/10.1126/sciadv.aau4996
Tu, R., Zhang, K., Ackermann, P. W., Bertilson, B. C., Glymour, C., & Zhang, C. (2018). Causal Discovery in the Presence of Missing Data. https://doi.org/10.48550/arxiv.1807.04010
Qiao, J., Chen, Z., Yu, J., Cai, R., & Hao, Z. (2024). Identification of Causal Structure in the Presence of Missing Data With Additive Noise Model. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v38i18.30036
Niu, Y., & Fei, J. (2021). A Sparsity-Assisted Fault Diagnosis Method Based on Nonconvex Sparse Regularization. In Ieee Access. https://doi.org/10.1109/access.2021.3073072
Dong, Z., Lin, G., & Nian-dong, C. (2021). An Inexact Penalty Decomposition Method for Sparse Optimization. In Computational Intelligence and Neuroscience. https://doi.org/10.1155/2021/9943519
Kalainathan, D., & Goudet, O. (2019). Causal Discovery Toolbox: Uncover causal relationships in Python.
Kawaguchi, H. (2023). Application of Quantum Computing to a Linear Non-Gaussian Acyclic Model for Novel Medical Knowledge Discovery. In Plos One. https://doi.org/10.1371/journal.pone.0283933
Kawaguchi, H. (2022). Application of quantum computing to a linear non-Gaussian acyclic model for novel medical knowledge discovery. https://doi.org/10.21203/rs.3.rs-1264829/v1
Wu, J., & Drton, M. (2023). Partial Homoscedasticity in Causal Discovery With Linear Models. IEEE Journal on Selected Areas in Information Theory, 4, 639–650. https://doi.org/10.1109/JSAIT.2023.3328476
Yu, Y., Chen, J., Gao, T., & Yu, M. (2019). DAG-GNN: DAG Structure Learning With Graph Neural Networks. https://doi.org/10.48550/arxiv.1904.10098
Huang, Y., Kleindeßner, M., Munishkin, A. A., Varshney, D., Guo, P., & Wang, J. (2021). Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere. In Frontiers in Big Data. https://doi.org/10.3389/fdata.2021.642182
Lee, S., & Honavar, V. (2016). On Learning Causal Models from Relational Data. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v30i1.10417
Ahsan, R., Arbour, D., & Zheleva, E. (2023). Learning Relational Causal Models With Cycles Through Relational Acyclification. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v37i10.26434
Yuan, Y., Ding, X., & Bar-Joseph, Z. (2020). Causal Inference Using Deep Neural Networks. https://doi.org/10.48550/arxiv.2011.12508
Akil, Y. S., & Lateko, A. A. H. (2023). Analisis Kausalitas Antara Produksi Listrik RE Dengan CPI Dan GDP Di Indonesia. In Jurnal Teknik Elektro Uniba (Jte Uniba). https://doi.org/10.36277/jteuniba.v8i1.234
Naibaho, R. (2020). Analisis Tingkat Pengungkapan Transaksi Pihak Berelasi Dan Pengaruhnya Terhadap Nilai Perusahaan (Studi Pada Industri Manufaktur). In Abis Accounting and Business Information Systems Journal. https://doi.org/10.22146/abis.v7i4.58861
Guyon, I., Aliferis, C. F., Cooper, G. S., Elisseeff, A., Pellet, J., Spirtes, P., & Statnikov, A. (2011). Causality Workbench. https://doi.org/10.1093/acprof:oso/9780199574131.003.0026
Lauritzen, S., & Spiegelhalter, D. (1988). Local Computations With Probabilities on Graphical Structures and Their Application to Expert Systems. In Journal of the Royal Statistical Society Series B (Statistical Methodology). https://doi.org/10.1111/j.2517-6161.1988.tb01721.x
Bryon Aragam. (2024). Greedy equivalence search for nonparametric graphical models. Arxiv.
Chickering, M. (2020). Statistically Efficient Greedy Equivalence Search. In J. Peters & D. Sontag (Eds.), Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) (Vol. 124, pp. 241–249). PMLR.

References

Krishna, R. S. (2023). Machine Learning Approaches in Early Lung Cancer Prediction: A Comprehensive Review. INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT, 07(09). https://doi.org/10.55041/IJSREM25584

P.R., R., Nair, R. A. S., & G., V. (2019). A Comparative Study of Lung Cancer Detection using Machine Learning Algorithms. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), 1–4. https://doi.org/10.1109/ICECCT.2019.8869001

Castro, D. C. de, Walker, I. D., & Glocker, B. (2020). Causality Matters in Medical Imaging. In Nature Communications. https://doi.org/10.1038/s41467-020-17478-w

Doupé, P., Faghmous, J. H., & Basu, S. (2019). Machine Learning for Health Services Researchers. In Value in Health. https://doi.org/10.1016/j.jval.2019.02.012

Nauta, M., Bucur, D., & Seifert, C. (2019). Causal Discovery with Attention-Based Convolutional Neural Networks. Machine Learning and Knowledge Extraction, 1(1), 312–340. https://doi.org/10.3390/make1010019

Runge, J. (2018). Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7). https://doi.org/10.1063/1.5025050

Xie, N.-N., Hu, L., & Li, T.-H. (2015). Lung Cancer Risk Prediction Method Based on Feature Selection and Artificial Neural Network. Asian Pacific Journal of Cancer Prevention, 15(23), 10539–10542. https://doi.org/10.7314/APJCP.2014.15.23.10539

Niu, W., Gao, Z., Song, L., & Li, L. (2024). Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data. http://arxiv.org/abs/2407.13054

Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinović, D. (2019). Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets. In Science Advances. https://doi.org/10.1126/sciadv.aau4996

Schreiber, T. (2000). Measuring Information Transfer. In Physical Review Letters. https://doi.org/10.1103/physrevlett.85.461

Shimizu, S., Hoyer, P. O., & Hyvärinen, A. (2009). Estimation of Linear Non-Gaussian Acyclic Models for Latent Factors. In Neurocomputing. https://doi.org/10.1016/j.neucom.2008.11.018

Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., López-Paz, D., & Sebag, M. (2018). Learning Functional Causal Models With Generative Neural Networks. https://doi.org/10.1007/978-3-319-98131-4_3

Bühlmann, P., Peters, J., & Ernest, J. (2014). CAM: Causal Additive Models, High-Dimensional Order Search and Penalized Regression. In The Annals of Statistics. https://doi.org/10.1214/14-aos1260

Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. In Machine Learning. https://doi.org/10.1007/s10994-006-6889-7

David Maxwell Chickering. (1996). Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V: Vol. V. Springer.

Singh, K., Gupta, G., Tewari, V., & Shroff, G. (2018). Comparative benchmarking of causal discovery algorithms. Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 46–56. https://doi.org/10.1145/3152494.3152499

Cao, L., Su, J., Cao, Y., Siang, L. C., Li, J., Saddler, J., & Gopaluni, R. B. (2022). Causal Discovery Based on Observational Data and Process Knowledge in Industrial Processes. In Industrial & Engineering Chemistry Research. https://doi.org/10.1021/acs.iecr.2c01326

Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. In Frontiers in Genetics. https://doi.org/10.3389/fgene.2019.00524

Lu, N. Y., Zhang, K., & Yuan, C. (2021). Improving Causal Discovery By Optimal Bayesian Network Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8741–8748. https://doi.org/10.1609/aaai.v35i10.17059

Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinovic, D. (2019). Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances, 5(11). https://doi.org/10.1126/sciadv.aau4996

Tu, R., Zhang, K., Ackermann, P. W., Bertilson, B. C., Glymour, C., & Zhang, C. (2018). Causal Discovery in the Presence of Missing Data. https://doi.org/10.48550/arxiv.1807.04010

Qiao, J., Chen, Z., Yu, J., Cai, R., & Hao, Z. (2024). Identification of Causal Structure in the Presence of Missing Data With Additive Noise Model. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v38i18.30036

Niu, Y., & Fei, J. (2021). A Sparsity-Assisted Fault Diagnosis Method Based on Nonconvex Sparse Regularization. In Ieee Access. https://doi.org/10.1109/access.2021.3073072

Dong, Z., Lin, G., & Nian-dong, C. (2021). An Inexact Penalty Decomposition Method for Sparse Optimization. In Computational Intelligence and Neuroscience. https://doi.org/10.1155/2021/9943519

Kalainathan, D., & Goudet, O. (2019). Causal Discovery Toolbox: Uncover causal relationships in Python.

Kawaguchi, H. (2023). Application of Quantum Computing to a Linear Non-Gaussian Acyclic Model for Novel Medical Knowledge Discovery. In Plos One. https://doi.org/10.1371/journal.pone.0283933

Kawaguchi, H. (2022). Application of quantum computing to a linear non-Gaussian acyclic model for novel medical knowledge discovery. https://doi.org/10.21203/rs.3.rs-1264829/v1

Wu, J., & Drton, M. (2023). Partial Homoscedasticity in Causal Discovery With Linear Models. IEEE Journal on Selected Areas in Information Theory, 4, 639–650. https://doi.org/10.1109/JSAIT.2023.3328476

Yu, Y., Chen, J., Gao, T., & Yu, M. (2019). DAG-GNN: DAG Structure Learning With Graph Neural Networks. https://doi.org/10.48550/arxiv.1904.10098

Huang, Y., Kleindeßner, M., Munishkin, A. A., Varshney, D., Guo, P., & Wang, J. (2021). Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere. In Frontiers in Big Data. https://doi.org/10.3389/fdata.2021.642182

Lee, S., & Honavar, V. (2016). On Learning Causal Models from Relational Data. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v30i1.10417

Ahsan, R., Arbour, D., & Zheleva, E. (2023). Learning Relational Causal Models With Cycles Through Relational Acyclification. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v37i10.26434

Yuan, Y., Ding, X., & Bar-Joseph, Z. (2020). Causal Inference Using Deep Neural Networks. https://doi.org/10.48550/arxiv.2011.12508

Akil, Y. S., & Lateko, A. A. H. (2023). Analisis Kausalitas Antara Produksi Listrik RE Dengan CPI Dan GDP Di Indonesia. In Jurnal Teknik Elektro Uniba (Jte Uniba). https://doi.org/10.36277/jteuniba.v8i1.234

Naibaho, R. (2020). Analisis Tingkat Pengungkapan Transaksi Pihak Berelasi Dan Pengaruhnya Terhadap Nilai Perusahaan (Studi Pada Industri Manufaktur). In Abis Accounting and Business Information Systems Journal. https://doi.org/10.22146/abis.v7i4.58861

Guyon, I., Aliferis, C. F., Cooper, G. S., Elisseeff, A., Pellet, J., Spirtes, P., & Statnikov, A. (2011). Causality Workbench. https://doi.org/10.1093/acprof:oso/9780199574131.003.0026

Lauritzen, S., & Spiegelhalter, D. (1988). Local Computations With Probabilities on Graphical Structures and Their Application to Expert Systems. In Journal of the Royal Statistical Society Series B (Statistical Methodology). https://doi.org/10.1111/j.2517-6161.1988.tb01721.x

Bryon Aragam. (2024). Greedy equivalence search for nonparametric graphical models. Arxiv.

Chickering, M. (2020). Statistically Efficient Greedy Equivalence Search. In J. Peters & D. Sontag (Eds.), Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) (Vol. 124, pp. 241–249). PMLR.

Issue

Vol. 10, No. 2, May 2025

Development of Lung Cancer Risk Screening Tool with Causal Discovery Model Evaluation Approach

Corresponding Author(s) : Sandi Wibowo

Abstract

Keywords

Download Citation

References

Downloads