Issue

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Development of Lung Cancer Risk Screening Tool with Causal Discovery Model Evaluation Approach
Corresponding Author(s) : Sandi Wibowo
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 10, No. 2, May 2025
Abstract
Causal graph discovery approaches in healthcare for detecting high-risk diseases have been more widely applied in the last decade. The main challenge in causal graph discovery in healthcare data is the complexity of big data, which requires appropriate algorithms to reveal causal relationships between variables. This study focuses on evaluating the performance of seven causal discovery models—Peter-Clark (PC), Greedy Equivalent Search (GES), Direct LiNGAM, Directed Acyclic Graph-Graph Neural Network (DAG-GNN), Greedy Sparsest Permutation (GraSP), and Recursive Causal Discovery (RCD)—on opensource healthcare datasets. The model performance was evaluated using the Structural Intervention Distance (SID), Structural Hamming Distance (SHD), Matthews Correlation Coefficient (MCC), and Fobernius Norm (FN) metrics. The evaluation results conclusively show that the GES model performs best on low-complexity datasets. Meanwhile, the DAG-GNN model offers consistent performance on high-complexity data with MCC values ranging from 0.77 to 0.88. The application of the GES model for lung cancer risk screening, based on user question responses, demonstrated effectiveness by measuring MCC, SID, and SHD scores between the reference adjacency metrics and the resulting screening metrics.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- Krishna, R. S. (2023). Machine Learning Approaches in Early Lung Cancer Prediction: A Comprehensive Review. INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT, 07(09). https://doi.org/10.55041/IJSREM25584
- P.R., R., Nair, R. A. S., & G., V. (2019). A Comparative Study of Lung Cancer Detection using Machine Learning Algorithms. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), 1–4. https://doi.org/10.1109/ICECCT.2019.8869001
- Castro, D. C. de, Walker, I. D., & Glocker, B. (2020). Causality Matters in Medical Imaging. In Nature Communications.
- https://doi.org/10.1038/s41467-020-17478-w
- Doupé, P., Faghmous, J. H., & Basu, S. (2019). Machine Learning for Health Services Researchers. In Value in Health.
- https://doi.org/10.1016/j.jval.2019.02.012
- Nauta, M., Bucur, D., & Seifert, C. (2019). Causal Discovery with Attention-Based Convolutional Neural Networks. Machine Learning and Knowledge Extraction, 1(1), 312–340. https://doi.org/10.3390/make1010019
- Runge, J. (2018). Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7). https://doi.org/10.1063/1.5025050
- Xie, N.-N., Hu, L., & Li, T.-H. (2015). Lung Cancer Risk Prediction Method Based on Feature Selection and Artificial Neural Network. Asian Pacific Journal of Cancer Prevention, 15(23), 10539–10542. https://doi.org/10.7314/APJCP.2014.15.23.10539
- Niu, W., Gao, Z., Song, L., & Li, L. (2024). Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data. http://arxiv.org/abs/2407.13054
- Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinović, D. (2019). Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets. In Science Advances. https://doi.org/10.1126/sciadv.aau4996
- Schreiber, T. (2000). Measuring Information Transfer. In Physical Review Letters. https://doi.org/10.1103/physrevlett.85.461
- Shimizu, S., Hoyer, P. O., & Hyvärinen, A. (2009). Estimation of Linear Non-Gaussian Acyclic Models for Latent Factors. In Neurocomputing.
- https://doi.org/10.1016/j.neucom.2008.11.018
- Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., López-Paz, D., & Sebag, M. (2018). Learning Functional Causal Models With Generative Neural Networks. https://doi.org/10.1007/978-3-319-98131-4_3
- Bühlmann, P., Peters, J., & Ernest, J. (2014). CAM: Causal Additive Models, High-Dimensional Order Search and Penalized Regression. In The Annals of Statistics. https://doi.org/10.1214/14-aos1260
- Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. In Machine Learning. https://doi.org/10.1007/s10994-006-6889-7
- David Maxwell Chickering. (1996). Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V: Vol. V. Springer.
- Singh, K., Gupta, G., Tewari, V., & Shroff, G. (2018). Comparative benchmarking of causal discovery algorithms. Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 46–56. https://doi.org/10.1145/3152494.3152499
- Cao, L., Su, J., Cao, Y., Siang, L. C., Li, J., Saddler, J., & Gopaluni, R. B. (2022). Causal Discovery Based on Observational Data and Process Knowledge in Industrial Processes. In Industrial & Engineering Chemistry Research. https://doi.org/10.1021/acs.iecr.2c01326
- Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. In Frontiers in Genetics.
- https://doi.org/10.3389/fgene.2019.00524
- Lu, N. Y., Zhang, K., & Yuan, C. (2021). Improving Causal Discovery By Optimal Bayesian Network Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8741–8748. https://doi.org/10.1609/aaai.v35i10.17059
- Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinovic, D. (2019). Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances, 5(11). https://doi.org/10.1126/sciadv.aau4996
- Tu, R., Zhang, K., Ackermann, P. W., Bertilson, B. C., Glymour, C., & Zhang, C. (2018). Causal Discovery in the Presence of Missing Data.
- https://doi.org/10.48550/arxiv.1807.04010
- Qiao, J., Chen, Z., Yu, J., Cai, R., & Hao, Z. (2024). Identification of Causal Structure in the Presence of Missing Data With Additive Noise Model. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v38i18.30036
- Niu, Y., & Fei, J. (2021). A Sparsity-Assisted Fault Diagnosis Method Based on Nonconvex Sparse Regularization. In Ieee Access.
- https://doi.org/10.1109/access.2021.3073072
- Dong, Z., Lin, G., & Nian-dong, C. (2021). An Inexact Penalty Decomposition Method for Sparse Optimization. In Computational Intelligence and Neuroscience. https://doi.org/10.1155/2021/9943519
- Kalainathan, D., & Goudet, O. (2019). Causal Discovery Toolbox: Uncover causal relationships in Python.
- Kawaguchi, H. (2023). Application of Quantum Computing to a Linear Non-Gaussian Acyclic Model for Novel Medical Knowledge Discovery.
- In Plos One. https://doi.org/10.1371/journal.pone.0283933
- Kawaguchi, H. (2022). Application of quantum computing to a linear non-Gaussian acyclic model for novel medical knowledge discovery.
- https://doi.org/10.21203/rs.3.rs-1264829/v1
- Wu, J., & Drton, M. (2023). Partial Homoscedasticity in Causal Discovery With Linear Models. IEEE Journal on Selected Areas in Information Theory, 4, 639–650. https://doi.org/10.1109/JSAIT.2023.3328476
- Yu, Y., Chen, J., Gao, T., & Yu, M. (2019). DAG-GNN: DAG Structure Learning With Graph Neural Networks.
- https://doi.org/10.48550/arxiv.1904.10098
- Huang, Y., Kleindeßner, M., Munishkin, A. A., Varshney, D., Guo, P., & Wang, J. (2021). Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere. In Frontiers in Big Data. https://doi.org/10.3389/fdata.2021.642182
- Lee, S., & Honavar, V. (2016). On Learning Causal Models From Relational Data. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v30i1.10417
- Ahsan, R., Arbour, D., & Zheleva, E. (2023). Learning Relational Causal Models With Cycles Through Relational Acyclification. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v37i10.26434
- Yuan, Y., Ding, X., & Bar-Joseph, Z. (2020). Causal Inference Using Deep Neural Networks. https://doi.org/10.48550/arxiv.2011.12508
- Akil, Y. S., & Lateko, A. A. H. (2023). Analisis Kausalitas Antara Produksi Listrik RE Dengan CPI Dan GDP Di Indonesia. In Jurnal Teknik
- Elektro Uniba (Jte Uniba). https://doi.org/10.36277/jteuniba.v8i1.234
- Naibaho, R. (2020). Analisis Tingkat Pengungkapan Transaksi Pihak Berelasi Dan Pengaruhnya Terhadap Nilai Perusahaan (Studi Pada
- Industri Manufaktur). In Abis Accounting and Business Information Systems Journal. https://doi.org/10.22146/abis.v7i4.58861
- Guyon, I., Aliferis, C. F., Cooper, G. S., Elisseeff, A., Pellet, J., Spirtes, P., & Statnikov, A. (2011). Causality Workbench.
- https://doi.org/10.1093/acprof:oso/9780199574131.003.0026
- Lauritzen, S., & Spiegelhalter, D. (1988). Local Computations With Probabilities on Graphical Structures and Their Application to Expert Systems. In Journal of the Royal Statistical Society Series B (Statistical Methodology). https://doi.org/10.1111/j.2517-6161.1988.tb01721.x
- Bryon Aragam. (2024). Greedy equivalence search for nonparametric graphical models. Arxiv.
- Chickering, M. (2020). Statistically Efficient Greedy Equivalence Search. In J. Peters & D. Sontag (Eds.), Proceedings of the 36th Conference
- on Uncertainty in Artificial Intelligence (UAI) (Vol. 124, pp. 241–249). PMLR. https://proceedings.mlr.press/v124/chickering20a.html
References
Krishna, R. S. (2023). Machine Learning Approaches in Early Lung Cancer Prediction: A Comprehensive Review. INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT, 07(09). https://doi.org/10.55041/IJSREM25584
P.R., R., Nair, R. A. S., & G., V. (2019). A Comparative Study of Lung Cancer Detection using Machine Learning Algorithms. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), 1–4. https://doi.org/10.1109/ICECCT.2019.8869001
Castro, D. C. de, Walker, I. D., & Glocker, B. (2020). Causality Matters in Medical Imaging. In Nature Communications.
https://doi.org/10.1038/s41467-020-17478-w
Doupé, P., Faghmous, J. H., & Basu, S. (2019). Machine Learning for Health Services Researchers. In Value in Health.
https://doi.org/10.1016/j.jval.2019.02.012
Nauta, M., Bucur, D., & Seifert, C. (2019). Causal Discovery with Attention-Based Convolutional Neural Networks. Machine Learning and Knowledge Extraction, 1(1), 312–340. https://doi.org/10.3390/make1010019
Runge, J. (2018). Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7). https://doi.org/10.1063/1.5025050
Xie, N.-N., Hu, L., & Li, T.-H. (2015). Lung Cancer Risk Prediction Method Based on Feature Selection and Artificial Neural Network. Asian Pacific Journal of Cancer Prevention, 15(23), 10539–10542. https://doi.org/10.7314/APJCP.2014.15.23.10539
Niu, W., Gao, Z., Song, L., & Li, L. (2024). Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data. http://arxiv.org/abs/2407.13054
Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinović, D. (2019). Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets. In Science Advances. https://doi.org/10.1126/sciadv.aau4996
Schreiber, T. (2000). Measuring Information Transfer. In Physical Review Letters. https://doi.org/10.1103/physrevlett.85.461
Shimizu, S., Hoyer, P. O., & Hyvärinen, A. (2009). Estimation of Linear Non-Gaussian Acyclic Models for Latent Factors. In Neurocomputing.
https://doi.org/10.1016/j.neucom.2008.11.018
Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., López-Paz, D., & Sebag, M. (2018). Learning Functional Causal Models With Generative Neural Networks. https://doi.org/10.1007/978-3-319-98131-4_3
Bühlmann, P., Peters, J., & Ernest, J. (2014). CAM: Causal Additive Models, High-Dimensional Order Search and Penalized Regression. In The Annals of Statistics. https://doi.org/10.1214/14-aos1260
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. In Machine Learning. https://doi.org/10.1007/s10994-006-6889-7
David Maxwell Chickering. (1996). Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V: Vol. V. Springer.
Singh, K., Gupta, G., Tewari, V., & Shroff, G. (2018). Comparative benchmarking of causal discovery algorithms. Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 46–56. https://doi.org/10.1145/3152494.3152499
Cao, L., Su, J., Cao, Y., Siang, L. C., Li, J., Saddler, J., & Gopaluni, R. B. (2022). Causal Discovery Based on Observational Data and Process Knowledge in Industrial Processes. In Industrial & Engineering Chemistry Research. https://doi.org/10.1021/acs.iecr.2c01326
Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models. In Frontiers in Genetics.
https://doi.org/10.3389/fgene.2019.00524
Lu, N. Y., Zhang, K., & Yuan, C. (2021). Improving Causal Discovery By Optimal Bayesian Network Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8741–8748. https://doi.org/10.1609/aaai.v35i10.17059
Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., & Sejdinovic, D. (2019). Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances, 5(11). https://doi.org/10.1126/sciadv.aau4996
Tu, R., Zhang, K., Ackermann, P. W., Bertilson, B. C., Glymour, C., & Zhang, C. (2018). Causal Discovery in the Presence of Missing Data.
https://doi.org/10.48550/arxiv.1807.04010
Qiao, J., Chen, Z., Yu, J., Cai, R., & Hao, Z. (2024). Identification of Causal Structure in the Presence of Missing Data With Additive Noise Model. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v38i18.30036
Niu, Y., & Fei, J. (2021). A Sparsity-Assisted Fault Diagnosis Method Based on Nonconvex Sparse Regularization. In Ieee Access.
https://doi.org/10.1109/access.2021.3073072
Dong, Z., Lin, G., & Nian-dong, C. (2021). An Inexact Penalty Decomposition Method for Sparse Optimization. In Computational Intelligence and Neuroscience. https://doi.org/10.1155/2021/9943519
Kalainathan, D., & Goudet, O. (2019). Causal Discovery Toolbox: Uncover causal relationships in Python.
Kawaguchi, H. (2023). Application of Quantum Computing to a Linear Non-Gaussian Acyclic Model for Novel Medical Knowledge Discovery.
In Plos One. https://doi.org/10.1371/journal.pone.0283933
Kawaguchi, H. (2022). Application of quantum computing to a linear non-Gaussian acyclic model for novel medical knowledge discovery.
https://doi.org/10.21203/rs.3.rs-1264829/v1
Wu, J., & Drton, M. (2023). Partial Homoscedasticity in Causal Discovery With Linear Models. IEEE Journal on Selected Areas in Information Theory, 4, 639–650. https://doi.org/10.1109/JSAIT.2023.3328476
Yu, Y., Chen, J., Gao, T., & Yu, M. (2019). DAG-GNN: DAG Structure Learning With Graph Neural Networks.
https://doi.org/10.48550/arxiv.1904.10098
Huang, Y., Kleindeßner, M., Munishkin, A. A., Varshney, D., Guo, P., & Wang, J. (2021). Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere. In Frontiers in Big Data. https://doi.org/10.3389/fdata.2021.642182
Lee, S., & Honavar, V. (2016). On Learning Causal Models From Relational Data. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v30i1.10417
Ahsan, R., Arbour, D., & Zheleva, E. (2023). Learning Relational Causal Models With Cycles Through Relational Acyclification. In Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v37i10.26434
Yuan, Y., Ding, X., & Bar-Joseph, Z. (2020). Causal Inference Using Deep Neural Networks. https://doi.org/10.48550/arxiv.2011.12508
Akil, Y. S., & Lateko, A. A. H. (2023). Analisis Kausalitas Antara Produksi Listrik RE Dengan CPI Dan GDP Di Indonesia. In Jurnal Teknik
Elektro Uniba (Jte Uniba). https://doi.org/10.36277/jteuniba.v8i1.234
Naibaho, R. (2020). Analisis Tingkat Pengungkapan Transaksi Pihak Berelasi Dan Pengaruhnya Terhadap Nilai Perusahaan (Studi Pada
Industri Manufaktur). In Abis Accounting and Business Information Systems Journal. https://doi.org/10.22146/abis.v7i4.58861
Guyon, I., Aliferis, C. F., Cooper, G. S., Elisseeff, A., Pellet, J., Spirtes, P., & Statnikov, A. (2011). Causality Workbench.
https://doi.org/10.1093/acprof:oso/9780199574131.003.0026
Lauritzen, S., & Spiegelhalter, D. (1988). Local Computations With Probabilities on Graphical Structures and Their Application to Expert Systems. In Journal of the Royal Statistical Society Series B (Statistical Methodology). https://doi.org/10.1111/j.2517-6161.1988.tb01721.x
Bryon Aragam. (2024). Greedy equivalence search for nonparametric graphical models. Arxiv.
Chickering, M. (2020). Statistically Efficient Greedy Equivalence Search. In J. Peters & D. Sontag (Eds.), Proceedings of the 36th Conference
on Uncertainty in Artificial Intelligence (UAI) (Vol. 124, pp. 241–249). PMLR. https://proceedings.mlr.press/v124/chickering20a.html