Machine Learning-Based Analysis of Cancer Incidence in Jordan (2020–2021): A Decision Tree Approach


Abstract


This study aimed to predict cancer types across different regions of Jordan using a machine learning-based Decision Tree Algorithm (DTA) model. The model employed patients’ demographic information—specifically gender, age, and region of residence—as independent variables (IVs) to assess their interaction with cancer types as the dependent variable (DV). The objective was to determine the predictive relationship between these demographic factors and cancer types to support regional cancer profiling and inform targeted public health planning. The research utilized secondary data from the Ministry of Health, the Directorate of Non-Communicable Diseases, and the Jordan Cancer Registry for the years 2020 to 2021. A total of 9,547 cancer cases were analyzed using the DTA model, which effectively identified significant incidence patterns, with the central region of Jordan accounting for the highest number of cases (n = 6,815; 71.4%). The model classified cancer into 25 distinct types based on demographic attributes, with breast cancer being the most prevalent, particularly among middle-aged females residing in the central region. The DTA model demonstrated high efficacy in handling and stratifying large-scale medical data, predicting cancer type interactions, categorizing and labeling datasets, and suggesting potential category mergers. These findings have important implications for the development of focused cancer prevention strategies and the efficient allocation of healthcare resources. However, a key limitation of the study is the incomplete characterization of cancer patient attributes across all Jordanian regions.


Keywords: Cancer incidence; Machine learning; Risk factors; Predictive modeling; Jordan

References


Abdel-Razeq, H., Al-Ibraheem, A., Al-Rabi, K., Shamiah, O., Al-Husaini, M., and Mansour, A. (2024). Cancer Care in Resource-Limited Countries: Jordan as an Example. JCO Global Oncology, 10:e2400237. doi:10.1200/GO.24.00237

Almaani, N., Juweid, M., Alduraidi, H., Ganem, N., Abu-Tayeh, F., Alrawi, R., et al. (2023). Incidence trends of melanoma and nonmelanoma skin cancers in Jordan from 2000 to 2016. JCO Global Oncology, 9:e2200338. doi:10.1200/GO.22.00338. PMID: 36812449; PMCID: PMC10166427

Bhatia, S., Landier, W., Paskett, E. D., Peters, K. B., Merrill, J. K., Phillips, J., and Osarogiagbon, R. U. (2022). Rural-Urban Disparities in Cancer Outcomes: Opportunities for Future Research. Journal of the National Cancer Institute, 114(7):940–952. doi:10.1093/jnci/djac030. PMID: 35148389; PMCID: PMC9275775

Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., and Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 68(6):394-424. doi:10.3322/caac.21492

Corallo, A., Fortunato, L., Massafra, A., Pasca, P., Angelelli, M., Hobbs, M., Al-Nasser, A. D., Al-Omari, A. I., and Ciavolino, E. (2020). Sentiment analysis of expectation and perception of MILANO EXPO2015 in twitter data: a generalized cross entropy approach. Soft Computing, 24(18):13597–13607. doi:10.1007/s00500-019-04368-7

Department of Statistics - Jordan (2024). Department of Statistics - Jordan. https://dosweb.dos.gov.jo

Ferlay, J., Shin, H., Bray, F., Forman, D., Mathers, C., and Parkin, D. (2010). Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. International Journal of Cancer, 127(12):2893–2917. doi:10.1002/ijc.25516

IBM Corp. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp. 2020

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction to statistical learning: with applications in R. Springer.

Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 29(2):119–127. doi:10.2307/2986296

Kim, J., Gosnell, J. E., and Roman, S. A. (2020). Geographic influences in the global rise of thyroid cancer. Nature Reviews Endocrinology, 16(1):17–29. doi:10.1038/s41574-019-0263-x

King Hussein Cancer Foundation & Center (2022). KHCF & KHCC in numbers 2022. https://www.khcc.jo/en/news/khcf-khcc-in-numbers-

Masoumi, Z. V., Genderen, J. L., and Mesgari, M. S. (2018). Modeling and predicting the spatial dispersion of skin cancer considering environmental and socio-economic factors using a digital earth approach. International Journal of Digital Earth, 13(6):661–682. doi:10.1080/17538947.2018.1551944

McMahon, K., Eaton, V., Srikanth, K., Tupper, C., Merwin, M., Morris, M., et al. (2023). Odds of Stage IV bone cancer diagnosis based on socioeconomic and geographical factors: a National Cancer Database (NCDB) review. Cureus, 15(2):e34819. doi:10.7759/cureus.34819. PMID: 36919067; PMCID: PMC10008125

Ministry of Health (2019). Annual Statistical Book 2019. Ministry of Health, Jordan

Ministry of Health (2021). Annual report of registered cancer incidents in Jordan for 2021. https://www.moh.gov.jo

Ministry of Health (2021). Annual report of registered cancer incidents in Jordan for 2021.

Nisbet, R., Elder, J., and Miner, G. (2009). Handbook of statistical analysis and data mining applications. Elsevier Inc.

Rokach, L. and Maimon, O. Z. (2008). Data mining with decision trees: theory and applications. World Scientific Publishing

Salem, H. S. (2023). Cancer status in the Occupied Palestinian Territories: types; incidence; mortality; sex, age, and geography distribution; and possible causes. Journal of Cancer Research and Clinical Oncology, 149(8):5139–5163. doi:10.1007/s00432-022-04430-2. Epub 2022 Nov 9. PMID: 36350411; PMCID: PMC9645346

Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., et al. (2021). Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 71(3):209–249. doi:10.3322/caac.21660

Tayefi, M., Esmaeili, H., Saberi, K., Amirabadi, Z., Ebrahimi, M., Safarian, M., et al. (2017). The application of a decision tree to establish the parameters associated with hypertension. Computer Methods and Programs in Biomedicine, 139:83–91. doi:10.1016/j.cmpb.2016.10.020. PMID: 28187897

Teli, S. and Kanikar, P. (2015). A survey on decision tree based approaches in data mining. International Journal of Advanced Research in Computer Science and Software Engineering, 5(4):613–617

World Population Review (2019). Jordan population 2017 (demographics, maps, graphs). World Population Review. (Archived version). https://worldpopulationreview.com

World Health Organization (2020). Global health estimates 2020: Deaths by cause, age, sex, by country and by region, 2000-2019. https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates/ghe. Retrieved December 11, 2020

Xie, Y., Shi, L., He, X., and Luo, Y. (2021). Gastrointestinal cancers in China, the USA, and Europe. Gastroenterology Report, 9(2):91–104. doi:10.1093/gastro/goab010

Yu, M., Hazelton, W. D., Luebeck, G. E., and Grady, W. M. (2020). Epigenetic aging: More than just a clock when it comes to cancer. Cancer Research, 80(3):367–374. doi:10.1158/0008-5472.CAN-19-0924


Full Text: pdf
کاغذ a4 ویزای استارتاپ

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.