Machine Learning-Based Analysis of Cancer Incidence in Jordan (2020–2021): A Decision Tree Approach
Abstract
This study aimed to predict cancer types across different regions of Jordan using a machine learning-based Decision Tree Algorithm (DTA) model. The model employed patients’ demographic information—specifically gender, age, and region of residence—as independent variables (IVs) to assess their interaction with cancer types as the dependent variable (DV). The objective was to determine the predictive relationship between these demographic factors and cancer types to support regional cancer profiling and inform targeted public health planning. The research utilized secondary data from the Ministry of Health, the Directorate of Non-Communicable Diseases, and the Jordan Cancer Registry for the years 2020 to 2021. A total of 9,547 cancer cases were analyzed using the DTA model, which effectively identified significant incidence patterns, with the central region of Jordan accounting for the highest number of cases (n = 6,815; 71.4%). The model classified cancer into 25 distinct types based on demographic attributes, with breast cancer being the most prevalent, particularly among middle-aged females residing in the central region. The DTA model demonstrated high efficacy in handling and stratifying large-scale medical data, predicting cancer type interactions, categorizing and labeling datasets, and suggesting potential category mergers. These findings have important implications for the development of focused cancer prevention strategies and the efficient allocation of healthcare resources. However, a key limitation of the study is the incomplete characterization of cancer patient attributes across all Jordanian regions.
References
Abdel-Razeq, H., Al-Ibraheem, A., Al-Rabi, K., Shamiah, O., Al-Husaini, M., and Mansour, A. (2024). Cancer Care in Resource-Limited Countries: Jordan as an Example. JCO Global Oncology, 10:e2400237. doi:10.1200/GO.24.00237
Almaani, N., Juweid, M., Alduraidi, H., Ganem, N., Abu-Tayeh, F., Alrawi, R., et al. (2023). Incidence trends of melanoma and nonmelanoma skin cancers in Jordan from 2000 to 2016. JCO Global Oncology, 9:e2200338. doi:10.1200/GO.22.00338. PMID: 36812449; PMCID: PMC10166427
Bhatia, S., Landier, W., Paskett, E. D., Peters, K. B., Merrill, J. K., Phillips, J., and Osarogiagbon, R. U. (2022). Rural-Urban Disparities in Cancer Outcomes: Opportunities for Future Research. Journal of the National Cancer Institute, 114(7):940–952. doi:10.1093/jnci/djac030. PMID: 35148389; PMCID: PMC9275775
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., and Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 68(6):394-424. doi:10.3322/caac.21492
Corallo, A., Fortunato, L., Massafra, A., Pasca, P., Angelelli, M., Hobbs, M., Al-Nasser, A. D., Al-Omari, A. I., and Ciavolino, E. (2020). Sentiment analysis of expectation and perception of MILANO EXPO2015 in twitter data: a generalized cross entropy approach. Soft Computing, 24(18):13597–13607. doi:10.1007/s00500-019-04368-7
Department of Statistics - Jordan (2024). Department of Statistics - Jordan. https://dosweb.dos.gov.jo
Ferlay, J., Shin, H., Bray, F., Forman, D., Mathers, C., and Parkin, D. (2010). Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. International Journal of Cancer, 127(12):2893–2917. doi:10.1002/ijc.25516
IBM Corp. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp. 2020
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction to statistical learning: with applications in R. Springer.
Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 29(2):119–127. doi:10.2307/2986296
Kim, J., Gosnell, J. E., and Roman, S. A. (2020). Geographic influences in the global rise of thyroid cancer. Nature Reviews Endocrinology, 16(1):17–29. doi:10.1038/s41574-019-0263-x
King Hussein Cancer Foundation & Center (2022). KHCF & KHCC in numbers 2022. https://www.khcc.jo/en/news/khcf-khcc-in-numbers-
Masoumi, Z. V., Genderen, J. L., and Mesgari, M. S. (2018). Modeling and predicting the spatial dispersion of skin cancer considering environmental and socio-economic factors using a digital earth approach. International Journal of Digital Earth, 13(6):661–682. doi:10.1080/17538947.2018.1551944
McMahon, K., Eaton, V., Srikanth, K., Tupper, C., Merwin, M., Morris, M., et al. (2023). Odds of Stage IV bone cancer diagnosis based on socioeconomic and geographical factors: a National Cancer Database (NCDB) review. Cureus, 15(2):e34819. doi:10.7759/cureus.34819. PMID: 36919067; PMCID: PMC10008125
Ministry of Health (2019). Annual Statistical Book 2019. Ministry of Health, Jordan
Ministry of Health (2021). Annual report of registered cancer incidents in Jordan for 2021. https://www.moh.gov.jo
Ministry of Health (2021). Annual report of registered cancer incidents in Jordan for 2021.
Nisbet, R., Elder, J., and Miner, G. (2009). Handbook of statistical analysis and data mining applications. Elsevier Inc.
Rokach, L. and Maimon, O. Z. (2008). Data mining with decision trees: theory and applications. World Scientific Publishing
Salem, H. S. (2023). Cancer status in the Occupied Palestinian Territories: types; incidence; mortality; sex, age, and geography distribution; and possible causes. Journal of Cancer Research and Clinical Oncology, 149(8):5139–5163. doi:10.1007/s00432-022-04430-2. Epub 2022 Nov 9. PMID: 36350411; PMCID: PMC9645346
Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., et al. (2021). Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 71(3):209–249. doi:10.3322/caac.21660
Tayefi, M., Esmaeili, H., Saberi, K., Amirabadi, Z., Ebrahimi, M., Safarian, M., et al. (2017). The application of a decision tree to establish the parameters associated with hypertension. Computer Methods and Programs in Biomedicine, 139:83–91. doi:10.1016/j.cmpb.2016.10.020. PMID: 28187897
Teli, S. and Kanikar, P. (2015). A survey on decision tree based approaches in data mining. International Journal of Advanced Research in Computer Science and Software Engineering, 5(4):613–617
World Population Review (2019). Jordan population 2017 (demographics, maps, graphs). World Population Review. (Archived version). https://worldpopulationreview.com
World Health Organization (2020). Global health estimates 2020: Deaths by cause, age, sex, by country and by region, 2000-2019. https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates/ghe. Retrieved December 11, 2020
Xie, Y., Shi, L., He, X., and Luo, Y. (2021). Gastrointestinal cancers in China, the USA, and Europe. Gastroenterology Report, 9(2):91–104. doi:10.1093/gastro/goab010
Yu, M., Hazelton, W. D., Luebeck, G. E., and Grady, W. M. (2020). Epigenetic aging: More than just a clock when it comes to cancer. Cancer Research, 80(3):367–374. doi:10.1158/0008-5472.CAN-19-0924
Full Text: pdf


 
  
  
  
  
  Email this article
			Email this article  
			