Choosing a machine learning model for breast cancer detection
Keywords:
Breast Cancer, Classification, Decision-Making Theory, Machine Learning, Supervised LearningAbstract
Machine Learning comprises a wide range of models aimed at solving real life problems using supervised and unsupervised algorithms capable of finding even the finest causalities and correlations between any given phenomena portrayed in data. Given the current extraordinary software capabilities, we can exploit this tool in practically any field – Oncology. For instance, a medical speciality which focuses on Cancer treatment can make use of these models to provide a more accurate diagnosis when it comes to Breast Cancer Detection. In this article we delve into a catalogue of Machine Learning models and discuss their effectiveness through specific criteria in order to choose the most suitable one for this problem. The Analytic Hierarchy Process displayed conclusive results assigning to the Random Forest the highest scores in each one of the analyses employed, over 10% better than the Logistic Regression, the second highest evaluated model in the overall analysis. The models we developed with data describing different features of different breast tumour nuclei, therefore, for another type of data results may differ.
Downloads
References
Accenture (2018). Consumer Survey on Digital Health. [Online], Available: https://www.accenture.com/_acnmedia/PDF-71/Accenture-Health-Meet-Todays-Healthcare-Team-Patients-Doctors-Machines.pdf#zoom=50 [10 Mar 2020].
Al-Allak, A., Bertelli, G. and Lewis, P.D. (2013). Random forests: The new generation of machine learning algorithms to predict survival in breast cancer, International Journal of Surgery, vol. 11, no. 8, pp. 607 https://dx.doi.org/10.1016/j.ijsu.2013.06.112
American Cancer Society (2012). Cancer Facts & Figures. American Cancer Society (ACS), Atlanta.
American Cancer Society (2016). What it is cancer? ACS. Retrieved from: https://www.cancer.org/es/cancer/aspectos-basicos-sobre-el-cancer/que-es-el-cancer.html [10 Mar 2020].
Anderson, B.O. (2014). UICC World Cancer Congress 2014: Global Breast Cancer Trends. Washington. [Online], Available: www.worldcancercongress.org/sites/congress/files/atoms/files/UICC41_Anderson-Benjamin-O.pdf [12 Mar 2020].
Bronshtein, A. (2017). Train/Test Split and Cross Validation in Python. Towards Data Science. [Online], Available: https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6 [27 Abr 2020].
Chaurasia, V. and Pal, S. (2004). “Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability,” International Journal of Computer Science and Mobile Computing IJCSMC, vol. 3, no. 1, pp. 10–22.
Djebbari, A., Liu, Z., Phan, S. and Famili, F. (2008). An ensemble machine learning approach to predict survival in breast cancer. International Journal of Computational Biology and Drug Design. , vol. 1, no. 3, pp. 275-294. https://dx.doi.org/10.1504/ijcbdd.2008.021422
Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition. Springer.
Asri, H., Mousannif, H., Al Moatassime, H. and Noel, T. (2016). Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis, Procedia Computer Science, Vol. 83, pp. 1064-1069 https://doi.org/10.1016/j.procs.2016.04.224
Houssami, N. and Hunter, K. (2017). The epidemiology, radiology and biological characteristics of interval breast cancers in population mammography screening. NPJ Breast Cancer, vol. 3, no. 12, pp. 1-13. https://dx.doi.org/10.1038/s41523-017-0014-x
Elmore, J.G., Jackson, S.L., Abraham, L., Miglioretti, D.L., Carney, P.A., Geller, B.M., Yankaskas, B.C., Kerlikowske, K., Onega, T., Rosenberg, R.D., Sickles, E.A. and Buist, D.S.M. (2009). Variability in interpretive performance at screening mammography and radiologists’ characteristics associated with accuracy. Radiology, vol. 253, no. 3., pp. 641–651. https://dx.doi.org/10.1148/radiol.2533082308
Fenton, J.J., Taplin, S.H., Carney, P.A., Abraham, L., Sickles, E.A., Berns, E.A., Cutter, G., Hendrick, R.E., Barlow, W.E. and Elmore, J.G. (2007). Influence of computer-aided detection on performance of screening mammography. The New England Journal of Medicine, vol. 356, no. 14, pp. 1399–1409. https://doi.org/10.1056/NEJMoa066099
Gupta, K. and Chawla, N. (2020). Analysis of Histopathological Images for Prediction of Breast Cancer Using Traditional Classifiers with Pre-Trained CNN, Procedia Computer Science, Vol. 167, pp. 878-889. https://doi.org/10.1016/j.procs.2020.03.427
Kohli, A. and Jha, S. (2018). Why CAD failed in mammography. Journal of the American College of Radiology, vol. 15, no. 3, pp. 535–537. https://doi.org/10.1016/j.jacr.2017.12.029
Kourou, K., Exarchos, T.P., Exarchos, K., Karanouzis, M.V. and Fotiadis, D. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, vol. 13, pp. 8-17. https://doi.org/10.1016/j.csbj.2014.11.005
Lantz, B. (2015). Machine Learning with R. 2nd Edition. Packt publishing.
Lehman, C.D., Wellman, R.D., Buist, D.S.M., Kerlikowske, K., Tosteson, A.N.A., Miglioretti, D.L. and Breast Cancer Surveillance Consortium (2015). Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Internal Medicine, vol. 175, no. 11, pp. 1828–1837. https://dx.doi.org/10.1001/jamainternmed.2015.5231
Lopez Guerra, J., Moreno, A., Parra, C., Gonzalez, R., Martinez, A., de Leon, J., Vieites, R., Ruiz, M., Lopez, M., Nieto, J., Fernandez, M., Rodriguez, E., Quintana, B. and Ortiz, M. (2013). Machine learning techniques to improve therapeutic decision-making in breast cancer, Reports of Practical Oncology and Radiotherapy, Vol. 18, Supplement 1. http://dx.doi.org/10.1016/j.rpor.2013.03.668
Lötsch, J., Sipilä, R., Dimova, V. and Kalso, E. (2018). Machine-learned selection of psychological questionnaire items relevant to the development of persistent pain after breast cancer surgery, British Journal of Anaesthesia, vol. 121, no. 5., pp. 1123-1132. https://doi.org/10.1016/j.bja.2018.06.007
Mayo Clinic (2011). Breast cancer. Patient Care & Health Information: Diseases & Conditions. [Online], Available: https://www.mayoclinic.org/diseases-conditions/breast-cancer/diagnosis-treatment/drc-20352475 [24 May 2020].
Mangasarian, O., Street, W. and Wolberg, W. (1994). Breast Cancer Diagnosis and Prognosis via Linear Programming. Operations Research, vol. 43, no. 4, pp. 1-9. https://doi.org/10.1287/opre.43.4.570
Mangasarian, O., Street, W. and Wolberg, W. (1995). Breast Cancer Wisconsin (Diagnostic) Data Set. Machine Learning Repository. UCI Center for Machine Learning and Intelligent Systems. [Online], Available: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 [30 Mar 2020].
McKinney, S.M., Sieniek, M., Godbole, V. and Shetty, S. (2020). International evaluation of an AI system for breast cancer screening. Nature, vol. 577, pp. 89-94.https://doi.org/10.1038/s41586-019-1799-6
INEGI (2018). STATISTICS ON WORLD CANCER DAY (4 FEBRUARY). Instituto Nacional de Estadística y Geografía, [Online], Available: https://www.inegi.org.mx/contenidos/saladeprensa/aproposito/2018/cancer2018_nal.pdf [11 Mar 2020].
Reddy Vaka, A., Soni, B. and Reddy, K.S. (2020). Breast cancer detection by leveraging Machine Learning, ICT Express, in press. https://doi.org/10.1016/j.icte.2020.04.009
R Project (2016). R Fortunes: Collected Wisdom. [Online], Available: https://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf [10 Mar 2020].
Saaty, T.L. (1977). A Scaling Method for Priorities in Hierarchical Structures. Journal of Mathematical Psychology, Vol. 15, No. 3, pp. 234-281. http://dx.doi.org/10.1016/0022-2496(77)90033-5
Saaty, T.L. (1980). The Analytic Hierarchy Process. McGraw-Hill, New York.
Saaty, R.W. (1987). The Analytic Hierarchy Process - What it is and how it is used. Mathematical Modelling, Vol. 9, No. 3-5, pp. 161-176. http://dx.doi.org/10.1016/0270-0255(87)90473-8
Silverio, M. (2020). Google AI for breast cancer detection beats doctors. Towards Data Science. Retrieved from: https://towardsdatascience.com/google-ai-for-breast-cancer-detection-beats-doctors-65b8983352e0 [12 Mar 2020].
Szeliski, R. (2010). Computer Vision: Algorithms and Applications. 1st Edition. Springer Science & Business Media.
Taulli, T. (2019) Artificial intelligence basics: a non-technical introduction. 1st Edition. Apress. https://doi.org/10.1007/978-1-4842-5028-0
TAC (2017). Cancer panorama in Mexico. Together Against Cancer, [Online], Available: https://juntoscontraelcancer.mx/panorama-del-cancer-en-mexico/ [11 Mar 2020].
Tosteson, A.N.A., Fryback, D.G., Hammond, C.S., Hanna, L.G, Grove, M.R., Brown, M., Wang, Q., Lindfors, K. and Pisano, E.D. (2014). Consequences of false-positive screening mammograms. JAMA Internal Medicine, vol. 174, no. 6, pp. 954–961. https://dx.doi.org/10.1001/jamainternmed.2014.981
Wolpert, D. and Macready, W. (1997). No Free Lunch Theorems for Optimization. IEEE Transactions on evolutionary computation, vol. 1, no. 1, pp. 67-82. https://dx.doi.org/10.1109/4235.585893
WHO (2004). National cancer control programs. World Health Organization, Washington DC. [Online], Available: https://www.paho.org/hq/dmdocuments/2012/OPS-Programas-Nacionales-Cancer-2004-Esp.pdf [12 Mar 2020].
WHO (2018a). Fact sheets “Cancer”. World Health Organization, [Online], Available: http://www.who.int/en/news-room/fact-sheets/detail/cancer [11 Mar 2020].
WHO (2018b). International Agency for Research on Cancer: Mexico. World Health Organization, [Online], Available: https://gco.iarc.fr/today/data/factsheets/populations/484-mexico-fact-sheets.pdf [11 Mar 2020].