Choosing a machine learning model for breast cancer detection

Authors

  • Ricardo Avila Hernandez La Salle University in Mexico City
  • Kevin Ricardo Rossell Mendoza Universidad La Salle México
  • Josue Alejandro Soto Mora La Salle University in Mexico City

Keywords:

Breast Cancer, Classification, Decision-Making Theory, Machine Learning, Supervised Learning

Abstract

Machine Learning comprises a wide range of models aimed at solving real life problems using supervised and unsupervised algorithms capable of finding even the finest causalities and correlations between any given phenomena portrayed in data. Given the current extraordinary software capabilities, we can exploit this tool in practically any field – Oncology. For instance, a medical speciality which focuses on Cancer treatment can make use of these models to provide a more accurate diagnosis when it comes to Breast Cancer Detection. In this article we delve into a catalogue of Machine Learning models and discuss their effectiveness through specific criteria in order to choose the most suitable one for this problem. The Analytic Hierarchy Process displayed conclusive results assigning to the Random Forest the highest scores in each one of the analyses employed, over 10% better than the Logistic Regression, the second highest evaluated model in the overall analysis. The models we developed with data describing different features of different breast tumour nuclei, therefore, for another type of data results may differ.

Downloads

Download data is not yet available.

References

Accenture (2018). Consumer Survey on Digital Health. [Online], Available: https://www.accenture.com/_acnmedia/PDF-71/Accenture-Health-Meet-Todays-Healthcare-Team-Patients-Doctors-Machines.pdf#zoom=50 [10 Mar 2020].

Al-Allak, A., Bertelli, G. and Lewis, P.D. (2013). Random forests: The new generation of machine learning algorithms to predict survival in breast cancer, International Journal of Surgery, vol. 11, no. 8, pp. 607 https://dx.doi.org/10.1016/j.ijsu.2013.06.112

American Cancer Society (2012). Cancer Facts & Figures. American Cancer Society (ACS), Atlanta.

American Cancer Society (2016). What it is cancer? ACS. Retrieved from: https://www.cancer.org/es/cancer/aspectos-basicos-sobre-el-cancer/que-es-el-cancer.html [10 Mar 2020].

Anderson, B.O. (2014). UICC World Cancer Congress 2014: Global Breast Cancer Trends. Washington. [Online], Available: www.worldcancercongress.org/sites/congress/files/atoms/files/UICC41_Anderson-Benjamin-O.pdf [12 Mar 2020].

Bronshtein, A. (2017). Train/Test Split and Cross Validation in Python. Towards Data Science. [Online], Available: https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6 [27 Abr 2020].

Chaurasia, V. and Pal, S. (2004). “Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability,” International Journal of Computer Science and Mobile Computing IJCSMC, vol. 3, no. 1, pp. 10–22.

Djebbari, A., Liu, Z., Phan, S. and Famili, F. (2008). An ensemble machine learning approach to predict survival in breast cancer. International Journal of Computational Biology and Drug Design. , vol. 1, no. 3, pp. 275-294. https://dx.doi.org/10.1504/ijcbdd.2008.021422

Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition. Springer.

Asri, H., Mousannif, H., Al Moatassime, H. and Noel, T. (2016). Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis, Procedia Computer Science, Vol. 83, pp. 1064-1069 https://doi.org/10.1016/j.procs.2016.04.224

Houssami, N. and Hunter, K. (2017). The epidemiology, radiology and biological characteristics of interval breast cancers in population mammography screening. NPJ Breast Cancer, vol. 3, no. 12, pp. 1-13. https://dx.doi.org/10.1038/s41523-017-0014-x

Elmore, J.G., Jackson, S.L., Abraham, L., Miglioretti, D.L., Carney, P.A., Geller, B.M., Yankaskas, B.C., Kerlikowske, K., Onega, T., Rosenberg, R.D., Sickles, E.A. and Buist, D.S.M. (2009). Variability in interpretive performance at screening mammography and radiologists’ characteristics associated with accuracy. Radiology, vol. 253, no. 3., pp. 641–651. https://dx.doi.org/10.1148/radiol.2533082308

Fenton, J.J., Taplin, S.H., Carney, P.A., Abraham, L., Sickles, E.A., Berns, E.A., Cutter, G., Hendrick, R.E., Barlow, W.E. and Elmore, J.G. (2007). Influence of computer-aided detection on performance of screening mammography. The New England Journal of Medicine, vol. 356, no. 14, pp. 1399–1409. https://doi.org/10.1056/NEJMoa066099

Gupta, K. and Chawla, N. (2020). Analysis of Histopathological Images for Prediction of Breast Cancer Using Traditional Classifiers with Pre-Trained CNN, Procedia Computer Science, Vol. 167, pp. 878-889. https://doi.org/10.1016/j.procs.2020.03.427

Kohli, A. and Jha, S. (2018). Why CAD failed in mammography. Journal of the American College of Radiology, vol. 15, no. 3, pp. 535–537. https://doi.org/10.1016/j.jacr.2017.12.029

Kourou, K., Exarchos, T.P., Exarchos, K., Karanouzis, M.V. and Fotiadis, D. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, vol. 13, pp. 8-17. https://doi.org/10.1016/j.csbj.2014.11.005

Lantz, B. (2015). Machine Learning with R. 2nd Edition. Packt publishing.

Lehman, C.D., Wellman, R.D., Buist, D.S.M., Kerlikowske, K., Tosteson, A.N.A., Miglioretti, D.L. and Breast Cancer Surveillance Consortium (2015). Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Internal Medicine, vol. 175, no. 11, pp. 1828–1837. https://dx.doi.org/10.1001/jamainternmed.2015.5231

Lopez Guerra, J., Moreno, A., Parra, C., Gonzalez, R., Martinez, A., de Leon, J., Vieites, R., Ruiz, M., Lopez, M., Nieto, J., Fernandez, M., Rodriguez, E., Quintana, B. and Ortiz, M. (2013). Machine learning techniques to improve therapeutic decision-making in breast cancer, Reports of Practical Oncology and Radiotherapy, Vol. 18, Supplement 1. http://dx.doi.org/10.1016/j.rpor.2013.03.668

Lötsch, J., Sipilä, R., Dimova, V. and Kalso, E. (2018). Machine-learned selection of psychological questionnaire items relevant to the development of persistent pain after breast cancer surgery, British Journal of Anaesthesia, vol. 121, no. 5., pp. 1123-1132. https://doi.org/10.1016/j.bja.2018.06.007

Mayo Clinic (2011). Breast cancer. Patient Care & Health Information: Diseases & Conditions. [Online], Available: https://www.mayoclinic.org/diseases-conditions/breast-cancer/diagnosis-treatment/drc-20352475 [24 May 2020].

Mangasarian, O., Street, W. and Wolberg, W. (1994). Breast Cancer Diagnosis and Prognosis via Linear Programming. Operations Research, vol. 43, no. 4, pp. 1-9. https://doi.org/10.1287/opre.43.4.570

Mangasarian, O., Street, W. and Wolberg, W. (1995). Breast Cancer Wisconsin (Diagnostic) Data Set. Machine Learning Repository. UCI Center for Machine Learning and Intelligent Systems. [Online], Available: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 [30 Mar 2020].

McKinney, S.M., Sieniek, M., Godbole, V. and Shetty, S. (2020). International evaluation of an AI system for breast cancer screening. Nature, vol. 577, pp. 89-94.https://doi.org/10.1038/s41586-019-1799-6

INEGI (2018). STATISTICS ON WORLD CANCER DAY (4 FEBRUARY). Instituto Nacional de Estadística y Geografía, [Online], Available: https://www.inegi.org.mx/contenidos/saladeprensa/aproposito/2018/cancer2018_nal.pdf [11 Mar 2020].

Reddy Vaka, A., Soni, B. and Reddy, K.S. (2020). Breast cancer detection by leveraging Machine Learning, ICT Express, in press. https://doi.org/10.1016/j.icte.2020.04.009

R Project (2016). R Fortunes: Collected Wisdom. [Online], Available: https://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf [10 Mar 2020].

Saaty, T.L. (1977). A Scaling Method for Priorities in Hierarchical Structures. Journal of Mathematical Psychology, Vol. 15, No. 3, pp. 234-281. http://dx.doi.org/10.1016/0022-2496(77)90033-5

Saaty, T.L. (1980). The Analytic Hierarchy Process. McGraw-Hill, New York.

Saaty, R.W. (1987). The Analytic Hierarchy Process - What it is and how it is used. Mathematical Modelling, Vol. 9, No. 3-5, pp. 161-176. http://dx.doi.org/10.1016/0270-0255(87)90473-8

Silverio, M. (2020). Google AI for breast cancer detection beats doctors. Towards Data Science. Retrieved from: https://towardsdatascience.com/google-ai-for-breast-cancer-detection-beats-doctors-65b8983352e0 [12 Mar 2020].

Szeliski, R. (2010). Computer Vision: Algorithms and Applications. 1st Edition. Springer Science & Business Media.

Taulli, T. (2019) Artificial intelligence basics: a non-technical introduction. 1st Edition. Apress. https://doi.org/10.1007/978-1-4842-5028-0

TAC (2017). Cancer panorama in Mexico. Together Against Cancer, [Online], Available: https://juntoscontraelcancer.mx/panorama-del-cancer-en-mexico/ [11 Mar 2020].

Tosteson, A.N.A., Fryback, D.G., Hammond, C.S., Hanna, L.G, Grove, M.R., Brown, M., Wang, Q., Lindfors, K. and Pisano, E.D. (2014). Consequences of false-positive screening mammograms. JAMA Internal Medicine, vol. 174, no. 6, pp. 954–961. https://dx.doi.org/10.1001/jamainternmed.2014.981

Wolpert, D. and Macready, W. (1997). No Free Lunch Theorems for Optimization. IEEE Transactions on evolutionary computation, vol. 1, no. 1, pp. 67-82. https://dx.doi.org/10.1109/4235.585893

WHO (2004). National cancer control programs. World Health Organization, Washington DC. [Online], Available: https://www.paho.org/hq/dmdocuments/2012/OPS-Programas-Nacionales-Cancer-2004-Esp.pdf [12 Mar 2020].

WHO (2018a). Fact sheets “Cancer”. World Health Organization, [Online], Available: http://www.who.int/en/news-room/fact-sheets/detail/cancer [11 Mar 2020].

WHO (2018b). International Agency for Research on Cancer: Mexico. World Health Organization, [Online], Available: https://gco.iarc.fr/today/data/factsheets/populations/484-mexico-fact-sheets.pdf [11 Mar 2020].

Downloads

Published

2020-12-17

How to Cite

Hernandez, R. A., Rossell Mendoza, K. R., & Soto Mora, J. A. (2020). Choosing a machine learning model for breast cancer detection. Revista Latinoamericana De Investigación Social, 3(3), 19–35. Retrieved from https://revistasinvestigacion.lasalle.mx/index.php/relais/article/view/2668

Issue

Section

Research article

Similar Articles

1 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.