Choosing a machine learning model for breast cancer detection

Ricardo Avila Hernandez; Kevin Ricardo Rossell Mendoza; Josue Alejandro Soto Mora

Ricardo Avila Hernandez Universidad La Salle México
Kevin Ricardo Rossell Mendoza Universidad La Salle México
Josue Alejandro Soto Mora Universidad La Salle México

Palabras clave: Cáncer de Mama, Clasificación, Teoría de las Decisiones, Aprendizaje de Máquina, Aprendizaje Supervisado

Resumen

El Aprendizaje de Máquina comprende una amplia gama de modelos que pretenden resolver problemas mediante algoritmos Supervisados y No Supervisados, éstos son capaces de encontrar relaciones causales y correlaciones que pueden pasar desapercibidas por otros métodos. Dados los avances tecnológicos, en concreto software, se pueden utilizar estas herramientas a varias disciplinas, como lo es Oncología. Ésta es una especialidad médica que se enfoca en el Cáncer y puede ser beneficiada al utilizar estos modelos para detección de Cáncer de Mama. En el presente artículo, exploramos un catálogo de modelos de Aprendizaje de Máquina Supervisados y estudiamos su eficiencia mediante diferentes criterios, para encontrar el más adecuado para resolver este problema. El método Analytic Hierarchy Process brindó resultados claros, mediante el cuál se asignó al Random Forest como el mejor modelo en los tres análisis que se llevaron a cabo; con una calificación más de 10% más alta que el segundo mejor modelo, la Regresión Logística. Estos modelos fueron entrenados con datos sobre diferentes células de tumores en mamas, por lo que con diferentes datos, los resultados pueden variar.

Descargas

La descarga de datos todavía no está disponible.

Citas

Accenture (2018). Consumer Survey on Digital Health. [Online], Available: https://www.accenture.com/_acnmedia/PDF-71/Accenture-Health-Meet-Todays-Healthcare-Team-Patients-Doctors-Machines.pdf#zoom=50 [10 Mar 2020].

Al-Allak, A., Bertelli, G. and Lewis, P.D. (2013). Random forests: The new generation of machine learning algorithms to predict survival in breast cancer, International Journal of Surgery, vol. 11, no. 8, pp. 607 https://dx.doi.org/10.1016/j.ijsu.2013.06.112

American Cancer Society (2012). Cancer Facts & Figures. American Cancer Society (ACS), Atlanta.

American Cancer Society (2016). What it is cancer? ACS. Retrieved from: https://www.cancer.org/es/cancer/aspectos-basicos-sobre-el-cancer/que-es-el-cancer.html [10 Mar 2020].

Anderson, B.O. (2014). UICC World Cancer Congress 2014: Global Breast Cancer Trends. Washington. [Online], Available: www.worldcancercongress.org/sites/congress/files/atoms/files/UICC41_Anderson-Benjamin-O.pdf [12 Mar 2020].

Bronshtein, A. (2017). Train/Test Split and Cross Validation in Python. Towards Data Science. [Online], Available: https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6 [27 Abr 2020].

Chaurasia, V. and Pal, S. (2004). “Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability,” International Journal of Computer Science and Mobile Computing IJCSMC, vol. 3, no. 1, pp. 10–22.

Djebbari, A., Liu, Z., Phan, S. and Famili, F. (2008). An ensemble machine learning approach to predict survival in breast cancer. International Journal of Computational Biology and Drug Design. , vol. 1, no. 3, pp. 275-294. https://dx.doi.org/10.1504/ijcbdd.2008.021422

Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition. Springer.

Asri, H., Mousannif, H., Al Moatassime, H. and Noel, T. (2016). Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis, Procedia Computer Science, Vol. 83, pp. 1064-1069 https://doi.org/10.1016/j.procs.2016.04.224

Houssami, N. and Hunter, K. (2017). The epidemiology, radiology and biological characteristics of interval breast cancers in population mammography screening. NPJ Breast Cancer, vol. 3, no. 12, pp. 1-13. https://dx.doi.org/10.1038/s41523-017-0014-x

Elmore, J.G., Jackson, S.L., Abraham, L., Miglioretti, D.L., Carney, P.A., Geller, B.M., Yankaskas, B.C., Kerlikowske, K., Onega, T., Rosenberg, R.D., Sickles, E.A. and Buist, D.S.M. (2009). Variability in interpretive performance at screening mammography and radiologists’ characteristics associated with accuracy. Radiology, vol. 253, no. 3., pp. 641–651. https://dx.doi.org/10.1148/radiol.2533082308

Fenton, J.J., Taplin, S.H., Carney, P.A., Abraham, L., Sickles, E.A., Berns, E.A., Cutter, G., Hendrick, R.E., Barlow, W.E. and Elmore, J.G. (2007). Influence of computer-aided detection on performance of screening mammography. The New England Journal of Medicine, vol. 356, no. 14, pp. 1399–1409. https://doi.org/10.1056/NEJMoa066099

Gupta, K. and Chawla, N. (2020). Analysis of Histopathological Images for Prediction of Breast Cancer Using Traditional Classifiers with Pre-Trained CNN, Procedia Computer Science, Vol. 167, pp. 878-889. https://doi.org/10.1016/j.procs.2020.03.427

Kohli, A. and Jha, S. (2018). Why CAD failed in mammography. Journal of the American College of Radiology, vol. 15, no. 3, pp. 535–537. https://doi.org/10.1016/j.jacr.2017.12.029

Kourou, K., Exarchos, T.P., Exarchos, K., Karanouzis, M.V. and Fotiadis, D. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, vol. 13, pp. 8-17. https://doi.org/10.1016/j.csbj.2014.11.005

Lantz, B. (2015). Machine Learning with R. 2nd Edition. Packt publishing.

Lehman, C.D., Wellman, R.D., Buist, D.S.M., Kerlikowske, K., Tosteson, A.N.A., Miglioretti, D.L. and Breast Cancer Surveillance Consortium (2015). Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Internal Medicine, vol. 175, no. 11, pp. 1828–1837. https://dx.doi.org/10.1001/jamainternmed.2015.5231

Lopez Guerra, J., Moreno, A., Parra, C., Gonzalez, R., Martinez, A., de Leon, J., Vieites, R., Ruiz, M., Lopez, M., Nieto, J., Fernandez, M., Rodriguez, E., Quintana, B. and Ortiz, M. (2013). Machine learning techniques to improve therapeutic decision-making in breast cancer, Reports of Practical Oncology and Radiotherapy, Vol. 18, Supplement 1. http://dx.doi.org/10.1016/j.rpor.2013.03.668

Lötsch, J., Sipilä, R., Dimova, V. and Kalso, E. (2018). Machine-learned selection of psychological questionnaire items relevant to the development of persistent pain after breast cancer surgery, British Journal of Anaesthesia, vol. 121, no. 5., pp. 1123-1132. https://doi.org/10.1016/j.bja.2018.06.007

Mayo Clinic (2011). Breast cancer. Patient Care & Health Information: Diseases & Conditions. [Online], Available: https://www.mayoclinic.org/diseases-conditions/breast-cancer/diagnosis-treatment/drc-20352475 [24 May 2020].

Mangasarian, O., Street, W. and Wolberg, W. (1994). Breast Cancer Diagnosis and Prognosis via Linear Programming. Operations Research, vol. 43, no. 4, pp. 1-9. https://doi.org/10.1287/opre.43.4.570

Mangasarian, O., Street, W. and Wolberg, W. (1995). Breast Cancer Wisconsin (Diagnostic) Data Set. Machine Learning Repository. UCI Center for Machine Learning and Intelligent Systems. [Online], Available: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 [30 Mar 2020].

McKinney, S.M., Sieniek, M., Godbole, V. and Shetty, S. (2020). International evaluation of an AI system for breast cancer screening. Nature, vol. 577, pp. 89-94.https://doi.org/10.1038/s41586-019-1799-6

INEGI (2018). STATISTICS ON WORLD CANCER DAY (4 FEBRUARY). Instituto Nacional de Estadística y Geografía, [Online], Available: https://www.inegi.org.mx/contenidos/saladeprensa/aproposito/2018/cancer2018_nal.pdf [11 Mar 2020].

Reddy Vaka, A., Soni, B. and Reddy, K.S. (2020). Breast cancer detection by leveraging Machine Learning, ICT Express, in press. https://doi.org/10.1016/j.icte.2020.04.009

R Project (2016). R Fortunes: Collected Wisdom. [Online], Available: https://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf [10 Mar 2020].

Saaty, T.L. (1977). A Scaling Method for Priorities in Hierarchical Structures. Journal of Mathematical Psychology, Vol. 15, No. 3, pp. 234-281. http://dx.doi.org/10.1016/0022-2496(77)90033-5

Saaty, T.L. (1980). The Analytic Hierarchy Process. McGraw-Hill, New York.

Saaty, R.W. (1987). The Analytic Hierarchy Process - What it is and how it is used. Mathematical Modelling, Vol. 9, No. 3-5, pp. 161-176. http://dx.doi.org/10.1016/0270-0255(87)90473-8

Silverio, M. (2020). Google AI for breast cancer detection beats doctors. Towards Data Science. Retrieved from: https://towardsdatascience.com/google-ai-for-breast-cancer-detection-beats-doctors-65b8983352e0 [12 Mar 2020].

Szeliski, R. (2010). Computer Vision: Algorithms and Applications. 1st Edition. Springer Science & Business Media.

Taulli, T. (2019) Artificial intelligence basics: a non-technical introduction. 1st Edition. Apress. https://doi.org/10.1007/978-1-4842-5028-0

TAC (2017). Cancer panorama in Mexico. Together Against Cancer, [Online], Available: https://juntoscontraelcancer.mx/panorama-del-cancer-en-mexico/ [11 Mar 2020].

Tosteson, A.N.A., Fryback, D.G., Hammond, C.S., Hanna, L.G, Grove, M.R., Brown, M., Wang, Q., Lindfors, K. and Pisano, E.D. (2014). Consequences of false-positive screening mammograms. JAMA Internal Medicine, vol. 174, no. 6, pp. 954–961. https://dx.doi.org/10.1001/jamainternmed.2014.981

Wolpert, D. and Macready, W. (1997). No Free Lunch Theorems for Optimization. IEEE Transactions on evolutionary computation, vol. 1, no. 1, pp. 67-82. https://dx.doi.org/10.1109/4235.585893

WHO (2004). National cancer control programs. World Health Organization, Washington DC. [Online], Available: https://www.paho.org/hq/dmdocuments/2012/OPS-Programas-Nacionales-Cancer-2004-Esp.pdf [12 Mar 2020].

WHO (2018a). Fact sheets “Cancer”. World Health Organization, [Online], Available: http://www.who.int/en/news-room/fact-sheets/detail/cancer [11 Mar 2020].

WHO (2018b). International Agency for Research on Cancer: Mexico. World Health Organization, [Online], Available: https://gco.iarc.fr/today/data/factsheets/populations/484-mexico-fact-sheets.pdf [11 Mar 2020].

Búsqueda de el mejor modelo de aprendizaje de máquina para detección de cáncer de mama

Resumen

Descargas

Citas