Búsqueda de el mejor modelo de aprendizaje de máquina para detección de cáncer de mama
Palabras clave:
Cáncer de Mama, Clasificación, Teoría de las Decisiones, Aprendizaje de Máquina, Aprendizaje SupervisadoResumen
El Aprendizaje de Máquina comprende una amplia gama de modelos que pretenden resolver problemas mediante algoritmos Supervisados y No Supervisados, éstos son capaces de encontrar relaciones causales y correlaciones que pueden pasar desapercibidas por otros métodos. Dados los avances tecnológicos, en concreto software, se pueden utilizar estas herramientas a varias disciplinas, como lo es Oncología. Ésta es una especialidad médica que se enfoca en el Cáncer y puede ser beneficiada al utilizar estos modelos para detección de Cáncer de Mama. En el presente artículo, exploramos un catálogo de modelos de Aprendizaje de Máquina Supervisados y estudiamos su eficiencia mediante diferentes criterios, para encontrar el más adecuado para resolver este problema. El método Analytic Hierarchy Process brindó resultados claros, mediante el cuál se asignó al Random Forest como el mejor modelo en los tres análisis que se llevaron a cabo; con una calificación más de 10% más alta que el segundo mejor modelo, la Regresión Logística. Estos modelos fueron entrenados con datos sobre diferentes células de tumores en mamas, por lo que con diferentes datos, los resultados pueden variar.
Descargas
Citas
Accenture (2018). Consumer Survey on Digital Health. [Online], Available: https://www.accenture.com/_acnmedia/PDF-71/Accenture-Health-Meet-Todays-Healthcare-Team-Patients-Doctors-Machines.pdf#zoom=50 [10 Mar 2020].
Al-Allak, A., Bertelli, G. and Lewis, P.D. (2013). Random forests: The new generation of machine learning algorithms to predict survival in breast cancer, International Journal of Surgery, vol. 11, no. 8, pp. 607 https://dx.doi.org/10.1016/j.ijsu.2013.06.112
American Cancer Society (2012). Cancer Facts & Figures. American Cancer Society (ACS), Atlanta.
American Cancer Society (2016). What it is cancer? ACS. Retrieved from: https://www.cancer.org/es/cancer/aspectos-basicos-sobre-el-cancer/que-es-el-cancer.html [10 Mar 2020].
Anderson, B.O. (2014). UICC World Cancer Congress 2014: Global Breast Cancer Trends. Washington. [Online], Available: www.worldcancercongress.org/sites/congress/files/atoms/files/UICC41_Anderson-Benjamin-O.pdf [12 Mar 2020].
Bronshtein, A. (2017). Train/Test Split and Cross Validation in Python. Towards Data Science. [Online], Available: https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6 [27 Abr 2020].
Chaurasia, V. and Pal, S. (2004). “Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability,” International Journal of Computer Science and Mobile Computing IJCSMC, vol. 3, no. 1, pp. 10–22.
Djebbari, A., Liu, Z., Phan, S. and Famili, F. (2008). An ensemble machine learning approach to predict survival in breast cancer. International Journal of Computational Biology and Drug Design. , vol. 1, no. 3, pp. 275-294. https://dx.doi.org/10.1504/ijcbdd.2008.021422
Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition. Springer.
Asri, H., Mousannif, H., Al Moatassime, H. and Noel, T. (2016). Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis, Procedia Computer Science, Vol. 83, pp. 1064-1069 https://doi.org/10.1016/j.procs.2016.04.224
Houssami, N. and Hunter, K. (2017). The epidemiology, radiology and biological characteristics of interval breast cancers in population mammography screening. NPJ Breast Cancer, vol. 3, no. 12, pp. 1-13. https://dx.doi.org/10.1038/s41523-017-0014-x
Elmore, J.G., Jackson, S.L., Abraham, L., Miglioretti, D.L., Carney, P.A., Geller, B.M., Yankaskas, B.C., Kerlikowske, K., Onega, T., Rosenberg, R.D., Sickles, E.A. and Buist, D.S.M. (2009). Variability in interpretive performance at screening mammography and radiologists’ characteristics associated with accuracy. Radiology, vol. 253, no. 3., pp. 641–651. https://dx.doi.org/10.1148/radiol.2533082308
Fenton, J.J., Taplin, S.H., Carney, P.A., Abraham, L., Sickles, E.A., Berns, E.A., Cutter, G., Hendrick, R.E., Barlow, W.E. and Elmore, J.G. (2007). Influence of computer-aided detection on performance of screening mammography. The New England Journal of Medicine, vol. 356, no. 14, pp. 1399–1409. https://doi.org/10.1056/NEJMoa066099
Gupta, K. and Chawla, N. (2020). Analysis of Histopathological Images for Prediction of Breast Cancer Using Traditional Classifiers with Pre-Trained CNN, Procedia Computer Science, Vol. 167, pp. 878-889. https://doi.org/10.1016/j.procs.2020.03.427
Kohli, A. and Jha, S. (2018). Why CAD failed in mammography. Journal of the American College of Radiology, vol. 15, no. 3, pp. 535–537. https://doi.org/10.1016/j.jacr.2017.12.029
Kourou, K., Exarchos, T.P., Exarchos, K., Karanouzis, M.V. and Fotiadis, D. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, vol. 13, pp. 8-17. https://doi.org/10.1016/j.csbj.2014.11.005
Lantz, B. (2015). Machine Learning with R. 2nd Edition. Packt publishing.
Lehman, C.D., Wellman, R.D., Buist, D.S.M., Kerlikowske, K., Tosteson, A.N.A., Miglioretti, D.L. and Breast Cancer Surveillance Consortium (2015). Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Internal Medicine, vol. 175, no. 11, pp. 1828–1837. https://dx.doi.org/10.1001/jamainternmed.2015.5231
Lopez Guerra, J., Moreno, A., Parra, C., Gonzalez, R., Martinez, A., de Leon, J., Vieites, R., Ruiz, M., Lopez, M., Nieto, J., Fernandez, M., Rodriguez, E., Quintana, B. and Ortiz, M. (2013). Machine learning techniques to improve therapeutic decision-making in breast cancer, Reports of Practical Oncology and Radiotherapy, Vol. 18, Supplement 1. http://dx.doi.org/10.1016/j.rpor.2013.03.668
Lötsch, J., Sipilä, R., Dimova, V. and Kalso, E. (2018). Machine-learned selection of psychological questionnaire items relevant to the development of persistent pain after breast cancer surgery, British Journal of Anaesthesia, vol. 121, no. 5., pp. 1123-1132. https://doi.org/10.1016/j.bja.2018.06.007
Mayo Clinic (2011). Breast cancer. Patient Care & Health Information: Diseases & Conditions. [Online], Available: https://www.mayoclinic.org/diseases-conditions/breast-cancer/diagnosis-treatment/drc-20352475 [24 May 2020].
Mangasarian, O., Street, W. and Wolberg, W. (1994). Breast Cancer Diagnosis and Prognosis via Linear Programming. Operations Research, vol. 43, no. 4, pp. 1-9. https://doi.org/10.1287/opre.43.4.570
Mangasarian, O., Street, W. and Wolberg, W. (1995). Breast Cancer Wisconsin (Diagnostic) Data Set. Machine Learning Repository. UCI Center for Machine Learning and Intelligent Systems. [Online], Available: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29 [30 Mar 2020].
McKinney, S.M., Sieniek, M., Godbole, V. and Shetty, S. (2020). International evaluation of an AI system for breast cancer screening. Nature, vol. 577, pp. 89-94.https://doi.org/10.1038/s41586-019-1799-6
INEGI (2018). STATISTICS ON WORLD CANCER DAY (4 FEBRUARY). Instituto Nacional de Estadística y Geografía, [Online], Available: https://www.inegi.org.mx/contenidos/saladeprensa/aproposito/2018/cancer2018_nal.pdf [11 Mar 2020].
Reddy Vaka, A., Soni, B. and Reddy, K.S. (2020). Breast cancer detection by leveraging Machine Learning, ICT Express, in press. https://doi.org/10.1016/j.icte.2020.04.009
R Project (2016). R Fortunes: Collected Wisdom. [Online], Available: https://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf [10 Mar 2020].
Saaty, T.L. (1977). A Scaling Method for Priorities in Hierarchical Structures. Journal of Mathematical Psychology, Vol. 15, No. 3, pp. 234-281. http://dx.doi.org/10.1016/0022-2496(77)90033-5
Saaty, T.L. (1980). The Analytic Hierarchy Process. McGraw-Hill, New York.
Saaty, R.W. (1987). The Analytic Hierarchy Process - What it is and how it is used. Mathematical Modelling, Vol. 9, No. 3-5, pp. 161-176. http://dx.doi.org/10.1016/0270-0255(87)90473-8
Silverio, M. (2020). Google AI for breast cancer detection beats doctors. Towards Data Science. Retrieved from: https://towardsdatascience.com/google-ai-for-breast-cancer-detection-beats-doctors-65b8983352e0 [12 Mar 2020].
Szeliski, R. (2010). Computer Vision: Algorithms and Applications. 1st Edition. Springer Science & Business Media.
Taulli, T. (2019) Artificial intelligence basics: a non-technical introduction. 1st Edition. Apress. https://doi.org/10.1007/978-1-4842-5028-0
TAC (2017). Cancer panorama in Mexico. Together Against Cancer, [Online], Available: https://juntoscontraelcancer.mx/panorama-del-cancer-en-mexico/ [11 Mar 2020].
Tosteson, A.N.A., Fryback, D.G., Hammond, C.S., Hanna, L.G, Grove, M.R., Brown, M., Wang, Q., Lindfors, K. and Pisano, E.D. (2014). Consequences of false-positive screening mammograms. JAMA Internal Medicine, vol. 174, no. 6, pp. 954–961. https://dx.doi.org/10.1001/jamainternmed.2014.981
Wolpert, D. and Macready, W. (1997). No Free Lunch Theorems for Optimization. IEEE Transactions on evolutionary computation, vol. 1, no. 1, pp. 67-82. https://dx.doi.org/10.1109/4235.585893
WHO (2004). National cancer control programs. World Health Organization, Washington DC. [Online], Available: https://www.paho.org/hq/dmdocuments/2012/OPS-Programas-Nacionales-Cancer-2004-Esp.pdf [12 Mar 2020].
WHO (2018a). Fact sheets “Cancer”. World Health Organization, [Online], Available: http://www.who.int/en/news-room/fact-sheets/detail/cancer [11 Mar 2020].
WHO (2018b). International Agency for Research on Cancer: Mexico. World Health Organization, [Online], Available: https://gco.iarc.fr/today/data/factsheets/populations/484-mexico-fact-sheets.pdf [11 Mar 2020].