Real time data analysis and visualization for the breast cancer disease

Rawan Ahmed Sanyour, Manal Abdullah

Abstract


Today, the amount of data that are digitally collected in the healthcare sector is tremendous and expanding rapidly, these data are inherently geospatial and temporal ranging from individual families to whole states and from minutes to decades. Therefore, they need sophisticated data management and analysis to be transformed into valuable knowledge. Healthcare professionals are faced with several challenges regarding extracting knowledge from this massive amount of data in order to support the decision-making process. To gain advantage of health care big data, big data analytics need to be exploited to utilize and understand patterns associations within these data thus make the right decision. In this research, an interactive data analysis and visualization tool is proposed to visually compare the performance of three machine learning algorithms on Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The proposed model consists of two phases: input phase and analysis/visualization phase. It aims to allow the user to interactively compare the performance of three different ML algorithms (KNN, SVM and NB) in terms of accuracy, sensitivity and error rate in a user-friendly way. Here, SVM classifier has proven its efficiency and it is concluded as the best classifier with the highest accuracy as compared to the other two classifiers.

Keywords


Data visualization; Interactive data visualization; Shiny app; Breast cancer; prediction; Classification

Full Text:

PDF

References


D. Lobach et al., “Evidence Report/Technology Assessment Enabling Health Care Decisionmaking Through Clinical Decision Support and Knowledge Management,” 2012.

I. Ko and H. Chang, “Interactive data visualization based on conventional statistical findings for antihypertensive prescriptions using National Health Insurance claims data,” Int. J. Med. Inform., vol. 116, no. February, pp. 1–8, 2018.

C. Plaisant, M. Monroe, T. Meyer, and B. Shneiderman, “Interactive Visualization,” Big Data Heal. Anal., pp. 1–18, 2014.

B. Shneiderman, C. Plaisant, and B. W. Hesse, “Improving healthcare with interactive visualization,” Computer (Long. Beach. Calif)., vol. 46, no. 5, pp. 58–66, 2013.

“Figure 1. Proposed Breast Cancer Diagnosis Model TABLE 1 DESCRIPTION OF THE BREAST CANCER DATASETS,” 2011.

“Wisconsin Breast Cancer (Diagnostic) DataSet Analysis.” [Online]. Available: http://rstudio-pubs-static.s3.amazonaws.com/344010_1f4d6691092d4544bfbddb092e7223d2.html. [Accessed: 05-Nov-2018].

“Breast cancer statistics | World Cancer Research Fund.” [Online]. Available: https://www.wcrf.org/dietandcancer/cancer-trends/breast-cancer-statistics. [Accessed: 18-Nov-2018].

P. Mendoza, M. Lacambra, P.-H. Tan, and G. M. Tse, “Fine needle aspiration cytology of the breast: the nonmalignant categories.,” Patholog. Res. Int., vol. 2011, p. 547580, May 2011.

D. A. Ellis and H. L. Merdian, “Thinking outside the box: Developing dynamic data visualizations for psychology with Shiny,” Front. Psychol., vol. 6, no. DEC, pp. 1–6, 2015.

R. Agrawal, A. Kadadi, X. Dai, and F. Andres, “Challenges and opportunities with big data visualization,” Proc. 7th Int. Conf. Manag. Comput. Collect. Intell. Digit. Ecosyst. - MEDES ’15, no. October, pp. 169–173, 2015.

W. Cho, Y. Lim, H. Lee, M. K. Varma, M. Lee, and E. Choi, “Big Data Analysis with Interactive Visualization using R packages,” Proc. 2014 Int. Conf. Big Data Sci. Comput. - BigDataScience ’14, pp. 1–6, 2014.

L. Wang, G. Wang, and C. A. Alexander, “Big Data and Visualization: Methods, Challenges and Technology Progress,” Digit. Technol., vol. 1, no. 1, pp. 33–38, 2015.

“Markus Loecher and Karl Ropkins (2015). RgoogleMaps and loa: Unleashing R Graphics Power on Map Tiles. Journal of Statistical Software 63(4), 1-18.”

M. Gesmann and D. De Castillo, “Using the Google Visualisation API with R.”

“Introducing Shiny: Easy web applications in R | RStudio Blog.” [Online]. Available: https://blog.rstudio.com/2012/11/08/introducing-shiny/. [Accessed: 18-Oct-2018].

O. Scrivner, V. Chakilam, J. Poojary, N. Sahoo, C. Uppuluri, and S. De Spiegeleire, “Building Customized Text Mining Tools via Shiny Framework: The Future of Data Visualization,” no. May 2017.

H. Asri, H. Mousannif, H. Al Moatassime, and T. Noel, “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis,” Procedia Comput. Sci., vol. 83, no. Fams, pp. 1064–1069, 2016.

V. Chaurasia and S. Pal, “Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability,” Int. J. Comput. Sci. Mob. Comput., vol. 3, no. 1, pp. 10–22, 2014.

S. Aruna, S. P. Rajagopalan, L. V Nandakishore, and S. C. In, “Knowledge Based Analysis Of Various Statistical Tools In Detecting Breast Cancer,” pp. 37–45, 2011.

L. Rodrigues, “Analysis of the Wisconsin Breast Cancer Dataset and Machine Learning for Breast Cancer Detection Analysis of the Wisconsin Breast Cancer Dataset and Machine Learning for Breast Cancer Detection,” XI Work. Visão Comput., no. December, pp. 415–423, 2016.

“UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set.” [Online]. Available: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original). [Accessed: 18-Nov-2018].

E. Beauxis-aussalet and L. Hardman, “Simplifying the Visualization of Confusion Matrix,” no. May, pp. 1–2, 2016.

“Understanding Confusion Matrix – Towards Data Science.” [Online]. Available: https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62. [Accessed: 19-Nov-2018].




DOI: http://dx.doi.org/10.21533/pen.v7i1.421

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 Rawan Ahmed Sanyour, Manal Abdullah

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2303-4521

Digital Object Identifier DOI: 10.21533/pen

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License