Performance Analysis of Classification Models for Prediction of Benign and Malignant Mammographic Masses
Keywords:
Benign, Malignant, Mammographic Mass, Classification, Prediction, Confusion Matrix, Principal Component Analysis, Breast Cancer, Mammography, MammogramsAbstract
About 1.7M new breast cancer cases were diagnosed in 2012. As of 2018, nearly 12.4% women in US are
expected to develop invasive breast cancer over their lifetime. Mammography has always been the most effective
technique for the screening of breast cancer. But, the low positive predictive value of breast biopsy which results from
the interpretation of mammogram leads to nearly 70% unnecessary biopsies with benign outcomes. To solve this
problem, supervised machine learning classification algorithms can be applied to develop a machine learning models
which can predict the rigorousness of a mammographic mass with the help of BI-RADS attributes and the patient’s
age.
830 records with a total of 6 attributes were recorded in the dataset to check the nature of mammographic
masses. The study investigates 6 different classification models: Logistic Regression, Naïve Bayes, Support Vector
Machine, Decision Tree, Random Forest and Artificial Neural Networks. Each model is evaluated on the basis of
confusion matrix, standard metrics of Accuracy, Precision, Recall and F-measure.
The research work was aimed to assess performance of various classification algorithms introduced in recent
years to design a predictive model for breast cancer identification on data obtained from full field digital
mammograms.