DEVELOPMENT OF NOVEL DATA MINING ALGORITHM FOR THE PREDICTION OF RECURRENCE AND SURVIVABILITY OF BREAST CANCER PATIENTS.
Abstract
Globally, breast cancer is currently the most common cancer, accounting for one-eighth of all
new annual cancer cases, and it is one of the leading causes of cancer-related death in women,
second only to lung cancer. The prediction of the recurrence and the survivability of breast
cancer patients is important as it will assist patients in knowing about the recurrence and
survivability pattern, and thereby encourage them to visit doctors promptly, so more lives can be
saved. This study developed an ensemble learning model, ANN-SVM, that can predict breast
cancer patients' recurrence and survivability. A total of 2,469 patients with breast cancer dataset
were obtained from Barau-Dikko Teaching Hospital (BDTH), Kaduna, Cancer Registry
Department. The results showed that the conventional Machine learning (ML) models- Support
Vector Machine (SVM), Artificial Neural Network (ANN), K-Nearest Neighbour (KNN), and
the proposed model- ANN-SVM could predict the recurrence of breast cancer respectively with
82.29%, 94.84%, 90.49%, and 95.65% accuracy, also they could predict survivability of breast
cancer patients respectively with 63.29%, 90.46%, 81.93%, and 91.47% accuracy in the tested
dataset. The ANN-SVM model outperformed the conventional ML models regarding recurrence
and survival prediction of breast cancer patients. In this study, family history and chemotherapy,
respectively, turned out to be the most important features for recurrence and survivability of
breast cancer patients. The outstanding performance of the proposed model in terms of precision,
recall and F1 score highlights the model's effectiveness in accurately predicting both “yes” and
“no” for recurrence prediction and both “alive” and “dead” for survivability prediction. Both
conventional ML models and the proposed ensemble learning model predict the recurrence of
breast cancer and the survivability of breast cancer patients with high accuracy.