A Gaussian Naive Bayes and SMOTE-Based Approach for Predicting Breast Cancer Aggressiveness in Imbalanced Datasets
Abstract
Breast cancer remains one of the leading causes of death among women worldwide, making early and accurate detection essential to improving patient outcomes. This study aims to develop a predictive model for breast cancer aggressiveness using the Gaussian Naive Bayes algorithm on the Breast Cancer Wisconsin Diagnostic Dataset. The dataset contains 569 instances with 30 numerical features representing various cell characteristics. Preprocessing steps included data cleaning, label encoding, and Min-Max normalization. The model was evaluated using accuracy, precision, recall, F1-score, and a confusion matrix. Initially, the model achieved an accuracy of 78.88%; however, the recall for malignant cases was relatively low at 45.5%, highlighting a critical limitation in detecting aggressive cancer. To address class imbalance and improve model sensitivity, the Synthetic Minority Oversampling Technique (SMOTE) was applied. While detailed post-SMOTE metrics were not reported in this version, the approach is expected to enhance recall and F1-score for the malignant class. This research demonstrates the potential of Gaussian Naive Bayes, combined with data balancing techniques, as a fast and interpretable tool for early breast cancer diagnosis. Future work will focus on model comparison, cross-validation, and statistical evaluation to improve robustness and reliability.
Article Metrics
Abstract: 12 Viewers PDF: 5 ViewersKeywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
IJIIS: International Journal of Informatics and Information Systems
ISSN | : | 2579-7069 (Online) |
Organized by | : | Departement of Information System, Universitas Amikom Purwokerto, Indonesia; Faculty of Computing and Information Science, Ain Shams University, Cairo, Egypt |
Website | : | www.ijiis.org |
: | husniteja@uinjkt.ac.id (publication issues) | |
taqwa@amikompurwokerto.ac.id (managing editor) | ||
contact@ijiis.org (technical & paper handling issues) |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0