Improving Performance for Diabetic Nephropathy Detection Using Adaptive Synthetic Sampling Data in Ensemble Method of Machine Learning Algorithms

Lailil Muflikhah, Fitra A. Bachtiar, Dian Eka Ratnawati, Riski Darmawan

Abstract

Nephropathy is a severe diabetic complication affecting the kidneys that presents a substantial risk to patients. It often progresses to renal failure and other critical health issues. Early and accurate prediction of nephropathy is paramount for effective intervention, patient well-being, and healthcare resource optimization. This research used medical records from 500 datasets of diabetic patients with imbalanced classes. The main goal of this study is to get high-performance predictive models for nephropathy. So, this study suggests a new way to deal with the common problem of having too little or too much data when trying to predict nephropathy: adding more data through adaptive synthetic sampling (ADASYN). This technique is particularly pertinent in ensemble machine-learning methods like Random Forest, AdaBoost, and bagging (Adabag). By increasing the number of instances of minority classes, it tries to reduce the bias that comes with imbalanced datasets, which should lead to more accurate and strong predictive models in the long run. The experimental results show an improving 4% rise in performance evaluation such as precision, recall, accuracy, and f1-score, especially for the ensemble methods. Two contributions of this research are highlighted here: first, the utilization of adaptive synthetic sampling data to improve the balance and diversity of the training dataset. The second contribution is incorporating ensemble methods within machine learning algorithms to enhance the accuracy and robustness of diabetic nephropathy detection.

Keywords

Nephropathy; Oversampling; Adasyn; Bagging; Boosting; Machine learning

Full Text:

PDF

DOI: http://dx.doi.org/10.26555/jiteki.v10i1.28107