Early Detection of Diabetes Using a Machine Learning Model Based on Laboratory Data

Arief Rahman Hakim, Yuni Franciska Br Tarigan, Khairul Fadhli Margolang

Abstract


Diabetes mellitus is a chronic disease whose prevalence continues to increase worldwide, with a projected number of sufferers reaching 643 million by 2030. Early detection of diabetes is crucial to prevent serious complications such as cardiovascular disease, kidney failure, and nerve damage. This study aims to compare the performance of four machine learning algorithms (Random Forest, Support Vector Machine, Logistic Regression, and K-Nearest Neighbors) in detecting diabetes based on clinical parameters, and to identify the most significant predictor variables. The study uses the Pima Indians Diabetes dataset consisting of 768 samples with 8 predictor variables (number of pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function, and age). Data is divided into a training set (70%) and a testing set (30%) using stratified sampling. Data preprocessing includes handling missing values, feature scaling using StandardScaler, and handling imbalanced data using the SMOTE technique. Performance evaluation uses accuracy, precision, recall, F1-score, and Area Under Curve (AUC-ROC) metrics. Results show that the Random Forest model achieves the best performance with an accuracy of 81.8%, precision of 79.2%, recall of 78.5%, F1-score of 78.8%, and AUC of 0.88. Support Vector Machine achieves an accuracy of 78.0%, Logistic Regression 76.0%, and K-Nearest Neighbors 74.5%. Feature importance analysis identifies glucose (28.5%), BMI (19.8%), and age (16.5%) as the most significant predictors in diabetes detection. The Random Forest model produces 17 false negatives and 12 false positives from 231 testing samples. The study concludes that Random Forest is the most effective algorithm for early diabetes detection with good accuracy and superior interpretability through feature importance.

Keywords


Diabetes Mellitus; Machine Learning; Random Forest; Support Vector Machine Early Detection; Pima Indians; Diabetes Dataset; Feature Importance.

Full Text:

PDF

References


B. Feng, W. Saaveethya, S. King, H. Lim, and F. H. Juwono, “Diabetes detection based on machine learning and deep learning approaches,” pp. 24153–24185, 2024, doi: 10.1007/s11042-023-16407-5.

A. Rahman, L. F. Abdulrazak, M. Ali, I. Mahmud, K. Ahmed, and F. M. Bui, “Machine Learning-Based Approach for Predicting Diabetes Employing Socio-Demographic Characteristics,” pp. 1–15, 2023.

O. Iparraguirre-villanueva, K. Espinola-linares, R. Ornella, F. Castañeda, and M. Cabanillas-carbonell, “Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes,” 2023.

S. Gowthami, R. V. Siva, and M. Riyaz, “Measurement : Sensors Exploring the effectiveness of machine learning algorithms for early detection of Type-2 Diabetes Mellitus,” Meas. Sensors, vol. 31, no. June 2023, p. 100983, 2024, doi: 10.1016/j.measen.2023.100983.

R. Hasan, V. Dattana, and S. Mahmood, “Towards Transparent Diabetes Prediction : Combining AutoML and Explainable AI for Improved Clinical Insights,” 2025.

F. Mohsen, H. R. H. Al-absi, and N. El Hajj, “OPEN A scoping review of arti fi cial intelligence-based methods for diabetes risk prediction,” pp. 1–15, doi: 10.1038/s41746-023-00933-5.

M. Kiran, Y. Xie, N. Anjum, and G. Ball, “Machine learning and arti fi cial intelligence in type 2 diabetes prediction : a comprehensive 33-year bibliometric and literature analysis,” no. March, 2025, doi: 10.3389/fdgth.2025.1557467.

C. N. Noviyanti, “Journal of Information System Early Detection of Diabetes Using Random Forest Algorithm,” vol. 2, no. 1, pp. 41–48, 2024.

W. Li, Y. P. Id, and K. Peng, “Diabetes prediction model based on GA- XGBoost and stacking ensemble algorithm,” pp. 1–29, 2024, doi: 10.1371/journal.pone.0311222.




DOI: https://doi.org/10.30743/infotekjar.v10i1.12810

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Arief Rahman Hakim, Yuni Franciska Br Tarigan, Khairul Fadhli Margolang

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.