| Interpretable machine learning for diabetes risk prediction: a large-scale analysis of Indian national survey data |
| Authors: |
Bhavana Barman, Hari K. Choudhury & Babita Jajodia |
| Source: |
Discover Public Health, Volume 22, article number 832 |
| Topic(s): |
Adult health Data models Diabetes
|
| Country: |
Asia
India
|
| Published: |
DEC 2025 |
| Abstract: |
BACKGROUND: Diabetes is a growing public-health challenge in India, and most Machine Learning (ML) studies use small, clinical datasets with limited interpretability. There remains a gap in applying interpretable ML models to nationally representative data to form policy measures.
OBJECTIVE: To develop and interpret ML models for diabetes risk prediction using NFHS-5 dataset, and to validate model-derived risk factors with a traditional regression approach.
METHODS: The study used tree-based ML models to train NFHS-5 data, and analysed 1,087,006 respondents’ data for diabetes prevalence. Based on the existing literature, various features or factors such as socio-demographic, behavioural, and anthropometric variables are included in the estimated models. Also, systematic hyperparameter tuning was performed for optimization.
RESULTS: Random Forest model performed better in comparison with other alternative models. The SHAP analysis identified age, hypertension, and arm circumference as the major contributors of diabetes prediction. The wealth index and urban residence also contribute significantly to the prediction of diabetes. The estimated logistic regression coefficients and AUC values aligned with the directions and magnitude of the SHAP analysis.
CONCLUSION: Interpretable ML on nationally representative survey data yields transparent risk profiles for diabetes, linking socio-economic and clinical factors. Policy-relevant actions include maintaining the screening age of 30 years and prioritizing older/high-risk adults, integrating diabetes checks into hypertension programs, and using arm circumference as a community triage tool. These findings support scalable and data-driven primary-care strategies in India. |
| Web: |
https://doi.org/10.1186/s12982-025-01236-8 |
|