IDENTIFYING KEY PREDICTORS OF GESTATIONAL DIABETES MELLITUS USING PENALIZED LOGISTIC REGRESSION
Abstract
Gestational diabetes mellitus (GDM) remains a major public health concern in Nigeria,
with rising prevalence and substantial maternal and neonatal complications. Several studies
have identified potential risk factors; however, most were limited to descriptive statistics
and bivariate analyses, offering little insight into variable selection in settings with many
correlated predictors. This study applies penalized logistic regression to identify key
determinants of GDM symptoms using a real-world antenatal dataset from Nwose et al.
(2023). After extensive data cleaning, which resulted in 17 complete cases and 17
predictors. Both ridge and lasso logistic regression models were fitted to address
multicollinearity and prevent overfitting. Model comparison using Akaike Information
Criterion (AIC) indicated that the lasso model (AIC = 12.16) outperformed the ridge model
(AIC = 34). Lasso penalization further enabled variable selection, identifying gestational
week, family history of type 2 diabetes mellitus, and polycystic ovary syndrome as the
most influential predictors of GDM symptoms. The results highlighted the importance of
familial metabolic risk and reproductive health factors in GDM screening.