|Impact of ignoring sampling design in the prediction of binary health outcomes through logistic regression: evidence from Malawi demographic and health survey under-five mortality data; 2000-2016
|Tsirizani M. Kaombe and Gracious A. Hamuza
|BMC Public Health, Volume 23, Article number 1674; DOI: https://doi.org/10.1186/s12889-023-16544-4
|The birth and death rates of a population are among the crucial vital statistics for socio-economic policy planning in any country. Since the under-five mortality rate is one of the indicators for monitoring the health of a population, it requires regular and accurate estimation. The national demographic and health survey data, that are readily available to the puplic, have become a means for answering most health-related questions among African populations, using relevant statistical methods. However, many of such applications tend to ignore survey design effect in the estimations, despite the availability of statistical tools that support the analyses. Little is known about the amount of inaccurate information that is generated when predicting under-five mortality rates. This study estimates and compares the bias encountered when applying unweighted and weighted logistic regression methods to predict under-five mortality rate in Malawi using nationwide survey data. The Malawi demographic and health survey data of 2004, 2010, and 2015-16 were used to determine the bias. The analyses were carried out in R software version 3.6.3 and Stata version 12.0. A logistic regression model that included various bio- and socio-demographic factors concerning the child, mother and households was used to estimate the under-five mortality rate. The results showed that accuracy of predicting the national under-five mortality rate hinges on cluster-weighting of the overall predicted probability of child-deaths, regardless of whether the model was weighted or not. Weighting the model caused small positive and negative changes in various fixed-effect estimates, which diffused the result of weighting in the fitted probabilities of deaths. In turn, there was no difference between the overall predicted mortality rate obtained using the weighted model and that obtained in the unweighted model. We recommend considering survey cluster-weights during the computation of overall predicted probability of events for a binary health outcome. This can be done without worrying about the weights during model fitting, whose aim is prediction of the population parameter.