As part of a continuous effort to maintain and improve the quality of data in DHS surveys, this report examines whether variation in 25 indicators of data quality, in 15 recent DHS surveys, can be attributed to interviewers and their characteristics. The analysis is based on interviewer ID codes that appear at several points in DHS data files, and information about the interviewers obtained in a Fieldworker Survey that is now a standard component of all DHS surveys. All of the data files are publicly available.
The 25 indicators are in three broad categories: nonresponse and refusals; reported age at death of young children; and ages and dates. The third group includes five subgroups or domains: incompleteness of age, which usually takes the form of a missing month of birth; inconsistency between age in the household survey and age in the individual surveys of women or men; heaping on ages that end in 0 or 5; displacement of age across boundaries for eligibility; and a new indirect indicator of over-dispersion of children’s age derived from flagging of the height-for-age and weight-for-age scores. All indicators are defined at the level of the individual, with outcome “1” for a problematic or potentially problematic response, and otherwise either “0” or “Not Applicable”. Because the outcomes are binary, they can be easily analyzed with logit regression and related versions of generalized linear models. Combinations of indicators and surveys are judged to be problematic if the level or prevalence of the outcome “1” is relatively far from an acceptable level and there is highly significant variation in the outcome across interviewers. Many such combinations are identified, with systematic in-depth investigation of several examples. It is found that when there is a high degree of variation across interviewers, in terms of a data quality indicator, the bulk of that variation can often be traced to a handful of interviewers on the same team or on different teams.
To investigate the potential effect of the covariates in the Fieldworker Survey, similar indicators are pooled and all the surveys are pooled. There are exceptions, but it is generally found that interviewers who are older and better educated have lower levels of problematic outcomes. Prior experience with a DHS survey or with other surveys is often statistically significant, and often—but not always—in the direction of better quality data. There is concern when previous experience may lead to worse, rather than better, data.
The most important limitation is that interviewer assignments are almost always to just one or two geographic regions within a country, and the quality of the data they collect is confounded with potentially relevant characteristics of the regions and the composition of potentially relevant characteristics of the respondents. For example, the respondents’ level of education is associated with the accuracy of their stated age, and interviewers assigned to a region with a low level of education cannot be expected to obtain the same quality of responses as interviewers who are assigned to other regions.
Further analysis is planned that will include characteristics of the respondents along with those of the interviewer, and possible statistical interactions that reflect the social distance between interviewers and respondents. The methods and findings of this study are relevant to ongoing efforts to improve the training of interviewers and the monitoring of fieldwork.