The DHS Program is authorized to distribute, at no cost, unrestricted survey data files for legitimate academic research. Registration is required for access to data.
Guide to Using Datasets
The DHS Program produces many different types of datasets. These types vary by individual survey, but are based upon the types of data collected and the file formats used for dataset distribution. Dataset types are organized into three distribution categories: Survey Data, HIV Test Results, and Geographic data. Survey Data can comprise many different types of data depending on individual survey design. HIV Test Results and Geographic Data are available for most surveys conducted in recent years. It is important to note that there is an additional requirement for GPS/HIV dataset requests.
Survey Data and HIV Data from DHS surveys are generally distributed in recode format, but may occasionally be distributed in raw format. A raw data file includes the data as they were collected, without any structural changes. A recode data file uses a standardized data definition in order to facilitate comparisons across surveys, and is distributed in several different file formats for use with statistical software packages. Geographic data are distributed in a format designed for use with GIS software packages.
On This Page
HIV Data and Other Biomarkers
In order to facilitate the analysis of data, DHS has developed the concept of recode files. Recode files have standard data definitions across countries and across DHS phases. Because of changes in questionnaires between DHS phases, there is a different recode definition for each phase. However, variables that are common across phases keep their names and the names of variables that are removed from a phase are not reused unless reinstated in another phase. Recode definitions are available for the DHS, AIS and MIS surveys. Work is currently under way for a recode definition for SPA surveys. DHS questionnaires allow different units of analysis (i.e., households, household members, women, children etc.) and they are ultimately translated into datasets. The types of datasets generated for each survey vary by survey design; however there are seven common types of recode data files associated with the core questionnaires. The datasets are available in the standard recode file formats in SPSS, SAS, Stata and CSPro; only completed questionnaires are included in these files. View the video below for a quick 60 second overview of DHS dataset types.
Standard Recode Files:
Household Data - Household Recode (HR)
This dataset has one record for each household. It includes household member's roster but no information from the individual women/men questionnaires is present in this file. The unit of analysis (case) in this file is the household.
Household Listing Data - Household Member Recode (PR)
This dataset has one record for every household member. It includes variables like sex, age, education, orphanhood, height and weight measurement, hemoglobin, etc. It also includes the characteristics of the households where the individual lives or was visiting. The unit of analysis (case) in this file is the household member.
Individual Women's Data - Individual Recode (IR)
This dataset has one record for every eligible woman as defined by the household schedule. It contains all the data collected in the women's questionnaire plus some variables from the household. Up to 20 births in the birth history, and up to 6 children under age 5, for whom pregnancy and postnatal care as well as immunization and health data were collected, can be found in this file. The fertility and mortality programs distributed by DHS use this file for data input. The unit of analysis (case) in this file is the woman.
Men's Data - Male Recode (MR)
This dataset has one record for every eligible man as defined by the household schedule. It contains all the data collected in the men's questionnaire plus some variables from the household. The unit of analysis (case) in this file is the man.
Couple's Data - Couple's Recode (CR)
This dataset has one record for every couple. It contains data for married or living together men and woman who both declared to be married (living together) to each other and with completed individual interviews (questionnaires). Essentially the file is the result of linking the two files previously described based on whom they both declared as partners. The unit of analysis (case) in this file is the couple in which both partners were interviewed.
Children's Data - Children's Recode (KR)
This dataset has one record for every child of interviewed women, born in the five years preceding the survey. It contains the information related to the child's pregnancy and postnatal care and immunization and health. The data for the mother of each of these children is included. This file is used to look at child health indicators such as immunization coverage, vitamin A supplementation, and recent occurrences of diarrhea, fever, and cough for young children and treatment of childhood diseases. The unit of analysis (case) in this file is the children of women born in the last 5 years (0-59 months).
Births' data - Birth's Recode (BR)
This dataset has one record for every child ever born to interviewed women. Essentially, it is the full birth history of all women interviewed including its information on pregnancy and postnatal care as well as immunization and health for children born in the last 5 years. Data for the mother of each of these children is also included. This file can be used to calculate health indicators as well as fertility and mortality rates. The unit of analysis (case) in this file is the children ever born of eligible women.
Associated Recode Files:
Additionally, there are a number of files that can be associated to the files previously described but are distributed separately.
Wealth Index data (WI)
This dataset has one record for every household. Wealth Index analysis was introduced to DHS around the end of the 90's. When the decision was made to include the wealth index as part of DHS, standard variables added to the recode definition for both the household and individual questionnaires (HV270 and HV271 for households; V190 and V191 for women; and MV190 and MV191 for men). For surveys conducted prior to the change in the recode file definition a file was created containing the score and the quintile variables. Wealth index files were created for all DHS surveys except surveys carried out as part of the first DHS phase. This file can be linked to any of the files described above.
Height and Weight data (HW)
This dataset has one record for every child measured for height and weight. In 2007 new child growth standards were introduced by WHO; in the past DHS used the NCHS/CDC/WHO reference. After the decision was made to adopt the new WHO standards, standard recode variables HC70 to HC73 and HW70 to HW73 were added to the recode definition to store the standard deviations of the new WHO child growth definition. All files using the DHS V or VI recode structure have these variables. For surveys prior to DHS phase V a file was created containing the new z-scores. In early DHS phases only children of eligible women were measured. Starting with DHS phase III onwards all children under five listed in households interviewed have been measured. This file can be linked to the household members (PR), the children (KR) or the births (BR) files described above if height and weight was taken for children in the households. The file can only be linked to the children (KR) or birth (BR) files when only children of eligible women were measured for early DHS phases.
Raw Data Files:
Other standard types of survey datasets include:
HIV Test data - AIDS Recode (AR)
This dataset has one record for every individual for which blood was drawn for HIV testing. In 2004 DHS began collecting blood for HIV testing but because of the sensitivity of the data instead of merging the results of HIV testing to the individuals a file that is distributed separately was created. This file can be linked to the household members (PR), the women (IR) or men files (MR).
Other Biomarkers data (OB)
This dataset has one record for every individual for which samples were taken for different kinds of biomarkers. This type of file includes test results for health conditions such as syphilis, tuberculosis, hepatitis B, etc. and in general any other tests different from HIV, that requires the data to be anonymous. The same protocol used to request HIV data applies to requests for other biomarkers. This file can be linked to the household members (PR), the women (IR) or the men files (MR).
Other standard types of datasets include:
Geographic information is collected in the DHS and AIS surveys. All survey data are presented both nationally and by sub-national reporting area. These reporting areas are often, but not always, provinces or groups of provinces, and are included in all recoded datasets. It is important to note that there is an additional requirement for GPS dataset requests.
Geographic data (GE)
This dataset has one record for every cluster in which the survey was conducted. This type of file includes the latitude and longitude of the center of the sample cluster.