1.6. Data Requirements

Micro-simulation is typically associated with high data requirements. This view stems from the predominant use of micro-simulation for modeling highly complex systems, e.g., the operations of social insurance systems in the context of social and demographic change. In contrast to such models, which must depict individual life-courses in great detail and include educational choices, employment, earnings, family dynamics, savings, health, and retirement decisions, the data requirements for population projection models are very modest and, for most countries, the necessary data are readily available.

DYNAMIS-POP requires two types of data:

  • Population projection data as used in macro-projection models (and available online for most countries). This includes age-specific fertility patterns, projected total fertility rates, a standard life table, and projected life expectancies by period and sex.
  • Four micro-data files for parameter estimations and the creation of the starting population. These files can typically be created from population censuses and a household survey like UNICEF’s Multiple Indicators Cluster Surveys (MICS) or USAID/ICF Macro’s Demographic and Health Surveys (DHS).

A file of current residents, typically compiled from a population census dataset. The production of partially or fully synthetic data as input to the model is an option worth considering. The approach (for which a specialized R package – simPop – and which we document is a separate report) offers the advantage of producing anonymous datasets, and of offering a solution to integrate data from multiple sources and to address quality issues in existing data.

M_ID         Person ID (0,1,..)
M_HHID       Household ID (0,1,..)
M_WEIGHT     Sample weight (123.456)
M_AGE        Age (in years, 16.789)
M_MALE       Sex (female 0, male 1)
M_DOB        District of birth (0..m, m = abroad)
M_DOR        District of residence (0..n)
M_PDIST      District 12 months ago (0..m, m = abroad)
M_EDUC       Primary education (0 non, 1 incomplete, 2 completed)
M_PARITY     Parity (0, 1..)
M_BIR12      Number of births in the past 12 months (0, 1, 2)
M_AGEMAR     Age at first marriage (in years, 16.789, 999 never married)
M_AGEBIR     Age at most recent birth (in years, 16.789, 999 childless)
M_ROB        Region of birth (0..b, b = abroad)
M_ROR        Region of residence (0..a)
M_PREG       Region 12 months ago (0..b, b = abroad)
M_ETHNO      Ethnicity (0..y)

A file of recent emigrants (people who emigrated in the past 12 months. As a proxy, this file is typically compiled from census information from household members living abroad)

M_WEIGHT     Sample weight (123.456)
M_PDIST      District of residence 12 months ago (0..n)
M_PREG       Region of residence 12 months ago (0..x)
M_AGE        Age (in years 18.901)
M_MALE       Sex (0 female, 1 male)

A file of all child history records - births, deaths, vaccination - reported by women. This information is available in MICS as well as in DHS surveys:

M_WEIGHT     Record Weight
      M_INTERV     Date of interview (months since 1900)
M_REGION     Region (0,1..)
M_BIRTH      Date of birth (months since 1900)
M_DEATH      Date of death (months since 1900)
M_MALE       Male (0/1)
M_AGEMO      Mothers age at birth of child (months)
M_EDUCMO     Primary education of mother: (0 non / 1 some / 2 graduate)
M_ETHNO      Ethnicity (0,1..)
M_VACC       Child is vaccinated (0/1 one year old only; 999 others)
M_PCARE      Mother received prenatal care (0/1 one year old only; 999 others)

A file of women recording all birth events. This information is available in MICS as well as in DHS surveys:

M_B01        Month of 1st birth (number of months since 1900; 9999 for non)
...
M_B14        Month of 14th birth (number of months since 1900)
M_WEIGHT     Sample weight (123.456)
M_BIRTH      Birth (number of months since 1900)
M_EDUC       Primary education (0 none, 1 incomplete, 2 completed)
M_REG        Region of residence (0..n)
M_INTERV     Month of interview (number of months since 1900)
M_MAR        Month of first marriage (number of months since 1900; 9999 never married)

Population projection data can be directly copied into the according model parameter tables or be produced by provided analysis R scripts based on csv files.

Notes on variable construction:

  • The terms “region” and “district” must be understood as “geographic area at first level” and “geographic area at second level”. This could be Region and District in some countries, State and Provinces in others, etc.
  • The codes of regions and districts must be consistent over time (e.g., the codes for district of birth or previous residence must be fully compatible with the codes used for the current residence). If the administrative divisions of the country have changed over time, this must be addressed in the phase of data preparation,
  • M_WEIGHT represents the sample weight or “weighting coefficient”. The value will be 1 for all observations in case where the data file is an exhaustive census. The sample weights can be calculated and calibrated based on published population tables. All weights must be strictly positive (but do not have to be integers).
  • The M_AGE, M_AGEMAR, M_AGEBIR variables correspond to exact age in years. In most survey and census datasets, the information will be provided as age in completed years. In such case, to obtain an “exact” age, a random value comprised between 0 and 1 should be added to the completed age value. A good option is to add a 3-decimal value.
  • All variables describing a “Time of …” or “Month of…” represent the number of months between January 1, 1900 and the of occurrence of the event.
  • The default file format of the data files is comma-separated value (.CSV) text file. If data are provided in a different format (e.g. Stata .dat or R Rdat) the country-specific setup-script for data analysis has to be adapted.

PopProjPara

Figure: Population projection parameters