1.4. The Dynamic Micro-simulation Approach for Population Projection (and Beyond)

1.4.1. What is Dynamic Microsimulation?

Micro-simulation in the context of socioeconomic applications can be perceived as an experiment with a virtual society of thousands—or millions—of individuals. Micro-simulation models can be static or dynamic. Central to dynamic micro-simulation is the explicit modeling of the time dimension, following people and their families or households over time, rather than performing a before-and-after comparison. Dynamic modeling lends itself naturally to the modeling of policies with a longitudinal component, e.g., educational investments, especially in the context of general rapid social, economic, and demographic change that make it difficult to assess the contribution of individual policies to overall trends without tracking and comparing the lives of individuals who form a society.

1.4.1.1. Advantages

Micro-simulation is attractive both from a theoretical and a practical point of view, as it supports research embedded into modern paradigms, such as the life-course perspective, while simultaneously providing a tool for what-if analysis of high policy relevance. Typical application areas include tax-benefit analysis, analysis of pension system adequacy and sustainability, and health and health insurance. One recent development is using micro-simulation in demographic projections. Although this has been discussed in literature for two decades (e.g., Imhoff & Post 1998), larger-scale implementations are a recent development. Statistics Canada was the first statistical office to produce official population projections using micro-simulation. Called Demosim, this model is implemented using the micro-simulation programming technology Modgen, which Statistics Canada developed. Modgen is freely available and shared worldwide. Variants of Demosim are currently being developed for several European countries as well as Australia (Marois et.al., 2017).

In social sciences and economics, there is an increasing emphasis on processes rather than static structures. This is where dynamic micro-simulation is most effective, as it can simultaneously deal with distributional and dynamic issues, e.g., demographic change, and with a longitudinal dimension to distributional analysis. From a policy perspective, there is increasing emphasis on processes rather than static structures; it also reflects the modern way of assessing poverty as a multi-dimensional and dynamic phenomena to be addressed by policies that go beyond static redistribution of resources, enable people to leave this state, and reduce poverty risks permanently. Micro-simulation can improve understanding of the complex dynamics resulting from many simultaneous processes, which are often studied only in isolation. In this context, dynamic micro-simulation is a key component in the evolution from analysis to synthesis (Willekens 2001) and can integrate analyses of single processes into computer simulations of societies. Such models can then be used for what-if analysis and assess how the future is shaped by today‘s decisions, policies, and actions.

In principle, whenever a system is made up of small-scale units, micro-simulation is a possible simulation approach. Micro-simulation has the potential to be especially powerful addressing population heterogeneity, the aggregation of behavioral relations, and individual histories (Spielauer 2011).

1.4.1.1.1. Population heterogeneity

Micro-simulation is the preferred modeling choice when population heterogeneity matters and there are too many possible combinations of characteristics to split the population into a manageable number of groups.

Using a large sample of members to represent a population is an intuitive way to capture the population’s diversity and allows for distributional analysis of the effects of changes and reforms. Creating such a representation in a database of individuals is a typical step in micro-simulation development. Frequently, it will combine and integrate data from various sources, which ultimately makes data more relevant for policy analysis. It also allows policies to be targeted to very specific segments of the population.

From a longitudinal perspective, micro-simulation can capture the variety and heterogeneity in life-course experiences and careers, adding a whole new dimension in distributional analysis with the use of lifetime measures and capture of distributional impacts of policies over the individual life-cycle and between cohorts and generations. Besides its typical use for distributional analysis, micro-simulation can also track different ethnicities or minorities, which often display persistent behavioral differences over time, such as demographic behaviors. Individual modeling can improve the accuracy of projections. Besides persistence in differences, another frequently observed or theorized phenomenon (e.g., in modernization theory) is that behavioral changes are adapted by different groups at different times, and that certain sectors of society lead this development and are followed by others. The ability to model such processes can improve the theoretical foundation of projections.

1.4.1.1.2. The problem of aggregation

Micro-simulation is an adequate modeling choice if behaviors are complex at the macro level but better understood at the micro level.

Unlike macro models, micro-simulation does not require behaviors to be aggregated, but aggregates the outcomes of individual behaviors. It is not bound by restrictive assumptions necessary for representing society by a representative agent or small group of agents. From a static accounting perspective, tax and social security regulations tie rules in a non-linear way to individual and family characteristics, impeding the aggregation of their operations. To calculate total tax revenues or costs of means-tested policies, we need to know composition of the population by income (progressive taxes), family characteristics (dependent children and spouses) and all other characteristics that affect the calculation of individual liability or eligibility.

In dynamic systems, many behaviors are modeled much more easily at the micro level, as this is where decisions are made. In many cases, behaviors are also more stable at the micro level, where there is no interference from composition effects. Even complete stability at the micro level does not automatically correspond to stability at the macro level. For example, educational attainments of the current school age population might be stable for given geographical, ethnic, and parental characteristics, but the composition of the population changes over time and may be further affected by migration and other population changes.

Based on (and producing) micro-data, micro-simulation allows flexible aggregation, as the information may be cross-tabulated in any form. As aggregation schemes do not have to be determined a priori, micro-simulation can develop and apply a broad range of output measures. This directly benefits the measurement of complex issues like income adequacy at old age, poverty as a very multidimensional phenomenon, or a wide variety of measures developed in literature.

1.4.1.1.3. Individual and linked histories

Dynamic micro-simulation is the only modeling choice if individual histories matter, i.e., when processes have memory.

Individual histories can become important factors in policy analysis, as they influence behaviors (e.g., a cash transfer may enable families to send their children to school, which generates an education history impacting their lives in many dimensions); affect risks (e.g., mortality by smoking histories); and are the foundations of many accounting issues (e.g., pensions that depend on individual contribution histories). Keeping memory allows measures like durations in states (e.g., healthy life, time worked, time spent in care institutions) and tallies of experiences (e.g. visits to hospitals, death of a child).

Keeping individual histories of income, taxes, and benefits can be useful, specifically for cost-benefit analysis, as they can distinguish between private and social return on investments such as education. As micro-simulation can link actors to families or households, it can extend histories over generations and help to better assess policies with long-term downstream effects. For example, enabling a person to attain higher education will not only lead to higher individual wages, it could affect expected tax payments and benefits received over the life-course, as well as family formation, number of children, and child mortality, education, and poverty risks, potentially stopping the inter-generational transmission of poverty.

1.4.1.2. Drawbacks

Limitations and drawbacks of micro-simulation can be classified in two categories: those that are intrinsic to all modeling and efforts to make statements about the future, especially the trade-off between detail and prediction power; and those that are transitory, as they can be expected to keep decreasing over time, such as costs of hardware, technical requirements, and data availability and quality issues.

1.4.1.2.1. Detail versus prediction power

The central limitation of micro-simulation is that the degree of model detail does not go hand-in-hand with overall prediction power. Providing more detailed models, something at which micro-simulation excels, does not necessarily mean the models are “better.” The ability to produce distributions comes at the price of losing predictive power in projecting means, and the ability to make very accurate statements in the short run does not necessarily lead to models that are useful for long-term projections. An analogy is weather forecasts: detailed models for the weather tomorrow, on a geographical scale, will not be of use for the projection of global climate changes over the next centuries. This also applies to socioeconomic models. The longer the time horizon and the more important the mean, the more the focus should be directed to the main driving forces and a solid theoretical foundation of these mechanisms. The reason for this can be found in what is called randomness, caused by accumulated errors and biases of variable values (for a discussion of randomness in micro-simulation, see Imhoff & Post 1998). In static models, this primarily involves the population database, which typically has to be constructed by combining information from various, and not necessarily consistent, data sources. In dynamic models, randomness is further increased by the stochastic nature of micro-simulation models and the fact that all right-hand variables used in equations for future behaviors have to be simulated as well.

The randomness resulting from the stochastic nature of dynamic micro-simulation is called Monte Carlo variability. Micro-simulation produces not expected values, but random variables distributed around the expected values. Every simulation experiment will produce different aggregate results. While this was cumbersome in the past, when computer capabilities were limited, many repeat experiments and/or the simulation of large populations can reduce this randomness and deliver valuable information on the distribution of results and point estimates.

A more fundamental problem lies in the trade-off between the additional randomness introduced by additional variables and misspecification errors caused by models that are too simplified. This means that the large number of variables that models can include, which is the feature that makes micro-simulation especially attractive, comes at the price of randomness and a decrease in prediction power that occurs as the number of variables increases. Modelers should be aware that this generates a trade-off between good aggregate predictions versus a good prediction regarding distributional issues in the long run. This trade-off is not specific to micro-simulation, but as micro-simulation is frequently employed for detailed projections, the scope for randomness becomes accordingly large.

There are basically two ways of dealing with this trade-off. The first is to keep models simple. The second is to combine the strengths of different modeling approaches. Not surprisingly, in many large-scale micro-simulation models, some outcomes are aligned or calibrated towards aggregated numbers or projections obtained by external means. For example, micro-simulation models may be powerful in modeling the effects of unemployment on individual lives, but will typically use aggregate unemployment rates stemming from other projections or scenarios. Technically, micro-simulation models can be separate, using results from macro models or projection scenarios as input parameters, or they can be linked to macro models, allowing feedback in both directions. Literature is particularly full of examples of linking static micro-simulation models with computable general equilibrium (CGE) models.

The effort to keep models simple often leads to macro models bypassing micro-simulation as a modeling strategy, the choice often justified with the higher development costs of micro-simulation. This choice ignores the fact that micro-simulation can often reproduce the results of macro models if needed (and at comparable costs), while also allowing for step-wise refinements and removal of simplifying assumptions inherent to macro models. This is best illustrated by population projection models, which are to date almost exclusively based on the cohort-component method limited to very few variables—a number probably too small to be justified from a theoretical point of view.

1.4.1.2.2. Transitory limitations

An often-stated drawback of micro-simulation is that such models have high data demands and costs typically involve acquiring and compiling such data. It can be noted, however, that such costs are not explicit costs associated with the micro-simulation itself, but represent the price to be paid for research in general, and informed policy making in particular. Recent advances in data availability in its various forms, from administrative data sets being made more accessible for researching internationally standardized survey data could turn this argument around: micro-simulation can make available data more policy relevant, as it complements traditional data analysis and combines such analysis with a what-if projection tool. In the case of population projections, required data are readily available for many countries and the model can be very generic as the input and output (i.e., requirements and purpose) of the model are very much the same across countries.

Historically, micro-simulation models require large investments with respect to both manpower and hardware. These costs can be expected to decrease over time, however, as hardware prices fall and more powerful and efficient computer languages become available. Development costs can be dramatically cut as technologies become available that do not require that models be built and programmed from scratch. Dramatic efficiency gains have been demonstrated by advances in programming technologies. Modgen, which was used for the application developed in this study, has made model implementation a straightforward process for the next generation of social scientists in the same way that the use of statistical software has put a series of statistical analysis into the toolbox of social scientists.

1.4.1.3. Types of dynamic micro-simulation models

Micro-simulation models come in many types and flavors. In scope and complexity, they range from models that address specific research questions to multi-purpose models covering a multitude of life-course domains, e.g., education, work, family life, income, saving, retirement, health, and eventually death, together with detailed accounting routines depicting tax-benefit systems and social insurance. While some research questions require complex models—e.g., pension analysis, which requires knowledge of detailed individual life-courses—other applications specialize in specific behaviors. All models have a demographic core, which itself can be designed as a specialized application for population projections as well as a foundation for applications added step-wise in a modular way. The latter is the development strategy used here.

A second distinction concerns the timeframe: dynamic models can operate in discrete time, like years, or in continuous time, allowing events to happen at any moment of time. Discrete time models are the more conventional approach, but it comes with serious drawbacks: when updating states on a yearly basis, information on when, in which order, and how often events happened gets lost. For example, a person may have experienced various episodes of unemployment during a year, but may be employed at the captured time points. We chose a continuous timeframe for our application, which can be implemented very efficiently using Modgen. It is the more flexible approach; developers choose how to model behaviors and when to update states. While some behaviors will be modeled in continuous time, other updates and calculations can still be made in yearly steps.

A third distinction concerns the model’s execution: one person or group of persons (e.g., families) at a time, or the whole population at once. The first approach is called case-based. It allows easy parallelization of the model’s execution and simulation of huge populations, because the whole population does not have to be kept in memory at each moment of time, as it does in time-based models. In contrast, time-based models allow modeling interactions between all actors and not just within a case. For example, persons can search for spouses within the population. Time-based models also allow aggregation on the fly, which is useful for policies that depend on outcomes (e.g., adjustment of tax rates for balancing books) or if one wishes to align aggregated outputs to given targets (e.g., adjustment of fertility risks to produce a target number of births). Modgen supports both approaches and, when starting as a case-based approach, can easily switch to a time-based approach. This is demonstrated in the application development in this report.

1.4.2. Population Projections by Micro-simulation

Most countries and international agencies produce population projections using the simple cohort-component method. For example, the widely used model DemProj (Stover & Kirmeyer 2001) builds the starting point of the micro-simulation model developed in this report, and it requires:

  • A base population table by age and sex for the base year
  • Fertility data, including the total fertility rate for the base and future year, and age distribution of women at birth
  • A model life table and life expectancy at birth for the base year and assumptions for the future
  • Assumptions on net international migration rates

Micro-simulation is a powerful alternative to this approach, as it can overcome technical limitations of macro models. Micro-simulation can not only fully replace the cohort-component method, producing identical results, but also can accommodate step-wise model extensions for added detail. Most importantly, micro-simulation can handle more variables, has no restriction on variable types, possesses memory of individual histories, and allows communication and linkage between people. This makes population projections more useful as they project more characteristics, and can also improve the overall quality of projections. An example is the incorporation of known and very persistent differences in demographic behaviors by specific population groups (e.g., by ethnic affiliation, religion, or income or education level).

A micro-simulation model reproducing a cohort-component model can start off from the same distributional table of the current population, with each person from its starting population sampling its initial characteristics from the distribution table. But as characteristics are added, the feasibility of this approach is quickly reduced, as the number of cells in the table grows exponentially. Micro-simulation overcomes this problem, by reading in a micro-population file (micro-data) as its starting population and allowing it to evolve over time by generating events such as births, marriages, deaths, and migrations.

For the starting population file, one would ideally use the latest population census micro-data or a sample of it (e.g., the five or 10 percent subset that many countries make available for public use). An alternative is to generate a synthetic population, which is a virtual population that resembles very closely the actual population without containing real records, thereby avoiding confidentiality issues.

For fertility, mortality, and migration, the micro-simulation approach also needs data and assumptions. The same information used for the cohort-based approach is sufficient, but we can also include key determinants for demographic events, like education, parity, or geographical context. This would typically use proportional factors, such as relative risks derived in Cox or proportional hazard models, or odds ratios from logistic regression. Required data are typically available in countries that conduct censuses and demographic household surveys like MICS or DHS.

This adds complexity, but it comes with major advantages:

  • Parameters can specify population groups
  • The use of available information is maximized
  • Micro-data allows all kinds of disaggregation, including for small populations, e.g., by ethic affiliation
  • Can test variance because it has a random component
  • Simulates the impact of determinants by running different scenarios
  • Can plug additional modules in the model (e.g., specific diseases)

A leading application example is Canada’s Demosim model, which projects the Canadian society, including variables like visible minority, Aboriginal identity, education, labor force participation, and a fine-grained geography (Caron-Malenfant & Coulombe 2015). Such projections can use the knowledge of behavioral differences typically found between ethnic groups and separate behavioral changes from composition effects. The added detail of such projections can provide valuable inputs for planning purposes. For example, geographical detail allows for planning of schools and health institutions on a regional level, or for projection of specific population groups, such as Aboriginal peoples. Demosim was also used to assess the impact of potential educational improvements on the future labor force participation of the Aboriginal population (Spielauer 2014). This application provides a simple but powerful example of using micro-simulation for what-if analysis, assessing the impact of policy-induced changes in one behavior (i.e., educational choices) on the outcome of another (i.e., labor force participation); the resulting changes in size and educational composition of the Aboriginal labor force; and the timeline of these changes. Models inspired by Demosim are currently developed for a series of developed countries (Marois et.al 2017). For European countries, the use of micro-simulation for European Union-wide standardized models was also explored and demonstrated in the MicMac project (NIDI 2009).

Micro-simulation’s ability to produce detailed population projections is also expected to be of high relevance for applications in developing countries, as they typically experience fast, intertwined demographic and social changes. Demographic behaviors and events (e.g., fertility, child mortality) are often closely linked to policy interventions, e.g., access to higher education or the provision of health care, especially in the developing world.

Although they are the backbone of most dynamic micro-simulation models, demographic modules typically are only one component of micro-simulation models. Models can be very specialized (e.g., modeling specific health trajectories or a specific population group) or, following a modular approach, grow into “multi-purpose” models representing society in many aspects and thus providing a tool for policy-relevant analysis and projections in a wide range of domains.