1.11. Implementation

Dynamic micro-simulation models can be implemented in three ways: from scratch, using multiple purpose programming languages like C++; using (and extending) statistical packages (e.g., R) with or without combining other tools; and using specialized micro-simulation packages. In this report, we advocate and use the latter approach, specifically, Modgen, a generic micro-simulation technology and language developed at Statistics Canada. Compared with other approaches, using Modgen allows considerable efficiency gains, as it avoids re-inventing the wheel (still a common practice in micro-simulation), allows communication between a large and growing global user group, and, as an industrial-strength product, avoids many programming problems and bugs because many components are ready made and well tested. As a compiled language, Modgen is also very fast, so large-scale models can be implemented with millions of actors that can be run on standard PCs or computer networks.

Modgen (Model generator) is a generic micro-simulation programming language that supports the creation, maintenance, and documentation of dynamic micro-simulation models. Virtually all types of dynamic socioeconomic and sociodemographic micro-simulation models can be accommodated, from small and specialized to large and multi-purpose, in continuous or discrete time, with interacting or non-interacting populations. Furthermore, a model developer does not need advanced programming skills to use Modgen, because Modgen hides underlying mechanisms, e.g., event queuing, and creates a stand-alone model-executable program with a complete visual interface and detailed model documentation. Model developers can therefore concentrate on model-specific code: the declaration of parameters, simulated actors, and events. High-efficiency coding, typically requiring only a couple of lines, can be used to program an output table. Tabulation is done in continuous time and includes a mechanism for estimating the Monte Carlo variation for any cell of any table.

Modgen was designed to facilitate micro-simulation model programming, and its purpose is to remove obstacles to micro-simulation model creation. Obstacles that Modgen eliminates include interface programming, documentation, simulation engine programming, and multi-lingualism. Modgen also provides the interface and simulation engine, and facilitates documentation by automatically creating a hypertext model documentation out of the programming code, and the labels and comments documenting the code. Recent developments include the possibility to publish and run models on the web.

Modgen was developed by micro-simulation modelers and grew by accommodating concrete modeler demands in various application fields covering population projection, health models, and large-scale socio-economic models, e.g., pension analysis. Since it can be shared for free, Modgen is used internally at Statistics Canada and by modelers around the globe, including governments, academia, individual researchers, and international organizations like WHO and OECD (for a list of models using Modgen, consult www.statcan.gc.ca/eng/microsimulation). This wide range of users can also extensively test Modgen in a variety of application fields, contributing to its reliability and stability. In this section, we introduce the Modgen technology from the developers’ perspective.

1.11.1. Key Modgen Programming Concepts

This chapter gives some flavor on Modgen programming; more detail is given in the step-by-step implementation guide. A detailed technical developer’s guide is also available for the Modgen language; it is integrated in the Modgen’s help option and available as a pdf at http://www.statcan.gc.ca/microsimulation.

Modgen requires and integrates into the Microsoft Visual Studio C++ package, thus making use of one of the most popular programming interfaces and its powerful debugging tools. Modgen translates Modgen code into C++ code, which is then compiled into a C++ application, combining the strengths of the generic C++ language with specialized micro-simulation language concepts and functions. A model developer does not need to have advanced programming skills to use Modgen, because Modgen hides underlying mechanisms like event queuing and automatically creates a stand-alone model-executable program with a complete visual interface and detailed documentation. Developers can therefore concentrate on model-specific code: the declaration of parameters, simulated actors, states, events, and table output.

1.11.1.1. Parameter Dimensions and Tables

All parameters in Modgen are organized in tables, which can be as simple as a single on/off checkbox or a multidimensional table. Data types can be logical, integer, or real numbers. To define parameter dimensions, Modgen has three key concepts: classifications (for categorical variables), partitions (to split a continuous variable or time into pieces), and ranges (a set of integer values). Parameters are declared in a parameters {}; block.

classification  SEX                                         //EN Sex
{
    FEMALE,                                                 //EN Female
    MALE                                                    //EN Male
};
partition       AGE15       { 15, 30, 45, 60, 75 };         //EN 15 year age groups
range           PROJ_YEARS  { 2015, 2060 };                 //EN Projected years range

parameters
{
    double      EmigrationRate[SEX][AGE15][PROJ_YEARS];    //EN Emigration Rate
};

Note that the code comments, which is the text introduced with //EN (where “EN” is defined as English language; Modgen supports multi-lingual models), are used to automatically label the resulting tables in the user interface. Use the code above to create a parameter table to display on the user interface and make accessible in the program.

../_images/ImpPara.jpg

Figure: The Parameter Coded Above

The parameter values are stored in .dat text files, usually in groups or all parameters together in one or several file(s). The syntax is identical to the programming code, except that values are provided:

double      EmigrationRate[SEX][AGE15][PROJ_YEARS] = { value1, value2,...};

1.11.1.2. Actors, states, functions and event declarations

Actor is an entity whose life is simulated in a Modgen model. There can be different types of actors in a model, and Persons is typically the most important. Actors are characterized by state variables as well as the functions and events that change the states. Some states can be numerical (e.g., age, income) and others are categorical (e.g., gender, marital status). In Modgen, simulation takes place through the execution of events. Each event consists of two functions: a time function to determine the time of the next occurrence of the event, and an implementation function to determine the consequences when the event happens.

Modgen supports three types of state variables:

  • “Simple” states those that are updated in events
  • “Derived” states those that depend on other states and update themselves, e.g., a formula in an Excel table cell
  • “Self-scheduling” states those that update themselves following a pre-specified time schedule

Two state variables, time and age, are supplied and maintained automatically. States and events belong to a specific actor and are declared in an actor block:

actor Person
{
    SEX         sex = { FEMALE };                           //EN Sex
    logical     resident = { TRUE };                        //EN Resident

    int         age_years = self_scheduling_int(age);       //EN Age in full years
    int         age_index = split(age_years, AGE15);        //EN Age group index
    int         simulated_year = self_scheduling_int(time); //EN Calendar year


    event       timeEmigrationEvent, EmigrationEvent;       //EN Emigration Event
};

In the previous example, we define:

  • Sex and resident as simple states, initialized with a value in {} brackets
  • Age_years as a self-scheduling state, creating its own schedule to be updated exactly at the right moment independent of other events in the model
  • Age_index is a derived state, when age_years changes, it is updated when a new age group limit is achieved
  • An event “EmigrationEvent”

1.11.1.3. Event Implementation

Each event consists of two functions, one returning the time of an event, the second accessed if the event happens. Typically, the time of an event is based on piece-wise constant hazard models, with the waiting time assumed to be exponentially distributed for a given hazard rate, which is assumed to be stable over specific pieces or episodes of time. For example, it is assumed that the emigration hazards will stay constant for a given age range and calendar year. When the age group or calendar year changes, Modgen automatically generates a new waiting time based on the new applicable rate. For a given hazard rate, a random waiting time can be calculated as -ln(RandUniform)/hazard.

TIME Person::timeEmigrationEvent()
{
    double dHazard = EmigrationRate[sex][age_index][RANGE_POS(PROJ_YEARS, calendar_year)];

    if ( resident && WITHIN(PROJ_YEARS, calendar_year) && dHazard > 0.0)
    {
        return WAIT(-log(RandUniform(29)) / dHazard );
    }
    else return TIME_INFINITE;
}

void Person::EmigrationEvent()
{
    resident = FALSE;
}

For persons at risk of emigration (i.e., residents within the simulated timeframe and positive hazard rate), a random waiting time to emigration is calculated based on the given rates. Note that Modgen automatically handles an event queue and automatically updates waiting times whenever a state affecting the waiting time changes. The second function, EmigrationEvent(), implements emigration itself: the event sets the simple state “resident” to FALSE.

1.11.1.4. Table output

Modgen contains a powerful tabling language, and even complex tables usually require only a few lines of code. Tables belong to an actor and consist of a name, a label, and one or more table expressions, e.g., formulas for the calculation of output values. Tables can be divided by an unlimited number of dimensions (e.g., output by age and sex) and can contain a filter which selects who should be included in the table and/or when the table should be created.

For example, the following table calculates the simulated emigration hazards by age group, sex, and calendar year. For sufficiently large simulated populations, the simulated rates should come close to the parameter.

The table filter (second line, within [] brackets) selects the timespan for which the table is produced. The table dimensions are projected calendar years, age group, and sex, as in the parameter table that drives the model. Age groups can be built automatically in the table for a given partition using split(). The sign after a table dimension indicates that the table should also calculate totals over all categories of a variable.

table Person SimulatedEmigrationRates                               //EN Simulated Emigration Rates
[ WITHIN(PROJ_YEARS, calendar_year)]
{
    sex+ *
    {
        transitions(resident, TRUE,FALSE) / duration(resident,TRUE) //EN Emigration Rate decimals=4
    }
    * split(age_years, AGE15)                                       //EN Age group
    * proj_years
};

Modgen has a long list of functions used in tables. The functions used here are transitions(), which count the number of specific transitions specified, and duration(), which calculate the time spent in a certain state. With these functions, rates can easily be calculated as the number of events divided by the time at risk.

../_images/ImpTab.jpg

Figure: An Output Table Displaying Simulated Rates

1.11.1.5. Model documentation

Modgen automatically produces a hyperlinked help file containing model documentation based on the model code as well as labels and notes within the code. Variable labels and other information within the code are also used for producing a fully labeled user interface, including labeled output tables (e.g., table name, variable names, and description of variable dimensions).

1.11.1.6. Programming Wizards and Model Templates

In addition to the ample functionality provided automatically, with the underlying code hidden from the developer, Modgen also provides code wizards to generate the essential code of new models, add micro-data input modules, and create new “empty” modules. Code produced by the wizards is visible, but usually does not require any changes by developers, or just minor adaptations, like the specification of variable names in the case of micro-data input files.

Alternatively, developers can start from existing models. The step-by-step creation of the population projection model supports this approach, as the first steps are very generic and modelers can depart from such earlier steps when building models.