Synthetic Data

Combining the information of various data sources enhances the policy-relevance of existing data. Generating synthetic datasets can make data accessible avoiding confidentiality issues.

The production of a synthetic (virtual) population dataset may be the first step in exploiting DYNAMIS-POP. When the data required to create the starting population are not available from a single source, or when the data cannot be shared or used for that purpose because of confidentiality issues, an option is to generate a synthetic dataset. Synthetic population datasets are created based on one or multiple sources of data. They provide a close representation of the actual population and are anonymized by design. They can integrate variables from multiple sources. Synthetic population datasets also can fix issues in the source data.

The creation of synthetic data and steps for cleaning and fixing data issues can be included into the highly automated work-flow of data preparation. For example, data sets can be corrected for age heaping and the under-reporting of births and young children, both being common data issues in developing countries.