A common problem in generating a representative synthetic population is that not all attributes of interest are present in the sample. The purpose is to enrich the synthetic population with additional attributes, after the synthetic population is generated from the original sample.
In many cases, practitioners only have access to aggregated data for important socio-demographic attributes, such as income, education level.
This package treats the problem to enrich an initial synthetic population from an aggregated data provided in the form of a distribution like deciles or quartiles.
flowchart LR
subgraph data_prep [Data preparation]
pop[("Population \n (Eqasim, ..)")] --> read_pop
distribs[("Enrichment source \n (Filosofi, ..)")] --> read_distribs
end
subgraph Assignment
formated_pop(Formated population)
formated_distribs(Formated source)
formated_pop & formated_distribs --> assignment[/Assignment algorithm/]
end
read_pop --> formated_pop
read_distribs --> formated_distribs
assignment --> enriched_pop(Enriched population)
The Bhepop2 package is available on PyPI
pip install bhepop2
See example notebooks here.
The documentation is hosted on ReadtheDocs : https://bhepop2.readthedocs.io/en/latest/
Feedback and contributions to Bhepop2 are very welcome, see CONTRIBUTING.md for more information !
This project is licensed under the CeCILL-B License - see the LICENSE.txt file for details
This package is the product of the joint work of the Gustave Eiffel university and the company Tellae.
It was originally created as an implementation the Bhepop2 (Bayesian Heuristic to Enrich POPulation by EntroPy OPtimization) methodology. See the bhepop2 module doc for more information.