pystatis is a Python wrapper for the GENESIS web service interface (API). It simplifies accessing the data from the German statistical federal office.
The main features are:
- Simplified access to the API. No more need to write cumbersome API calls.
- Credential management removes need to manually add credentials.
- Integrated workflow enables an end-to-end process from finding the relevant data to download it.
- Pandas support instead of manually parsing results.
- Caching to enable productive work despite strict query limits.
- Starting and handling background jobs for datasets that are to big to be downloaded directly from GENESIS.
To learn more about GENESIS refer to the official documentation here.
You can install the package via
$ pip install pystatisIf everything worked out correctly, you should be able to import pystatis like this
import pystatis as pystat
print("Version:", pystat.__version__)To be able to use the web service/API of GENESIS-Online, you have to be a registered user. You can create your user here.
Once you have a registered user, you can use your username and password as credentials for authentication against the GENESIS-Online API.
To avoid entering your credentials each time you use pystatis, your credentials will be stored locally with the init_config() helper function. This function accepts both a username and password argument and stores your credentials in a configuration file named config.ini that is stored under <user home>/.pystatis/config.ini by default. You can change this path with the optional config_dir argument.
So before you can use pystatis you have to execute the following code once:
from pystatis import init_config
init_config(username="myusername", password="mypassword")After executing this code you should have a new config.ini file under the <user home>/.pystatis directory.
Each time pystatis is communicating with GENESIS-Online via the API, it is automatically using the stored credentials in this config.ini, so you don't have to specify them again. In case of updated credentials, you can either run init_config() again or update the values directly in the config.ini file.
GENESIS-Online provides a helloworld endpoint that can be used to check your credentials:
from pystatis import logincheck
logincheck()
>>> '{"Status":"Sie wurden erfolgreich an- und abgemeldet!","Username":"ASFJ582LJ"}'If you can see a response like this, your setup is complete and you can start downloading data.
For more details, please study the provided sample notebook for cache.
The Genesis data structure consists of multiple elements as summarized in the image below.

This package currently supports retrieving the following data types:
- Cubes: Multi-dimensional data objects
- Tables: Derivatives of cubes that are already packaged into logical units
pystatis offers the Find class to search for any piece of information with GENESIS. Behind the scene it's using the find endpoint.
Example:
from pystatis import Find
results = Find("Rohöl") # Initiates object that contains all variables, statistics, tables and cubes
results.run() # Runs the query
results.tables.df # Results for tables
results.tables.get_code([1,2,3]) # Gets the table codes, e.g. for downloading the table
results.tables.get_metadata([1,2]) # Gets the metadata for the tableA complete overview of all use cases is provided in the sample notebook.
Data can be downloaded in to forms: as tables and as cubes. Both interfaces have been aligned to be as close as possible to each other.
Example for downloading a Table:
from pystatis import Table
t = Table(name="21311-0001") # data is not yet downloaded
t.get_data() # Only now the data is either fetched from GENESIS or loaded from cache. If the data is downloaded from online, it will be also cached, so next time the data is loaded from cache.
t.data # a pandas data frameExample for downloading a Cube:
from pystatis import Cube
c = Cube(name="22922KJ1141") # data is not yet downloaded
c.get_data() # Only now the data is either fetched from GENESIS or loaded from cache. If the data is downloaded from online, it will be also cached, so next time the data is loaded from cache.
c.data # a pandas data frameFor more details, please study the provided sample notebook for tables and cubes.
When a cube or table is queried, it will be put into cache automatically. The cache can be cleared using the following function:
from pystatis import clear_cache
clear_cache("21311-0001") # only deletes the data for the object with the specified name
clear_cache() # deletes the complete cacheDistributed under the MIT License. See LICENSE.txt for more information.
A few ideas we should implement in the maybe-near future:
- Improve Table parsing. Right now, the parsing is really simple and we should align the cube and table format so that the data frame for tables is more convenient to use.
- Create a source code documentation with Sphinx or similar tools.
- Mechanism to download data that is newer than the cached version. Right now, once data is cached, it is always retrieved from cache no matter if there is a newer version online. However, this could be quite challenging as the GENESIS API is really bad in providing a good and consistent field for the last update datetime.
- Improve Table and Cube metadata so the user can look up the variables contained in the dataset and for each variable the values that this variable can have.
- Understand and support time series.
Contributions to this project are highly appreciated! You can either contact the maintainers or directly create a pull request for your proposed changes:
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/<descriptive-name>) - Commit your changes (
git commit -m 'Added NewFeature') - Push to remote (
git push origin feature/<descriptive-name>) - Open a Pull Request to be merged with dev
To contribute to this project, please follow these steps:
- Install poetry. We recommend installing
poetryvia pipx which gives you a globalpoetrycommand in an isolated virtual environment. - Clone the repository via git.
- Change into the project root directory.
- Run
poetry installto create the virtual environment withinpoetry's cache folder (runpoetry env infoto see the details of this new virtual environment).poetryhas installed all dependencies for you, as well as the package itself. - Install pre-commit:
poetry run pre-commit install. This will activate the pre-commit hooks that will run prior every commit to ensure code quality. - Do your changes.
- Run
poetry run pytestto see if all existing tests still run through. It is important to usepoetry runto callpytestso thatpoetryuses the created virtual environment and not the system's default Python interpreter. Alternatively, you can runpoetry shellto letpoetryactivate the virtual environment for the current session. Afterwards, you can runpytestas usual without any prefix. You can leave the poetry shell with theexitcommand. - Add new tests depending on your changes.
- Run
poetry run pytestagain to make sure your tests are also passed. - Commit and push your changes.
- Create a PR.
To learn more about poetry, see Dependency Management With Python Poetry by realpython.com.