This project is a learning-by-doing data model build with dbt-core
for an imaginary company selling postcards.
The company sells both directly but also through resellers in the majority of European countries.
This model is used by my other projects:
raw
unrefined input datastaging
staging areacore
curated data
- dim_channel
- dim_customer
- dim_date
- dim_geography
- dim_sales_agent
- fact_sales
The data is generated as parquet files by a Python script generator/generate.py
using user-defined assets assets.py
. These may be adjusted as per needs.
-
Rename
.env.example
to.env
. This will contain relative paths for the database file (datamart.duckdb) and parquet input files -
Rename
shared\db\datamart.duckdb.example
toshared\db\datamart.duckdb
or initiate an empty database there with the same name. -
Create a Python Virtual Environment
python3 -m venv .venv
- Add environment variables to the virtual environment
cat .env >> .venv/bin/activate
- Activate the Python venv
source .venv/bin/activate
- Change the working directory to
generator
cd generator
- Install the required packages
pip install -r requirements.txt
- Generate the data
python3 generate.py
The generated data will be under shared/parquet
.
- Ensure the virtual environment is activated
source .venv/bin/activate
- Change directory to
postcard_company
cd postcard_company
-
Run
dbt deps
to install dependencies -
Run
dbt seed
to import the seed (static) data -
Run
dbt compile
to compile the project -
Run
dbt run
to run the models -
Run
dbt test
to run the tests