Skip to content

nextail/nx-dbt

Repository files navigation

Dagster Nextail Template CircleCI

dagster-in-nextail

In order to make it easier to start the development in Dagster this template is provided.

This template will give us:

  • A local development environment based on docker and docker-compose
  • CI/CD integration with Dagster Cloud
  • Examples of simple and test pipelines

Contents

Name Description
.circleci/ CircleCI workflows
.github/ issue and pull request templates
.platform/ Helm chart configuration for the project's Service Account and Secret Provider Class
.vscode/ Visual Studio Code custom configurations for debugging
dagster_template/ python package of your repository (it will change its name once you launch the update
docker/ definition of containers on which we will develop
images/ images for the documentation
scripts/ Utils for makefile
tests/ python tests for your package
Makefile Automating software building procedure
pyproject.toml This file contains requirements, which are used by pdm to build the package
pdm.lock Dependencies and sub-dependencies resolved properly by pdm from pyproject.toml
README.md A description and guide for this code repository

Makefile

Action Description
help show this help
dev-deps test if the dependencies needed to run this Makefile are installed
update This script updates all references to dagster-template
test pytest
create-env create .env file
start-dev start the docker environment in the background
shell start the docker environment in the background with shell
start-dev-nocache start the docker environment in the background without cache on build
stop-dev stop the docker environment
dev-clean clean all the project containers
dev-clean-full clean all the project containers and their data
clean remove all build, test, coverage and Python artifacts
clean-packages remove build packages
clean-pyc remove Python pyc files
clean-test remove test and coverage artifacts
pdm-lock rebuild the pdm lock file
lint-check test linter without making changes

0. Template

Create the project

⚠️ Important: User that creates the Github repository should be the same one that will later configure the project in CircleCI

To start, create a new repository and reference this template to make a copy: Repo-for-template

Then run:

make update

This script replaces all references to dagster-template (in folder names and file text content) by the name of your project.

Typically it performs this operation in:

  • the platform chart folder and chart name
  • the project's main module name (and textual references to it, typically in imports)

Once the update is complete, commit and push your changes to main.

Set up Dockerhub repository for your service

File a Request in Platform Squad Slack Channel #squad-platform asking for a new dockerhub repository to host your Docker images. Remember to include the name of the DockerHub repository you want. This is a manual process, since appropriate permissions must be set for developers and bots.

Set up your project in CI

Configure the project in CircleCI. Pipelines are configured in the default folder .circleci:

More Info

1. Requirements

2. Scenarios

  1. Development
  2. Cloud

2.1. Development

Include:

  • Postgres 11
  • Dagster Daemon
  • Dagster WebServer

Dagster Daemon and Dagster Webserver have your package folder as a docker volume.

The dagster_template.dagster module contains the code for your Dagster Definitions, the object that contains all the definitions defined within a code location. Definitions include assets, jobs, resources, schedules, and sensors.

2.1.1 Start environment

To start the development environment:

make start-dev
make-start-dev.mov

Navigate to http://127.0.0.0:3000 in your web browser. Go to the Launchpad tab. In this tab you will be able to edit the job configuration. On the bottom right you will find the "Launch Run" execution button.

launch-job.mov

2.1.2 Shell

If you want to start a shell with pdm installed, ready to interact with the source projects:

make shell

where workdir is: /opt/dagster-template with folders:

  • dagster: project
  • scripts: utils
make-shell.mov

To download and mount project dependencies within __pypackages__, run

pdm install --dev

Then you can run your tests with pytest:

pytest
test-from-shell.mov

2.1.3 Test

To run the full test suite non-interactively:

make test
make-test.mp4

As you create Dagster ops and graphs, add tests in tests/dagster to check that your code behaves as desired and does not break over time.

For hints on how to write tests for ops and graphs, See the documentation tutorial of Testing in Dagster

2.1.4 Pre-commit

Continuous Integration will run code formatting checks like black, ruff, isort and more using pre-commit hooks Any warnings from these checks will cause the Continuous Integration to fail; therefore, it is helpful to run the check yourself before submitting code. This can be done by using pre-commit in our docker shell (make shell) or using our docker lint-check (make lint-check).

You can install locally pre-commit:

pip install pre-commit

and then running:

pre-commit install

Now all of the styling checks will be run each time you commit changes without your needing to run each one manually. In addition, using pre-commit will also allow you to more easily remain up-to-date with our code checks as they change.

Note that if needed, you can skip these checks with git commit --no-verify.

If you don’t want to use pre-commit as part of your workflow, you can still use it to run its checks with:

pre-commit run --files <files you have modified>

without needing to have done pre-commit install beforehand.

2.1.5 Debug with vscode

One of the great things in Visual Studio Code is debugging support. Set breakpoints, step-in, inspect variables and more. The template is prepared to use this utility. Within the run and debug menu you can select webserver:localhost or daemon:localhost to start your debug.

debug-with-vscode

2.1.6 Access to Amazon S3 from local env

  • If you haven't set your AWS Profile yet, please follow this guide to configure it for developerProd

  • Secondly, when you configure your access in the job to s3 remember to select as profile: developerProd

profile_name: developerProd
region_name: eu-west-1

2.1.7 Deploying Branch Deployments

The branch_deployments will be executed by default from the branches that start with deploy/ and meet the following situation:

  • Pull Request OPEN: will create a branch_deployment with the name of the branch
  • Pull Request CLOSE or MERGED: will mark the branch_deployment as closed and Dagster Cloud will remove it after a certain amount of time.

The triggers that generate the Pull Request OPEN, Pull Request CLOSE or Pull Request MERGED events are generated from Github Actions and sent to CircleCI where the pipelines for the creation of the branch deployment and the deployment of the code location are executed.

⚠️ The ENVVAR "DAGSTER_CLOUD_DEPLOYMENT_NAME" will be the deployment ID. It's recommended to use the ENVVAR "NX_ENVIRONMENT" which will be sandbox.

Recommended reading: Branch Deployments in Dagster Cloud

2.1.8 Deveoping with DevContainer or Github Codespace

There is the possibility of developing the project from a DevContainer or GitHub Codespace.

You can read more about how to configure and launch these projects in the operations-workspace repository documentation

2.2. Cloud

We have two operating environments: sandbox and production.

CI/CD Integration with CircleCI Orb.

2.2.0 Requirements

  • Request Docker image on Docker Hub with the same name as your repository to #squad-platform
  • Configure the project in CircleCI. Pipelines are configured in the default folder .circleci

2.2.1 Environments

  • Nextail Cloud: Link
    • Workspaces:

2.2.2 Orb

The Circleci workflow lets you automatically update Dagster Cloud code locations when pipeline code is updated. The workflows builds a Docker image, pushes it to a Docker Hub repository, and uses the Dagster Cloud CLI to tell your agent to add the built image to your workspace.

More info

2.2.3 Default AWS permissions for Sandbox

To help kickstart development, a new data pipeline project can run its jobs using a default application role with common permissions. The default application role is materialized through two artifacts

  • an IAM role {{env}}-dagster holding permissions to interact with AWS services
  • a k8s service account user-cloud-dagster-cloud-agent which references the IAM role

Through these two artifacts, the default application role brings the following permissions to data pipeline jobs

  • Amazon S3 (granted through the IAM role)

    The IAM role has read/write access to the following buckets and paths

    • evo pipelines

      • SANDBOX

        s3_bucket: nextail-dev-evo
        s3_prefix: env-sandbox/{{tenant}}/dagster/{{your_path}}
      • PRODUCTION

        s3_bucket: nextail-{{tenant}}-evo
        s3_prefix: dagster/{{your_path}}
    • non-evo pipelines

      • SANDBOX

        s3_bucket: nextail-dev
        s3_prefix: env-sandbox/{{tenant}}/dagster/{{your_path}}
      • PRODUCTION

        s3_bucket: nextail-{{tenant}}
        s3_prefix: dagster/{{your_path}}
  • Amazon Secrets Manager (granted through the IAM role)

    Read access to Secrets containing the following tags

    • "scope-dagster": "true"
    • "environment": "${environment}" (where environment could be sandbox or production)
  • K8s (granted through the service account)

    Orchestrate jobs in the Kubernetes cluster where the dagster job is running.

Once initial development has been kickstarted, the pipeline project should move on to configuring specific permissions before being promoted to production, as explained in the next section.

2.2.4 Specific AWS permissions for Production

To decouple functional pipelines from the underlying platform's runtime, the functional pipelines must be granted specific permissions. To grant these permissions, you will need to have the following artifacts

  • an IAM role, which you can create following this platform guide. By default, the role will only grant your application permission to interact with AWS Secrets Manager. If the application needs additional permissions to access other AWS services like S3, follow this platform guide to grant them
  • a service account referencing the IAM role, deployed as a platform resource of the data pipeline repository via helm, as provided in this template repository

For general guidance about how to work with secrets in your pipeline project, check this platform guide

In addition, we added a property to the service account policies that allows you to have the same permissions that are given by default in Dagster. In this way you can migrate to your custom service account without losing functionality. The property is enable_dagster. You can set this property in the repository nextail/aws-infrastructure.

An example:

name                          = "dagster-poc"
enable_secrets_manager_access = true
enable_dagster                = true
custom_policies               = [ ]
Step 1: Configure the creation of Service Account and Service Provider

Configure the file .platform/charts/{{your_repository}}/values.yaml:

  1. Change serviceAccount.create to true
  2. Set your envVars and AWS keys to map your secrets into app environment ()

It is mandatory to have at least one secret mapped so that the deployment does not fail!!!

# Kubernetes Service Account
# By default it will be false and a default dagster service account will be used.
# In case of create: true, the circleci pipeline must also be configured setting the property custom_service_account: true.
serviceAccount:
  create: true

# envVar: is the name of the environment variable that will be exposed in your application container.
# key: refers to the key inside your AWS Secrets Manager secret.
envFromSecretsManager:
  - envVar: ENV_NAME
    key: secret_key
Step 2: Set our Custom Service Account into our Dagster Project

The last step will be to modify in the file .circleci/config.yml the parameter custom_service_account in the line 9 default: false to default: true.

version: 2.1

orbs:
  dagster-pipelines-orb: nextail/[email protected]

parameters:
  custom_service_account:
    type: boolean
    default: true
    description: "We use this parameter to define if our project uses its own service account (true) or by default (false)."