-
Notifications
You must be signed in to change notification settings - Fork 181
Add Lakeflow template #2959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add Lakeflow template #2959
Changes from 4 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
442f13c
Add Lakeflow template
fjakobs 6cb3fc0
Can't use quotes in file names when embedding
fjakobs b036f53
address PR feedback
fjakobs 2bcd816
always generate jobs
fjakobs 1ee4ede
Add acceptance tests
fjakobs 2959c84
update acceptance tests
fjakobs 501e864
format
fjakobs 9ba1f48
PR feedback
fjakobs 9a4595f
Merge branch 'main' into lakeflow-template
fjakobs 8cfd360
Fix python acceptance test
fjakobs 0745405
fix formatting
fjakobs ad7897b
replace table_suffix with project_name
fjakobs 6b61fd3
Merge branch 'main' into lakeflow-template
fjakobs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
libs/template/templates/experimental-jobs-as-code/template/__preamble.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Lakeflow Pipelines | ||
|
|
||
| Default template for Lakeflow Declarative Pipelines |
57 changes: 57 additions & 0 deletions
57
libs/template/templates/lakeflow-pipelines/databricks_template_schema.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| { | ||
| "welcome_message": "\nWelcome to the template for Lakeflow Declarative Pipelines!", | ||
| "properties": { | ||
| "project_name": { | ||
| "type": "string", | ||
| "default": "my_project", | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
| "description": "Please provide the following details to tailor the template to your preferences.\n\nUnique name for this project\nproject_name", | ||
| "order": 1, | ||
| "pattern": "^[a-z0-9_]+$", | ||
| "pattern_match_failure_message": "Name must consist of lower case letters, numbers, and underscores." | ||
| }, | ||
| "default_catalog": { | ||
| "type": "string", | ||
| "default": "{{default_catalog}}", | ||
| "pattern": "^\\w*$", | ||
| "pattern_match_failure_message": "Invalid catalog name.", | ||
| "description": "\nInitial catalog.\ndefault_catalog", | ||
| "order": 3 | ||
| }, | ||
| "personal_schemas": { | ||
| "type": "string", | ||
| "description": "\nUse a personal schema for each user working on this project? (e.g., 'catalog.{{short_name}}')\npersonal_schemas", | ||
| "default": "yes", | ||
| "enum": [ | ||
| "yes", | ||
| "no" | ||
| ], | ||
| "order": 4 | ||
| }, | ||
| "shared_schema": { | ||
| "skip_prompt_if": { | ||
| "properties": { | ||
| "personal_schemas": { | ||
| "const": "yes" | ||
| } | ||
| } | ||
| }, | ||
| "type": "string", | ||
| "default": "default", | ||
| "pattern": "^\\w+$", | ||
| "pattern_match_failure_message": "Invalid schema name.", | ||
| "description": "\nInitial schema during development:\ndefault_schema", | ||
| "order": 5 | ||
| }, | ||
| "language": { | ||
| "type": "string", | ||
| "default": "python", | ||
| "description": "\nLanguage for this project:\nlanguage", | ||
| "enum": [ | ||
| "python", | ||
| "sql" | ||
| ], | ||
| "order": 6 | ||
| } | ||
| }, | ||
| "success_message": "\n\nYour new project has been created in the '{{.project_name}}' directory!\n\nRefer to the README.md file for \"getting started\" instructions!" | ||
| } | ||
37 changes: 37 additions & 0 deletions
37
libs/template/templates/lakeflow-pipelines/library/variables.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| {{- define `table_suffix` -}} | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
| {{ (regexp `^_+|_+$`).ReplaceAllString ((regexp `_+`).ReplaceAllString .project_name `_`) `` }} | ||
| {{- end }} | ||
|
|
||
| {{- define `pipeline_name` -}} | ||
| {{template `table_suffix` .}}_pipeline | ||
| {{- end }} | ||
|
|
||
| {{- define `job_name` -}} | ||
| {{template `table_suffix` .}}_job | ||
| {{- end }} | ||
|
|
||
| {{- define `static_dev_schema` -}} | ||
| {{- if (regexp "^yes").MatchString .personal_schemas -}} | ||
| {{ short_name }} | ||
| {{- else -}} | ||
| {{ .shared_schema }} | ||
| {{- end}} | ||
| {{- end }} | ||
|
|
||
|
|
||
| {{- define `dev_schema` -}} | ||
| {{- if (regexp "^yes").MatchString .personal_schemas -}} | ||
| ${workspace.current_user.short_name} | ||
| {{- else -}} | ||
| {{ .shared_schema }} | ||
| {{- end}} | ||
| {{- end }} | ||
|
|
||
|
|
||
| {{- define `prod_schema` -}} | ||
| {{- if (regexp "^yes").MatchString .personal_schemas -}} | ||
| default | ||
| {{- else -}} | ||
| {{ .shared_schema }} | ||
| {{- end}} | ||
| {{- end }} | ||
16 changes: 16 additions & 0 deletions
16
libs/template/templates/lakeflow-pipelines/template/__preamble.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # Preamble | ||
|
|
||
| This file only contains template directives; it is skipped for the actual output. | ||
|
|
||
| {{skip "__preamble"}} | ||
|
|
||
| {{$isSQL := eq .language "sql"}} | ||
|
|
||
| {{if $isSQL}} | ||
| {{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/utilities/utils.py"}} | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
| {{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/transformations/sample_zones_{{template `table_suffix` .}}.py"}} | ||
| {{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/transformations/sample_trips_{{template `table_suffix` .}}.py"}} | ||
| {{else}} | ||
| {{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/transformations/sample_zones_{{template `table_suffix` .}}.sql"}} | ||
| {{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/transformations/sample_trips_{{template `table_suffix` .}}.sql"}} | ||
| {{end}} | ||
8 changes: 8 additions & 0 deletions
8
libs/template/templates/lakeflow-pipelines/template/{{.project_name}}/.gitignore.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| .databricks/ | ||
| build/ | ||
| dist/ | ||
| __pycache__/ | ||
| *.egg-info | ||
| .venv/ | ||
| **/explorations/** | ||
| **/!explorations/README.md |
3 changes: 3 additions & 0 deletions
3
...template/templates/lakeflow-pipelines/template/{{.project_name}}/.vscode/__builtins__.pyi
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Typings for Pylance in Visual Studio Code | ||
| # see https://github.com/microsoft/pyright/blob/main/docs/builtins.md | ||
| from databricks.sdk.runtime import * |
7 changes: 7 additions & 0 deletions
7
.../template/templates/lakeflow-pipelines/template/{{.project_name}}/.vscode/extensions.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| { | ||
| "recommendations": [ | ||
| "databricks.databricks", | ||
| "ms-python.vscode-pylance", | ||
| "redhat.vscode-yaml" | ||
| ] | ||
| } |
22 changes: 22 additions & 0 deletions
22
...mplate/templates/lakeflow-pipelines/template/{{.project_name}}/.vscode/settings.json.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| { | ||
| "python.analysis.stubPath": ".vscode", | ||
| "databricks.python.envFile": "${workspaceFolder}/.env", | ||
| "jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])", | ||
| "jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------", | ||
| "python.testing.pytestArgs": [ | ||
| "." | ||
| ], | ||
| "python.testing.unittestEnabled": false, | ||
| "python.testing.pytestEnabled": true, | ||
| {{- /* Unfortunately extraPaths doesn't support globs!! See: https://github.com/microsoft/pylance-release/issues/973 */}} | ||
| "python.analysis.extraPaths": ["assets/etl_pipeline"], | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
| "files.exclude": { | ||
| "**/*.egg-info": true, | ||
| "**/__pycache__": true, | ||
| ".pytest_cache": true, | ||
| }, | ||
| "[python]": { | ||
| "editor.defaultFormatter": "ms-python.black-formatter", | ||
| "editor.formatOnSave": true, | ||
| }, | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add "python.analysis.typeCheckingMode": "basic" here? Or, probably better, revisit that question later?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer to defer that decision |
||
41 changes: 41 additions & 0 deletions
41
libs/template/templates/lakeflow-pipelines/template/{{.project_name}}/README.md.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # {{.project_name}} | ||
|
|
||
| The '{{.project_name}}' project was generated by using the Lakeflow template. | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Setup | ||
|
|
||
| 1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html | ||
|
|
||
| 2. Authenticate to your Databricks workspace, if you have not done so already: | ||
| ``` | ||
| $ databricks auth login | ||
| ``` | ||
|
|
||
| 3. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from | ||
| https://docs.databricks.com/dev-tools/vscode-ext.html. Or the PyCharm plugin from | ||
| https://www.databricks.com/blog/announcing-pycharm-integration-databricks. | ||
|
|
||
|
|
||
| ## Deploying resources | ||
|
|
||
| 1. To deploy a development copy of this project, type: | ||
| ``` | ||
| $ databricks bundle deploy --target dev | ||
| ``` | ||
| (Note that "dev" is the default target, so the `--target` parameter | ||
| is optional here.) | ||
|
|
||
| 2. Similarly, to deploy a production copy, type: | ||
| ``` | ||
| $ databricks bundle deploy --target prod | ||
| ``` | ||
|
|
||
| 3. Use the "summary" comand to review everything that was deployed: | ||
| ``` | ||
| $ databricks bundle summary | ||
| ``` | ||
|
|
||
| 4. To run a job or pipeline, use the "run" command: | ||
| ``` | ||
| $ databricks bundle run | ||
| ``` | ||
49 changes: 49 additions & 0 deletions
49
libs/template/templates/lakeflow-pipelines/template/{{.project_name}}/databricks.yml.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # This is a Databricks asset bundle definition for {{.project_name}}. | ||
| # See https://docs.databricks.com/dev-tools/bundles/index.html for documentation. | ||
| bundle: | ||
| name: {{.project_name}} | ||
| uuid: {{bundle_uuid}} | ||
|
|
||
| include: | ||
| - resources/*.yml | ||
| - resources/*/*.yml | ||
|
|
||
| # Variable declarations. These variables are assigned in the dev/prod targets below. | ||
| variables: | ||
| catalog: | ||
| description: The catalog to use | ||
| schema: | ||
| description: The schema to use | ||
| notifications: | ||
| description: The email addresses to use for failure notifications | ||
|
|
||
| targets: | ||
| dev: | ||
| # The default target uses 'mode: development' to create a development copy. | ||
| # - Deployed resources get prefixed with '[dev my_user_name]' | ||
| # - Any job schedules and triggers are paused by default. | ||
| # See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html. | ||
| mode: development | ||
| default: true | ||
| workspace: | ||
| host: {{workspace_host}} | ||
| variables: | ||
| catalog: {{.default_catalog}} | ||
| schema: {{template `dev_schema` .}} | ||
| notifications: [] | ||
|
|
||
| prod: | ||
| mode: production | ||
| workspace: | ||
| host: {{workspace_host}} | ||
| # We explicitly specify /Workspace/Users/{{user_name}} to make sure we only have a single copy. | ||
| root_path: /Workspace/Users/{{user_name}}/.bundle/${bundle.name}/${bundle.target} | ||
| permissions: | ||
| - {{if is_service_principal}}service_principal{{else}}user{{end}}_name: {{user_name}} | ||
| level: CAN_MANAGE | ||
| run_as: | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
| {{if is_service_principal}}service_principal{{else}}user{{end}}_name: {{user_name}} | ||
| variables: | ||
| catalog: {{.default_catalog}} | ||
| schema: {{template `prod_schema` .}} | ||
| notifications: [{{user_name}}] | ||
46 changes: 46 additions & 0 deletions
46
...-pipelines/template/{{.project_name}}/resources/{{.project_name}}_pipeline/README.md.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| {{- if (eq .language "python") -}} | ||
| # {{template `pipeline_name` .}} | ||
|
|
||
| This folder defines all source code for the {{template `pipeline_name` .}} pipeline: | ||
|
|
||
| - `explorations`: Ad-hoc notebooks used to explore the data processed by this pipeline. | ||
| - `transformations`: All dataset definitions and transformations. | ||
| - `utilities`: Utility functions and Python modules used in this pipeline. | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
| - `data_sources` (optional): View definitions describing the source data for this pipeline. | ||
|
|
||
| ## Getting Started | ||
|
|
||
| To get started, go to the `transformations` folder -- most of the relevant source code lives there: | ||
|
|
||
| * By convention, every dataset under `transformations` is in a separate file. | ||
| * Take a look at the sample under "sample_trips_{{template `table_suffix` .}}.py" to get familiar with the syntax. | ||
| Read more about the syntax at https://docs.databricks.com/dlt/python-ref.html. | ||
| * Use `Run file` to run and preview a single transformation. | ||
| * Use `Run pipeline` to run _all_ transformations in the entire pipeline. | ||
| * Use `+ Add` in the file browser to add a new data set definition. | ||
| * Use `Schedule` to run the pipeline on a schedule! | ||
|
|
||
| For more tutorials and reference material, see https://docs.databricks.com/dlt. | ||
| {{- else -}} | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
| # {{template `pipeline_name` .}} | ||
|
|
||
| This folder defines all source code for the '{{template `pipeline_name` .}}' pipeline: | ||
|
|
||
| - \`explorations\`: Ad-hoc notebooks used to explore the data processed by this pipeline. | ||
|
fjakobs marked this conversation as resolved.
Outdated
|
||
| - \`transformations\`: All dataset definitions and transformations. | ||
| - \`data_sources\` (optional): View definitions describing the source data for this pipeline. | ||
|
|
||
| ## Getting Started | ||
|
|
||
| To get started, go to the \`transformations\` folder -- most of the relevant source code lives there: | ||
|
|
||
| * By convention, every dataset under \`transformations\` is in a separate file. | ||
| * Take a look at the sample under "sample_trips_{{template `table_suffix` .}}.sql" to get familiar with the syntax. | ||
| Read more about the syntax at https://docs.databricks.com/dlt/sql-ref.html. | ||
| * Use \`Run file\` to run and preview a single transformation. | ||
| * Use \`Run pipeline\` to run _all_ transformations in the entire pipeline. | ||
| * Use \`+ Add\` in the file browser to add a new data set definition. | ||
| * Use \`Schedule\` to run the pipeline on a schedule! | ||
|
|
||
| For more tutorials and reference material, see https://docs.databricks.com/dlt. | ||
| {{- end -}} | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you file a separate PR for this?