Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions libs/template/template.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ type TemplateName string
const (
DefaultPython TemplateName = "default-python"
DefaultSql TemplateName = "default-sql"
LakeflowPipelines TemplateName = "lakeflow-pipelines"
DbtSql TemplateName = "dbt-sql"
MlopsStacks TemplateName = "mlops-stacks"
DefaultPydabs TemplateName = "default-pydabs"
Expand All @@ -46,6 +47,13 @@ var databricksTemplates = []Template{
Reader: &builtinReader{name: string(DefaultSql)},
Writer: &writerWithFullTelemetry{defaultWriter: defaultWriter{name: DefaultSql}},
},
{
name: LakeflowPipelines,
hidden: true,
description: "The default template for Lakeflow Declarative Pipelines",
Reader: &builtinReader{name: string(LakeflowPipelines)},
Writer: &writerWithFullTelemetry{defaultWriter: defaultWriter{name: LakeflowPipelines}},
},
{
name: DbtSql,
description: "The dbt SQL template (databricks.com/blog/delivering-cost-effective-data-real-time-dbt-and-databricks)",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Preamble

This file only template directives; it is skipped for the actual output.
This file only contains template directives; it is skipped for the actual output.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file a separate PR for this?


{{skip "__preamble"}}

Expand Down
3 changes: 3 additions & 0 deletions libs/template/templates/lakeflow-pipelines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Lakeflow Pipelines

Default template for Lakeflow Declarative Pipelines
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{
"welcome_message": "\nWelcome to the template for Lakeflow Declarative Pipelines!",
"properties": {
"project_name": {
"type": "string",
"default": "my_project",
Comment thread
fjakobs marked this conversation as resolved.
Outdated
"description": "Please provide the following details to tailor the template to your preferences.\n\nUnique name for this project\nproject_name",
"order": 1,
"pattern": "^[a-z0-9_]+$",
"pattern_match_failure_message": "Name must consist of lower case letters, numbers, and underscores."
},
"default_catalog": {
"type": "string",
"default": "{{default_catalog}}",
"pattern": "^\\w*$",
"pattern_match_failure_message": "Invalid catalog name.",
"description": "\nInitial catalog.\ndefault_catalog",
"order": 3
},
"personal_schemas": {
"type": "string",
"description": "\nUse a personal schema for each user working on this project? (e.g., 'catalog.{{short_name}}')\npersonal_schemas",
"default": "yes",
"enum": [
"yes",
"no"
],
"order": 4
},
"shared_schema": {
"skip_prompt_if": {
"properties": {
"personal_schemas": {
"const": "yes"
}
}
},
"type": "string",
"default": "default",
"pattern": "^\\w+$",
"pattern_match_failure_message": "Invalid schema name.",
"description": "\nInitial schema during development:\ndefault_schema",
"order": 5
},
"language": {
"type": "string",
"default": "python",
"description": "\nLanguage for this project:\nlanguage",
"enum": [
"python",
"sql"
],
"order": 6
}
},
"success_message": "\n\nYour new project has been created in the '{{.project_name}}' directory!\n\nRefer to the README.md file for \"getting started\" instructions!"
}
37 changes: 37 additions & 0 deletions libs/template/templates/lakeflow-pipelines/library/variables.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{{- define `table_suffix` -}}
Comment thread
fjakobs marked this conversation as resolved.
Outdated
{{ (regexp `^_+|_+$`).ReplaceAllString ((regexp `_+`).ReplaceAllString .project_name `_`) `` }}
{{- end }}

{{- define `pipeline_name` -}}
{{template `table_suffix` .}}_pipeline
{{- end }}

{{- define `job_name` -}}
{{template `table_suffix` .}}_job
{{- end }}

{{- define `static_dev_schema` -}}
{{- if (regexp "^yes").MatchString .personal_schemas -}}
{{ short_name }}
{{- else -}}
{{ .shared_schema }}
{{- end}}
{{- end }}


{{- define `dev_schema` -}}
{{- if (regexp "^yes").MatchString .personal_schemas -}}
${workspace.current_user.short_name}
{{- else -}}
{{ .shared_schema }}
{{- end}}
{{- end }}


{{- define `prod_schema` -}}
{{- if (regexp "^yes").MatchString .personal_schemas -}}
default
{{- else -}}
{{ .shared_schema }}
{{- end}}
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Preamble

This file only contains template directives; it is skipped for the actual output.

{{skip "__preamble"}}

{{$isSQL := eq .language "sql"}}

{{if $isSQL}}
{{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/utilities/utils.py"}}
Comment thread
fjakobs marked this conversation as resolved.
Outdated
{{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/transformations/sample_zones_{{template `table_suffix` .}}.py"}}
{{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/transformations/sample_trips_{{template `table_suffix` .}}.py"}}
{{else}}
{{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/transformations/sample_zones_{{template `table_suffix` .}}.sql"}}
{{skip "{{.project_name}}/resources/{{template `pipeline_name` .}}/transformations/sample_trips_{{template `table_suffix` .}}.sql"}}
{{end}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.databricks/
build/
dist/
__pycache__/
*.egg-info
.venv/
**/explorations/**
**/!explorations/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Typings for Pylance in Visual Studio Code
# see https://github.com/microsoft/pyright/blob/main/docs/builtins.md
from databricks.sdk.runtime import *
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"recommendations": [
"databricks.databricks",
"ms-python.vscode-pylance",
"redhat.vscode-yaml"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"python.analysis.stubPath": ".vscode",
"databricks.python.envFile": "${workspaceFolder}/.env",
"jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
"jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
"python.testing.pytestArgs": [
"."
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
{{- /* Unfortunately extraPaths doesn't support globs!! See: https://github.com/microsoft/pylance-release/issues/973 */}}
"python.analysis.extraPaths": ["assets/etl_pipeline"],
Comment thread
fjakobs marked this conversation as resolved.
Outdated
"files.exclude": {
"**/*.egg-info": true,
"**/__pycache__": true,
".pytest_cache": true,
},
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.formatOnSave": true,
},
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add "python.analysis.typeCheckingMode": "basic" here? Or, probably better, revisit that question later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to defer that decision

Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# {{.project_name}}

The '{{.project_name}}' project was generated by using the Lakeflow template.
Comment thread
fjakobs marked this conversation as resolved.
Outdated

## Setup

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html

2. Authenticate to your Databricks workspace, if you have not done so already:
```
$ databricks auth login
```

3. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
https://docs.databricks.com/dev-tools/vscode-ext.html. Or the PyCharm plugin from
https://www.databricks.com/blog/announcing-pycharm-integration-databricks.


## Deploying resources

1. To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```
(Note that "dev" is the default target, so the `--target` parameter
is optional here.)

2. Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```

3. Use the "summary" comand to review everything that was deployed:
```
$ databricks bundle summary
```

4. To run a job or pipeline, use the "run" command:
```
$ databricks bundle run
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# This is a Databricks asset bundle definition for {{.project_name}}.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: {{.project_name}}
uuid: {{bundle_uuid}}

include:
- resources/*.yml
- resources/*/*.yml

# Variable declarations. These variables are assigned in the dev/prod targets below.
variables:
catalog:
description: The catalog to use
schema:
description: The schema to use
notifications:
description: The email addresses to use for failure notifications

targets:
dev:
# The default target uses 'mode: development' to create a development copy.
# - Deployed resources get prefixed with '[dev my_user_name]'
# - Any job schedules and triggers are paused by default.
# See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.
mode: development
default: true
workspace:
host: {{workspace_host}}
variables:
catalog: {{.default_catalog}}
schema: {{template `dev_schema` .}}
notifications: []

prod:
mode: production
workspace:
host: {{workspace_host}}
# We explicitly specify /Workspace/Users/{{user_name}} to make sure we only have a single copy.
root_path: /Workspace/Users/{{user_name}}/.bundle/${bundle.name}/${bundle.target}
permissions:
- {{if is_service_principal}}service_principal{{else}}user{{end}}_name: {{user_name}}
level: CAN_MANAGE
run_as:
Comment thread
fjakobs marked this conversation as resolved.
Outdated
{{if is_service_principal}}service_principal{{else}}user{{end}}_name: {{user_name}}
variables:
catalog: {{.default_catalog}}
schema: {{template `prod_schema` .}}
notifications: [{{user_name}}]
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{{- if (eq .language "python") -}}
# {{template `pipeline_name` .}}

This folder defines all source code for the {{template `pipeline_name` .}} pipeline:

- `explorations`: Ad-hoc notebooks used to explore the data processed by this pipeline.
- `transformations`: All dataset definitions and transformations.
- `utilities`: Utility functions and Python modules used in this pipeline.
Comment thread
fjakobs marked this conversation as resolved.
Outdated
- `data_sources` (optional): View definitions describing the source data for this pipeline.

## Getting Started

To get started, go to the `transformations` folder -- most of the relevant source code lives there:

* By convention, every dataset under `transformations` is in a separate file.
* Take a look at the sample under "sample_trips_{{template `table_suffix` .}}.py" to get familiar with the syntax.
Read more about the syntax at https://docs.databricks.com/dlt/python-ref.html.
* Use `Run file` to run and preview a single transformation.
* Use `Run pipeline` to run _all_ transformations in the entire pipeline.
* Use `+ Add` in the file browser to add a new data set definition.
* Use `Schedule` to run the pipeline on a schedule!

For more tutorials and reference material, see https://docs.databricks.com/dlt.
{{- else -}}
Comment thread
fjakobs marked this conversation as resolved.
Outdated
# {{template `pipeline_name` .}}

This folder defines all source code for the '{{template `pipeline_name` .}}' pipeline:

- \`explorations\`: Ad-hoc notebooks used to explore the data processed by this pipeline.
Comment thread
fjakobs marked this conversation as resolved.
Outdated
- \`transformations\`: All dataset definitions and transformations.
- \`data_sources\` (optional): View definitions describing the source data for this pipeline.

## Getting Started

To get started, go to the \`transformations\` folder -- most of the relevant source code lives there:

* By convention, every dataset under \`transformations\` is in a separate file.
* Take a look at the sample under "sample_trips_{{template `table_suffix` .}}.sql" to get familiar with the syntax.
Read more about the syntax at https://docs.databricks.com/dlt/sql-ref.html.
* Use \`Run file\` to run and preview a single transformation.
* Use \`Run pipeline\` to run _all_ transformations in the entire pipeline.
* Use \`+ Add\` in the file browser to add a new data set definition.
* Use \`Schedule\` to run the pipeline on a schedule!

For more tutorials and reference material, see https://docs.databricks.com/dlt.
{{- end -}}
Loading
Loading