align databricks-iris template to work with kedro-databricks #227

JenspederM · 2024-07-14T21:06:36Z

Motivation and Context

This PR is made to align the databricks-iris starter to the kedro-databricks plugin.

With these changes, creating a new Kedro project on Databricks should be as easy as:

Create a project kedro new --starter="databricks-iris"
Install dependencies python -m venv .venv && source ./.venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt
Initialize databricks asset bundle kedro databricks init
Bundle pipelines to asset bundle resources kedro databricks bundle
Deploy project to Databricks kedro databricks deploy

How has this been tested?

The above has been tested against a private Databricks workspace.

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Assigned myself to the PR
Added tests to cover my changes

noklam

Left some comment. Thanks for creating the plugin I think this is a huge improvement over the manual step deployment guide.

I have left some comment but I suggest separating necessary changes vs README.md, at this stage we are not ready to mention a third-party plugin in our official docs. You can actually create your own custom starter and register it in your plugin so that it can be created with kedro new -s. See the documentations

databricks-iris/{{ cookiecutter.repo_name }}/README.md

...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py

databricks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipeline.py

JenspederM · 2024-07-17T20:21:14Z

Left some comment. Thanks for creating the plugin I think this is a huge improvement over the manual step deployment guide.

... separating necessary changes vs README.md, at this stage we are not ready ...

Does that mean including the plugin

Signed-off-by: Jens Peder Meldgaard <[email protected]>

JenspederM · 2024-07-18T13:39:04Z

Hi @noklam,

Thank you for the thorough review! I have made changes according to your comments - I see I was a bit overenthusiastic in regards to kedro-databricks ;)

I have removed all mentions of it now, and simply ensure that the starter would work out of the box with the plugin instead.

Steps to deploy using the plugin is now:

# Create project
kedro new --starter="https://github.com/JenspederM/kedro-starters" --directory="databricks-iris" --checkout="feat/align-databricks-iris-with-kedro-databricks" --name "iris"

# Install dependencies
pip install -r requirements.txt
pip install kedro-databricks

# Initialization
kedro databricks init

# Bundling
kedro databricks bundle

# Deployment
kedro databricks deploy

This is much less invasive than before.

I also made some changes to the plugin, specifically regarding logging to make this more informative and closer aligned with other methods.

Please let me know what you think! :)

noklam

Thanks again, I think we are getting closed, added a few comments, I missed something in the last round.

...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py

databricks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipeline.py

Signed-off-by: Jens Peder Meldgaard <[email protected]>

JenspederM · 2024-07-18T15:50:42Z

FYI @noklam

I implemented the changes that you suggested. You're right, no need for the @pandas. :)

I have just tried the steps from my previous comment with this implementation and everything works as expected.

noklam

Left one minor styling comment, thanks for making this change! Exciting to see the datarbricks plugin, great work!

...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py

datajoely

Amazing work @JenspederM appreciate the effort. My comments are more for wider discussion not a firm recommendation!

datajoely · 2024-07-18T16:13:42Z

databricks-iris/{{ cookiecutter.repo_name }}/requirements.txt

@@ -2,7 +2,7 @@ ipython>=8.10
 jupyterlab>=3.0
 notebook
 kedro~={{ cookiecutter.kedro_version }}
-kedro-datasets[spark.SparkDataset, pandas.ParquetDataset]>=1.0
+kedro-datasets[spark, pandas, spark.SparkDataset, pandas.ParquetDataset]>=1.0


Is there an argument we should encourage the use of databricks.ManagedTableDataSet (even if it's commented out in the catalog) since it highlights Kedro's compatibility with Unity Catalog + Delta lake?

Agree 👆

But perhaps better saved for another PR? :)

absolutely, this is a really important and frankly overdue piece of work and I'm just thinking about how to take it further, later!

For reference, +1 on this!

...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py

JenspederM · 2024-07-19T07:29:47Z

@datajoely, do you regard your comments as things that should be included in this PR or is it more general discussion for a future PR? 😊

I would like to merge this as I can then make an announcement for the plugin without having to specify a branch for the starter 😊

noklam · 2024-07-19T10:30:20Z

@JenspederM I see the DCO checking is failing, can you try to fix it? If you click into the "Details" button it shows you the instruction to fix it.

Signed-off-by: Jens Peder Meldgaard <[email protected]>

JenspederM · 2024-07-19T11:19:15Z

@noklam I just accepted your change here - apparently that doesn't sign..

Should be fixed now :)

JenspederM · 2024-07-19T12:11:51Z

@datajoely, I see you resolved your comments, does that mean you approve of the PR? :)

JenspederM · 2024-07-19T12:24:10Z

@noklam @datajoely any idea why tests aren't running?

noklam · 2024-07-19T12:36:22Z

I trigger it now, it's a setting thing. CI won't run for first time contributor.

JenspederM · 2024-07-19T12:38:45Z

Ah okay.

Thank you!

JenspederM · 2024-07-19T14:42:32Z

@noklam @datajoely Are we ready to merge? :)

noklam · 2024-07-19T15:19:57Z

@JenspederM congratulations for your first PR!

JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch 3 times, most recently from d903972 to 7c101af Compare July 14, 2024 23:22

JenspederM marked this pull request as ready for review July 14, 2024 23:25

JenspederM marked this pull request as draft July 14, 2024 23:30

JenspederM marked this pull request as ready for review July 14, 2024 23:31

noklam reviewed Jul 16, 2024

View reviewed changes

JenspederM closed this Jul 17, 2024

JenspederM reopened this Jul 17, 2024

JenspederM added 3 commits July 18, 2024 15:26

improve readme

62f0cb0

Signed-off-by: Jens Peder Meldgaard <[email protected]>

add base dependencies for spark and pandas

8bfe125

Signed-off-by: Jens Peder Meldgaard <[email protected]>

fix dbfs file paths and remove use of MemoryDataset

64928ff

Signed-off-by: Jens Peder Meldgaard <[email protected]>

JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch from 306f00b to c3a5a08 Compare July 18, 2024 13:28

JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch from c3a5a08 to 6cfdebd Compare July 18, 2024 13:41

noklam reviewed Jul 18, 2024

View reviewed changes

...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py Outdated Show resolved Hide resolved

databricks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipeline.py Outdated Show resolved Hide resolved

add option to specify nodes in run

9fcd6c3

Signed-off-by: Jens Peder Meldgaard <[email protected]>

JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch from 6cfdebd to 9fcd6c3 Compare July 18, 2024 13:50

not doing any transcoding

f0e481a

Signed-off-by: Jens Peder Meldgaard <[email protected]>

noklam approved these changes Jul 18, 2024

View reviewed changes

...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py Outdated Show resolved Hide resolved

datajoely reviewed Jul 18, 2024

View reviewed changes

not also handles the None case

159ec3e

Signed-off-by: Jens Peder Meldgaard <[email protected]>

JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch from 042e734 to 159ec3e Compare July 19, 2024 11:18

datajoely approved these changes Jul 19, 2024

View reviewed changes

noklam self-requested a review July 19, 2024 15:19

noklam approved these changes Jul 19, 2024

View reviewed changes

noklam merged commit a0fbc12 into kedro-org:main Jul 19, 2024
23 checks passed

merelcht mentioned this pull request Aug 6, 2024

Update databricks deployment docs and remove mentions of custom entrypoint kedro-org/kedro#4067

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

align databricks-iris template to work with kedro-databricks #227

align databricks-iris template to work with kedro-databricks #227

JenspederM commented Jul 14, 2024 •

edited

Loading

noklam left a comment

JenspederM commented Jul 17, 2024

JenspederM commented Jul 18, 2024

noklam left a comment

JenspederM commented Jul 18, 2024 •

edited

Loading

noklam left a comment

datajoely left a comment

datajoely Jul 18, 2024

JenspederM Jul 18, 2024

datajoely Jul 19, 2024

astrojuanlu Jul 19, 2024

JenspederM commented Jul 19, 2024

noklam commented Jul 19, 2024

JenspederM commented Jul 19, 2024

JenspederM commented Jul 19, 2024

JenspederM commented Jul 19, 2024

noklam commented Jul 19, 2024

JenspederM commented Jul 19, 2024

JenspederM commented Jul 19, 2024

noklam commented Jul 19, 2024

align databricks-iris template to work with kedro-databricks #227

align databricks-iris template to work with kedro-databricks #227

Conversation

JenspederM commented Jul 14, 2024 • edited Loading

Motivation and Context

How has this been tested?

Checklist

noklam left a comment

Choose a reason for hiding this comment

JenspederM commented Jul 17, 2024

JenspederM commented Jul 18, 2024

noklam left a comment

Choose a reason for hiding this comment

JenspederM commented Jul 18, 2024 • edited Loading

noklam left a comment

Choose a reason for hiding this comment

datajoely left a comment

Choose a reason for hiding this comment

datajoely Jul 18, 2024

Choose a reason for hiding this comment

JenspederM Jul 18, 2024

Choose a reason for hiding this comment

datajoely Jul 19, 2024

Choose a reason for hiding this comment

astrojuanlu Jul 19, 2024

Choose a reason for hiding this comment

JenspederM commented Jul 19, 2024

noklam commented Jul 19, 2024

JenspederM commented Jul 19, 2024

JenspederM commented Jul 19, 2024

JenspederM commented Jul 19, 2024

noklam commented Jul 19, 2024

JenspederM commented Jul 19, 2024

JenspederM commented Jul 19, 2024

noklam commented Jul 19, 2024

JenspederM commented Jul 14, 2024 •

edited

Loading

JenspederM commented Jul 18, 2024 •

edited

Loading