-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
align databricks-iris template to work with kedro-databricks #227
align databricks-iris template to work with kedro-databricks #227
Conversation
d903972
to
7c101af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comment. Thanks for creating the plugin I think this is a huge improvement over the manual step deployment guide.
I have left some comment but I suggest separating necessary changes vs README.md, at this stage we are not ready to mention a third-party plugin in our official docs. You can actually create your own custom starter and register it in your plugin so that it can be created with kedro new -s
. See the documentations
...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py
Outdated
Show resolved
Hide resolved
databricks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipeline.py
Outdated
Show resolved
Hide resolved
Does that mean including the plugin |
Signed-off-by: Jens Peder Meldgaard <[email protected]>
Signed-off-by: Jens Peder Meldgaard <[email protected]>
Signed-off-by: Jens Peder Meldgaard <[email protected]>
306f00b
to
c3a5a08
Compare
Hi @noklam, Thank you for the thorough review! I have made changes according to your comments - I see I was a bit overenthusiastic in regards to I have removed all mentions of it now, and simply ensure that the starter would work out of the box with the plugin instead. Steps to deploy using the plugin is now:
This is much less invasive than before. I also made some changes to the plugin, specifically regarding logging to make this more informative and closer aligned with other methods. Please let me know what you think! :) |
c3a5a08
to
6cfdebd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, I think we are getting closed, added a few comments, I missed something in the last round.
...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py
Outdated
Show resolved
Hide resolved
databricks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/pipeline.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Jens Peder Meldgaard <[email protected]>
6cfdebd
to
9fcd6c3
Compare
Signed-off-by: Jens Peder Meldgaard <[email protected]>
FYI @noklam I implemented the changes that you suggested. You're right, no need for the I have just tried the steps from my previous comment with this implementation and everything works as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one minor styling comment, thanks for making this change! Exciting to see the datarbricks plugin, great work!
...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @JenspederM appreciate the effort. My comments are more for wider discussion not a firm recommendation!
@@ -2,7 +2,7 @@ ipython>=8.10 | |||
jupyterlab>=3.0 | |||
notebook | |||
kedro~={{ cookiecutter.kedro_version }} | |||
kedro-datasets[spark.SparkDataset, pandas.ParquetDataset]>=1.0 | |||
kedro-datasets[spark, pandas, spark.SparkDataset, pandas.ParquetDataset]>=1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an argument we should encourage the use of databricks.ManagedTableDataSet (even if it's commented out in the catalog) since it highlights Kedro's compatibility with Unity Catalog + Delta lake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree 👆
But perhaps better saved for another PR? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely, this is a really important and frankly overdue piece of work and I'm just thinking about how to take it further, later!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, +1 on this!
...ks-iris/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/databricks_run.py
Show resolved
Hide resolved
@datajoely, do you regard your comments as things that should be included in this PR or is it more general discussion for a future PR? 😊 I would like to merge this as I can then make an announcement for the plugin without having to specify a branch for the starter 😊 |
@JenspederM I see the DCO checking is failing, can you try to fix it? If you click into the "Details" button it shows you the instruction to fix it. |
Signed-off-by: Jens Peder Meldgaard <[email protected]>
042e734
to
159ec3e
Compare
@noklam I just accepted your change here - apparently that doesn't sign.. Should be fixed now :) |
@datajoely, I see you resolved your comments, does that mean you approve of the PR? :) |
@noklam @datajoely any idea why tests aren't running? |
I trigger it now, it's a setting thing. CI won't run for first time contributor. |
Ah okay. Thank you! |
@noklam @datajoely Are we ready to merge? :) |
@JenspederM congratulations for your first PR! |
Motivation and Context
This PR is made to align the
databricks-iris
starter to thekedro-databricks
plugin.With these changes, creating a new Kedro project on Databricks should be as easy as:
kedro new --starter="databricks-iris"
python -m venv .venv && source ./.venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt
kedro databricks init
kedro databricks bundle
kedro databricks deploy
How has this been tested?
The above has been tested against a private Databricks workspace.
Checklist