Skip to content

Commit

Permalink
Merge branch 'main' into chore/remove-session-args
Browse files Browse the repository at this point in the history
  • Loading branch information
ravi-kumar-pilla authored Jul 31, 2024
2 parents e8a1f56 + a0fbc12 commit 4d53672
Show file tree
Hide file tree
Showing 12 changed files with 79 additions and 50 deletions.
27 changes: 27 additions & 0 deletions databricks-iris/{{ cookiecutter.repo_name }}/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,33 @@ This is your new Kedro project, which was generated using `kedro {{ cookiecutter

Take a look at the [Kedro documentation](https://docs.kedro.org) to get started.

## Getting Started

To create a project based on this starter, ensure you have installed Kedro into a virtual environment. Then use the following command:

```sh
pip install kedro
kedro new --starter=databricks-iris
```

After the project is created, navigate to the newly created project directory:

```sh
cd <my-project-name> # change directory
```

Install the required dependencies:

```sh
pip install -r requirements.txt
```

Now you can run the project:

```sh
kedro run
```

## Rules and guidelines

In order to get the best out of the template:
Expand Down
29 changes: 13 additions & 16 deletions databricks-iris/{{ cookiecutter.repo_name }}/conf/base/catalog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@

example_iris_data:
type: spark.SparkDataset
filepath: /dbfs/FileStore/iris-databricks/data/01_raw/iris.csv
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/01_raw/iris.csv
file_format: csv
load_args:
header: True
Expand All @@ -56,48 +56,45 @@ example_iris_data:
# for all SparkDatasets.
X_train@pyspark:
type: spark.SparkDataset
filepath: /dbfs/FileStore/iris-databricks/data/02_intermediate/X_train.parquet
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/02_intermediate/X_train.parquet
save_args:
mode: overwrite

X_train@pandas:
type: pandas.ParquetDataset
filepath: /dbfs/FileStore/iris-databricks/data/02_intermediate/X_train.parquet
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/02_intermediate/X_train.parquet

X_test@pyspark:
type: spark.SparkDataset
filepath: /dbfs/FileStore/iris-databricks/data/02_intermediate/X_test.parquet
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/02_intermediate/X_test.parquet
save_args:
mode: overwrite

X_test@pandas:
type: pandas.ParquetDataset
filepath: /dbfs/FileStore/iris-databricks/data/02_intermediate/X_test.parquet
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/02_intermediate/X_test.parquet

y_train@pyspark:
type: spark.SparkDataset
filepath: /dbfs/FileStore/iris-databricks/data/02_intermediate/y_train.parquet
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/02_intermediate/y_train.parquet
save_args:
mode: overwrite

y_train@pandas:
type: pandas.ParquetDataset
filepath: /dbfs/FileStore/iris-databricks/data/02_intermediate/y_train.parquet
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/02_intermediate/y_train.parquet

y_test@pyspark:
type: spark.SparkDataset
filepath: /dbfs/FileStore/iris-databricks/data/02_intermediate/y_test.parquet
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/02_intermediate/y_test.parquet
save_args:
mode: overwrite

y_test@pandas:
type: pandas.ParquetDataset
filepath: /dbfs/FileStore/iris-databricks/data/02_intermediate/y_test.parquet
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/02_intermediate/y_test.parquet

y_pred:
type: pandas.ParquetDataset
filepath: /dbfs/FileStore/{{ cookiecutter.python_package }}/data/03_primary/y_pred.parquet

# This is an example how to use `MemoryDataset` with Spark objects that aren't `DataFrame`'s.
# In particular, the `assign` copy mode ensures that the `MemoryDataset` will be assigned
# the Spark object itself, not a deepcopy version of it, since deepcopy doesn't work with
# Spark object generally.
example_classifier:
type: MemoryDataset
copy_mode: assign
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ ipython>=8.10
jupyterlab>=3.0
notebook
kedro~={{ cookiecutter.kedro_version }}
kedro-datasets[spark.SparkDataset, pandas.ParquetDataset]>=1.0
kedro-datasets[spark, pandas, spark.SparkDataset, pandas.ParquetDataset]>=1.0
kedro-telemetry>=0.3.1
numpy~=1.21
pytest-cov~=3.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,24 @@ def main():
parser.add_argument("--env", dest="env", type=str)
parser.add_argument("--conf-source", dest="conf_source", type=str)
parser.add_argument("--package-name", dest="package_name", type=str)
parser.add_argument("--nodes", dest="nodes", type=str)

args = parser.parse_args()
env = args.env
conf_source = args.conf_source
package_name = args.package_name
nodes = [node.strip() for node in args.nodes.split(",")]

# https://kb.databricks.com/notebooks/cmd-c-on-object-id-p0.html
logging.getLogger("py4j.java_gateway").setLevel(logging.ERROR)
logging.getLogger("py4j.py4j.clientserver").setLevel(logging.ERROR)

configure_project(package_name)
with KedroSession.create(env=env, conf_source=conf_source) as session:
session.run()
if not nodes:
session.run()
else:
session.run(node_names=nodes)


if __name__ == "__main__":
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = "en"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ dynamic = ["dependencies", "version"]

[project.optional-dependencies]
docs = [
"docutils<0.18.0",
"sphinx~=3.4.3",
"sphinx_rtd_theme==0.5.1",
"docutils<0.21",
"sphinx>=5.3,<7.3",
"sphinx_rtd_theme==2.0.0",
"nbsphinx==0.8.1",
"sphinx-autodoc-typehints==1.11.1",
"sphinx_copybutton==0.3.1",
"sphinx-autodoc-typehints==1.20.2",
"sphinx_copybutton==0.5.2",
"ipykernel>=5.3, <7.0",
"Jinja2<3.1.0",
"myst-parser~=0.17.2",
"Jinja2<3.2.0",
"myst-parser>=1.0,<2.1"
]

[tool.setuptools.dynamic]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = "en"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down
14 changes: 7 additions & 7 deletions spaceflights-pandas/{{ cookiecutter.repo_name }}/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ dynamic = ["dependencies", "version"]

[project.optional-dependencies]
docs = [
"docutils<0.18.0",
"sphinx~=3.4.3",
"sphinx_rtd_theme==0.5.1",
"docutils<0.21",
"sphinx>=5.3,<7.3",
"sphinx_rtd_theme==2.0.0",
"nbsphinx==0.8.1",
"sphinx-autodoc-typehints==1.11.1",
"sphinx_copybutton==0.3.1",
"sphinx-autodoc-typehints==1.20.2",
"sphinx_copybutton==0.5.2",
"ipykernel>=5.3, <7.0",
"Jinja2<3.1.0",
"myst-parser~=0.17.2",
"Jinja2<3.2.0",
"myst-parser>=1.0,<2.1"
]

[tool.setuptools.dynamic]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = "en"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ dynamic = ["dependencies", "version"]

[project.optional-dependencies]
docs = [
"docutils<0.18.0",
"sphinx~=3.4.3",
"sphinx_rtd_theme==0.5.1",
"docutils<0.21",
"sphinx>=5.3,<7.3",
"sphinx_rtd_theme==2.0.0",
"nbsphinx==0.8.1",
"sphinx-autodoc-typehints==1.11.1",
"sphinx_copybutton==0.3.1",
"sphinx-autodoc-typehints==1.20.2",
"sphinx_copybutton==0.5.2",
"ipykernel>=5.3, <7.0",
"Jinja2<3.1.0",
"myst-parser~=0.17.2",
"Jinja2<3.2.0",
"myst-parser>=1.0,<2.1"
]

[tool.setuptools.dynamic]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = "en"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down
14 changes: 7 additions & 7 deletions spaceflights-pyspark/{{ cookiecutter.repo_name }}/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ dynamic = ["dependencies", "version"]

[project.optional-dependencies]
docs = [
"docutils<0.18.0",
"sphinx~=3.4.3",
"sphinx_rtd_theme==0.5.1",
"docutils<0.21",
"sphinx>=5.3,<7.3",
"sphinx_rtd_theme==2.0.0",
"nbsphinx==0.8.1",
"sphinx-autodoc-typehints==1.11.1",
"sphinx_copybutton==0.3.1",
"sphinx-autodoc-typehints==1.20.2",
"sphinx_copybutton==0.5.2",
"ipykernel>=5.3, <7.0",
"Jinja2<3.1.0",
"myst-parser~=0.17.2",
"Jinja2<3.2.0",
"myst-parser>=1.0,<2.1"
]

[tool.setuptools.dynamic]
Expand Down

0 comments on commit 4d53672

Please sign in to comment.