From 38e81fba49702d0c8851ad4dcd488bafd08f1a36 Mon Sep 17 00:00:00 2001 From: Jo Stichbury Date: Wed, 11 Oct 2023 18:15:39 +0100 Subject: [PATCH] Update README for standalone-datacatalog (#156) * Update README for standalone-datacatalog Signed-off-by: Jo Stichbury * Update README.md DataSet -> Dataset change --------- Signed-off-by: Jo Stichbury --- standalone-datacatalog/README.md | 40 +++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/standalone-datacatalog/README.md b/standalone-datacatalog/README.md index 2bbe909d..2c91eb04 100644 --- a/standalone-datacatalog/README.md +++ b/standalone-datacatalog/README.md @@ -1,3 +1,41 @@ # The `standalone-datacatalog` Kedro starter -For more information, see the [Kedro documentation about this starter](https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_as_a_data_registry.html). +This starter, formerly known as `mini-kedro`, sets up a lightweight Kedro project that uses the Kedro [Data Catalog](https://docs.kedro.org/en/stable/data/index.html) as a registry for data without using any of the other features of Kedro. + +The starter comprises a minimal setup to use the traditional [Iris dataset](https://www.kaggle.com/uciml/iris). + +## Usage + +To create a new project based on this starter: + +```bash +kedro new --starter=standalone-datacatalog +``` + +You can call the project any name you choose. When created, the project contains the following: + +* A `conf` directory, which contains an example `DataCatalog` configuration (`catalog.yml`): + + ```yaml +# conf/base/catalog.yml +example_dataset_1: + type: pandas.CSVDataset + filepath: folder/filepath.csv + +example_dataset_2: + type: spark.SparkDataset + filepath: s3a://your_bucket/data/01_raw/example_dataset_2* + credentials: dev_s3 + file_format: csv + save_args: + if_exists: replace +``` + +* A `data` directory, which contains an example dataset identical to the one used by the [`pandas-iris`](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris) starter + +* An example Jupyter notebook, which shows how to instantiate the `DataCatalog` and interact with the example dataset: + +```python +df = catalog.load("example_dataset_1") +df_2 = catalog.save("example_dataset_2") +```