Using paths relative to `project_root` in dataset implementation #3149

jasonmhite · 2023-10-09T18:58:07Z

Description

I have a custom dataset implementation that is supposed to fetch some data into a cache folder that you specify in the dataset configuration. If the data isn't in the folder, it fetches it, then loads from disk. Simple enough.

I want to be able to specify a path relative to the project root e.g. data/01_raw, however I can't figure out how to access the project_root directory at runtime. Things work fine if I run from the cli since it starts in the project root, but if I manipulate the catalog in say kedro jupyter lab and then make a notebook in say notebooks/my_notebook.ipynb, working in that notebook my working directory will be notebooks. Hence if I load my dataset from the catalog it will resolve the cache folder to notebooks/data/01_raw and redownload all of my datasets.

Best I can figure, however, there is not a good way to get the project_root in a dataset implementation as you need to know the project_root folder to instantiate config/context.

Context

This could be worked around by using an absolute path, but I want to be able to redistribute my project to share with other users. Being able to specify a path that is always interpreted as relative to project_root would be helpful.

Possible Implementation

Expose project_root in a way that can be accessed from a dataset implementation. Maybe there is already a way, but I can't seem to work it out.

Possible Alternatives

Workaround is to specify an absolute path but then users need to remember to fix the path when they clone my project.

I suppose in my notebooks I could have something like a %cd context.project_root since I would have access to it, but that seems like not a great solution.

Thanks for your help as always.

The text was updated successfully, but these errors were encountered:

astrojuanlu · 2023-10-09T19:51:38Z

Hi @jasonmhite , thanks for flagging this. We're having a discussion related to this in #2965 but the solution is still not clear. Could you have a look and tell us how it relates to your feature request?

jasonmhite added the Issue: Feature Request New feature or improvement to existing feature label Oct 9, 2023

jasonmhite mentioned this issue Oct 10, 2023

Add ability to Specify a "root" for DataCatalog.from_config #2965

Closed

github-actions bot mentioned this issue Nov 1, 2023

Monthly issue metrics report #3256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using paths relative to `project_root` in dataset implementation #3149

Using paths relative to `project_root` in dataset implementation #3149

jasonmhite commented Oct 9, 2023 •

edited

Loading

astrojuanlu commented Oct 9, 2023

Using paths relative to project_root in dataset implementation #3149

Using paths relative to project_root in dataset implementation #3149

Comments

jasonmhite commented Oct 9, 2023 • edited Loading

Description

Context

Possible Implementation

Possible Alternatives

astrojuanlu commented Oct 9, 2023

Using paths relative to `project_root` in dataset implementation #3149

Using paths relative to `project_root` in dataset implementation #3149

jasonmhite commented Oct 9, 2023 •

edited

Loading