Using paths relative to project_root
in dataset implementation
#3149
Labels
Issue: Feature Request
New feature or improvement to existing feature
Description
I have a custom dataset implementation that is supposed to fetch some data into a cache folder that you specify in the dataset configuration. If the data isn't in the folder, it fetches it, then loads from disk. Simple enough.
I want to be able to specify a path relative to the project root e.g.
data/01_raw
, however I can't figure out how to access theproject_root
directory at runtime. Things work fine if I run from the cli since it starts in the project root, but if I manipulate thecatalog
in saykedro jupyter lab
and then make a notebook in saynotebooks/my_notebook.ipynb
, working in that notebook my working directory will benotebooks
. Hence if I load my dataset from the catalog it will resolve the cache folder tonotebooks/data/01_raw
and redownload all of my datasets.Best I can figure, however, there is not a good way to get the
project_root
in a dataset implementation as you need to know theproject_root
folder to instantiate config/context.Context
This could be worked around by using an absolute path, but I want to be able to redistribute my project to share with other users. Being able to specify a path that is always interpreted as relative to project_root would be helpful.
Possible Implementation
Expose
project_root
in a way that can be accessed from a dataset implementation. Maybe there is already a way, but I can't seem to work it out.Possible Alternatives
Workaround is to specify an absolute path but then users need to remember to fix the path when they clone my project.
I suppose in my notebooks I could have something like a
%cd context.project_root
since I would have access to it, but that seems like not a great solution.Thanks for your help as always.
The text was updated successfully, but these errors were encountered: