-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] support different webdataset format #10
Comments
To clarify: Using this feature, the following would be possible:
i.e. in the respective folders, the same shards with the respective same sample key must be present. The features (here This would allow e.g. precomputing features, or changing the ground truth without modifying the input data (in this case the image). A difficulty might be the wizard for preparing datasets. That wizard must allow combining the shards. Implementation thought:
|
hi @lvoegtle what you described is accurate! fwiw, in webdataset, your idea of "primary" folder is supported through https://webdataset.github.io/webdataset/column-store/#using-webdataset-as-a-column-store |
to my understanding, the energon compatible format needs everything in the same folder like this
however, users might have different dataset format, for example, images live in one folder while labels live in another.
We need to be able to customize dataset format definition. e.g., a user defined function for mapping data to dictionary.
The text was updated successfully, but these errors were encountered: