Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ability to merge croissant metadata files #789

Open
csbrown opened this issue Dec 30, 2024 · 0 comments
Open

ability to merge croissant metadata files #789

csbrown opened this issue Dec 30, 2024 · 0 comments

Comments

@csbrown
Copy link

csbrown commented Dec 30, 2024

Backward compatibility with existing formats creates incentive to manage some number of "standard" Croissant metadata types. For example, building a COCO-like Croissant manually for every COCO-format dataset out in the wild is wasteful, when exactly one could serve all of these data.

However, Croissant is built with extensibility in mind. I.e. the point of using Croissant instead of COCO is that we can add new fields that aren't in the original COCO spec. Unfortunately, this re-raises the problem noted above: The single COCO-Croissant is no longer sufficient, and a proliferation of Croissants that are "mostly COCO" but with minor changes is inevitable. What would make this much cleaner is the ability to simply MERGE a single universal COCO-Croissant with another Croissant containing any minor extensions or deviations. Even the ability to do this strictly internally would be very helpful: Viz. I maintain a single "COCO extension" metadata, and then when I publish my data I perform a MERGE operation with the universal COCO Croissant to create the correct Croissant for my particular dataset.

I am willing and able to put some effort toward this, but am relatively new here. Does anyone have pointers on where to start delving into the code and specification to start on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@csbrown and others