Skip to content

ability to merge croissant metadata files #789

@csbrown

Description

@csbrown

Backward compatibility with existing formats creates incentive to manage some number of "standard" Croissant metadata types. For example, building a COCO-like Croissant manually for every COCO-format dataset out in the wild is wasteful, when exactly one could serve all of these data.

However, Croissant is built with extensibility in mind. I.e. the point of using Croissant instead of COCO is that we can add new fields that aren't in the original COCO spec. Unfortunately, this re-raises the problem noted above: The single COCO-Croissant is no longer sufficient, and a proliferation of Croissants that are "mostly COCO" but with minor changes is inevitable. What would make this much cleaner is the ability to simply MERGE a single universal COCO-Croissant with another Croissant containing any minor extensions or deviations. Even the ability to do this strictly internally would be very helpful: Viz. I maintain a single "COCO extension" metadata, and then when I publish my data I perform a MERGE operation with the universal COCO Croissant to create the correct Croissant for my particular dataset.

I am willing and able to put some effort toward this, but am relatively new here. Does anyone have pointers on where to start delving into the code and specification to start on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Maybe

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions