Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Croissant support 🥐 #10341

Closed
pdurbin opened this issue Feb 26, 2024 · 1 comment · Fixed by #10533
Closed

Croissant support 🥐 #10341

pdurbin opened this issue Feb 26, 2024 · 1 comment · Fixed by #10533
Labels
Feature: Metadata NIH CAFE Issues related to and/or funded by the NIH CAFE project Size: 50 A percentage of a sprint. 35 hours.
Milestone

Comments

@pdurbin
Copy link
Member

pdurbin commented Feb 26, 2024

We plan to implement support for the Croissant format (video).

The code is implemented as an external exporter:

Originally, we planned to use use https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UTY03A about earthquakes to experiment with the format. This dataset also appears in Kaggle at https://www.kaggle.com/datasets/gustavobmgm/earthquakes-for-ml-prediction . We planned to create the dataset using our dataverse_json.json (native JSON) format. Our initial target, provided by the Kaggle team exported from their dataset on the Kaggle side, looks like kaggle-1.0.json available from https://www.kaggle.com/datasets/gustavobmgm/earthquakes-for-ml-prediction . However, that dataset on the Dataverse side is just a CSV with no additional metadata. Instead we have switched to a dataset about cars provided by Stata as an example. See gdcc/dataverse-exporters#4 and I've deployed my code so far to https://dev3.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/DZRHUP

Kaggle has advised that we keep an eye on these.

For additional context of how the Croissant project plans to index our Croissant-enabled datasets:

Other related Croissant issues:

Discussion:

@pdurbin pdurbin added Feature: Metadata Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) labels Feb 26, 2024
@pdurbin pdurbin self-assigned this Feb 26, 2024
@pdurbin pdurbin moved this to In Progress 💻 in IQSS Dataverse Project Feb 26, 2024
@pdurbin pdurbin added the NIH CAFE Issues related to and/or funded by the NIH CAFE project label Feb 26, 2024
pdurbin added a commit to gdcc/dataverse-exporters that referenced this issue Feb 26, 2024
pdurbin added a commit that referenced this issue Apr 29, 2024
pdurbin added a commit that referenced this issue May 1, 2024
@cmbz cmbz added Size: 50 A percentage of a sprint. 35 hours. and removed Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) labels May 8, 2024
pdurbin added a commit that referenced this issue May 21, 2024
pdurbin added a commit that referenced this issue May 21, 2024
pdurbin added a commit that referenced this issue May 28, 2024
@pdurbin
Copy link
Member Author

pdurbin commented May 28, 2024

I'm removing this issue from the board now that I've removed "draft" from both of these PRs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata NIH CAFE Issues related to and/or funded by the NIH CAFE project Size: 50 A percentage of a sprint. 35 hours.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants