-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Croissant support 🥐 #10341
Labels
Feature: Metadata
NIH CAFE
Issues related to and/or funded by the NIH CAFE project
Size: 50
A percentage of a sprint. 35 hours.
Milestone
Comments
pdurbin
added
Feature: Metadata
Size: 30
A percentage of a sprint. 21 hours. (formerly size:33)
labels
Feb 26, 2024
pdurbin
added
the
NIH CAFE
Issues related to and/or funded by the NIH CAFE project
label
Feb 26, 2024
pdurbin
added a commit
to gdcc/dataverse-exporters
that referenced
this issue
Feb 26, 2024
pdurbin
added a commit
to IQSS/dataverse-sample-data
that referenced
this issue
Feb 27, 2024
12 tasks
pdurbin
added a commit
that referenced
this issue
Apr 26, 2024
pdurbin
added a commit
that referenced
this issue
Apr 30, 2024
pdurbin
added a commit
that referenced
this issue
Apr 30, 2024
pdurbin
added a commit
that referenced
this issue
Apr 30, 2024
cmbz
added
Size: 50
A percentage of a sprint. 35 hours.
and removed
Size: 30
A percentage of a sprint. 21 hours. (formerly size:33)
labels
May 8, 2024
pdurbin
added a commit
that referenced
this issue
May 21, 2024
pdurbin
added a commit
that referenced
this issue
May 28, 2024
pdurbin
added a commit
that referenced
this issue
May 28, 2024
I'm removing this issue from the board now that I've removed "draft" from both of these PRs: |
pdurbin
added a commit
that referenced
this issue
May 28, 2024
pdurbin
added a commit
that referenced
this issue
May 29, 2024
pdurbin
added a commit
that referenced
this issue
Jun 4, 2024
pdurbin
added a commit
that referenced
this issue
Jul 9, 2024
pdurbin
added a commit
that referenced
this issue
Jul 9, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Feature: Metadata
NIH CAFE
Issues related to and/or funded by the NIH CAFE project
Size: 50
A percentage of a sprint. 35 hours.
We plan to implement support for the Croissant format (video).
The code is implemented as an external exporter:
Originally, we planned to use use https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UTY03A about earthquakes to experiment with the format. This dataset also appears in Kaggle at https://www.kaggle.com/datasets/gustavobmgm/earthquakes-for-ml-prediction . We planned to create the dataset using our dataverse_json.json (native JSON) format. Our initial target, provided by the Kaggle team exported from their dataset on the Kaggle side, looks like kaggle-1.0.json available from https://www.kaggle.com/datasets/gustavobmgm/earthquakes-for-ml-prediction . However, that dataset on the Dataverse side is just a CSV with no additional metadata. Instead we have switched to a dataset about cars provided by Stata as an example. See gdcc/dataverse-exporters#4 and I've deployed my code so far to https://dev3.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/DZRHUP
Kaggle has advised that we keep an eye on these.
name
mlcommons/croissant#449For additional context of how the Croissant project plans to index our Croissant-enabled datasets:
Other related Croissant issues:
Discussion:
The text was updated successfully, but these errors were encountered: