Skip to content

This is the repository for Data Readiness Cluster to maintain the AI-ready data checklist.

Notifications You must be signed in to change notification settings

ESIPFed/data-readiness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

ESIP Data Readiness Cluster

This is the repository for the Data Readiness Cluster to maintain the AI-ready data checklist. The cluster is a community-driven group focusing on developing recommendations and community standards on AI-ready open environmental data. Although the work currently focuses on environmental data, the product could be extended to data from other domains.

The goal of AI-readiness assessment:

  • For data producers/providers, the purpose is to understand to what extent the data being assessed meets the common research data management practices and principles that are relevant to AI/ML application development. The assessment result can be used to justify targeted improvements for the dataset when resources become available.
  • For projects generating new datasets, the purpose of the AI-readiness checklist can be used to guide the development of the dataset. For example:
    • What documentation do you want to provide accompanying the dataset?
    • Do you have a proper data quality assessment that will make the development of downstream AI/ML applications efficient?

How to use the checklist for assessment:

The current version of the checklist is available here (last updated 2023-12-20). The checklist will be maintained and updated by the community.

To assist with the assessment, we have created a fillable Google sheet template. You can make a copy of the Google sheet for your assessment. Each dataset should be assessed separately as the checklist is designed for individual datasets. More effort is ongoing to address the need for linked datasets.

If you are in the early stages of developing AI/ML applications with open environmental datasets. We encourage you to assess the input data used for your applications. Although you may not have the ability to change other people’s datasets, this will help you document the effort spent on preparing the dataset for your development.

How to provide your feedback:

If you have any questions or suggestions related to the checklist and the assessment tool, you can provide feedback following the two options listed below:

  • Contact Douglas Rao ([email protected]), cluster chair
  • Open an issue in this GitHub repo.

How to cite the checklist:

ESIP Data Readiness Cluster (2023). Checklist to Examine AI-readiness for Open Environmental Datasets. Version 1.0. Earth Science Information Partners. https://github.com/ESIPFed/data-readiness [date accessed].

Relevant references:

About

This is the repository for Data Readiness Cluster to maintain the AI-ready data checklist.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published