Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOE OSTI DOIs for input4MIPs #177

Open
durack1 opened this issue Jan 15, 2025 · 48 comments
Open

DOE OSTI DOIs for input4MIPs #177

durack1 opened this issue Jan 15, 2025 · 48 comments

Comments

@durack1
Copy link
Contributor

durack1 commented Jan 15, 2025

Just adding a placeholder issue, so we can centralize information about what the DOI OSTI service requires from authors to get a DOI issued.

We can then update the source_id and institution_id registration info, with the additional fields

ping @jitendra-kumar @sashakames

@durack1
Copy link
Contributor Author

durack1 commented Jan 16, 2025

FYI to self, 210 DOIs were issued by the CMIP6-era citation service for input4MIPs, so maybe we need to bump up the ~100 number we've discussed - see https://www.wdc-climate.de/ui/statistics?type=cmip6_doi_registration.

Also relevant is the CMIP6 Data Citation and Long-Term Archival wiki - https://redmine.dkrz.de/projects/cmip6-lta-and-data-citation/wiki

@durack1
Copy link
Contributor Author

durack1 commented Jan 28, 2025

Hi @jitendra-kumar. Just circling on this task, is there any progress to report? We have a project meeting tomorrow, so I was keen to update the data providers about the status and timings

@jitendra-kumar
Copy link

@durack1

Here's a summary of fields we need information for to register with OSTI. Many (but not all) of these information exist within the JSONs in this repo and we can pull the information together from the existing JSONs, and create a new JSON with all the information needed to register DOI for each dataset.

Product Description:

  • Dataset Title
  • Authors/Contributors
    • First/Last name
    • Email
    • ORCID
    • Affiliation
  • Related DOIs (if any) -- for cross-referencing [OPTIONAL]
  • Originating Research Organization
  • Publication Date
  • Sponsoring Organization
  • Keywords:
  • Geolocation -- [WE CAN ADD THIS IF ALL DATA ARE EXPECTED TO BE GLOBAL]
  • Dataset Description/Abstract

Dataset Location:

  • Landing page URL
  • Dataset file extension. [OPTIONAL -- will be .nc in most/all cases]
  • Dataset size

@durack1
Copy link
Contributor Author

durack1 commented Jan 29, 2025

@jitendra-kumar, that's great. What is the best/easiest format for this info to be collated, considering this first pass is going to be manual copy-and-paste — text files or another format?

@jitendra-kumar
Copy link

We should put the information together in a JSON, and that would allow us to automate the process at the later date. And even for the short term I can extract everything needed from that for manual entry.

@sashakames
Copy link

We will need to create .html landing pages. The .json could be used to render those. We would need then to put together a template. Then push those pages to gh-pages. This could be done with Github Actions.

@znichollscr
Copy link
Collaborator

znichollscr commented Jan 29, 2025

Do you have any ideas for the schema @jitendra-kumar ? E.g. do certain fields need to be strings/boolean/lists etc.? I think that is the key. Once we have the schema, writing data to match it is relatively trivial.

Even just something like the below

Schema proposal
from attrs import define

@define
class Author:
    first_name: str
    last_name: str
    orcid: str  # would we also validate this, probably a good idea if easy
    affiliation: str
    affiliation_ror: str | None  # optional for anyone whose institute isn't registered
    

@define
class Product:
    dataset_title: str
    authors: list[Author]
    related_dois: list[str]  # should validate that these are DOIs
    originating_research_organisation: str  # I find this field a bit weird, given most things have multiple authors therefore source organisations and the authors have affiliations anyway
    publication_date: str  # YYYY-MM-DD I guess?
    sponsoring_organisation: str  # as above re needing multiple and info already being in author info. Also unclear to me what the difference from the other orgs is so I would suggest making this optional if we can
    keywords: list[str]
    geolocation: tuple[float, float]  # what do we put here? Lat/lon co-ords? Would suggest making this optional or dropping if we can
    description: str
    dataset_location: tuple[float, float]  # what do we put here? Lat/lon co-ords? Would suggest making this optional or dropping if we can. Or do I misunderstand this field?

@define
class Dataset:
    url: str  # validate this is a URL
    extension: str
    size: float  # in bytes I guess?

@jitendra-kumar
Copy link

@znichollscr working on this schema to be consistent with what OSTI wants. Will have something to share soon.

@durack1
Copy link
Contributor Author

durack1 commented Feb 19, 2025

@jitendra-kumar just FYI, our first CMIP7 final dataset has just been published with no DOI (see here).. So if we have a very quick solution to implement now is the time!

@jitendra-kumar
Copy link

jitendra-kumar commented Feb 23, 2025

@durack1 I looked at the three published datasets and can get the published metadata from the ESGF catalog. I am also able to extract most of the fields necessary to register the DOI from the input4MIPs CVs (in this repository). However, the published datasets are not reflect in the main branch of this raster, can the appropriate branches/forks with the CMIP7 mip_era be merged in the main.

Two outstanding attributes that's needed --

  1. "Title" of the dataset. CVs/input4MIPs_source_id.json does not contain a title. Reviewing past input4MIPs datasets, there is no consistency in the title formats and it would be good to follow some scheme.
  • If a title field can be added to the CVs/input4MIPs_source_id.json.
  • The dataset title from ESGF index, for ex. input4MIPs.CMIP7.CMIP.SOLARIS-HEPPA.SOLARIS-HEPPA-CMIP-4-6.atmos.day.multiple.gn can potentially be used as the dataset title. I can recreate this from CVs as well ofcourse.
  1. "Authors": CVs/input4MIPs_source_id.json is inconsistent with the contact attribute with some datasets containing "Firstname LastName (email)" format while some have just email for ex. all SOLARIS-HEPPA-CMIP-4-* datasets. Will be good to have contact field follow a format such as:
{
"contact" : [
    {
      "name": "First Author",
      "email": "[email protected]",
      "affiliation": "First Author Organization",
      "orcid": "ORCID",
    },
    {
      "name": "Second Author",
      "email": "[email protected]",
      "affiliation": "Second Author Organization",
      "orcid": "ORCID",
    },
    {
      "name": "Third Author",
      "email": "[email protected]",
      "affiliation": "Third Author Organization"
      "orcid": "ORCID",
    }
  ]
}

Or add affiliation and ORCID to the current format. ORCID can be an optional field:

{
"contact" : "Firstname Lastname (Affiliation | email | ORCID); Firstname Lastname (Affiliation | email | ORCID); "
}

Need something that I can parse..

So if the CVs can be updated for title and author fields, I will be able to get us a DOI very quick.

Also, as a quick fix for these three datasets published so far, you can send me Title and Authors information directly via email and I can register DOIs for them and add them as a new doi attribute to to CVs/input4MIPs_source_id.json via a PR.

@climate-dude

@jitendra-kumar
Copy link

jitendra-kumar commented Feb 24, 2025

We also need dataset description, or a way to assemble them from CVs.

This was the description in the past input4MIPs. @durack1 can you provide an updated description to use.

Ex. from https://www.wdc-climate.de/ui/cmip6?input=input4MIPs.CMIP6.RFMIP.UColorado.UColorado-RFMIP-0-4

CMIP6 Forcing Datasets (input4MIPs).
These data include all datasets published for 'input4MIPs.CMIP6.RFMIP.UColorado.UColorado-RFMIP-0-4' with the full Data Reference Syntax following the template 'activity_id.mip_era.target_mip.institution_id.source_id.realm.frequency.variable_id.grid_label'.

The model UColorado-RFMIP-0-4 (UColorado-RFMIP-0-4) was run by the UColorado (UColorado) in native nominal resolutions: unknown.

Project: The forcing datasets (and boundary conditions) needed for CMIP6 experiments are being prepared by a number of different experts. Initially many of these datasets may only be available from those experts, but over time as part of the 'input4MIPs' activity most of them will be archived by PCMDI and served by the Earth System Grid Federation (https://esgf-node.llnl.gov/search/input4mips/ ). More information is available in the living document: http://goo.gl/r8up31 .

@znichollscr
Copy link
Collaborator

znichollscr commented Feb 24, 2025

However, the published datasets are not reflect in the main branch of this raster, can the appropriate branches/forks with the CMIP7 mip_era be merged in the main.

Done now. To set your expectations for future: in general, the lag is 1-2 days. In this case it was a week because I was on leave. We're on the fence about whether this should be a fully automated process or not (see e.g. prototype here, which we decided not to pursue right now #115).

Before continuing, a bit of background on the rationale for this, which will help explain the recommendations below.

Background/justification for the below

Some datasets are already minting their own DOIs (CEDS, GHGs, aerosol simple plumes). Hence, we can't create a workflow that only works if the DOI comes from this process (which is why the proposed quick fix isn't the way to go, we can't put a DOI in for each source ID as some providers mint DOIs at a higher level of resolution than source IDs).

This is a fallback, so I would keep it simple. Let's mint DOIs at the mip-era - source ID level. The combination of MIP era and source IDs should be unique, so this is a totally fine way that will let us keep track of things without having to mint heaps of DOIs. This then makes the title easy to create and easy for a human to read.

I would also keep the description short. We have lots of information already. Let's not duplicate it (which would then force us to think about how to update it). Rather, let's link back to existing information sources that we already have as much as possible.

Re the authors, I had wondered when our unstructured contacts would bite us. Looks like that day has come. I'll fix that now.

Recommendations

  • For the title: make it equal to "[mip_era].[source ID]" e.g. "CMIP7.SOLARIS-HEPPA-1-4-6" or "CMIP6Plus.CR-CMIP-GHG-1-3-0"
  • Authors: I will add a field to the source IDs so you have something to parse (working in Add author field #195)

Dataset description (I would start with this, and update once we know what information users who hit the DOI are actually interested in rather than spending lots of time trying to anticipate this in advance because I think we really don't know what context someone who lands on this DOI page will already have):

Datasets for input4MIPs provided with the source ID: [source-id]. These datasets are in the [dataset-category] category.

This is simply a DOI.
More information about the dataset can be found in the table/database available here: https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP6Plus.html.
If you are a modeller, these docs will likely also be helpful: https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/dataset-overviews/ (from here, you can look at the dataset(s) of interest to you).
The data itself is available from the ESGF at https://aims2.llnl.gov/search?project=input4MIPs&versionType=all&activeFacets=%7B%22source_id%22%3A%22[source-id]%22%7D.

@znichollscr znichollscr mentioned this issue Feb 24, 2025
2 tasks
@jitendra-kumar
Copy link

@znichollscr : where should I look for the dataset_category for any given source_id ?

@znichollscr
Copy link
Collaborator

Ah that's a good point: it's not a 1:1 mapping. You can either scrape it from the ESGF data, then just put a list of values. Honestly, I would probably just drop that sentence from the description.

@jitendra-kumar
Copy link

Also, is/will there a CMIP table/database, equivalent to CMIP6Plus URL in your example above, for CMIP7.
Expecting this URL for CMIP7: https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP7.html

@znichollscr
Copy link
Collaborator

Good pick up. I'll add one tomorrow.

@durack1
Copy link
Contributor Author

durack1 commented Feb 25, 2025

Also, is/will there a CMIP table/database, equivalent to CMIP6Plus URL in your example above, for CMIP7.
Expecting this URL for CMIP7: https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP7.html

@jitendra-kumar just circling on this, we already have one CMIP7 dataset published into ESGF, SOLARIS-HEPPA-CMIP-4-6, and so we've missed being able to assign a DOI within the published files.

We have volcanic, land use, greenhouse gas, anthropogenic emissions, biomass burning, and several other data that are being prepared right now, so what're the chances we can get ~10 DOIs issued now so these can be written to the files pre-publication?

@jitendra-kumar
Copy link

@znichollscr :

  • I think include dataset_category would be a useful thing, since it maps well to the table such as https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP6Plus.html. I can scrape that information from ESGF, but I would very much like to get information needed for the DOI registry from CVs (assuming we may do this before the data is published and there may not be anything in ESGF to scrape from).
  • can "activity_id" : "input4MIPs" be included in the source_id CVs
  • for the new datasets SOLARIS-HEPPA-CMIP-4-6 that have be published CVs have "mip_era : "CMIP6Plus", but it should be "mip_era" : "CMIP7"

@jitendra-kumar
Copy link

@durack1 : I need few additional information added/updated to the CVs in my last comment above. Else, send me the list of source_id that need DOIs minted and I should be able to do that pretty quick.

I have a DOI minted for SOLARIS-HEPPA-CMIP-4-6 but am waiting on replication/publication of the data to ORNL node. You can consider adding the DOI (10.25981/ESGF.input4MIPs.CMIP7/2522675 ) to the NETCDF file and republish the data.

@znichollscr
Copy link
Collaborator

znichollscr commented Feb 25, 2025

@jitendra-kumar thinking about this and seeing how it is evolving. Can you tell us what you need? (I.e. give me a schema to fill out, and I'll auto-generate it. Having you parse the CVs isn't going to be efficient I don't think). Including everything in the CVs is going to be a pain for us and likely not helpful. It's going to be much faster for me to just compile what you need than for us to pollute our CVs in the way we're going.

  • for the new datasets SOLARIS-HEPPA-CMIP-4-6 that have be published CVs have "mip_era : "CMIP6Plus", but it should be "mip_era" : "CMIP7"

Thanks, good pick up.

Else, send me the list of source_id that need DOIs minted and I should be able to do that pretty quick

Given we can update the metadata after the DOI is minted, can we just mint the DOIs and we'll update the metadata afterwards?

@jitendra-kumar
Copy link

You can auto-fill the schema for input4MIPs but we won't be able to do that with other projects. We need to be able to do this for projects other than input4MIPs and will be using CVs as our source of information. CVs already has most of what we need. We prefer to not have to hard code activity_id but I can. If dataset_category is not a part of your CVs, I will simply drop it. So we can do without those fields. Other than that we are good to go and nothing else is needed beyond what you already have.

For minting DOIs, yes we can update the metadata later, but need some minimal information (dataset name, authors -- which can be edited later) to populate the draft. As long as the dataset is in source_id CVs, I can get what I need. For datasets @durack1 noted above, I just need to know the source_id/dataset name that I can look for in input4MIPs_source_id.json.

@znichollscr
Copy link
Collaborator

You can auto-fill the schema for input4MIPs but we won't be able to do that with other projects. We need to be able to do this for projects other than input4MIPs and will be using CVs as our source of information

Let me make a demo. I think we're on the same page. My point is just more that, we don't need to have this all in the source ID CVs, we can grab it from a few places. Anyway, I'll do a demo then we can see.

For datasets @durack1 noted above, I just need to know the source_id/dataset name that I can look for in input4MIPs_source_id.json

Grab the following:

  • UOEXETER-CMIP-1-3-1
  • UofMD-landState-3-0
  • DRES-CMIP-BB4CMIP7-1-0

They're obviously old, but the information will be mostly correct and applicable for the next version. As long as we can update metadata, then they're the perfect starting point

@znichollscr
Copy link
Collaborator

I have a DOI minted for SOLARIS-HEPPA-CMIP-4-6 but am waiting on replication/publication of the data to ORNL node. You can consider adding the DOI (10.25981/ESGF.input4MIPs.CMIP7/2522675 ) to the NETCDF file and republish the data.

@jitendra-kumar one other question. Can you make this for SOLARIS-HEPPA-CMIP-4-7 ? We don't republish data as far as I understand (that can create complete confusion so we either publish something new or don't change the data).

@jitendra-kumar
Copy link

jitendra-kumar commented Feb 25, 2025

@sashakames @climate-dude may be able to comment on best way to retract/republish.

But yes I can change SOLARIS-HEPPA-CMIP-4-6 to SOLARIS-HEPPA-CMIP-4-7 in the DOI metadata which is still in draft form.

@znichollscr
Copy link
Collaborator

znichollscr commented Feb 25, 2025

@sashakames @climate-dude may be able to comment on best way to retract/republish

Thanks. We're comfortable with the process and work with Sasha a lot already. We just don't do it to avoid confusion. For input4MIPs, it's either: new version or nothing to avoid there being multiple versions of the same thing out there.

@znichollscr
Copy link
Collaborator

But yes I can change SOLARIS-HEPPA-CMIP-4-6 to SOLARIS-HEPPA-CMIP-4-7 in the DOI metadata which is still in draft form

Great to know, thanks

@znichollscr
Copy link
Collaborator

So, thinking about it more, this is the process I imagine we'll end up with:

  • start a new MR for a new source ID (e.g. update createDRS.py to catch issues (#37) #129
  • use that to mint a DOI in draft form
  • pass the DOI to the data creator so they can include it in their file
  • receive created files from the data provider
  • check that the DOI has been entered correctly
  • publish the files on ESGF
  • publish the DOI
  • celebrate

@sashakames
Copy link

SOLARIS-HEPPA-CMIP-4-7 ?

Is this new data or a replica of existing data? If the files do not change we don't want to cut a new source_id, then you would need to start over, or will that happen at LLNL?
We are still working on staging the data at ORNL for publication, have run into a permissions issue with the destination path.

@znichollscr
Copy link
Collaborator

znichollscr commented Feb 25, 2025

If the files do not change we don't want to cut a new source_id, then you would need to start over, or will that happen at LLNL?

Sorry @sashakames, shouldn't have pulled you in with no context. Anyway, now you're here: we would only cut a new source ID if we create new files. The question is whether we do that or not, and on that we haven't decided yet (is it worth creating new files, just to get a DOI attribute, we'll probably just ask the data provider, once we've worked out the various issues here).

@sashakames
Copy link

thanks, if there are new files, need to decide to do the original publish at ORNL vs LLNL. Also we may want to populate a citation_url with he landing pageand an xlink record with .json so Metagrid points at something.
A GH pages site would work great for hosting those.

@znichollscr
Copy link
Collaborator

Also we may want to populate a citation_url with he landing pageand an xlink record with .json so Metagrid points at something.

Sounds good, but I don't fully follow. Is this something that we do on the ESGF side or that you're suggesting we put in the file?

@znichollscr
Copy link
Collaborator

Let me make a demo

Trying to keep track of things, the demo is in #200

@durack1
Copy link
Contributor Author

durack1 commented Feb 25, 2025

  • for the new datasets SOLARIS-HEPPA-CMIP-4-6 that have be published CVs have "mip_era : "CMIP6Plus", but it should be "mip_era" : "CMIP7"

@jitendra-kumar sorry just catching up, the SOLARIS-HEPPA-CMIP-4-6 is our first mip_era=CMIP7 or "final" dataset that will be used in the production CMIP7 simulations, see

"SOLARIS-HEPPA-CMIP-4-6":{
"authors": [
{
"name": "Bernd Funke",
"email": "[email protected]",
"affiliations": [
"Instituto de Astrofísica de Andalucía, CSIC, Granada, Spain"
],
"orcid": "0000-0003-0462-4702"
}
],
"contact":"[email protected]",
"further_info_url":"http://solarisheppa.geomar.de/cmip7",
"institution_id":"SOLARIS-HEPPA",
"license_id":"CC BY 4.0",
"mip_era":"CMIP7",
"target_mip":"CMIP",
"source_version":"4.6"
},

In the ESGF SOLR index, and in the file:

$ ncdump -h input4MIPs/CMIP7/CMIP/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-6/atmos/mon/multiple/gn/v20250219/multiple_input4MIPs_solar_CMIP_SOLARIS-HEPPA-CMIP-4-6_gn_185001-202312.nc | grep mip_era
		:mip_era = "CMIP7" ;

Previous versions were prototype mip_era=CMIP6Plus data, e.g.,

"SOLARIS-HEPPA-CMIP-4-5":{
"authors": [
{
"name": "Bernd Funke",
"email": "[email protected]",
"affiliations": [
"Instituto de Astrofísica de Andalucía, CSIC, Granada, Spain"
],
"orcid": "0000-0003-0462-4702"
}
],
"contact":"[email protected]",
"further_info_url":"http://solarisheppa.geomar.de/cmip7",
"institution_id":"SOLARIS-HEPPA",
"license_id":"CC BY 4.0",
"mip_era":"CMIP6Plus",
"target_mip":"CMIP",
"source_version":"4.5"
},

EDIT: ah, ok, so there was a metadata issue in our CVs, https://github.com/PCMDI/input4MIPs_CVs/pull/198/files

@jitendra-kumar
Copy link

jitendra-kumar commented Feb 27, 2025

You can auto-fill the schema for input4MIPs but we won't be able to do that with other projects. We need to be able to do this for projects other than input4MIPs and will be using CVs as our source of information

Let me make a demo. I think we're on the same page. My point is just more that, we don't need to have this all in the source ID CVs, we can grab it from a few places. Anyway, I'll do a demo then we can see.

For datasets @durack1 noted above, I just need to know the source_id/dataset name that I can look for in input4MIPs_source_id.json

Grab the following:

  • UOEXETER-CMIP-1-3-1
  • UofMD-landState-3-0
  • DRES-CMIP-BB4CMIP7-1-0

They're obviously old, but the information will be mostly correct and applicable for the next version. As long as we can update metadata, then they're the perfect starting point

@znichollscr

Here are DOI's for these three datasets. We would need a better way to track these. Create a separate markdown table? I am having to add to CVs but these are only placeholder old source Ids.

title doi
input4MIPs.CMIP7.CMIP.SOLARIS-HEPPA.SOLARIS-HEPPA-CMIP-4-6 10.25981/ESGF.input4MIPs.CMIP7/2522675
input4MIPs.CMIP7.CMIP.uoexeter.UOEXETER-CMIP-1-3-1 10.25981/ESGF.input4MIPs.CMIP7/2522673
input4MIPs.CMIP7.CMIP.UofMD.UofMD-landState-3-0 10.25981/ESGF.input4MIPs.CMIP7/2521499
input4MIPs.CMIP7.CMIP.DRES.DRES-CMIP-BB4CMIP7-1-0 10.25981/ESGF.input4MIPs.CMIP7/2524040

@znichollscr
Copy link
Collaborator

We would need a better way to track these. Create a separate markdown table? I am having to add to CVs but these are only placeholder old source Ids.

We'll write them in the files ideally, then they'll be captured and appear in various places e.g. https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP7.html and I can also pull them out so they appear in other auto-generated places e.g. https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/dataset-overviews/anthropogenic-slcf-co2-emissions/

They don't need to go in the CVs: we will capture them in other ways.

@jitendra-kumar
Copy link

Sure, that fine. Let me know as and when you have more datasets ready to mint DOIs, and when there are updated information for these three.

@znichollscr
Copy link
Collaborator

Cool I fixed this up in #205.

10.25981/ESGF.input4MIPs.CMIP7/2522675 should be updated to

    "SOLARIS-HEPPA-CMIP-4-6":{
        "activity_id": "input4MIPs",
        "authors": [
            {
                "name": "Bernd Funke",
                "email": "[email protected]",
                "affiliations": [
                    "Instituto de Astrofísica de Andalucía, CSIC, Granada, Spain"
                ],
                "orcid": "0000-0003-0462-4702"
            }
        ],
        "contact":"[email protected]",
        "dataset_category":["solar"],
        "further_info_url":"http://solarisheppa.geomar.de/cmip7",
        "institution_id":"SOLARIS-HEPPA",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP7",
        "target_mip":"CMIP",
        "source_version":"4.6"
    },

10.25981/ESGF.input4MIPs.CMIP7/2522673 should be updated to

    "UOEXETER-CMIP-2-0-0":{
        "activity_id": "input4MIPs",
        "authors": [
            {
                "name": "Thomas Aubry",
                "email": "[email protected]",
                "affiliations": [
                    "Faculty of Environment, Science and Economy, University of Exeter, Exeter, EX4 4QF, UK"
                ],
                "orcid": "0000-0002-9275-4293"
            }
        ],
        "contact":"[email protected]",
        "dataset_category":["aerosolProperties", "emissions"],
        "further_info_url":"https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/latest/dataset-overviews/stratospheric-volcanic-so2-emissions-aod/",
        "institution_id":"uoexeter",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP7",
        "source_version":"2.0.0"
    },

Thanks!

@znichollscr
Copy link
Collaborator

From #197, this will be the entry for 10.25981/ESGF.input4MIPs.CMIP7/2524040

"DRES-CMIP-BB4CMIP7-2-0":{
        "activity_id": "input4MIPs",
        "authors": [
            {
                "name": "Margreet van Marle",
                "email": "[email protected]",
                "affiliations": [
                    "Deltares, Delft, the Netherlands"
                ],
                "orcid": "0000-0001-7473-5550"
            },
            {
                "name": "Guido van der Werf",
                "email": "[email protected]",
                "affiliations": [
                    "Wageningen University and Research, Meteorology and Air Quality, Wageningen, Netherlands"
                ],
                "orcid": "0000-0001-9042-8630"
            }
        ],
        "contact":"[email protected], [email protected]",
        "dataset_category":["emissions"],
        "further_info_url":"https://www.globalfiredata.org",
        "institution_id":"DRES",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP7",
        "source_version":"2.0"
    },

@durack1
Copy link
Contributor Author

durack1 commented Mar 5, 2025

Just looping back from an email thread, we currently have the status below (I will update this table to reflect the current status) - this HTML view is the latest here

@jitendra-kumar @znichollscr ping

ESGF publish status title doi
CMIP7 CMIP/DECK data
published input4MIPs.CMIP7.CMIP.CR.CR-CMIP-1-0-0 10.5281/zenodo.14892947
published input4MIPs.CMIP7.CMIP.DRES.DRES-CMIP-BB4CMIP7-1-0 10.25981/ESGF.input4MIPs.CMIP7/2524040
published input4MIPs.CMIP7.CMIP.SOLARIS-HEPPA.SOLARIS-HEPPA-CMIP-4-6 10.25981/ESGF.input4MIPs.CMIP7/2522675
published input4MIPs.CMIP7.CMIP.uoexeter.UOEXETER-CMIP-2-0-0 10.25981/ESGF.input4MIPs.CMIP7/2522673
expected CEDS-CMIP-2025-xx-yy, CEDS-CMIP-2025-xx-yy-supplemental -
expected CEDS-CMIP-xx-provisional + supp -
expected FZJ-x-y -
expected ImperialCollege-x-y -
expected PCMDI-AMIP-1-1-10 -
expected UofMD-landState-x-y -
expected somePopulationData-x-y -
CMIP7 other non-CMIP data
published input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-1 -
published input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-1-constant -
published input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-1-decreasing -
published input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-1-increasing -
published input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-2 -
published input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-2-constant -
published input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-2-decreasing -
published input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-2-increasing -

@jitendra-kumar
Copy link

@durack1 @znichollscr @climate-dude

I had not realized that the DRES-CMIP-BB4CMIP7-2-0 , UofMD-landState-3-1 and UOEXETER-CMIP-2-0-0 were already published to finalize/activate the respective DOIs. Is there a way to be in loop so I know when there's an action needed on my part. Also, DOIs don't appear to be included in the NetCDF global attributes in the published files.

For minting DOIs for list below..

  • CEDS-CMIP-2025-xx-yy, CEDS-CMIP-2025-xx-yy-supplemental -- will these needed individual DOIs or a single one?
  • CEDS-CMIP-xx-provisional + supp -- there nothing similar in existing CVs to pull placeholder information for this
  • FZJ-x-y -- not sure if and what dataset in existing CVs I can use to pull placeholder information for this
  • ImperialCollege-x-y -- not sure if and what dataset in existing CVs I can use to pull placeholder information for this
  • PCMDI-AMIP-1-1-10 -- I assume I can use the information from PCMDI-AMIP-1-1-9 in current CVs for placeholder information
  • UofMD-landState-x-y -- this was minted already 10.25981/ESGF.input4MIPs.CMIP7/2521499 and is published but is not reflected in the table https://input4mips-cvs.readthedocs.io/en/latest/database-views/input4MIPs_source-id_CMIP7.html

@durack1
Copy link
Contributor Author

durack1 commented Mar 13, 2025

Is there a way to be in loop so I know when there's an action needed

Yep, the easiest would be watch this repo for "Releases". We're trying to behave ourselves to make sure that any data that is published into ESGF is followed promptly with a webpage/repo update, and release

@znichollscr
Copy link
Collaborator

znichollscr commented Mar 14, 2025

I had not realized that the DRES-CMIP-BB4CMIP7-2-0 , UofMD-landState-3-1 and UOEXETER-CMIP-2-0-0 were already published to finalize/activate the respective DOIs. Is there a way to be in loop so I know when there's an action needed on my part. Also, DOIs don't appear to be included in the NetCDF global attributes in the published files

@jitendra-kumar that's our fault, sorry. We've been so focussed on getting data sets out that minting DOIs has fallen off our plate.

Yep, the easiest would be watch this repo for "Releases". We're trying to behave ourselves to make sure that any data that is published into ESGF is followed promptly with a webpage/repo update, and release

@durack1 as a note this isn't ideal, because by the time this happens it's too late to mint a DOI that can actually go in the file.

Summarising what we need next:

  • CEDS-CMIP*: no DOI needed, CEDS mints their own DOIs
  • FZJ-x-y: please mint a DOI based on the information in Add FZJ-1-0 #186
  • ImperialCollege-x-y: forget about this for now, I'll ping you when we are more sure about what this is going to be
  • PCMDI-AMIP-1-1-10: please use the information in Add PCMDI-AMIP-1-1-10 source ID and data #219

Other notes:

this was minted already 10.25981/ESGF.input4MIPs.CMIP7/2521499 and is published but is not reflected in the table input4mips-cvs.readthedocs.io/en/latest/database-views/input4MIPs_source-id_CMIP7.html

Correct, it will not be reflected until #213 is merged (here is the preview https://input4mips-controlled-vocabularies-cvs--213.org.readthedocs.build/en/213/database-views/input4MIPs_source-id_CMIP7.html).

For what it's worth, https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2521499 does not resolve for me so I think there is something wrong

@jitendra-kumar can you please also address the requests made here: #177 (comment)

@jitendra-kumar
Copy link

@znichollscr

Regarding #177 comment:

  • After we complete the DOI registrations -- OSTI auto-generates a page in their OSTI Data Explorer ex. (https://www.osti.gov/dataexplorer/biblio/dataset/2522675), and a similar entry appears in DataCite Commons (https://commons.datacite.org/doi.org/10.25981/esgf.input4mips.cmip7/2522675). I have checked with OSTI and they want data providers to provide the landing page URLs and we cannot use the OSTI or DataCite pages as landing pages. We also have no control over those pages and we can only control what's in the description field
  • OSTI allows using html tags in the data description and I have tried to move the data URL to the top of the description. OSTI auto-generated pages are not rending it correctly and I have made them aware of that. DataCite renders it better.

For now we will stick to link to Metagrid and within ESGF we are discussing potential ways to offer better landing pages.

These two DOIs are finalized and will be active shortly.
DRES-CMIP-BB4CMIP7-2-0: https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2524040
UOEXETER-CMIP-2-0-0: https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2522673

UofMD-landState-3-1: https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2521499 is still in draft stage and is not finalized yet since this dataset is not in the CVs yet. I can do that once that appears, I assume with merge of #213 .

@sashakames
Copy link

Then back to something we were discussing, autogenerate the landing page from the CV info and host in GH pgs. Autogenerate the MG links as part of the page generation using the url syntax via the "Copy Search" feature.

@znichollscr
Copy link
Collaborator

GH pgs

read the docs/via mkdocs, but yes, I agree

UofMD-landState-3-1: doi.org/10.25981/ESGF.input4MIPs.CMIP7/2521499 is still in draft stage and is not finalized yet since this dataset is not in the CVs yet. I can do that once that appears, I assume with merge of #213 .

Great, thanks

@znichollscr
Copy link
Collaborator

@jitendra-kumar can you try pointing the DOI entries' landing page to a URL of the form https://input4mips-cvs.readthedocs.io/en/stable/source-id-landing-pages/{source_id}/? e.g. https://input4mips-cvs.readthedocs.io/en/stable/source-id-landing-pages/CR-CMIP-1-0-0/. That should provide us with a landing page that has more metadata than the raw ESGF page.

@znichollscr
Copy link
Collaborator

@jitendra-kumar as an FYI, UofMD-landState-3-1-1 (#229) used the same DOI as UofMD-landState-3-1 so no need for a new DOI (reusing a DOI isn't ideal, but in this case it's ok because the difference is so small)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants