DOE OSTI DOIs for input4MIPs #177

durack1 · 2025-01-15T17:39:06Z

Just adding a placeholder issue, so we can centralize information about what the DOI OSTI service requires from authors to get a DOI issued.

We can then update the source_id and institution_id registration info, with the additional fields

ping @jitendra-kumar @sashakames

durack1 · 2025-01-16T16:00:58Z

FYI to self, 210 DOIs were issued by the CMIP6-era citation service for input4MIPs, so maybe we need to bump up the ~100 number we've discussed - see https://www.wdc-climate.de/ui/statistics?type=cmip6_doi_registration.

Also relevant is the CMIP6 Data Citation and Long-Term Archival wiki - https://redmine.dkrz.de/projects/cmip6-lta-and-data-citation/wiki

durack1 · 2025-01-28T02:38:12Z

Hi @jitendra-kumar. Just circling on this task, is there any progress to report? We have a project meeting tomorrow, so I was keen to update the data providers about the status and timings

jitendra-kumar · 2025-01-28T04:52:26Z

@durack1

Here's a summary of fields we need information for to register with OSTI. Many (but not all) of these information exist within the JSONs in this repo and we can pull the information together from the existing JSONs, and create a new JSON with all the information needed to register DOI for each dataset.

Product Description:

Dataset Title
Authors/Contributors
- First/Last name
- Email
- ORCID
- Affiliation
Related DOIs (if any) -- for cross-referencing [OPTIONAL]
Originating Research Organization
Publication Date
Sponsoring Organization
Keywords:
Geolocation -- [WE CAN ADD THIS IF ALL DATA ARE EXPECTED TO BE GLOBAL]
Dataset Description/Abstract

Dataset Location:

Landing page URL
Dataset file extension. [OPTIONAL -- will be .nc in most/all cases]
Dataset size

durack1 · 2025-01-29T14:15:20Z

@jitendra-kumar, that's great. What is the best/easiest format for this info to be collated, considering this first pass is going to be manual copy-and-paste — text files or another format?

jitendra-kumar · 2025-01-29T18:25:01Z

We should put the information together in a JSON, and that would allow us to automate the process at the later date. And even for the short term I can extract everything needed from that for manual entry.

sashakames · 2025-01-29T18:29:41Z

We will need to create .html landing pages. The .json could be used to render those. We would need then to put together a template. Then push those pages to gh-pages. This could be done with Github Actions.

znichollscr · 2025-01-29T18:36:42Z

Do you have any ideas for the schema @jitendra-kumar ? E.g. do certain fields need to be strings/boolean/lists etc.? I think that is the key. Once we have the schema, writing data to match it is relatively trivial.

Even just something like the below

Schema proposal

from attrs import define

@define
class Author:
    first_name: str
    last_name: str
    orcid: str  # would we also validate this, probably a good idea if easy
    affiliation: str
    affiliation_ror: str | None  # optional for anyone whose institute isn't registered
    

@define
class Product:
    dataset_title: str
    authors: list[Author]
    related_dois: list[str]  # should validate that these are DOIs
    originating_research_organisation: str  # I find this field a bit weird, given most things have multiple authors therefore source organisations and the authors have affiliations anyway
    publication_date: str  # YYYY-MM-DD I guess?
    sponsoring_organisation: str  # as above re needing multiple and info already being in author info. Also unclear to me what the difference from the other orgs is so I would suggest making this optional if we can
    keywords: list[str]
    geolocation: tuple[float, float]  # what do we put here? Lat/lon co-ords? Would suggest making this optional or dropping if we can
    description: str
    dataset_location: tuple[float, float]  # what do we put here? Lat/lon co-ords? Would suggest making this optional or dropping if we can. Or do I misunderstand this field?

@define
class Dataset:
    url: str  # validate this is a URL
    extension: str
    size: float  # in bytes I guess?

jitendra-kumar · 2025-01-30T20:57:07Z

@znichollscr working on this schema to be consistent with what OSTI wants. Will have something to share soon.

durack1 · 2025-02-19T18:43:30Z

@jitendra-kumar just FYI, our first CMIP7 final dataset has just been published with no DOI (see here).. So if we have a very quick solution to implement now is the time!

jitendra-kumar · 2025-02-23T20:45:46Z

@durack1 I looked at the three published datasets and can get the published metadata from the ESGF catalog. I am also able to extract most of the fields necessary to register the DOI from the input4MIPs CVs (in this repository). However, the published datasets are not reflect in the main branch of this raster, can the appropriate branches/forks with the CMIP7 mip_era be merged in the main.

Two outstanding attributes that's needed --

"Title" of the dataset. CVs/input4MIPs_source_id.json does not contain a title. Reviewing past input4MIPs datasets, there is no consistency in the title formats and it would be good to follow some scheme.

If a title field can be added to the CVs/input4MIPs_source_id.json.
The dataset title from ESGF index, for ex. input4MIPs.CMIP7.CMIP.SOLARIS-HEPPA.SOLARIS-HEPPA-CMIP-4-6.atmos.day.multiple.gn can potentially be used as the dataset title. I can recreate this from CVs as well ofcourse.

"Authors": CVs/input4MIPs_source_id.json is inconsistent with the contact attribute with some datasets containing "Firstname LastName (email)" format while some have just email for ex. all SOLARIS-HEPPA-CMIP-4-* datasets. Will be good to have contact field follow a format such as:

{
"contact" : [
    {
      "name": "First Author",
      "email": "[email protected]",
      "affiliation": "First Author Organization",
      "orcid": "ORCID",
    },
    {
      "name": "Second Author",
      "email": "[email protected]",
      "affiliation": "Second Author Organization",
      "orcid": "ORCID",
    },
    {
      "name": "Third Author",
      "email": "[email protected]",
      "affiliation": "Third Author Organization"
      "orcid": "ORCID",
    }
  ]
}

Or add affiliation and ORCID to the current format. ORCID can be an optional field:

{
"contact" : "Firstname Lastname (Affiliation | email | ORCID); Firstname Lastname (Affiliation | email | ORCID); "
}

Need something that I can parse..

So if the CVs can be updated for title and author fields, I will be able to get us a DOI very quick.

Also, as a quick fix for these three datasets published so far, you can send me Title and Authors information directly via email and I can register DOIs for them and add them as a new doi attribute to to CVs/input4MIPs_source_id.json via a PR.

@climate-dude

jitendra-kumar · 2025-02-24T01:16:10Z

We also need dataset description, or a way to assemble them from CVs.

This was the description in the past input4MIPs. @durack1 can you provide an updated description to use.

Ex. from https://www.wdc-climate.de/ui/cmip6?input=input4MIPs.CMIP6.RFMIP.UColorado.UColorado-RFMIP-0-4

CMIP6 Forcing Datasets (input4MIPs).
These data include all datasets published for 'input4MIPs.CMIP6.RFMIP.UColorado.UColorado-RFMIP-0-4' with the full Data Reference Syntax following the template 'activity_id.mip_era.target_mip.institution_id.source_id.realm.frequency.variable_id.grid_label'.

The model UColorado-RFMIP-0-4 (UColorado-RFMIP-0-4) was run by the UColorado (UColorado) in native nominal resolutions: unknown.

Project: The forcing datasets (and boundary conditions) needed for CMIP6 experiments are being prepared by a number of different experts. Initially many of these datasets may only be available from those experts, but over time as part of the 'input4MIPs' activity most of them will be archived by PCMDI and served by the Earth System Grid Federation (https://esgf-node.llnl.gov/search/input4mips/ ). More information is available in the living document: http://goo.gl/r8up31 .

znichollscr · 2025-02-24T11:26:03Z

However, the published datasets are not reflect in the main branch of this raster, can the appropriate branches/forks with the CMIP7 mip_era be merged in the main.

Done now. To set your expectations for future: in general, the lag is 1-2 days. In this case it was a week because I was on leave. We're on the fence about whether this should be a fully automated process or not (see e.g. prototype here, which we decided not to pursue right now #115).

Before continuing, a bit of background on the rationale for this, which will help explain the recommendations below.

Background/justification for the below

Some datasets are already minting their own DOIs (CEDS, GHGs, aerosol simple plumes). Hence, we can't create a workflow that only works if the DOI comes from this process (which is why the proposed quick fix isn't the way to go, we can't put a DOI in for each source ID as some providers mint DOIs at a higher level of resolution than source IDs).

This is a fallback, so I would keep it simple. Let's mint DOIs at the mip-era - source ID level. The combination of MIP era and source IDs should be unique, so this is a totally fine way that will let us keep track of things without having to mint heaps of DOIs. This then makes the title easy to create and easy for a human to read.

I would also keep the description short. We have lots of information already. Let's not duplicate it (which would then force us to think about how to update it). Rather, let's link back to existing information sources that we already have as much as possible.

Re the authors, I had wondered when our unstructured contacts would bite us. Looks like that day has come. I'll fix that now.

Recommendations

For the title: make it equal to "[mip_era].[source ID]" e.g. "CMIP7.SOLARIS-HEPPA-1-4-6" or "CMIP6Plus.CR-CMIP-GHG-1-3-0"
Authors: I will add a field to the source IDs so you have something to parse (working in Add author field #195)

Dataset description (I would start with this, and update once we know what information users who hit the DOI are actually interested in rather than spending lots of time trying to anticipate this in advance because I think we really don't know what context someone who lands on this DOI page will already have):

Datasets for input4MIPs provided with the source ID: [source-id]. These datasets are in the [dataset-category] category.

This is simply a DOI.
More information about the dataset can be found in the table/database available here: https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP6Plus.html.
If you are a modeller, these docs will likely also be helpful: https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/dataset-overviews/ (from here, you can look at the dataset(s) of interest to you).
The data itself is available from the ESGF at https://aims2.llnl.gov/search?project=input4MIPs&versionType=all&activeFacets=%7B%22source_id%22%3A%22[source-id]%22%7D.

jitendra-kumar · 2025-02-24T18:58:47Z

@znichollscr : where should I look for the dataset_category for any given source_id ?

znichollscr · 2025-02-24T19:28:38Z

Ah that's a good point: it's not a 1:1 mapping. You can either scrape it from the ESGF data, then just put a list of values. Honestly, I would probably just drop that sentence from the description.

jitendra-kumar · 2025-02-24T20:38:01Z

Also, is/will there a CMIP table/database, equivalent to CMIP6Plus URL in your example above, for CMIP7.
Expecting this URL for CMIP7: https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP7.html

znichollscr · 2025-02-24T23:44:25Z

Good pick up. I'll add one tomorrow.

durack1 · 2025-02-25T15:35:44Z

Also, is/will there a CMIP table/database, equivalent to CMIP6Plus URL in your example above, for CMIP7.
Expecting this URL for CMIP7: https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP7.html

@jitendra-kumar just circling on this, we already have one CMIP7 dataset published into ESGF, SOLARIS-HEPPA-CMIP-4-6, and so we've missed being able to assign a DOI within the published files.

We have volcanic, land use, greenhouse gas, anthropogenic emissions, biomass burning, and several other data that are being prepared right now, so what're the chances we can get ~10 DOIs issued now so these can be written to the files pre-publication?

jitendra-kumar · 2025-02-25T15:40:08Z

@znichollscr :

I think include dataset_category would be a useful thing, since it maps well to the table such as https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP6Plus.html. I can scrape that information from ESGF, but I would very much like to get information needed for the DOI registry from CVs (assuming we may do this before the data is published and there may not be anything in ESGF to scrape from).
can "activity_id" : "input4MIPs" be included in the source_id CVs
for the new datasets SOLARIS-HEPPA-CMIP-4-6 that have be published CVs have "mip_era : "CMIP6Plus", but it should be "mip_era" : "CMIP7"

jitendra-kumar · 2025-02-25T15:53:50Z

@durack1 : I need few additional information added/updated to the CVs in my last comment above. Else, send me the list of source_id that need DOIs minted and I should be able to do that pretty quick.

I have a DOI minted for SOLARIS-HEPPA-CMIP-4-6 but am waiting on replication/publication of the data to ORNL node. You can consider adding the DOI (10.25981/ESGF.input4MIPs.CMIP7/2522675 ) to the NETCDF file and republish the data.

znichollscr · 2025-02-25T16:12:34Z

@jitendra-kumar thinking about this and seeing how it is evolving. Can you tell us what you need? (I.e. give me a schema to fill out, and I'll auto-generate it. Having you parse the CVs isn't going to be efficient I don't think). Including everything in the CVs is going to be a pain for us and likely not helpful. It's going to be much faster for me to just compile what you need than for us to pollute our CVs in the way we're going.

for the new datasets SOLARIS-HEPPA-CMIP-4-6 that have be published CVs have "mip_era : "CMIP6Plus", but it should be "mip_era" : "CMIP7"

Thanks, good pick up.

Else, send me the list of source_id that need DOIs minted and I should be able to do that pretty quick

Given we can update the metadata after the DOI is minted, can we just mint the DOIs and we'll update the metadata afterwards?

jitendra-kumar · 2025-02-25T16:47:05Z

You can auto-fill the schema for input4MIPs but we won't be able to do that with other projects. We need to be able to do this for projects other than input4MIPs and will be using CVs as our source of information. CVs already has most of what we need. We prefer to not have to hard code activity_id but I can. If dataset_category is not a part of your CVs, I will simply drop it. So we can do without those fields. Other than that we are good to go and nothing else is needed beyond what you already have.

For minting DOIs, yes we can update the metadata later, but need some minimal information (dataset name, authors -- which can be edited later) to populate the draft. As long as the dataset is in source_id CVs, I can get what I need. For datasets @durack1 noted above, I just need to know the source_id/dataset name that I can look for in input4MIPs_source_id.json.

znichollscr · 2025-02-25T17:06:39Z

You can auto-fill the schema for input4MIPs but we won't be able to do that with other projects. We need to be able to do this for projects other than input4MIPs and will be using CVs as our source of information

Let me make a demo. I think we're on the same page. My point is just more that, we don't need to have this all in the source ID CVs, we can grab it from a few places. Anyway, I'll do a demo then we can see.

For datasets @durack1 noted above, I just need to know the source_id/dataset name that I can look for in input4MIPs_source_id.json

Grab the following:

UOEXETER-CMIP-1-3-1
UofMD-landState-3-0
DRES-CMIP-BB4CMIP7-1-0

They're obviously old, but the information will be mostly correct and applicable for the next version. As long as we can update metadata, then they're the perfect starting point

znichollscr · 2025-02-25T18:19:17Z

I have a DOI minted for SOLARIS-HEPPA-CMIP-4-6 but am waiting on replication/publication of the data to ORNL node. You can consider adding the DOI (10.25981/ESGF.input4MIPs.CMIP7/2522675 ) to the NETCDF file and republish the data.

@jitendra-kumar one other question. Can you make this for SOLARIS-HEPPA-CMIP-4-7 ? We don't republish data as far as I understand (that can create complete confusion so we either publish something new or don't change the data).

jitendra-kumar · 2025-02-25T18:23:19Z

@sashakames @climate-dude may be able to comment on best way to retract/republish.

But yes I can change SOLARIS-HEPPA-CMIP-4-6 to SOLARIS-HEPPA-CMIP-4-7 in the DOI metadata which is still in draft form.

znichollscr · 2025-02-25T18:34:53Z

@sashakames @climate-dude may be able to comment on best way to retract/republish

Thanks. We're comfortable with the process and work with Sasha a lot already. We just don't do it to avoid confusion. For input4MIPs, it's either: new version or nothing to avoid there being multiple versions of the same thing out there.

znichollscr · 2025-02-25T18:35:37Z

But yes I can change SOLARIS-HEPPA-CMIP-4-6 to SOLARIS-HEPPA-CMIP-4-7 in the DOI metadata which is still in draft form

Great to know, thanks

znichollscr · 2025-02-25T18:37:30Z

So, thinking about it more, this is the process I imagine we'll end up with:

start a new MR for a new source ID (e.g. update createDRS.py to catch issues (#37) #129
use that to mint a DOI in draft form
pass the DOI to the data creator so they can include it in their file
receive created files from the data provider
check that the DOI has been entered correctly
publish the files on ESGF
publish the DOI
celebrate

sashakames · 2025-02-25T18:40:58Z

SOLARIS-HEPPA-CMIP-4-7 ?

Is this new data or a replica of existing data? If the files do not change we don't want to cut a new source_id, then you would need to start over, or will that happen at LLNL?
We are still working on staging the data at ORNL for publication, have run into a permissions issue with the destination path.

znichollscr · 2025-02-25T18:42:48Z

If the files do not change we don't want to cut a new source_id, then you would need to start over, or will that happen at LLNL?

Sorry @sashakames, shouldn't have pulled you in with no context. Anyway, now you're here: we would only cut a new source ID if we create new files. The question is whether we do that or not, and on that we haven't decided yet (is it worth creating new files, just to get a DOI attribute, we'll probably just ask the data provider, once we've worked out the various issues here).

sashakames · 2025-02-25T18:49:52Z

thanks, if there are new files, need to decide to do the original publish at ORNL vs LLNL. Also we may want to populate a citation_url with he landing pageand an xlink record with .json so Metagrid points at something.
A GH pages site would work great for hosting those.

znichollscr · 2025-02-25T18:52:20Z

Also we may want to populate a citation_url with he landing pageand an xlink record with .json so Metagrid points at something.

Sounds good, but I don't fully follow. Is this something that we do on the ESGF side or that you're suggesting we put in the file?

znichollscr · 2025-02-25T19:26:51Z

Let me make a demo

Trying to keep track of things, the demo is in #200

durack1 · 2025-02-25T20:05:38Z

for the new datasets SOLARIS-HEPPA-CMIP-4-6 that have be published CVs have "mip_era : "CMIP6Plus", but it should be "mip_era" : "CMIP7"

@jitendra-kumar sorry just catching up, the SOLARIS-HEPPA-CMIP-4-6 is our first mip_era=CMIP7 or "final" dataset that will be used in the production CMIP7 simulations, see

input4MIPs_CVs/CVs/input4MIPs_source_id.json

Lines 486 to 504 in 2a1439f

	"SOLARIS-HEPPA-CMIP-4-6":{
	"authors": [
	{
	"name": "Bernd Funke",
	"email": "[email protected]",
	"affiliations": [
	"Instituto de Astrofísica de Andalucía, CSIC, Granada, Spain"
	],
	"orcid": "0000-0003-0462-4702"
	}
	],
	"contact":"[email protected]",
	"further_info_url":"http://solarisheppa.geomar.de/cmip7",
	"institution_id":"SOLARIS-HEPPA",
	"license_id":"CC BY 4.0",
	"mip_era":"CMIP7",
	"target_mip":"CMIP",
	"source_version":"4.6"
	},

In the ESGF SOLR index, and in the file:

$ ncdump -h input4MIPs/CMIP7/CMIP/SOLARIS-HEPPA/SOLARIS-HEPPA-CMIP-4-6/atmos/mon/multiple/gn/v20250219/multiple_input4MIPs_solar_CMIP_SOLARIS-HEPPA-CMIP-4-6_gn_185001-202312.nc | grep mip_era
		:mip_era = "CMIP7" ;

Previous versions were prototype mip_era=CMIP6Plus data, e.g.,

input4MIPs_CVs/CVs/input4MIPs_source_id.json

Lines 467 to 485 in 2a1439f

	"SOLARIS-HEPPA-CMIP-4-5":{
	"authors": [
	{
	"name": "Bernd Funke",
	"email": "[email protected]",
	"affiliations": [
	"Instituto de Astrofísica de Andalucía, CSIC, Granada, Spain"
	],
	"orcid": "0000-0003-0462-4702"
	}
	],
	"contact":"[email protected]",
	"further_info_url":"http://solarisheppa.geomar.de/cmip7",
	"institution_id":"SOLARIS-HEPPA",
	"license_id":"CC BY 4.0",
	"mip_era":"CMIP6Plus",
	"target_mip":"CMIP",
	"source_version":"4.5"
	},

EDIT: ah, ok, so there was a metadata issue in our CVs, https://github.com/PCMDI/input4MIPs_CVs/pull/198/files

jitendra-kumar · 2025-02-27T20:28:26Z

You can auto-fill the schema for input4MIPs but we won't be able to do that with other projects. We need to be able to do this for projects other than input4MIPs and will be using CVs as our source of information

Let me make a demo. I think we're on the same page. My point is just more that, we don't need to have this all in the source ID CVs, we can grab it from a few places. Anyway, I'll do a demo then we can see.

For datasets @durack1 noted above, I just need to know the source_id/dataset name that I can look for in input4MIPs_source_id.json

Grab the following:

UOEXETER-CMIP-1-3-1

UofMD-landState-3-0

DRES-CMIP-BB4CMIP7-1-0

They're obviously old, but the information will be mostly correct and applicable for the next version. As long as we can update metadata, then they're the perfect starting point

@znichollscr

Here are DOI's for these three datasets. We would need a better way to track these. Create a separate markdown table? I am having to add to CVs but these are only placeholder old source Ids.

title	doi
input4MIPs.CMIP7.CMIP.SOLARIS-HEPPA.SOLARIS-HEPPA-CMIP-4-6	10.25981/ESGF.input4MIPs.CMIP7/2522675
input4MIPs.CMIP7.CMIP.uoexeter.UOEXETER-CMIP-1-3-1	10.25981/ESGF.input4MIPs.CMIP7/2522673
input4MIPs.CMIP7.CMIP.UofMD.UofMD-landState-3-0	10.25981/ESGF.input4MIPs.CMIP7/2521499
input4MIPs.CMIP7.CMIP.DRES.DRES-CMIP-BB4CMIP7-1-0	10.25981/ESGF.input4MIPs.CMIP7/2524040

znichollscr · 2025-02-27T20:44:34Z

We would need a better way to track these. Create a separate markdown table? I am having to add to CVs but these are only placeholder old source Ids.

We'll write them in the files ideally, then they'll be captured and appear in various places e.g. https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/database-views/input4MIPs_source-id_CMIP7.html and I can also pull them out so they appear in other auto-generated places e.g. https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/stable/dataset-overviews/anthropogenic-slcf-co2-emissions/

They don't need to go in the CVs: we will capture them in other ways.

jitendra-kumar · 2025-02-27T20:56:18Z

Sure, that fine. Let me know as and when you have more datasets ready to mint DOIs, and when there are updated information for these three.

znichollscr · 2025-02-28T07:02:50Z

Cool I fixed this up in #205.

10.25981/ESGF.input4MIPs.CMIP7/2522675 should be updated to

    "SOLARIS-HEPPA-CMIP-4-6":{
        "activity_id": "input4MIPs",
        "authors": [
            {
                "name": "Bernd Funke",
                "email": "[email protected]",
                "affiliations": [
                    "Instituto de Astrofísica de Andalucía, CSIC, Granada, Spain"
                ],
                "orcid": "0000-0003-0462-4702"
            }
        ],
        "contact":"[email protected]",
        "dataset_category":["solar"],
        "further_info_url":"http://solarisheppa.geomar.de/cmip7",
        "institution_id":"SOLARIS-HEPPA",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP7",
        "target_mip":"CMIP",
        "source_version":"4.6"
    },

10.25981/ESGF.input4MIPs.CMIP7/2522673 should be updated to

    "UOEXETER-CMIP-2-0-0":{
        "activity_id": "input4MIPs",
        "authors": [
            {
                "name": "Thomas Aubry",
                "email": "[email protected]",
                "affiliations": [
                    "Faculty of Environment, Science and Economy, University of Exeter, Exeter, EX4 4QF, UK"
                ],
                "orcid": "0000-0002-9275-4293"
            }
        ],
        "contact":"[email protected]",
        "dataset_category":["aerosolProperties", "emissions"],
        "further_info_url":"https://input4mips-controlled-vocabularies-cvs.readthedocs.io/en/latest/dataset-overviews/stratospheric-volcanic-so2-emissions-aod/",
        "institution_id":"uoexeter",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP7",
        "source_version":"2.0.0"
    },

Thanks!

znichollscr · 2025-02-28T07:14:25Z

From #197, this will be the entry for 10.25981/ESGF.input4MIPs.CMIP7/2524040

"DRES-CMIP-BB4CMIP7-2-0":{
        "activity_id": "input4MIPs",
        "authors": [
            {
                "name": "Margreet van Marle",
                "email": "[email protected]",
                "affiliations": [
                    "Deltares, Delft, the Netherlands"
                ],
                "orcid": "0000-0001-7473-5550"
            },
            {
                "name": "Guido van der Werf",
                "email": "[email protected]",
                "affiliations": [
                    "Wageningen University and Research, Meteorology and Air Quality, Wageningen, Netherlands"
                ],
                "orcid": "0000-0001-9042-8630"
            }
        ],
        "contact":"[email protected], [email protected]",
        "dataset_category":["emissions"],
        "further_info_url":"https://www.globalfiredata.org",
        "institution_id":"DRES",
        "license_id":"CC BY 4.0",
        "mip_era":"CMIP7",
        "source_version":"2.0"
    },

durack1 · 2025-03-05T00:02:10Z

Just looping back from an email thread, we currently have the status below (I will update this table to reflect the current status) - this HTML view is the latest here

@jitendra-kumar @znichollscr ping

ESGF publish status	title	doi
	CMIP7 CMIP/DECK data
published	input4MIPs.CMIP7.CMIP.CR.CR-CMIP-1-0-0	10.5281/zenodo.14892947
published	input4MIPs.CMIP7.CMIP.DRES.DRES-CMIP-BB4CMIP7-1-0	10.25981/ESGF.input4MIPs.CMIP7/2524040
published	input4MIPs.CMIP7.CMIP.SOLARIS-HEPPA.SOLARIS-HEPPA-CMIP-4-6	10.25981/ESGF.input4MIPs.CMIP7/2522675
published	input4MIPs.CMIP7.CMIP.uoexeter.UOEXETER-CMIP-2-0-0	10.25981/ESGF.input4MIPs.CMIP7/2522673

expected	CEDS-CMIP-2025-xx-yy, CEDS-CMIP-2025-xx-yy-supplemental	-
expected	CEDS-CMIP-xx-provisional + supp	-
expected	FZJ-x-y	-
expected	ImperialCollege-x-y	-
expected	PCMDI-AMIP-1-1-10	-
expected	UofMD-landState-x-y	-
expected	somePopulationData-x-y	-

	CMIP7 other non-CMIP data
published	input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-1	-
published	input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-1-constant	-
published	input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-1-decreasing	-
published	input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-1-increasing	-
published	input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-2	-
published	input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-2-constant	-
published	input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-2-decreasing	-
published	input4MIPs.CMIP6Plus.AerChemMIP2.UCLA.UCLA-1-0-2-increasing	-

znichollscr · 2025-03-05T19:24:21Z

@jitendra-kumar a couple of requests:

can you point the DOIs at the OSTI landing pages (e.g. https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2522675 should resolve to https://www.osti.gov/dataexplorer/biblio/dataset/2522675, not https://esgf-node.ornl.gov/search/input4mips/?project=input4MIPs&activeFacets=%7B%22mip_era%22%3A%22CMIP7%22%2C%22target_mip%22%3A%22CMIP%22%2C%22activity_id%22%3A%22input4MIPs%22%2C%22institution_id%22%3A%22SOLARIS-HEPPA%22%2C%22source_id%22%3A%22SOLARIS-HEPPA-CMIP-4-6%22%7D)
can you put the link to metagrid on the OSTI landing page a bit more prominently (i.e. add https://esgf-node.ornl.gov/search/input4mips/?project=input4MIPs&activeFacets=%7B%22mip_era%22%3A%22CMIP7%22%2C%22target_mip%22%3A%22CMIP%22%2C%22activity_id%22%3A%22input4MIPs%22%2C%22institution_id%22%3A%22SOLARIS-HEPPA%22%2C%22source_id%22%3A%22SOLARIS-HEPPA-CMIP-4-6%22%7D to https://www.osti.gov/dataexplorer/biblio/dataset/2522675 as text somewhere, rather than only having it under the download button)

Thanks!

jitendra-kumar · 2025-03-13T19:53:49Z

@durack1 @znichollscr @climate-dude

I had not realized that the DRES-CMIP-BB4CMIP7-2-0 , UofMD-landState-3-1 and UOEXETER-CMIP-2-0-0 were already published to finalize/activate the respective DOIs. Is there a way to be in loop so I know when there's an action needed on my part. Also, DOIs don't appear to be included in the NetCDF global attributes in the published files.

For minting DOIs for list below..

CEDS-CMIP-2025-xx-yy, CEDS-CMIP-2025-xx-yy-supplemental -- will these needed individual DOIs or a single one?
CEDS-CMIP-xx-provisional + supp -- there nothing similar in existing CVs to pull placeholder information for this
FZJ-x-y -- not sure if and what dataset in existing CVs I can use to pull placeholder information for this
ImperialCollege-x-y -- not sure if and what dataset in existing CVs I can use to pull placeholder information for this
PCMDI-AMIP-1-1-10 -- I assume I can use the information from PCMDI-AMIP-1-1-9 in current CVs for placeholder information
UofMD-landState-x-y -- this was minted already 10.25981/ESGF.input4MIPs.CMIP7/2521499 and is published but is not reflected in the table https://input4mips-cvs.readthedocs.io/en/latest/database-views/input4MIPs_source-id_CMIP7.html

durack1 · 2025-03-13T20:03:24Z

Is there a way to be in loop so I know when there's an action needed

Yep, the easiest would be watch this repo for "Releases". We're trying to behave ourselves to make sure that any data that is published into ESGF is followed promptly with a webpage/repo update, and release

znichollscr · 2025-03-14T10:41:45Z

I had not realized that the DRES-CMIP-BB4CMIP7-2-0 , UofMD-landState-3-1 and UOEXETER-CMIP-2-0-0 were already published to finalize/activate the respective DOIs. Is there a way to be in loop so I know when there's an action needed on my part. Also, DOIs don't appear to be included in the NetCDF global attributes in the published files

@jitendra-kumar that's our fault, sorry. We've been so focussed on getting data sets out that minting DOIs has fallen off our plate.

Yep, the easiest would be watch this repo for "Releases". We're trying to behave ourselves to make sure that any data that is published into ESGF is followed promptly with a webpage/repo update, and release

@durack1 as a note this isn't ideal, because by the time this happens it's too late to mint a DOI that can actually go in the file.

Summarising what we need next:

CEDS-CMIP*: no DOI needed, CEDS mints their own DOIs
FZJ-x-y: please mint a DOI based on the information in Add FZJ-1-0 #186
ImperialCollege-x-y: forget about this for now, I'll ping you when we are more sure about what this is going to be
PCMDI-AMIP-1-1-10: please use the information in Add PCMDI-AMIP-1-1-10 source ID and data #219

Other notes:

this was minted already 10.25981/ESGF.input4MIPs.CMIP7/2521499 and is published but is not reflected in the table input4mips-cvs.readthedocs.io/en/latest/database-views/input4MIPs_source-id_CMIP7.html

Correct, it will not be reflected until #213 is merged (here is the preview https://input4mips-controlled-vocabularies-cvs--213.org.readthedocs.build/en/213/database-views/input4MIPs_source-id_CMIP7.html).

For what it's worth, https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2521499 does not resolve for me so I think there is something wrong

@jitendra-kumar can you please also address the requests made here: #177 (comment)

jitendra-kumar · 2025-03-14T16:24:13Z

@znichollscr

Regarding #177 comment:

After we complete the DOI registrations -- OSTI auto-generates a page in their OSTI Data Explorer ex. (https://www.osti.gov/dataexplorer/biblio/dataset/2522675), and a similar entry appears in DataCite Commons (https://commons.datacite.org/doi.org/10.25981/esgf.input4mips.cmip7/2522675). I have checked with OSTI and they want data providers to provide the landing page URLs and we cannot use the OSTI or DataCite pages as landing pages. We also have no control over those pages and we can only control what's in the description field
OSTI allows using html tags in the data description and I have tried to move the data URL to the top of the description. OSTI auto-generated pages are not rending it correctly and I have made them aware of that. DataCite renders it better.

For now we will stick to link to Metagrid and within ESGF we are discussing potential ways to offer better landing pages.

These two DOIs are finalized and will be active shortly.
DRES-CMIP-BB4CMIP7-2-0: https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2524040
UOEXETER-CMIP-2-0-0: https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2522673

UofMD-landState-3-1: https://doi.org/10.25981/ESGF.input4MIPs.CMIP7/2521499 is still in draft stage and is not finalized yet since this dataset is not in the CVs yet. I can do that once that appears, I assume with merge of #213 .

sashakames · 2025-03-14T16:30:40Z

Then back to something we were discussing, autogenerate the landing page from the CV info and host in GH pgs. Autogenerate the MG links as part of the page generation using the url syntax via the "Copy Search" feature.

znichollscr · 2025-03-14T17:32:07Z

GH pgs

read the docs/via mkdocs, but yes, I agree

UofMD-landState-3-1: doi.org/10.25981/ESGF.input4MIPs.CMIP7/2521499 is still in draft stage and is not finalized yet since this dataset is not in the CVs yet. I can do that once that appears, I assume with merge of #213 .

Great, thanks

znichollscr · 2025-03-20T15:11:48Z

@jitendra-kumar can you try pointing the DOI entries' landing page to a URL of the form https://input4mips-cvs.readthedocs.io/en/stable/source-id-landing-pages/{source_id}/? e.g. https://input4mips-cvs.readthedocs.io/en/stable/source-id-landing-pages/CR-CMIP-1-0-0/. That should provide us with a landing page that has more metadata than the raw ESGF page.

znichollscr · 2025-03-26T07:25:26Z

@jitendra-kumar as an FYI, UofMD-landState-3-1-1 (#229) used the same DOI as UofMD-landState-3-1 so no need for a new DOI (reusing a DOI isn't ideal, but in this case it's ok because the difference is so small)

znichollscr mentioned this issue Feb 24, 2025

Add author field #195

Merged

2 tasks

znichollscr mentioned this issue Feb 25, 2025

Add CMIP7 summary pages #199

Merged

2 tasks

znichollscr mentioned this issue Feb 25, 2025

Clarifying the point of the CVs #201

Closed

znichollscr mentioned this issue Feb 26, 2025

Rework source IDs to include all information for citations #200

Merged

2 tasks

znichollscr mentioned this issue Mar 14, 2025

Add source ID landing pages #225

Merged

5 tasks

DOE OSTI DOIs for input4MIPs #177

DOE OSTI DOIs for input4MIPs #177

Comments

durack1 commented Jan 15, 2025 • edited Loading

durack1 commented Jan 16, 2025

durack1 commented Jan 28, 2025

jitendra-kumar commented Jan 28, 2025

durack1 commented Jan 29, 2025

jitendra-kumar commented Jan 29, 2025

sashakames commented Jan 29, 2025

znichollscr commented Jan 29, 2025 • edited Loading

jitendra-kumar commented Jan 30, 2025

durack1 commented Feb 19, 2025

jitendra-kumar commented Feb 23, 2025 • edited Loading

jitendra-kumar commented Feb 24, 2025 • edited Loading

znichollscr commented Feb 24, 2025 • edited Loading

Recommendations

jitendra-kumar commented Feb 24, 2025

znichollscr commented Feb 24, 2025

jitendra-kumar commented Feb 24, 2025

znichollscr commented Feb 24, 2025

durack1 commented Feb 25, 2025

jitendra-kumar commented Feb 25, 2025

jitendra-kumar commented Feb 25, 2025

znichollscr commented Feb 25, 2025 • edited Loading

jitendra-kumar commented Feb 25, 2025

znichollscr commented Feb 25, 2025

znichollscr commented Feb 25, 2025

jitendra-kumar commented Feb 25, 2025 • edited Loading

znichollscr commented Feb 25, 2025 • edited Loading

znichollscr commented Feb 25, 2025

znichollscr commented Feb 25, 2025

sashakames commented Feb 25, 2025

znichollscr commented Feb 25, 2025 • edited Loading

sashakames commented Feb 25, 2025

znichollscr commented Feb 25, 2025

znichollscr commented Feb 25, 2025

durack1 commented Feb 25, 2025 • edited Loading

jitendra-kumar commented Feb 27, 2025 • edited Loading

znichollscr commented Feb 27, 2025

jitendra-kumar commented Feb 27, 2025

znichollscr commented Feb 28, 2025

znichollscr commented Feb 28, 2025

durack1 commented Mar 5, 2025

znichollscr commented Mar 5, 2025

jitendra-kumar commented Mar 13, 2025

durack1 commented Mar 13, 2025

znichollscr commented Mar 14, 2025 • edited Loading

jitendra-kumar commented Mar 14, 2025

sashakames commented Mar 14, 2025

znichollscr commented Mar 14, 2025

znichollscr commented Mar 20, 2025

znichollscr commented Mar 26, 2025

durack1 commented Jan 15, 2025 •

edited

Loading

znichollscr commented Jan 29, 2025 •

edited

Loading

jitendra-kumar commented Feb 23, 2025 •

edited

Loading

jitendra-kumar commented Feb 24, 2025 •

edited

Loading

znichollscr commented Feb 24, 2025 •

edited

Loading

znichollscr commented Feb 25, 2025 •

edited

Loading

jitendra-kumar commented Feb 25, 2025 •

edited

Loading

znichollscr commented Feb 25, 2025 •

edited

Loading

znichollscr commented Feb 25, 2025 •

edited

Loading

durack1 commented Feb 25, 2025 •

edited

Loading

jitendra-kumar commented Feb 27, 2025 •

edited

Loading

znichollscr commented Mar 14, 2025 •

edited

Loading