Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMIP7 Historical Forcings are part of the DECK #222

Open
johndunne13 opened this issue Mar 14, 2025 · 19 comments
Open

CMIP7 Historical Forcings are part of the DECK #222

johndunne13 opened this issue Mar 14, 2025 · 19 comments
Assignees
Labels
dataset-issue An issue with a dataset

Comments

@johndunne13
Copy link

Dataset source ID(s)

Dataset source ID(s):

Describe the issue

This githip page site https://input4mips-cvs.readthedocs.io/en/latest/dataset-overviews/ currently incorrectly refers to the historical forcings as part of "CMIP7 AR7 Fast Track". As discussed in the CMIP7 description paper, the CMIP Panel highlights the importance of the Historical simulation for CMIP7 by including it in the mandatory DECK set of experiments rather than any optional set of experiments. The github page should be corrected to reflect this distinction and avoid confusion. Thanks! John

Expected data

Screenshots

Additional context

@znichollscr
Copy link
Collaborator

znichollscr commented Mar 14, 2025

Hi @johndunne13, just to make sure we're all on the same page. You're saying that there is only one DECK. So, unlike other MIPs, which could have a fast track phase and a later phase, for the DECK simulations there is only one phase. Key implication: the DECK forcings that are being released now are the only DECK forcings that will be released throughout the entirety of CMIP7 (at least under the title of the DECK).

If yes, that's good to understand. Can you please also confirm that this does not need to go through a CMIP panel meeting, I should just update the docs now.

@johndunne13
Copy link
Author

johndunne13 commented Mar 14, 2025 via email

@johndunne13
Copy link
Author

johndunne13 commented Mar 14, 2025 via email

@durack1
Copy link
Contributor

durack1 commented Mar 14, 2025

Just flagging that even for CMIP6, the DECK was identified "continuous" - such as the graphic below. We need to be a little careful about conflating "entry card" (CMIP6 terminology) which relates to what a modelling group needs to do to contribute to CMIP7, and what forcing data is used to generate the simulations - with all likelihood being that there will be updates to forcing data over the lifetime of CMIP7, which will likely start with first data available in 2025, and if precedents are a guide, extend well past 2030..

@durack1
Copy link
Contributor

durack1 commented Mar 14, 2025

A key point to note, is that in CMIP6 any simulations that met the CMIP6 data specifications were allowed to publish their data into the CMIP6 ESGF project. We have no technical way to validate what forcing data was used for a piControl, historical (or their esm* variants), and consequently we will get what we get in the CMIP7 archive - this is an "ensemble of opportunity" after all!

Most, but not all modeling groups used the CMIP6 provided forcings (e.g. Lurton et al., 2020), but some groups did not (e.g. Danabasoglu et al., 2020 note the VolcanEESM volcanic forcing was used, not the Luo et al data provided in input4MIPs). Both these sims are published side-by-side in CMIP6, and it is likely the same will happen in CMIP7

@znichollscr
Copy link
Collaborator

there will be updates to forcing data over the lifetime of CMIP7

My understanding is that the communications made to date are saying the opposite of this. We are freezing the forcings for the historical. There will be no updates as part of the DECK (i.e. if you want to do updates, do them in a MIP).

@durack1 are you arguing for us to put back in the distinction between CMIP7/fast track/whatever comes now? If no, I'm a bit confused about the point being made, I thought we had clarity about the next steps (drop fast track) but now I'm not sure.

@vnaik60
Copy link
Collaborator

vnaik60 commented Mar 14, 2025

  1. there will be "updates to forcing data over the lifetime of CMIP7" but most modeling centers will not rerun DECK simualtions for the CMIP7 phase. New simulations could be run as part of CMIP7Plus or CMIP8. Running DECK is costly.
  2. there will be extensions to forcings during CMIP7 and most modeling centers could extend their historical (hist-ext) simulations if they choose to. This would entail minimum cost.
    my 2 cents.

@znichollscr
Copy link
Collaborator

  1. there will be "updates to forcing data over the lifetime of CMIP7" but most modeling centers will not rerun DECK simualtions for the CMIP7 phase. New simulations could be run as part of CMIP7Plus or CMIP8. Running DECK is costly.

Ok, so surely we put those updates under a mip_era of "CMIP7Plus" to make clear that we do not expect modelling centres to re-run?

2. there will be extensions to forcings during CMIP7 and most modeling centers could extend their historical (hist-ext) simulations if they choose to. This would entail minimum cost.

To clarify, if they're "historical-ext" simulations then they're not DECK simulations anyway, right?

@durack1
Copy link
Contributor

durack1 commented Mar 14, 2025

we put those updates under a mip_era of "CMIP7Plus"

For the forcing data and input4MIPs ESGF project, this makes sense to me. I'm not sure it makes sense to others. For example, would Steve be happy for his "provisional" data to be marked CMIP7Plus? As a single vote, I'd be more than happy for my PCMDI-AMIP-2-x-y data to be marked CMIP7Plus if that makes comms cleaner.

I would note that however we identify these forcings, they are likely to be used in simulations published to the CMIP7 ESGF project, so I am just noting that nuance.

@znichollscr
Copy link
Collaborator

For example, would Steve be happy for his "provisional" data to be marked CMIP7Plus?

If we put this under CMIP7, it'll be super confusing (are we saying that this should be used for the DECK or not?). Maybe the other way through it would be to put it under mip_era="CMIP7" but target_mip="SomeNewMIPOnProvisionalData". Either way, it wouldn't be under mip_era="CMIP7" and target_mip="CMIP" so we can differentiate.

@znichollscr
Copy link
Collaborator

I've done a suggested update of the docs in #223. A preview of how they would look with this update is here (only this page is relevant, fast track is not mentioned anywhere else): https://input4mips-controlled-vocabularies-cvs--223.org.readthedocs.build/en/223/dataset-overviews/

@durack1
Copy link
Contributor

durack1 commented Mar 14, 2025

@vnaik60 can you chime in on the forcing data labelling "CMIP7" small set specifically to get DECK sims started now, and "CMIP7Plus" organically growing set?

From a modelling group perspective, does that make our comms, along with the aspiration to collate and develop "sustained" forcings that temporally extend, and potentially completely overhaul in the coming years (during the CMIP7 project window) clear?

@vnaik60
Copy link
Collaborator

vnaik60 commented Mar 14, 2025

Isn't provisional only extending the timeseries?
Lets keep this simple and keep our focus on CMIP7 (which includes forcings for fastrack, communityMIPs). We will come to CMIP7Plus when the whole timeseries for all forcing datasets is updated (similar to what we did for CMIP6Plus).

@durack1
Copy link
Contributor

durack1 commented Mar 14, 2025

Isn't provisional only extending the timeseries?

The point of the "provisional" identity (note this is my placeholder naming, not sure what Steve will do) was to capture data that will likely change. As this will change, it should not be used for "production" CMIP7 simulations.

So 1850-2021 CEDS data is "CEDS-CMIP-2025-02-01" for example, whereas the 2022-2023 CEDS data which will be made available soonafter or at the same time, is not identified the same way (some provisional marker - say "CEDS-CMIP-2025-02-01-provisional"). In 2026 this "provisional" data will be revised likely markedly to extend the "CEDS-CMIP-2025-02-01" data from the 1850-2021 to the 1850-2023 period, ensuring the 2021-12 and 2022-01 timestep is smooth. This new data will have an identity of "CEDS-CMIP-2026-02-01" (or similar).

CMIP7 (which includes forcings for fastrack, communityMIPs)

The point of this thread and all this recent chatter over the last days is to come up with very explicit guidance. "Fast Track" is a vague term. What we are trying to do here is match forcing data to experiments, and specifically the experimental protocol. If we change the data, we change the protocol, we change the simulations that result. So for the DECK experiments (1pctCO2, abrupt-4xCO2, amip, piControl, esm-piControl, historical, esm-hist, piClim-control, piClim-anthro, piClim-4xCO2), with the historical that covers the 1850-2021 period, what forcing (identified by source_id's) are we recommending? Once we recommend these, the protocol is set, and anything not using these forcings is not a CMIP7:CMIP:historical simulation (although I have noted, the CMIP6 precedent was looser than what we are discussing now - #222 (comment))

@znichollscr
Copy link
Collaborator

This new data will have an identity of "CEDS-CMIP-2026-02-01" (or similar)

My two cents: this either goes in CMIP7Plus or it goes in a different target_mip (ExtensionMIP for example).

What we are trying to do here is match forcing data to experiments, and specifically the experimental protocol...Once we recommend these, the protocol is set, and anything not using these forcings is not a CMIP7:CMIP:historical simulation

I agree with this. I'd actually go further. If we're trying to match forcings to experiments, that's a job for the experiment definitions. The experiment definition should say: for this experiment, use these source IDs for your forcing data. We as forcings providers don't provide the recommendation, the experiment protocol provides the recommendation (which then allows for DAMIP, AerChemMIP etc. to use the forcings data without us having to predict such uses in advance, which is impossible).

although I have noted, the CMIP6 precedent was looser than what we are discussing now

I think it's good we're moving away from this, we can improve our communication I feel.

@durack1
Copy link
Contributor

durack1 commented Mar 16, 2025

If we're trying to match forcings to experiments, that's a job for the experiment definitions. The experiment definition should say: for this experiment, use these source IDs for your forcing data.

I agree with everything immediately above in #222 (comment), and extend one step further. We will need to keep a running table of the data that are required to meet the CMIP7:CMIP:piControl/historical (and esm-* variants), as these are likely to evolve from what we receive first, which is exactly what these doc pages that we're discussing are targeted at. For context, it's useful to consider the CMIP6 experience, which straddled almost a year of find, fix issues and restart - see below (this is all captured in gory detail in the CMIP6 Forcing Datasets Summary google doc)

Date Collection version Why
Tue 20th Dec 2016 6.0.0 Happy xmas 2016, let's get going with CMIP6 DECK
Wed 17th May 2017 6.1.0 Sorry! Issues with CEDS CO2, CH4 emissions resolved; Snuck a PCMDI-AMIP-1-1-2 temporal extension in 2016-06 to 2016-12
Mon 22nd May 2017 6.1.1 Sorry!! Further issues with CEDS CO2, CH4 emissions resolved; MACv2SP published
Mon 11th Sep 2017 6.2.0 Sorry!!! CEDS aircraft SO2 corrected; IACETH volcanic SAOD v3 data release (replaced v2)
Fri 6th Oct 2017 6.2.1 Sorry!!!! CEDS aircraft SO2 corrected again; CMIP6 DECK proper here we come
Oct 2017 through Jun 2022 many lots of subsequent trickle updates, mostly for non-DECK data' PCMDI-AMIP was extended 5 times
Wed 22nd Jun 2022 6.2.43 Updated PCMDI-AMIP-1-1-8 temporal extension

If it wasn't obvious, I was recommending that we follow a similar 7.0.0, 7.0.1, 7.1.0, 7.1.1, 7.2.0 or hopefully a considerably more simple versioning experience for CMIP7. First number = mip_era, second = DECK collection version (a DECK-required dataset changes), third = a marker that changes if ANYTHING changes (MIP data publication), docs, data, anything.

Once we have the forcing data clearly and uniquely identified, it's then over to modelling groups to start to use them, and once sims are complete, to document their forcing use alongside the simulations that are being published. A question, do we want this info (considering we have unique source_id's) to be embedded in netcdf file metadata? The best template we have to document forcing data use is Lurton et al., 2020, but that was the anomaly in the CMIP6 model doc suite, rather than the rule across modelling groups. We also had groups that used the r W i X p Y f ${\color{red}Z}$ judiciously to indicate changes in forcing across simulation/ensemble members.

@znichollscr I hope this syncs with what you're thinking, and @vnaik60 I hope this meets your expectations too - following our CMIP6 experience..

@znichollscr I’d also note this dovetails the documenting the “CMIP experimental protocol” discussion from months back, which could be folded into the CV TT remit.. noting ES-Doc attempted to do this in CMIP6, but never synced with the forcing info, as this was more loosely identified than now - we have a source_id managed repo here

@znichollscr
Copy link
Collaborator

We will need to keep a running table of the data that are required to meet the CMIP7:CMIP:piControl/historical (and esm-* variants)

Do you have any sense of whose remit this is? Is it a data request thing, a forcings thing, is their an experiment protocol task team? (maybe the last comment answers this, "roll into the CVs task team" and make it a CVs and experiment definition task team)

For context, it's useful to consider the CMIP6 experience, which straddled almost a year of find

I very much hope we are not doing this for a year...

If it wasn't obvious, I was recommending that we follow a similar 7.0.0, 7.0.1, 7.1.0, 7.1.1, 7.2.0 or hopefully a considerably more simple versioning experience for CMIP7. First number = mip_era, second = DECK collection version (a DECK-required dataset changes), third = a marker that changes if ANYTHING changes (MIP data publication), docs, data, anything.

I think the suggestion is fine. However, I don't think it actually solves the problem. People want to know a) what data they should use for what experiment and b) if they need to restart. The labelling suggested is too broad for this (it can't distinguish between 'restart historical' vs. 'restart piControl' for example). I think we can get what we want, but it needs something that links experiments and source IDs (i.e. an answer to the first question posed above).

Once we have the forcing data clearly and uniquely identified, it's then over to modelling groups to start to use them

I would say that it's already over to modelling groups to be using forcing data, we already have the data identified (yes, we are missing the experiment connection piece but we will get that and modelling groups shouldn't wait in the meantime).

A question, do we want this info (considering we have unique source_id's) to be embedded in netcdf file metadata?

Ideally, yes. In practice, I think we should assume that this won't happen and build a system that allows us to update this information after the files have been published (so we're not republishing TBs of data just to add missing forcing IDs).

@vnaik60
Copy link
Collaborator

vnaik60 commented Mar 18, 2025

Summarizing what we agree on and discussing open questions (as I lost track of things in the thread) :

  • Documentation of CMIP7 forcings: we agree that the CMIP7 forcings be documented as such and not be separated into a fast track phase. This is addressed by confirmation from John CMIP7 Historical Forcings are part of the DECK #222 (comment) and is has been implemented here Remove mention of CMIP7 AR7 fast track #223. Thanks!

  • How to handle provisional data that may come as the forcings are extended (annually)? ? here is my understanding as a modeler. Each CMIP phase has a unique DECK (https://wcrp-cmip.org/cmip-phases/cmip7/). DECK simulations are needed for "Diagnostic, Evaluation and Characterization" of models simulating climate. For CMIP7, the DECK experiments are
    historical
    esm-hist (for ESMs only)
    piClim-control
    piClim-anthro
    piClim-4xCO2
    Typically every new phase of CMIP is characterized by model updates which then requires starting from a DECK to characterize the model responses. The communityMIPs, fast tracks, any other crisisMIPs (falling in the time period when one version of model is frozen and the next is not ready yet) are built on this DECK, especially the piControl and Historical experiments. The historical simulation in a CMIP phase can be extended without having to rerun the full timeseries as long as the forcings are also extended (and not updated; a full dataset update would require re-running the whole DECK which from my perspective is a new CMIP phase, but please correct me if I am wrong). Let's use the example of emissions. We start with a frozen set of CMIP7 emissions (1850-2021, say version 7.0.2 after a bit of iteration if issues are found between now and when the simulations are - this versioning is not exactly the same as Paul describes in CMIP7 Historical Forcings are part of the DECK #222 (comment), more in %%) to run the DECK. Additionally Steve provides a "provisional" dataset for years 2022-2023. Models can pick this and extend their historical simulation to 2023 if they choose to and upload these years tagged as "historical-ext" or "hist-esm-ext" (note that this extension experiment is not a part of CMIP7 DECK at the moment, see @@). In 2026, Steve may update the provisional years based on upstream data updates and produce emissions for 2022-2024 (say version 7.01). Models can rerun the overlapping 2022-2023 period plus 2024, using the restart files they save for 2021 in their historical simulations. If model output is uplaoded to ESGF, it would still be historical-ext or hist-esm-ext with a different version. It could be labeled as "CMIP7Plus" but I would not recommend it given the precedence of a completely different forcing dataset in CMIP6Plus. I see CMIP#Plus as a transition phase towards CMIP#+1 where new forcings are tested using models from the CMIP# phase. Now, at some point in 2028-2029, there may be a need to update the full emission timeseries (also for all the other forcings). This may be the start of CMIP7Plus-CMIP8.
    I am totally happy to tweak my thinking if there are documents that lay out this "continuous" CMIP phases plan CMIP7 Historical Forcings are part of the DECK #222 (comment) or if there are plans coming out of the WIP.

  • how to document the forcings datasets being used for specific model simulations?:- I think this is where Paul was heading with

Once we have the forcing data clearly and uniquely identified, it's then over to modelling groups to start to use them, and once sims are complete, to document their forcing use alongside the simulations that are being published. A question, do we want this info (considering we have unique source_id's) to be embedded in netcdf file metadata? The best template we have to document forcing data use is Lurton et al., 2020, but that was the anomaly in the CMIP6 model doc suite, rather than the rule across modelling groups. We also had groups that used the r W i X p Y f Z judiciously to indicate changes in forcing across simulation/ensemble members.

Once we have a stable dataset (no bugs found, no further discussions etc) and time is ticking to get the models started, then we will use what is available here https://input4mips-cvs.readthedocs.io/en/latest/database-views/input4MIPs_source-id_CMIP7.html. If game-changing updates come in later, those who can restart their simulations, will do and those who cannot, will not (at least this is what happened in CMIP6). And since all the datasets are versioned, modelers will have a way of keeping track of what forcings they have used. Now the question is how can the modelers share the information on which versions of forcings have been used. I personally did not find the convention r W i X p Y f Z helpful (remember having conversations with Balaji on this). fZ was a label for all forcings so when a forcing changed, Z changed but there was no way to tell which forcing Z was referring to.

On,

A question, do we want this info (considering we have unique source_id's) to be embedded in netcdf file metadata?

To clarify, are you thinking about embedding the metadata for all the forcings in the netcdf file that was generated from a simualtion that used those forcings? If so, I dont think this is a great idea as this would increase the size of the metadata (and the file). As of now, there are a max of 10 forcings data that are used by the current generation of models (in various combinations, e.g., prescribed ozone versus emission-driven ozone), and having their source_id and some additional information included in the metadata of each file is an overkill, in my humble opinion. Instead, I agree with this

The best template we have to document forcing data use is Lurton et al., 2020, but that was the anomaly in the CMIP6 model doc suite, rather than the rule across modelling groups.

If a journal paper is not feasible, modeling centers could be given the option to put together a technical document summarizing the source_ids for the forcing dataset used in the simualtions they have uploaded on ESGF. Something like this https://data.giss.nasa.gov/modelE/cmip6/ but with specific forcings dataset versions. As much as, we would like information on forcings datasets being used for specific model simulations to be automated, I don't think there is a viable way to do this other than a manually filled document or spreadsheet (at least that's what we did at GFDL, though on hindsight Thibaut's way was better).

%%

If it wasn't obvious, I was recommending that we follow a similar 7.0.0, 7.0.1, 7.1.0, 7.1.1, 7.2.0 or hopefully a considerably more simple versioning experience for CMIP7. First number = mip_era, second = DECK collection version (a DECK-required dataset changes), third = a marker that changes if ANYTHING changes (MIP data publication), docs, data, anything.

For forcings, this numbering may get complicated. Our CMIP7 forcing collection already did not follow this numbering. Data providers have their on versioning system and it is probably convenient that way. The important thing is that we should have information on which version of each forcing dataset is used in a particular model experiment, consistent with Zeb's thinking - "People want to know a) what data they should use for what experiment and b) if they need to restart."

@@ In CMIP6, it was probably added later per Eyring et al 2016 "To distinguish between the portion of the historical period when all models will use the same forcing data sets (i.e. 1850–2014) from the extended period where different data sets might be used, the experiment for 1850–2014 will be labelled historical (esm-hist in the case of the emission-driven run) and the period from 2015 through near-present will likely be labelled historical-ext (esm-hist-ext). "

@znichollscr
Copy link
Collaborator

Thanks @vnaik60, helpful. My quick replies

How to handle provisional data that may come as the forcings are extended (annually)?

A good question. I don't think anyone has an answer for this and I don't think it is urgent yet so I would suggest leaving this to one side for now.

If a journal paper is not feasible, modeling centers could be given the option to put together a technical document summarizing the source_ids for the forcing dataset used in the simualtions they have uploaded on ESGF

I'd agree with this (although, to be honest, I think a journal paper is never feasible, they're way too slow to write and publish so I would go for a much more lightweight solution as compulsory, with a journal paper being optional).

For forcings, this numbering may get complicated

I agree and don't think the forcing collection version number will provide the information we want.

My takeaway: "People want to know a) what data they should use for what experiment and b) if they need to restart" is still the thing we need to do next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset-issue An issue with a dataset
Projects
None yet
Development

No branches or pull requests

4 participants