Skip to content

Commit

Permalink
update target data and requirments
Browse files Browse the repository at this point in the history
  • Loading branch information
LucieContamin committed Apr 5, 2024
1 parent d7d0737 commit 0d409b8
Show file tree
Hide file tree
Showing 7 changed files with 1,958 additions and 1,848 deletions.
39 changes: 23 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,22 +237,23 @@ documentation associated with the
In this round, the required target for trajectories will be **weekly incident
infections, cases, and deaths in California and North Carolina for a set of
specified racial/ethnic groups.** Trajectories will need to be paired across
racial/ethnic groups (i.e., for a given model, location, scenario and horizon,
all race/ethnicity data for simulation 1 corresponds to the sum of
racial/ethnic groups ann horizaom (i.e., for a given model, location, scenario
and horizon, all race/ethnicity data for simulation 1 corresponds to the sum of
race/ethnicity-specific estimates for simulation 1).

In California, required racial/ethnic groups are `"latino"`, `"black"`,
`"white"`, `"asian"`, and `"other"`, where `"other"` represents American Indian
Alaska Native and Native Hawaiian and Pacific Islander.
`"white"`, `"asian"`, `"other"`, and `"overall"`.

In North Carolina, required racial/ethnic groups are `"white"`, `"black"`, `"asian"`,
and `"other"`, where `"other"` represents the sum of other and Hispanic/Latino.
`"other"`, and `"overall"`.

Given the missingness in demographic disease data and limited data available on
case reporting rates by race/ethnicity, infections and cases will not be evaluated.
The definitions of race/ethnicity can differ across various datasets. For more
information please consult the [target data README](./target-data/README.md).

Given the missingness in demographic disease data and limited data available on case
reporting rates by race/ethnicity, it will be optional for teams to submit cases.
Infections and deaths will be required and only weekly death targets will be evaluated.

Teams will be submitting cases and infections for the purpose of model comparison and
weekly death targets will be evaluated.

### Additional Information

Expand Down Expand Up @@ -296,15 +297,21 @@ The folder contains multiple sub-folders:
Saturday, April 3, 2021 (20 week projection period). Weeks follow epi-weeks
(Sun-Sat) dated by the last day of the week.

- Weekly targets: Weekly incident infections, cases, and deaths by location
- Weekly targets: Weekly incident infections, deaths by location
and major racial/ethnic group. We require the following racial/ethnic groups
by state:
- California: `"latino"`, `"black"`, `"white"`, `"asian"`, and `"other"`.
- North Carolina: `"black"`, `"white"`, `"asian"`, and `"other"`.

- 100-300 individual trajectories for each target. Trajectories should be sampled
in such a way that they will be most likely to produce the uncertainty of the
simulated process.
- California: `"latino"`, `"black"`, `"white"`, `"asian"`, `"other"`, and `"overall"`.
- North Carolina: `"black"`, `"white"`, `"asian"`, `"other"`, and `"overall"`.

- Optional to also provide weekly incidant cases by location
and major racial/ethnic group (as previously stated)

- We required 100-300 individual trajectories for each target. Trajectories
should be sampled in such a way that they will be most likely to produce
the uncertainty of the simulated process. Projection quantiles are optional.
- For teams who wish to submit quantiles, the format is in accordance with
prior SMh rounds. We ask for the following quantiles:
0.01, 0.025, 0.05, every 5% to 0.95, 0.975, and 0.99

- Metadata: We will require a brief meta-data from all teams.

Expand Down
32 changes: 26 additions & 6 deletions hub-config/tasks.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@
"optional": null
},
"race_ethnicity": {
"required": ["latino", "asian", "black", "white", "other"],
"required": ["latino", "asian", "black", "white", "other", "overall"],
"optional": null
},
"target": {
"required": ["inc death", "inc inf", "inc case"],
"optional": null
"required": ["inc death", "inc inf"],
"optional": ["inc case"]
},
"horizon": {
"required": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
Expand All @@ -43,6 +43,16 @@
"type": "double",
"minimum": 0
}
},
"quantile":{
"type_id": {
"required": [0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99],
"optional": [0, 1]
},
"value" : {
"type": "numeric",
"minimum": 0
}
}
},
"target_metadata": [
Expand Down Expand Up @@ -99,12 +109,12 @@
"optional": null
},
"race_ethnicity": {
"required": ["asian", "black", "white", "other"],
"required": ["asian", "black", "white", "other", "overall"],
"optional": null
},
"target": {
"required": ["inc death", "inc inf", "inc case"],
"optional": null
"required": ["inc death", "inc inf"],
"optional": ["inc case"]
},
"horizon": {
"required": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
Expand All @@ -122,6 +132,16 @@
"type": "double",
"minimum": 0
}
},
"quantile": {
"type_id": {
"required": [0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99],
"optional": [0, 1]
},
"value" : {
"type": "numeric",
"minimum": 0
}
}
},
"target_metadata": [
Expand Down
48 changes: 21 additions & 27 deletions model-output/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ The "arrow" library can be used to read/write the files in
Other tools are also accessible, for example [parquet-tools](https://github.com/hangxie/parquet-tools)

For example, in R:
```
```R
# To write "parquet" file format:
filename <-path/YYYY-MM-DD-team_model.parquet
arrow::write_parquet(df, filename)
Expand Down Expand Up @@ -174,24 +174,20 @@ The submission can contain multiple output type information:
please consult the [quantile](./data-processed#quantile)
section.

The requested targets are (for "sample" type output):
The requested targets are:

- weekly incident infections
- weekly incident cases
- weekly incident infections
- weekly incident deaths

Optional target (for "quantile" type output):
Optional target :

- quantile:
- weekly cumulative hospitalizations
- weekly incident hospitalizations
- peak size hospitalizations
- weekly incident case

Values in the `target` column must be one of the following character strings:

- `"inc inf"`
- `"inc case"`
- `"inc death"`
- `"inc case"`


#### inc inf
Expand All @@ -206,21 +202,6 @@ Saturday, inclusive).

Projections of infections will be used to compare outputs between
models but will not be evaluated against observations.


#### inc case

This target is the incident (weekly) number of cases
predicted by the model during the week that is N weeks after
`origin_date`.

A week-ahead scenario should represent the total number of new
cases reported during a given epiweek (from Sunday through Saturday,
inclusive).

Projections of infections will be used to compare outputs between
models but will not be evaluated against observations.


#### inc death

Expand All @@ -239,6 +220,20 @@ available in the
folder.


#### inc case

This target is the incident (weekly) number of cases
predicted by the model during the week that is N weeks after
`origin_date`.

A week-ahead scenario should represent the total number of new
cases reported during a given epiweek (from Sunday through Saturday,
inclusive).

Projections of infections will be used to compare outputs between
models but will not be evaluated against observations.


### `location`

Values in the `location` column must be: `"06"` for California
Expand Down Expand Up @@ -326,8 +321,6 @@ Teams should provide the following 23 quantiles:
0.550 0.600 0.650 0.700 0.750, 0.800 0.850 0.900 0.950 0.975 0.990
```

An optional `0` and `1` value can also be provided.

For example:

|origin_date|scenario_id|location|target|horizon|age_group|output_type|output_type_id|run_grouping|stochastic_run|value|
Expand All @@ -352,6 +345,7 @@ Accepted values in the `race_ethnicity` column are:
- "latino" (accepted only for California ("06"))
- "other"
- "black"
- "overall"

----

Expand Down
95 changes: 92 additions & 3 deletions target-data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,10 @@ up until 4/3/2021.

Race/ethnicity grouping in North Carolina
includes `"black"`, `"white"`, `"asian"`, and `"other"`.
Here `"other"` represents the sum of other and Hispanic/Latino.

Race/ethnicity
grouping in California includes `"latino"`, `"black"`, `"white"`,
`"asian"`, and `"other"`. All non-Latino groups are non-Hispanic and
`"latino"` represents both Latino and Hispanic.
`"asian"`, and `"other"`.

Case demographic data is sourced from the
[COVID-19 Race-Ethnicity Timeseries](https://data.chhs.ca.gov/dataset/covid-19-equity-metrics/resource/ef29f30e-320c-46cf-86cd-37a36663616d) from
Expand All @@ -44,10 +42,101 @@ from the North Carolina Department of Public Health.
Death demographic data is sourced from
[The National Center for Health Statistics](https://wonder.cdc.gov/mcd-icd10-provisional.html)

- [`source/nc_case_demographics_county.csv`](./source/nc_case_demographics_county.csv):
raw county-level estimates of cases by race in North Carolina with more
detailed information on suppression from
[NC COVID-19 Dashboard Data](https://covid19.ncdhhs.gov/dashboard/data-behind-dashboards)
from the North Carolina Department of Public Health.


### Overall population data

- [cases_overall_jhu.csv](./cases_overall_jhu.csv): Overall cases reported
to Johns Hopkins Center for Systems Science and Engineering (JHU-CSSE)
in California and North Carolina. This gives full cases reported, whereas
the public health department case data has some missingness by
race/ethnicity.


### Definition

The definitions of race/ethnicity can differ across various datasets. Racial/ethnic
groups are reported consistently across death data, serology, and census data,
where if an individual is Hispanic or Latino, their primary racial/ethnic
group is “latino,” and all other racial subgroups are interpreted as non-Hispanic.
Note that serology is subject to self-reporting whereas death certificate and
census data are not, so although classifications are the same, there may still be
slight differences in groups represented. We provide tables for mapping how data
is reported to how it is observed in the target data.

Serology and census data are available in the
[covid19-smh-research_resources](https://github.com/midas-network/covid19-smh-research_resources)
GitHub repository,
[disparities folder](https://github.com/midas-network/covid19-smh-research_resources/tree/main/disparities)


#### Serology and census data

|State|Race_ethnicity|Target Data|
|:---|:----|:----|
|California|Hispanic or Latino|latino|
|California|non-Hispanic White|white|
|California|non-Hispanic Black|black|
|California|non-Hispanic Asian|asian|
|California|non-Hispanic Other|other|
|North Carolina|Hispanic or Latino|other|
|North Carolina|non-Hispanic White|white|
|North Carolina|non-Hispanic Black|black|
|North Carolina|non-Hispanic Asian|asian|
|North Carolina|non-Hispanic Other|other|

#### NCHS Death data

In the target death data, there are more detailed breakups of the same population groups.
This dataset is also subject to a small proportion of known suppression which has been
distributed uniformly across suppressed groups. This is noted in the `“suppressed”` column
in the target data files. Teams will be evaluated on the sum of the `"value"`
columns in the target death data.

|State|Race_ethnicity|Target Data|
|:---|:----|:----|
|California|Hispanic or Latino|latino|
|California|non-Hispanic White|white|
|California|non-Hispanic Black|black|
|California|non-Hispanic Asian|asian|
|California|non-Hispanic American Indian or Alaskan Native|other|
|California|non-Hispanic Native Hawaiian or other Pacific Islander|other|
|California|non-Hispanic more than once race |other|
|North Carolina|Hispanic or Latino|other|
|North Carolina|non-Hispanic White|white|
|North Carolina|non-Hispanic Black|black|
|North Carolina|non-Hispanic Asian|asian|
|North Carolina|non-Hispanic Other|other|
|North Carolina|non-Hispanic American Indian or Alaskan Native|other|
|North Carolina|non-Hispanic Native Hawaiian or other Pacific Islander|other|
|North Carolina|non-Hispanic more than once race |other|

#### Case and Vaccination data

Case and vaccination data is derived from state public health department data.
California reports racial/ethnic grouping consistently with other data sources
but North Carolina reports race and ethnicity separately. We choose to use race data
in North Carolina given higher completeness and consistency with other data sources.
Cases and vaccination data are mapped in the following format:


|State|Race_ethnicity|Target Data|
|:---|:----|:----|
|California|Latino|latino|
|California|White|white|
|California|African American|black|
|California|Asian American|asian|
|California|American Indian Native Hawaiian or other Pacific Islander|other|
|California|Multiracial|other|
|California|non-Hispanic more than once race |other|
|North Carolina|White|white|
|North Carolina|Black or African American|black|
|North Carolina|American Indian or Alaskan Native|other|
|North Carolina|Asian or other Pacific Islander|other|
|North Carolina|Additional Groups|other|

Binary file not shown.
Loading

0 comments on commit 0d409b8

Please sign in to comment.