BRIGHT

BRIGHT is the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. The queries are collected from diverse domains (StackExchange, LeetCode, and math competitions), all sourced from realistic human data. Experiments show that existing retrieval models perform poorly on BRIGHT, where the highest score is only 21 measured by nDCG@10. BRIGHT provides a good testbed for future retrieval research in more realistic and challenging settings.

Dataset Link

Data Card Author(s)

Hongjin Su, HKU: Owner
Howard Yen, Princeton: Owner
Mengzhou Xia, Princeton: Owner
Weijia Shi, UW: Contributor
Niklas Muennighoff: Contributor
Han-yu Wang, HKU: Contributor
Haisu Liu, HKU: Contributor
Quan Shi, Princeton: Contributor
Zachary S. Siegel, Princeton: Contributor
Michael Tang, Princeton: Contributor
Ruoxi Sun, Google: Contributor
Jinsung Yoon, Google: Contributor
Sercan Ö. Arik, Google: Contributor
Danqi Chen, Princeton: Contributor
Tao Yu, HKU: Contributor

Authorship

Publishers

Publishing Organization(s)

The University of Hong Kong

Industry Type(s)

Academic - Tech

Contact Detail(s)

Publishing POC: N/A
Affiliation: N/A
Contact: N/A
Mailing List: N/A
Website: N/A

Dataset Owners

Team(s)

Hongjin Su, Howard Yen and Mengzhou Xia

Contact Detail(s)

Dataset Owner(s): Hongjin Su, Howard Yen and Mengzhou Xia
Affiliation: The University of Hong Kong and Princeton University
Contact: hjsu@cs.hku.hk, {hyen,mengzhou}@cs.princeton.edu
Group Email: N/A
Website: N/A

Author(s)

Hongjin Su, PhD student, The University of Hong Kong
Howard Yen, Masters student, Princeton University
Mengzhou Xia, PhD student, Princeton University

Funding Sources

Institution(s)

Princeton University
Google cloud AI research

Funding or Grant Summary(ies)

Non-Sensitive Data about people
Public data accessible to everyone

Dataset Snapshot

Category	Data
Size of Dataset	607 MB
Number of Instances	1322
Number of Fields	6
Domains	11

Above: We collect 1322 diverse queries from realistics human data. Each example is annotated with the gold documents and the reasning traces to fine them.

Content Description

The datasets are collected from StackExchange, TheoremQA, LeetCode and Math competition.

Descriptive Statistics

Dataset	# Q	# D	# D+	Q.L.	D.L.
Biology	103	57,364	3.6	83.6	115.2
Earth Science	118	122,388	7.7	132.4	113.3
Economics	103	50,221	8.0	120.2	181.5
Psychology	101	52,841	7.3	118.2	149.6
Robotics	101	62,198	5.5	120.6	818.9
Stack Overflow	117	101,100	7.0	704.5	478.3
Sustainable Living	108	60,732	5.6	108.0	148.5
LeetCode	142	413,932	1.8	483.1	497.5
Pony	112	7,894	22.5	98.3	102.6
AoPS	111	188,177	4.7	89.0	250.5
TheoremQA	206	188,177	3.2	117.1	250.5

Data statistics of BRIGHT For each dataset, we show the number of queries (# Q) and documents (# D), the average number of positive documents (# D+) per example, the average length of queries (Q.L.) and documents (D.L., measured by the GPT-2 tokenizer)

Sensitivity of Data

Sensitivity Type(s)

None

Field(s) with Sensitive Data

None

Intentional Collected Sensitive Data

None

Unintentionally Collected Sensitive Data

None

Security and Privacy Handling

We select academia-oriented domains and remove all user information in StackExchange data.

Risk Type(s)

No Known Risks

Supplemental Link(s)

None

Risk(s) and Mitigation(s)

N/A

Dataset Version and Maintenance

Maintenance Status

Actively Maintained - No new versions will be made available, but this dataset will be actively maintained, including but not limited to updates to the data.

Version Details

Current Version: 1.0

Last Updated: 06/2024

Release Date: 06/2024

Maintenance Plan

We will mainly use Github issues and huggingface communities to address any issue the users encounter in using the BRIGHT data.

Versioning: If new versions are released, it will become 1.1 or 2.0 depending on the update.

Updates: There may be updates in the future.

Errors: We will address the error users encounter

Feedback: Either by email, Github issue, huggingface community, we welcome all fedback to make this benchmark better

Next Planned Update(s)

Version affected: N/A

Next data update: N/A

Next version: N/A

Next version update: N/A

Expected Change(s)

Updates to Data: N/A

Updates to Dataset: N/A

Additional Notes: Add here

Example of Data Points

Primary Data Modality

Text Data

Sampling of Data Points

Typical Data Points Link

Typical Data Point

Summarize here. Include any criteria for typicality of data point.

{
  "query": "Claim in article about why insects are attracted to light\nIn this article they are addressing the reason insects are attracted to light when they say\nHeat radiation as an attractive component is refuted by the effect of LED lighting, which supplies negligible infrared radiation yet still entraps vast numbers of insects.\nI don't see why attraction to LEDs shows they're not seeking heat. Could they for example be evolutionarily programmed to associate light with heat? So that even though they don't encounter heat near/on the LEDs they still \"expect\" to?",
  "reasoning": "The question probes why insects are drawn to low-heat LED lights, challenging the idea that their attraction to light is heat-based. The document helps distinguish between heat attraction and evolved behaviors, shedding light on why insects might be attracted to LEDs despite their minimal heat.",
  "id": "0",
  "excluded_ids": [
    "N/A"
  ],
  "gold_ids_long": [
    "insects_attracted_to_light/Proximate_and_ultimate_causation.txt",
    "insects_attracted_to_light/Phototaxis.txt"
  ],
  "gold_ids": [
    "insects_attracted_to_light/Phototaxis_3.txt",
    "insects_attracted_to_light/Proximate_and_ultimate_causation_0.txt",
    "insects_attracted_to_light/Phototaxis_4.txt",
    "insects_attracted_to_light/Proximate_and_ultimate_causation_1.txt",
    "insects_attracted_to_light/Phototaxis_0.txt"
  ]
}

Motivations & Intentions

Motivations

Purpose(s)

Research

Domain(s) of Application

retrieval

Motivating Factor(s)

Existing retrieval benchmarks can be solved by lexical or semantic match
Many realistic scenarios cannot be solved by such simple match
To bridge this gap, we introduce BRIGHT to evaluate retrieval models in realistic settings where intensive reasoning is required

Intended Use

Dataset Use(s)

Evaluate retrieval systems in realistic scenarios

Suitable Use Case(s)

Suitable Use Case: Evaluate retrieval models

Unsuitable Use Case(s)

Unsuitable Use Case: Train retrieval models

Research and Problem Space(s)

We investigate new directions of retrieval, where the relevance between queries and documents go beyond lexical and semantic similarities.

Citation Guidelines

Guidelines & Steps: Include citation when using BRIGHT

BiBTeX:

@inproceedings{BRIGHT,
  title={BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval},
  author={Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and Sun, Ruoxi and Yoon, Jinsung and Arik, Sercan O and Chen, Danqi and Yu, Tao},
  year={2024},
}

Access, Rentention, & Wipeout

Access

Access Type

External - Open Access

Documentation Link(s)

Dataset Website URL: https://huggingface.co/datasets/xlangai/BRIGHT
GitHub URL: https://github.com/xlang-ai/BRIGHT

Prerequisite(s)

N/A

Policy Link(s)

Direct download URL: https://huggingface.co/datasets/xlangai/BRIGHT

Code to download data:

from datasets import load_dataset
data = load_dataset('xlangai/BRIGHT', 'examples')['biology']

Access Control List(s)

N/A

Retention

Free retention

Wipeout and Deletion

We are not currently considering wiping out or deleting the data

Provenance

Collection

Method(s) Used

Data are collected by authors

Methodology Detail(s)

Collection Type

Source: StackExchange, TheoremQA, LeetCode and Math competition.

Platform: N/A

Is this source considered sensitive or high-risk? No

Dates of Collection: 2024.03~2024.05

Primary modality of collection data:

Usage Note: Select one for this collection type.

Text Data

Update Frequency for collected data:

Usage Note: Select one for this collection type.

Static

Additional Links for this collection:

N/A

Source Description(s)

Source: StackExchange is a popular question-answering platform where users ask questions and receive answers from the community. One example is

How good is it to reuse water from plant pots?

I'm living in an apartment, and after I water my plants the water goes to plates below the pots. The pots are in a metallic structure above the plates, so I can take the plates to reuse the water (throwing it at the plants again).

This reuse seems beneficial, because I think I can get rid of mosquitoes that would reproduce in the stagnated water. And also some nutrients of the soil (as well as earthworms) can return to the vase.

Is there some negative points in doing that?

EDIT: I think I must add that I'm at 3 degrees of latitude, in a hot and humid tropical rainforest, where the precipitation used to be around 1700 mm. So I use lots of water everyday, more than once a day sometimes, so the reused water is a small fraction of the water used.

waterreuseplants
Share
Improve this question
Follow
edited Mar 17, 2016 at 15:27
asked Sep 3, 2015 at 18:39
Rodrigo's user avatar
Rodrigo
16311 silver badge66 bronze badges
i think you mean "pots" if they have dirt in them. "vases" hold water and cur flowers. – 
Kate Gregory
 Mar 17, 2016 at 14:53
Yes, @KateGregory, you're absolutely right. That's because in Portuguese we call them "vasos" :) – 
Rodrigo
 Mar 17, 2016 at 15:25
Add a comment
2 Answers
Sorted by:

Highest score (default)
7

In my experience plants suffer in the long term from accumulation of salts in the soil, so fresh water would be better than reusing the water. Even better would be to get hold of fresh rain water (tricky in an apartment though, unless perhaps you have a balcony that gets rained on) for watering them, as that won't contain the salts that tap water does.

More detail here.

Share
Improve this answer
Follow

Source: LeetCode is a popular coding platform for programmers to practice. One example is:

5. Longest Palindromic Substring
Medium
Topics
Companies
Hint
Given a string s, return the longest 
palindromic
 
substring
 in s.

 

Example 1:

Input: s = "babad"
Output: "bab"
Explanation: "aba" is also a valid answer.
Example 2:

Input: s = "cbbd"
Output: "bb"
 

Constraints:

1 <= s.length <= 1000
s consist of only digits and English letters.

Source: AoPS contains math competition questions. One example is:

Problem 1
What is the ones digit of 222,22 -222,222, -2,222,-222222 \
A. 0
B. 2
C. 4
D. 6
E. 8      
        

Solution 1
We can rewrite the expression as\[222,222-(22,222+2,222+222+22+2).\]
We note that the units digit of the addition is $0$ because all the units digits of the five numbers are $2$ and $5*2=10$, which has a units digit of $0$.
Now, we have something with a units digit of $0$ subtracted from $222,222$. The units digit of this expression is obviously $2$, and we get $\boxed{B}$ as our answer.

Collection Cadence

Static: Data was collected once from single or multiple sources.

Data Integration

Source

Included Fields

Data fields that were collected and are included in the dataset.

Field Name	Description
Post	The content of post where users ask questions

Additional Notes: Add here

Excluded Fields

Data fields that were collected but are excluded from the dataset.

Field Name	Description
Answer	Community answers
Votes	The votes for the post of answers

Data Processing

All the data collection and processing are done manually or with the help of python scripts.

Collection Criteria

Data Selection

StackExchange: We select posts that have links in answers receiving user accept or more than 5 votes
Math and Code: We select questions that require a theorems of syntax documentation.

Data Inclusion

We include data from diverse domains including psychology, robotics, etc.

Data Exclusion

We exclude examples that do not require reasoning in retrieval or do not use theorems.

Relationship to Source

Use & Utility(ies)

StackExchange: We use the post and linked web pages in answers
Math & Code: We use the questions and tags in websites.

Benefit and Value(s)

Using this method, we collect retrieval instances that require intensive reasoning to retrieve documents

Limitation(s) and Trade-Off(s)

The judgement of relevance can be subjective, leading to non-perfect human performance.

Version and Maintenance

First Version

Release date: 06/2024
Link to dataset: BRIGHT 1.0: https://huggingface.co/datasets/xlangai/BRIGHT
Status: Actively Maintained
Size of Dataset: 607 MB
Number of Instances: 1322

Note(s) and Caveat(s)

None

Cadence

Daily

Last and Next Update(s)

We have not updated the datasets since release.

Changes on Update(s)

N/A

Human and Other Sensitive Attributes

Sensitive Human Attribute(s)

None

Intentionality

Intentionally Collected Attributes

Extended Use

Use with Other Data

Safety Level

Safe to use with other data

Known Safe Dataset(s) or Data Type(s)

The data in BRIGHT benchmark focus on academia-oriented domains, and they should be safe.

Best Practices

Evaluate retrieval systems on BRIGHT.

Known Unsafe Dataset(s) or Data Type(s)

None

Limitation(s) and Recommendation(s)

The judgement of relevance between queries and documents can be subjective, so marginal difference between model evaluation could be ignored, while significant difference gives good signals of model capabilities.

Forking & Sampling

Safety Level

Safe to form and/or sample

Acceptable Sampling Method(s)

Cluster Sampling
Haphazard Sampling
Multi-stage sampling
Random Sampling
Retrospective Sampling
Systematic Sampling
Weighted Sampling
Unsampled

Best Practice(s)

Although sampling is possible, we recommend not to do it because the size of BRIGHT is not very large.

Risk(s) and Mitigation(s)

N/A

Limitation(s) and Recommendation(s)

N/A

Use in ML or AI Systems

Dataset Use(s)

Evaluation

Notable Feature(s)

The intensive reasoning required to retrieve documents.

Usage Guideline(s)

Usage Guidelines: Follow the tutorial to evaluate retrieval systems.

Approval Steps: Steps are here.

Reviewer: We authors review the dataset for publication.

Distribution(s)

The BRIGHT benchmark is for the purpose of evaluation, i.e., all data are in test set.

Known Correlation(s)

query, gold_ids, gold_ids_long

Description: The documents corresponding to gold_ids or gold_ids_long are relevant to queries.

Impact on dataset use: It can help evaluate retrieval systems in more realistic scenarios.

Risks from correlation: The judgement of correlation is by real users, and can be subjective.

Split Statistics

Dataset	# Q	# D	# D+	Q.L.	D.L.
Biology	103	57,364	3.6	83.6	115.2
Earth Science	118	122,388	7.7	132.4	113.3
Economics	103	50,221	8.0	120.2	181.5
Psychology	101	52,841	7.3	118.2	149.6
Robotics	101	62,198	5.5	120.6	818.9
Stack Overflow	117	101,100	7.0	704.5	478.3
Sustainable Living	108	60,732	5.6	108.0	148.5
LeetCode	142	413,932	1.8	483.1	497.5
Pony	112	7,894	22.5	98.3	102.6
AoPS	111	188,177	4.7	89.0	250.5
TheoremQA	206	188,177	3.2	117.1	250.5

Data statistics of BRIGHT For each dataset, we show the number of queries (# Q) and documents (# D), the average number of positive documents (# D+) per example, the average length of queries (Q.L.) and documents (D.L., measured by the GPT-2 tokenizer)

Transformations

Synopsis

Transformation(s) Applied

Data Aggregation

Field(s) Transformed

Transformation Type

Field Name	Source & Target
gold_ids	links: gold_ids
gold_ids_long	links: gold_ids_long

Library(ies) and Method(s) Used

Transformation Type

Method: We follow the links or tags to find relevant documents.

Platforms, tools, or libraries: We do not leverage other platforms or tools in transformation

Transformation Results: We collect 1322 examples that can be used for evaluating retrievers.

Breakdown of Transformations

We find documents for all instances following the procedure above

Residual & Other Risk(s)

The risk is that the relevance judgement is subjective.

Human Oversight Measure(s)

We require human annotators to write down the judgement for relevance and reasoning steps.

Additional Considerations

None

Cleaning Mismatched Value(s)

We select high-quality data instance from websites, so there is no further cleaning.

Method(s) Used

We follow links or tags in the websites.

Comparative Summary

We do not use incorrect or mismatched values.

Residual & Other Risk(s)

M/A

Human Oversight Measure(s)

The data and notes written down by annotators are reviewed

Additional Considerations

None

Anomalies

We select data from websites, so no anomaly or outlier is excluded.

Method(s) Used

N/A

Residual & Other Risk(s)

N/A

Human Oversight Measure(s)

N/A

Additional Considerations

N/A

Annotations & Labeling

Annotation Workforce Type

Human Annotations (Expert)
Human Annotations (Non-Expert)

Annotation Characteristic(s)

Annotation Type	Number
Total number of annotations	1322

Annotation Description(s)

Description: Description of annotations (labels, ratings) produced. Include how this was created or authored.

We follow links/tags to find relevant documents

Link: N/A

Platforms, tools, or libraries: N/A

Annotation Distribution(s)

Dataset	number
Biology	103
Earth Science	118
Economics	103
Psychology	101
Robotics	101
Stack Overflow	117
Sustainable Living	108
LeetCode	142
Pony	112
AoPS	111
TheoremQA	206

Distribution of data splits in each domain

Annotation Task(s)

(Task Type)

Task description & instructions: In this section, we describe the instructions for annotators to collect data in BRIGHT.

StackExchange

Browse posts from the newest to the oldest.
Discard posts without an answer accepted by the user or obtains more than 5 votes
Discard answers of posts without URL links.
For each link in the answer, write down the answers to: (1). why are the document and the post relevant; (2). what is the reasoning required to understand the relevance between the post and the document. If there answers are not possible, discard the link.
Use LLMs (e.g., ChatGPT, Claude, etc.) to generate post key words, or use the post title to search for web pages with large keyword or semantic overlap in Google. Search for at most 5 negative web pages per query.
Split every web page into small passages either by two newline symbols, "#" in markdonw files or fixed-length tokens

TheoremQA

In TheoremQA, the main task for the annotator is to check if the GPT-4 rewritten questions are valid. The specific instructions are as follows:

Read the rewritten question and determine if it is solvable.
If it is solvable, read the original question and solution, and determine if the rewritten question is consistent with the original question. That is, the same reasoning steps and the final answer should hold.
If it is also consistent, mark the question as valid, and make any minor edits to the problem statement (e.g., to improve grammar or fluency) as you see fit.
If it is not solvable or not consistent, read the original question and solution, and correct the rewritten question if possible. If not, then discard the problem.

AoPS In AoPS, annotators are tasked to find questions from the AoPS Wiki and record the problems:

Browse through the AoPS Wiki and find topic/category pages (example 1, example 2).
Look through each page and find pages specific theorems or techniques that can be used to solve problems. The page should link to at least two competition problems (example 1, example 2).
Record the links of both the theorem/technique as well as the problem pages. The annotators are assigned a category to look for theorems in to avoid overlaps, and the categories are {algebra, geometry, calculus, probability, number theory, other}. After all links are collected, we use a web scraper to collect the problem statement and solutions, and we manually check the quality of the scraped data.

LeetCode In LeetCode, annotators determine whether a question is grounded in real-world concepts. We give a similar instruction to the annotator as to GPT-4:

Read the problem statement carefully.
Categorize the question into one of three categories: • 0: The question is not grounded in any real-world concepts. The description only uses coding-specific terms, such as "linked list", "binary search", "palindrome", "sorting", etc.. • 1: The question is not grounded in any real-world concepts or real-world concepts that are commonly used in the context of coding, such as needle in a haystack, strings/words, or a spiral matrix.

• 2: The question is grounded in real-world concepts that are not commonly used in the context of coding, such as building height, planting trees, or games. It may still uses some code-specific terms to specify the data structure involved.

Methods used: Basically we follow links/tags to find documents

Inter-rater adjudication policy: Reviewers annotate where the pairing of queries and documents are not convincing.

Golden questions: N/A

Human Annotators

Annotator Description(s)

(Annotation Type)

Task type: Annotate StackExchange data

Number of unique annotators: 3

Expertise of annotators: Both experts and non-experts

Description of annotators: PhD students in computer science, biology, environment, etc.

Language distribution of annotators: They all speak fluent English

Geographic distribution of annotators: They come from Asia

Summary of annotation instructions: Follow links to find documents with filtering

Summary of gold questions: N/A

Annotation platforms: Google sheets

Additional Notes: N/A

Annotator Task(s)

(Task Type)

Task description: Annotate math and code data

Task instructions: Follow tags to find similar problems/questions

Methods used: Follow tags annotated by websites

Inter-rater adjudication policy: The data is reviewed

Golden questions: N/A

Additional notes: N/A

Language(s)

(Annotation Type)

100% English

Location(s)

(Annotation Type)

Asia [50 %]
US [50 %]

Gender(s)

(Annotation Type)

Male [80 %]
Female [20 %]

Validation Types

Method(s)

Code/cross-reference Validation

Breakdown(s)

(Validation Type)

Number of Data Points Validated: 1322

Fields Validated

All fields in data are validated

Description(s)

(Validation Type)

Method: Describe the validation method here. Include links where necessary.

We require annotators to write the logic to determine the relevance between queries and documents. The reviewers not only check the data, but also annotators' notes.

Validation Results:

Over 90% of annotation passes peer review, and we discard the the rest part.

Description of Human Validators

Characteristic(s)

(Validation Type)

Unique validators: 8
Number of examples per validator: 300
Average cost/task/validator: N/A
Training provided: N
Expertise required: N

Description(s)

(Validation Type)

Validator description: Validators are domain experts, e.g., PhD students from the corresponding domains.

Training provided: We do not provide training, but verify that the annotators, reviewers are qualified

Validator selection criteria: We have a test containing verified examples. An annotator is qualified if they can work out these examples.

Training provided: N/A

Language(s)

(Validation Type)

English [100 %]

Location(s)

(Validation Type)

Asia [60 %]
US [40 %]

Gender(s)

(Validation Type)

Male [70 %]
Female [30 %]

Sampling Methods

Method(s) Used

Unsampled

Characteristic(s)

N/A

Sampling Criteria

N/A

Known Applications & Benchmarks

ML Application(s)

Retrieval evaluation

Evaluation Result(s)

SFR-Embedding-Mistral 17.8

Model Card: https://huggingface.co/Salesforce/SFR-Embedding-Mistral/tree/main

Evaluation Results

nDCG@10: 17.8

Evaluation Process(es)

We write python scripts to run retrieval models on BRIGHT.

Description(s) and Statistic(s)

SFR-Embedding-Mistral

Model Card: https://huggingface.co/Salesforce/SFR-Embedding-Mistral/tree/main

Model Description: The best-class retrieval model trained from mistral-7b

Model Size: 7.11B
Model Weights: 7.11B
Model Layers 32
Latency: 2s

Expected Performance and Known Caveats

Claude-3 + BM25

Expected Performance: surpasses results obtained without using LLMs

Known Caveats: The inference of LLMs can be expensive

Terms of Art

Concepts and Definitions referenced in this Data Card

BRIGHT

Definition: The name of this benchmark

Source: https://huggingface.co/datasets/xlangai/BRIGHT

Interpretation: N/A

Reflections on Data

We believe that BRIGHT paves the way for future research on retrieval20 systems in more realistic and challenging settings.

Files

Dataset_documentation.md

Latest commit

History

Dataset_documentation.md

File metadata and controls

BRIGHT

Dataset Link

Data Card Author(s)

Authorship

Publishers

Publishing Organization(s)

Industry Type(s)

Contact Detail(s)

Dataset Owners

Team(s)

Contact Detail(s)

Author(s)

Funding Sources

Institution(s)

Funding or Grant Summary(ies)

Dataset Snapshot

Content Description

Descriptive Statistics

Sensitivity of Data

Sensitivity Type(s)

Field(s) with Sensitive Data

Security and Privacy Handling

Risk Type(s)

Supplemental Link(s)

Risk(s) and Mitigation(s)

Dataset Version and Maintenance

Maintenance Status

Version Details

Maintenance Plan

Next Planned Update(s)

Expected Change(s)

Example of Data Points

Primary Data Modality

Sampling of Data Points

Typical Data Point

Motivations & Intentions

Motivations

Purpose(s)

Domain(s) of Application

Motivating Factor(s)

Intended Use

Dataset Use(s)

Suitable Use Case(s)

Unsuitable Use Case(s)

Research and Problem Space(s)

Citation Guidelines

Access, Rentention, & Wipeout

Access

Access Type

Documentation Link(s)

Prerequisite(s)

Policy Link(s)

Access Control List(s)

Retention

Wipeout and Deletion

Provenance

Collection

Method(s) Used

Methodology Detail(s)

Source Description(s)

Collection Cadence

Data Integration

Data Processing

Collection Criteria

Data Selection

Data Inclusion

Data Exclusion

Relationship to Source

Use & Utility(ies)

Benefit and Value(s)

Limitation(s) and Trade-Off(s)

Version and Maintenance

First Version

Note(s) and Caveat(s)