Skip to content

Commit 35fe80b

Browse files
committed
Draft README for WebSRC
1 parent 955bd06 commit 35fe80b

File tree

1 file changed

+51
-1
lines changed

1 file changed

+51
-1
lines changed

Diff for: lmms_eval/tasks/websrc/README.md

+51-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,51 @@
1-
# WebSRC
1+
# WebSRC
2+
3+
## Paper
4+
5+
Title: WebSRC: A Dataset for Web-Based Structural Reading Comprehension
6+
7+
Abstract: https://arxiv.org/abs/2101.09465
8+
9+
Homepage: https://x-lance.github.io/WebSRC/#
10+
11+
WebSRC is a dataset for web-based structural reading comprehension.
12+
Its full train/dev/test split contains over 400k questions across 6.4k webpages.
13+
This version of the dataset does not contain OCR or original HTML, it simply treats WebSRC as a image-and-text-based multimodal Q&A benchmark on webpage screenshots.
14+
15+
## Citation
16+
17+
```bibtex
18+
@inproceedings{chen2021websrc,
19+
title={WebSRC: A Dataset for Web-Based Structural Reading Comprehension},
20+
author={Chen, Xingyu and Zhao, Zihan and Chen, Lu and Ji, Jiabao and Zhang, Danyang and Luo, Ao and Xiong, Yuxuan and Yu, Kai},
21+
booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
22+
pages={4173--4185},
23+
year={2021}
24+
}
25+
```
26+
27+
## Groups & Tasks
28+
29+
### Groups
30+
31+
- `websrc`: Evaluates `websrc-val` and generates a submission file for `websrc-test`.
32+
33+
### Tasks
34+
35+
- `websrc-val`: Given a question and a web page, predict the answer.
36+
- `websrc-test`: Given a question and a web page, predict the answer. Ground truth is not provided for this task.
37+
38+
## Metrics
39+
40+
This task uses SQUAD-style evaluation metrics, of which F1 score over tokens is used.
41+
The orignal paper also uses Exact Match (EM) score, but this is not implemented here as that metric is more conducive for Encoder-only extraction models.
42+
43+
### F1 Score
44+
45+
F1 Score is the harmonic mean of precision and recall.
46+
We calculate precision and recall at the token level, then compute the F1 score as normal using these values.
47+
48+
### Test Submission
49+
50+
When evaluaing on the test split, a prediction JSON will be compiled instead of metrics computed.
51+
Instructions for submission are available on the [WebSRC homepage](https://x-lance.github.io/WebSRC/#) and in their [Original GitHub Repo](https://github.com/X-LANCE/WebSRC-Baseline#obtain-test-result).

0 commit comments

Comments
 (0)