Adding four tasks from RULER #28

rbiswasfc · 2024-06-27T11:39:34Z

This PR adds four tasks from the RULER benchmark. Specifically, these are the tasks:

QA2 (hotpotqa after adding distracting information)
Multi-hop Tracing: Variable Tracking (VT)
Aggregation: Common Words (CWE)
Multi-keys Needle-in-a-haystack (NIAH)

Currently, each task is having 4k context length - which can be adjusted as needed.

…sion into rb/ruler

rbiswasfc · 2024-06-27T11:50:40Z

Running with default setting with Qwen2-1.5B-Instruct model (e.g. python eval.py --tasks rulercwe --checkpoint_path checkpoints/Qwen/Qwen2-1.5B-Instruct/model.pth ) I got these scores:

rulerqa: StringMatch_score: 38.8
rulerniah: StringMatch_score: 89.00
rulervt: StringMatch_score: 77.96
rulercwe: StringMatch_score: 31.08

rbiswasfc added 4 commits June 27, 2024 15:33

adding ruler/qa2 4k

d9cf5ab

Merge branch 'main' of https://github.com/AnswerDotAI/context-compres…

7eaf591

…sion into rb/ruler

added 4 tasks from ruler

17a6f66

ruff

cc8c039

rbiswasfc requested review from fladhak and griff4692 June 27, 2024 11:39

griff4692 approved these changes Jun 27, 2024

View reviewed changes

griff4692 merged commit 6a02bef into main Jun 27, 2024

griff4692 deleted the rb/ruler branch June 27, 2024 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding four tasks from RULER #28

Adding four tasks from RULER #28

Uh oh!

rbiswasfc commented Jun 27, 2024

Uh oh!

rbiswasfc commented Jun 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding four tasks from RULER #28

Adding four tasks from RULER #28

Uh oh!

Conversation

rbiswasfc commented Jun 27, 2024

Uh oh!

rbiswasfc commented Jun 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants