Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MS MARCO (V2) regressions for docTTTTTquery expansions #1735

Closed
lintool opened this issue Jan 13, 2022 · 2 comments · Fixed by #1744
Closed

Add MS MARCO (V2) regressions for docTTTTTquery expansions #1735

lintool opened this issue Jan 13, 2022 · 2 comments · Fixed by #1744
Assignees

Comments

@lintool
Copy link
Member

lintool commented Jan 13, 2022

Building on #1721 #1730 and castorini/pyserini#939

We are now ready to build regressions for MS MARCO (V2) {doc, passage} with docTTTTTquery expansions.

@lintool
Copy link
Member Author

lintool commented Jan 16, 2022

For doc, @ronakice has prepared the two expanded corpora on orca at:

/store/scratch/rpradeep/msmarco-v2/collections/msmarco_v2_doc_segmented_d2q-t5_10
/store/scratch/rpradeep/msmarco-v2/collections/msmarco_v2_doc_d2q-t5

Since these corpora have already been broken into smaller files and individually gzipped, I've simply copied into their final locations on orca under /store/collections/msmarco:

$ du -h msmarco_v2_doc_d2q-t5/ msmarco_v2_doc_segmented_d2q-t5/
46G	msmarco_v2_doc_d2q-t5/
73G	msmarco_v2_doc_segmented_d2q-t5/

I've also created tarballs at /store/collections/msmarco/tarballs:

$ md5sum msmarco_v2_doc_d2q-t5.tar msmarco_v2_doc_segmented_d2q-t5.tar
cdd8e4823b237d9d4d6e05f7c02c4f26  msmarco_v2_doc_d2q-t5.tar
3eb16c3efc19e834b7ca8c62b9c0ddcc  msmarco_v2_doc_segmented_d2q-t5.tar

@lintool
Copy link
Member Author

lintool commented Jan 19, 2022

For passage, @ronakice has prepared two expanded corpora on orca at:

/store/scratch/rpradeep/msmarco-v2/collections/msmarco_v2_passage_augmented_d2q-t5_20
/store/scratch/rpradeep/msmarco-v2/collections/msmarco_v2_passage_d2q-t5_20

Since these corpora have already been broken into smaller files and individually gzipped, I've simply copied into their final locations on orca under /store/collections/msmarco:

$ du -h msmarco_v2_passage_d2q-t5 msmarco_v2_passage_augmented_d2q-t5
47G	msmarco_v2_passage_d2q-t5
66G	msmarco_v2_passage_augmented_d2q-t5

I've also created tarballs at /store/collections/msmarco/tarballs:

$ md5sum msmarco_v2_passage_d2q-t5.tar msmarco_v2_passage_augmented_d2q-t5.tar
61632bdb3313dc5631c563d650acf6d2  msmarco_v2_passage_d2q-t5.tar
7ce979309caeeb0a28dd1d79f24851b6  msmarco_v2_passage_augmented_d2q-t5.tar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants