You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-8
Original file line number
Diff line number
Diff line change
@@ -93,18 +93,22 @@ We provide our pre-filled argparse run configs for all experiments under `code/r
93
93
The few-shots for all tasks are also available in `assets/{task}/few-shot/corpus-task-32.jsonl`, so you can start running/reproducing our experiments.
94
94
95
95
#### Step 0: Download required files and set up the directory structure
96
-
* Embedding database: Download our embedding database from [here](http://data.cis.lmu.de/data/craft/embeddings.h5) and place it under `datasets/embeddings.h5`
97
-
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link.
96
+
* Embedding database: Download our embedding database from the link below and place it under `datasets/embeddings.h5`
97
+
*http://data.cis.lmu.de/data/craft/embeddings.h5
98
+
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link. Sometimes, you may also have to paste the address into the search bar directly.
98
99
* We provide the sha256 checksum for the file in this repository in `checksum`
99
100
* C4: Download the 305GB `en` version of C4 from [Hugging Face](https://huggingface.co/datasets/allenai/c4). We used the Git download version. It is not mentioned there that you have to run `git lfs checkout` after everything is downloaded so that the lazy files are actually linked to the downloaded files.
100
-
* Wikipedia: Download our cleaned Wikipedia corpus samples from [here](http://data.cis.lmu.de/data/craft/wikipedia_cleaned.tar.gz) and place them under `datasets/wikipedia/cleaned/`
101
-
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link.
101
+
* Wikipedia: Download our cleaned Wikipedia corpus samples from the link below and place them under `datasets/wikipedia/cleaned/`
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link. Sometimes, you may also have to paste the address into the search bar directly.
102
104
* We provide the sha256 checksum for the file in this repository in `checksum`
103
-
* WikiHow: Download our cleaned WikiHow corpus samples from [here](http://data.cis.lmu.de/data/craft/wikihow_cleaned.tar.gz) and place them under `datasets/wikihow/cleaned/`
104
-
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link.
105
+
* WikiHow: Download our cleaned WikiHow corpus samples from the link below and place them under `datasets/wikihow/cleaned/`
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link. Sometimes, you may also have to paste the address into the search bar directly.
105
108
* We provide the sha256 checksum for the file in this repository in `checksum`
106
-
* StackExchange: Download our cleaned StackExchange corpus samples from [here](http://data.cis.lmu.de/data/craft/stackexchange_cleaned.tar.gz) and place them under `datasets/stackexchange/cleaned/`
107
-
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link.
109
+
* StackExchange: Download our cleaned StackExchange corpus samples from the link below and place them under `datasets/stackexchange/cleaned/`
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link. Sometimes, you may also have to paste the address into the search bar directly.
108
112
* We provide the sha256 checksum for the file in this repository in `checksum`
109
113
* Make sure that each task folder under `assets/` has the following subfolder available: `assets/{task}/corpus_samples/`, `assets/{task}/outputs/`, `assets/{task}/results/`, `assets/{task}/task_samples/`
110
114
* Create a `model_ckpts` directory. All LoRA adapters will be saved here
0 commit comments