Skip to content

Commit 875063a

Browse files
committed
update README.md
1 parent 239c2e8 commit 875063a

File tree

1 file changed

+12
-8
lines changed

1 file changed

+12
-8
lines changed

README.md

+12-8
Original file line numberDiff line numberDiff line change
@@ -93,18 +93,22 @@ We provide our pre-filled argparse run configs for all experiments under `code/r
9393
The few-shots for all tasks are also available in `assets/{task}/few-shot/corpus-task-32.jsonl`, so you can start running/reproducing our experiments.
9494

9595
#### Step 0: Download required files and set up the directory structure
96-
* Embedding database: Download our embedding database from [here](http://data.cis.lmu.de/data/craft/embeddings.h5) and place it under `datasets/embeddings.h5`
97-
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link.
96+
* Embedding database: Download our embedding database from the link below and place it under `datasets/embeddings.h5`
97+
* http://data.cis.lmu.de/data/craft/embeddings.h5
98+
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link. Sometimes, you may also have to paste the address into the search bar directly.
9899
* We provide the sha256 checksum for the file in this repository in `checksum`
99100
* C4: Download the 305GB `en` version of C4 from [Hugging Face](https://huggingface.co/datasets/allenai/c4). We used the Git download version. It is not mentioned there that you have to run `git lfs checkout` after everything is downloaded so that the lazy files are actually linked to the downloaded files.
100-
* Wikipedia: Download our cleaned Wikipedia corpus samples from [here](http://data.cis.lmu.de/data/craft/wikipedia_cleaned.tar.gz) and place them under `datasets/wikipedia/cleaned/`
101-
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link.
101+
* Wikipedia: Download our cleaned Wikipedia corpus samples from the link below and place them under `datasets/wikipedia/cleaned/`
102+
* http://data.cis.lmu.de/data/craft/wikipedia_cleaned.tar.gz
103+
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link. Sometimes, you may also have to paste the address into the search bar directly.
102104
* We provide the sha256 checksum for the file in this repository in `checksum`
103-
* WikiHow: Download our cleaned WikiHow corpus samples from [here](http://data.cis.lmu.de/data/craft/wikihow_cleaned.tar.gz) and place them under `datasets/wikihow/cleaned/`
104-
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link.
105+
* WikiHow: Download our cleaned WikiHow corpus samples from the link below and place them under `datasets/wikihow/cleaned/`
106+
* http://data.cis.lmu.de/data/craft/wikihow_cleaned.tar.gz
107+
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link. Sometimes, you may also have to paste the address into the search bar directly.
105108
* We provide the sha256 checksum for the file in this repository in `checksum`
106-
* StackExchange: Download our cleaned StackExchange corpus samples from [here](http://data.cis.lmu.de/data/craft/stackexchange_cleaned.tar.gz) and place them under `datasets/stackexchange/cleaned/`
107-
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link.
109+
* StackExchange: Download our cleaned StackExchange corpus samples from the link below and place them under `datasets/stackexchange/cleaned/`
110+
* http://data.cis.lmu.de/data/craft/stackexchange_cleaned.tar.gz
111+
* Please note that we host the files on an `http` address. If your browser autocompletes to `https`, you may need to manually adjust the link. Sometimes, you may also have to paste the address into the search bar directly.
108112
* We provide the sha256 checksum for the file in this repository in `checksum`
109113
* Make sure that each task folder under `assets/` has the following subfolder available: `assets/{task}/corpus_samples/`, `assets/{task}/outputs/`, `assets/{task}/results/`, `assets/{task}/task_samples/`
110114
* Create a `model_ckpts` directory. All LoRA adapters will be saved here

0 commit comments

Comments
 (0)