From 29fed188871b3bf3d9d7bf4534ad08826a305988 Mon Sep 17 00:00:00 2001 From: Akshita Bhagia Date: Thu, 25 Jan 2024 21:00:46 -0800 Subject: [PATCH 1/5] update readme with inference section --- README.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/README.md b/README.md index 3e63e51dc..f239a0e0a 100644 --- a/README.md +++ b/README.md @@ -46,3 +46,37 @@ torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \ ``` Note: passing CLI overrides like `--reset_trainer_state` is only necessary if you didn't update those fields in your config. + + +## Inference + +To run inference on the olmo checkpoints: + +```python +from hf_olmo import * + +from transformers import AutoModelForCausalLM, AutoTokenizer +olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B") +tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B") +message = ["Language modeling is "] + +inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False) +response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95) +print(tokenizer.batch_decode(response, skip_special_tokens=True)[0]) +``` + +Alternatively, with the huggingface pipeline abstraction: + +```python +from transformers import pipeline +olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B") +print(olmo_pipe("Language modeling is")) +``` + +### Quantization + +```python +olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B", torch_dtype=torch.float16, load_in_8bit=True) # requires bitsandbytes +``` + +The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda') to avoid potential issues. From 761fa94d66e9d3c243656cc4dbe9cdd1d98c78a1 Mon Sep 17 00:00:00 2001 From: Akshita Bhagia Date: Thu, 25 Jan 2024 21:06:36 -0800 Subject: [PATCH 2/5] add subsection about hf conversion --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index f239a0e0a..3da6ca978 100644 --- a/README.md +++ b/README.md @@ -73,6 +73,15 @@ olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B") print(olmo_pipe("Language modeling is")) ``` + +### Inference on finetuned checkpoints + +If you finetune the model using the code above, you can use the conversion script to convert a native olmo checkpoint to a huggingface-compatible checkpoint + +```bash +python hf_olmo/convert_olmo_to_hf.py --checkpoint-dir /path/to/checkpoint +``` + ### Quantization ```python From 35fcf30dd0fc1cd0b59c5bd33ebf423bf4f7363b Mon Sep 17 00:00:00 2001 From: Akshita Bhagia Date: Fri, 26 Jan 2024 11:46:04 -0800 Subject: [PATCH 3/5] Update README.md Co-authored-by: Pete --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3da6ca978..c5bf876b7 100644 --- a/README.md +++ b/README.md @@ -54,12 +54,12 @@ To run inference on the olmo checkpoints: ```python from hf_olmo import * - from transformers import AutoModelForCausalLM, AutoTokenizer + olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B") tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B") -message = ["Language modeling is "] +message = ["Language modeling is "] inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False) response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95) print(tokenizer.batch_decode(response, skip_special_tokens=True)[0]) From 99d5090cb5df6eceabacbbb14ea93f4dec72ba3f Mon Sep 17 00:00:00 2001 From: Akshita Bhagia Date: Fri, 26 Jan 2024 11:46:19 -0800 Subject: [PATCH 4/5] Update README.md Co-authored-by: Pete --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c5bf876b7..e126f1d66 100644 --- a/README.md +++ b/README.md @@ -76,7 +76,7 @@ print(olmo_pipe("Language modeling is")) ### Inference on finetuned checkpoints -If you finetune the model using the code above, you can use the conversion script to convert a native olmo checkpoint to a huggingface-compatible checkpoint +If you finetune the model using the code above, you can use the conversion script to convert a native OLMo checkpoint to a HuggingFace-compatible checkpoint ```bash python hf_olmo/convert_olmo_to_hf.py --checkpoint-dir /path/to/checkpoint From 780e38627d5335ef887012e62026560b9f8a22a3 Mon Sep 17 00:00:00 2001 From: Akshita Bhagia Date: Tue, 30 Jan 2024 11:25:10 -0800 Subject: [PATCH 5/5] update content --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a96976b26..d84f5a660 100644 --- a/README.md +++ b/README.md @@ -71,10 +71,11 @@ Note: passing CLI overrides like `--reset_trainer_state` is only necessary if yo ## Inference -To run inference on the olmo checkpoints: +You can utilize our HuggingFace integration to run inference on the olmo checkpoints: ```python -from hf_olmo import * +from hf_olmo import * # registers the Auto* classes + from transformers import AutoModelForCausalLM, AutoTokenizer olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B")