Skip to content

Commit

Permalink
Merge pull request #415 from allenai/readme-inf
Browse files Browse the repository at this point in the history
Update readme with inference section
  • Loading branch information
AkshitaB authored Jan 30, 2024
2 parents 2587240 + 780e386 commit 141cc94
Showing 1 changed file with 45 additions and 1 deletion.
46 changes: 45 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,50 @@ torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
Note: passing CLI overrides like `--reset_trainer_state` is only necessary if you didn't update those fields in your config.


## Inference

You can utilize our HuggingFace integration to run inference on the olmo checkpoints:

```python
from hf_olmo import * # registers the Auto* classes

from transformers import AutoModelForCausalLM, AutoTokenizer

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B")

message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
```

Alternatively, with the huggingface pipeline abstraction:

```python
from transformers import pipeline
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B")
print(olmo_pipe("Language modeling is"))
```


### Inference on finetuned checkpoints

If you finetune the model using the code above, you can use the conversion script to convert a native OLMo checkpoint to a HuggingFace-compatible checkpoint

```bash
python hf_olmo/convert_olmo_to_hf.py --checkpoint-dir /path/to/checkpoint
```

### Quantization

```python
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B", torch_dtype=torch.float16, load_in_8bit=True) # requires bitsandbytes
```

The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda') to avoid potential issues.


## Evaluation

Additional tools for evaluating OLMo models are available at the [OLMo Eval](https://github.com/allenai/ai2-olmo-eval) repo.
Additional tools for evaluating OLMo models are available at the [OLMo Eval](https://github.com/allenai/ai2-olmo-eval) repo.

0 comments on commit 141cc94

Please sign in to comment.