Skip to content

Commit 2c8984a

Browse files
authored
[ENH] add example for LLama 3 vllm (#381)
1 parent 32c8c0d commit 2c8984a

File tree

2 files changed

+47
-0
lines changed

2 files changed

+47
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
<|im_start|>system
2+
You are a helpful assistant that ranks models by the quality of their answers.
3+
<|im_end|>
4+
<|im_start|>user
5+
I want you to create a leaderboard of different large language models. To do so, I will give you the instructions (prompts) given to the models, and the responses of two models. Please rank the models based on which responses would be preferred by humans. All inputs and outputs should be Python dictionaries.
6+
7+
Here is the prompt:
8+
{
9+
"instruction": """{instruction}""",
10+
}
11+
12+
Here are the outputs of the models:
13+
[
14+
{
15+
"model": "model_1",
16+
"answer": """{output_1}"""
17+
},
18+
{
19+
"model": "model_2",
20+
"answer": """{output_2}"""
21+
}
22+
]
23+
24+
Now please rank the models by the quality of their answers, so that the model with rank 1 has the best output. Then return a list of the model names and ranks, i.e., produce the following output:
25+
[
26+
{'model': <model-name>, 'rank': <model-rank>},
27+
{'model': <model-name>, 'rank': <model-rank>}
28+
]
29+
30+
Your response must be a valid Python dictionary and should contain nothing else because we will directly execute it in Python. Please provide the ranking that the majority of humans would give.
31+
<|im_end|>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
alpaca_eval_vllm_llama3_70b_fn:
2+
prompt_template: "alpaca_eval_vllm_llama3_70b_fn/alpaca_eval_fn.txt"
3+
fn_completions: "vllm_local_completions"
4+
completions_kwargs:
5+
model_name: "/home/shared/Meta-Llama-3-70B-Instruct" # TODO: replace with path to the model
6+
model_kwargs:
7+
tokenizer_mode: "auto"
8+
trust_remote_code: True
9+
max_model_len: 7000
10+
tensor_parallel_size: 2 # 2 GPUs
11+
is_chatml_prompt: true
12+
max_new_tokens: 100
13+
temperature: 0.0
14+
top_p: 1.0
15+
batch_size: 128
16+
fn_completion_parser: "ranking_parser"

0 commit comments

Comments
 (0)