You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromtransformersimportAutoTokenizerfromvllmimportLLM, SamplingParams# Initialize the tokenizertokenizer=AutoTokenizer.from_pretrained("../Qwen2-Math-7B-Instruct")
# Pass the default decoding hyperparameters of Qwen2.5-7B-Instruct# max_tokens is for the maximum length for generation.sampling_params=SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=512)
# Input the model name or path. Can be GPTQ or AWQ models.llm=LLM(model="../Qwen2-Math-7B-Instruct", enforce_eager=True)
# Prepare your promptsprompt="Tell me something about large language models."messages= [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text=tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# generate outputsoutputs=llm.generate([text], sampling_params)
# Print the outputs.foroutputinoutputs:
prompt=output.promptgenerated_text=output.outputs[0].textprint(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
The text was updated successfully, but these errors were encountered:
system info:
gpu: L20
cuda: 12.2
pytorch: 2.5.1
graphics card driver version: 535.161.08
vllm version: 0.6.4.post1
inference script:
The text was updated successfully, but these errors were encountered: