Skip to content

Conversation

sangioai
Copy link

@sangioai sangioai commented Sep 5, 2025

Clone of this PR on 🤗

  • Add input_ids parsing into the generate function, commonly used in 🤗 pipelines

Example:

# Load models
tok = AutoTokenizer.from_pretrained("apple/FastVLM-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "apple/FastVLM-7B",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
    trust_remote_code=True,
)

# Build prompt
messages = [
    {"role": "user", "content": "Describe San Francisco"}
]

# Tokenize
inputs = tokenizer.apply_chat_template(
      messages,
      add_generation_prompt=True,
      tokenize=True,
      return_dict=True,
      return_tensors="pt",
 ).to(model.device)

# Generate  
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=128,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant