Skip to content

Conversation

westonbrown
Copy link

Add LlamaCpp Model Provider Tutorial

Issue #, if available: Related to strands-agents/sdk-python#585

Summary

This PR adds the first comprehensive tutorial showcasing the new LlamaCppModel provider class (merged in #585), demonstrating how to run on-device quantized function calling models with the Strands Agents SDK. This tutorial fills a gap in our documentation by showing developers how to deploy AI agents locally, using efficient quantized models that run on resource constrained hardware.

Value to the Repository

This tutorial is essential because it:

  1. Demonstrates the LlamaCppModel class - First official tutorial for the newly added model provider
  2. Enables offline AI - Shows how to build agents that work without internet connectivity

Key Features Demonstrated

  • Local Model Deployment: Run quantized GGUF models (4-bit, 8-bit) locally
  • Multimodal Processing: Audio transcription/translation and image analysis
  • Grammar Constraints: GBNF grammar for guaranteed output formats
  • Performance Optimization: Benchmarking and optimization strategies

Tutorial Structure

03-llamacpp-model/
├── README.md                 # Setup guide and overview
├── llamacpp_demo.ipynb      # Interactive tutorial notebook
├── requirements.txt         # Python dependencies
└── utils/                   # Helper modules
    ├── audio_recorder.py    # Speech recognition interface
    ├── image_utils.py       # Image processing utilities
    ├── grammar_utils.py     # Grammar constraint demos
    └── benchmark_utils.py   # Performance testing tools

What Users Learn

  1. LlamaCppModel Setup - Configure and use the new strands.models.llamacpp.LlamaCppModel class
  2. Quantized Models - Download and run GGUF models (Qwen, Llama, Mistral, etc.)
  3. Grammar Constraints - Use GBNF grammars for controlled generation
  4. Multimodal Agents - Build agents that process audio, images, and text
  5. Performance Tuning - Optimize inference speed and memory usage
  6. Tool Integration - Add custom functions to local agents

Example Code Snippet

from strands import Agent
from strands.models.llamacpp import LlamaCppModel

# Create local model instance
model = LlamaCppModel(
    base_url="http://localhost:8080",
    params={"temperature": 0.7, "max_tokens": 300}
)

# Use grammar constraints
model.use_grammar_constraint('root ::= "yes" | "no"')

# Create agent with local model
agent = Agent(model=model)
response = agent("Is Python a compiled language?")  # Returns: "no"

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@westonbrown
Copy link
Author

westonbrown commented Sep 15, 2025

Requesting review for the following tutorial. The base PR for this feature link has been merged into the last strands release.

clean_base_url = base_url.rstrip('/').replace('/v1', '')
model = LlamaCppModel(
base_url=clean_base_url,
params={**params, "max_tokens": 100}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the model generates text and reaches the max_tokens limit of 100, I'm getting MaxTokensReachedException exception instead of returning the generated text. This appears to be an issue with the SDK. As a workaround, could you please increase max_tokens to 500, and let’s also submit this as a potential SDK bug?

params={"temperature": temperature, "max_tokens": max_tokens}
)

model.use_grammar_constraint(grammar)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im getting the following error here — AttributeError: 'LlamaCppModel' object has no attribute 'use_grammar_constraint'. Would it be possible that Llama.cpp library has updated this method?

### 4. Run the Tutorial

```bash
jupyter notebook llamacpp_demo.ipynb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong file name


## Additional Examples

The `examples/` directory contains standalone Python scripts demonstrating specific features.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples directory doesn't exist. could you please remove this from readme?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants