llamacpp-model-tutorial; #139

westonbrown · 2025-08-13T21:10:19Z

Add LlamaCpp Model Provider Tutorial

Issue #, if available: Related to strands-agents/sdk-python#585

Summary

This PR adds the first comprehensive tutorial showcasing the new LlamaCppModel provider class (merged in #585), demonstrating how to run on-device quantized function calling models with the Strands Agents SDK. This tutorial fills a gap in our documentation by showing developers how to deploy AI agents locally, using efficient quantized models that run on resource constrained hardware.

Value to the Repository

This tutorial is essential because it:

Demonstrates the LlamaCppModel class - First official tutorial for the newly added model provider
Enables offline AI - Shows how to build agents that work without internet connectivity

Key Features Demonstrated

Local Model Deployment: Run quantized GGUF models (4-bit, 8-bit) locally
Multimodal Processing: Audio transcription/translation and image analysis
Grammar Constraints: GBNF grammar for guaranteed output formats
Performance Optimization: Benchmarking and optimization strategies

Tutorial Structure

03-llamacpp-model/
├── README.md                 # Setup guide and overview
├── llamacpp_demo.ipynb      # Interactive tutorial notebook
├── requirements.txt         # Python dependencies
└── utils/                   # Helper modules
    ├── audio_recorder.py    # Speech recognition interface
    ├── image_utils.py       # Image processing utilities
    ├── grammar_utils.py     # Grammar constraint demos
    └── benchmark_utils.py   # Performance testing tools

What Users Learn

LlamaCppModel Setup - Configure and use the new strands.models.llamacpp.LlamaCppModel class
Quantized Models - Download and run GGUF models (Qwen, Llama, Mistral, etc.)
Grammar Constraints - Use GBNF grammars for controlled generation
Multimodal Agents - Build agents that process audio, images, and text
Performance Tuning - Optimize inference speed and memory usage
Tool Integration - Add custom functions to local agents

Example Code Snippet

from strands import Agent
from strands.models.llamacpp import LlamaCppModel

# Create local model instance
model = LlamaCppModel(
    base_url="http://localhost:8080",
    params={"temperature": 0.7, "max_tokens": 300}
)

# Use grammar constraints
model.use_grammar_constraint('root ::= "yes" | "no"')

# Create agent with local model
agent = Agent(model=model)
response = agent("Is Python a compiled language?")  # Returns: "no"

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

westonbrown · 2025-09-15T17:07:23Z

Requesting review for the following tutorial. The base PR for this feature link has been merged into the last strands release.

manoj-selvakumar5 · 2025-09-18T18:08:34Z

01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/grammar_utils.py

+    clean_base_url = base_url.rstrip('/').replace('/v1', '')
+    model = LlamaCppModel(
+        base_url=clean_base_url,
+        params={**params, "max_tokens": 100}


When the model generates text and reaches the max_tokens limit of 100, I'm getting MaxTokensReachedException exception instead of returning the generated text. This appears to be an issue with the SDK. As a workaround, could you please increase max_tokens to 500, and let’s also submit this as a potential SDK bug?

manoj-selvakumar5 · 2025-09-18T18:11:28Z

01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/utils/grammar_utils.py

+        params={"temperature": temperature, "max_tokens": max_tokens}
+    )
+
+    model.use_grammar_constraint(grammar)


Im getting the following error here — AttributeError: 'LlamaCppModel' object has no attribute 'use_grammar_constraint'. Would it be possible that Llama.cpp library has updated this method?

manoj-selvakumar5 · 2025-09-18T18:11:48Z

01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/README.md

+### 4. Run the Tutorial
+
+```bash
+jupyter notebook llamacpp_demo.ipynb


wrong file name

manoj-selvakumar5 · 2025-09-18T18:14:23Z

01-tutorials/01-fundamentals/02-model-providers/03-llamacpp-model/README.md

+
+## Additional Examples
+
+The `examples/` directory contains standalone Python scripts demonstrating specific features.


examples directory doesn't exist. could you please remove this from readme?

llamacpp-model-tutorial;

613d345

manoj-selvakumar5 reviewed Sep 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llamacpp-model-tutorial; #139

llamacpp-model-tutorial; #139

westonbrown commented Aug 13, 2025

Uh oh!

westonbrown commented Sep 15, 2025 •

edited

Loading

Uh oh!

manoj-selvakumar5 Sep 18, 2025

Uh oh!

manoj-selvakumar5 Sep 18, 2025

Uh oh!

manoj-selvakumar5 Sep 18, 2025

Uh oh!

manoj-selvakumar5 Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Additional Examples

		The `examples/` directory contains standalone Python scripts demonstrating specific features.

llamacpp-model-tutorial; #139

Are you sure you want to change the base?

llamacpp-model-tutorial; #139

Conversation

westonbrown commented Aug 13, 2025

Add LlamaCpp Model Provider Tutorial

Summary

Value to the Repository

Key Features Demonstrated

Tutorial Structure

What Users Learn

Example Code Snippet

Uh oh!

westonbrown commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manoj-selvakumar5 Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

manoj-selvakumar5 Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

manoj-selvakumar5 Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

manoj-selvakumar5 Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

westonbrown commented Sep 15, 2025 •

edited

Loading