-
Notifications
You must be signed in to change notification settings - Fork 233
llamacpp-model-tutorial; #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
llamacpp-model-tutorial; #139
Conversation
Requesting review for the following tutorial. The base PR for this feature link has been merged into the last strands release. |
clean_base_url = base_url.rstrip('/').replace('/v1', '') | ||
model = LlamaCppModel( | ||
base_url=clean_base_url, | ||
params={**params, "max_tokens": 100} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the model generates text and reaches the max_tokens limit of 100, I'm getting MaxTokensReachedException
exception instead of returning the generated text. This appears to be an issue with the SDK. As a workaround, could you please increase max_tokens to 500, and let’s also submit this as a potential SDK bug?
params={"temperature": temperature, "max_tokens": max_tokens} | ||
) | ||
|
||
model.use_grammar_constraint(grammar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im getting the following error here — AttributeError: 'LlamaCppModel' object has no attribute 'use_grammar_constraint'. Would it be possible that Llama.cpp library has updated this method?
### 4. Run the Tutorial | ||
|
||
```bash | ||
jupyter notebook llamacpp_demo.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong file name
|
||
## Additional Examples | ||
|
||
The `examples/` directory contains standalone Python scripts demonstrating specific features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
examples directory doesn't exist. could you please remove this from readme?
Add LlamaCpp Model Provider Tutorial
Issue #, if available: Related to strands-agents/sdk-python#585
Summary
This PR adds the first comprehensive tutorial showcasing the new
LlamaCppModel
provider class (merged in #585), demonstrating how to run on-device quantized function calling models with the Strands Agents SDK. This tutorial fills a gap in our documentation by showing developers how to deploy AI agents locally, using efficient quantized models that run on resource constrained hardware.Value to the Repository
This tutorial is essential because it:
Key Features Demonstrated
Tutorial Structure
What Users Learn
strands.models.llamacpp.LlamaCppModel
classExample Code Snippet
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.