Skip to content

Enable bedrock prompt cache#6463

Closed
fbalicchia wants to merge 103 commits into
aaif-goose:mainfrom
fbalicchia:enable-bedrock-prompt-cache
Closed

Enable bedrock prompt cache#6463
fbalicchia wants to merge 103 commits into
aaif-goose:mainfrom
fbalicchia:enable-bedrock-prompt-cache

Conversation

@fbalicchia
Copy link
Copy Markdown
Contributor

Summary

Implemented prompt caching for Anthropic Claude models on AWS Bedrock to reduce costs
Introduced an intelligent cache point placement strategy that complies with AWS Bedrock’s four cache point limitation.
Added the BEDROCK_ENABLE_CACHING configuration parameter

Testing

  • Verify caching is enabled for Claude models by default
  • Test that cache points are correctly placed in system prompts
  • Test that cache points are correctly placed in user/assistant messages
  • Verify cache point limit (4 max) is respected

After making a bit of a mess with the branches, I reopened this PR starting from this one.
6151

@fbalicchia fbalicchia requested a review from a team as a code owner January 13, 2026 10:31
@fbalicchia
Copy link
Copy Markdown
Contributor Author

Hi @DOsinga, just a gentle reminder about this PR.

The initial review comments were addressed a few weeks ago, so whenever you have time, I’d appreciate another look.
Thanks!,

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements prompt caching for Anthropic Claude models running on AWS Bedrock to reduce costs and improve latency. The feature is controlled by a new BEDROCK_ENABLE_CACHING configuration parameter that defaults to false (opt-in).

Changes:

  • Adds BEDROCK_ENABLE_CACHING configuration parameter (defaults to false) to enable prompt caching for Claude models
  • Implements intelligent cache point placement for system prompts and messages with a 4 cache point limit
  • Adds comprehensive test coverage for cache point allocation logic and message conversion

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
documentation/docs/getting-started/providers.md Documents the new BEDROCK_ENABLE_CACHING parameter in the providers table
crates/goose/src/providers/formats/bedrock.rs Adds to_bedrock_message_with_caching function to insert cache points in messages and handles CachePoint blocks in responses
crates/goose/src/providers/bedrock.rs Implements cache point allocation strategy, adds configuration check, and includes extensive test coverage

Comment thread documentation/docs/getting-started/providers.md Outdated
Comment thread crates/goose/src/providers/bedrock.rs Outdated
Comment thread crates/goose/src/providers/formats/bedrock.rs Outdated
Comment thread crates/goose/src/providers/bedrock.rs
@blackgirlbytes
Copy link
Copy Markdown
Collaborator

/goose review this PR please

@github-actions
Copy link
Copy Markdown
Contributor

Summary: This PR adds prompt caching support for Anthropic Claude models on AWS Bedrock via a new BEDROCK_ENABLE_CACHING config parameter. The implementation places cache points on the system prompt and recent messages. However, there's a significant bug where the code claims to cache tools but doesn't, leading to suboptimal cache point allocation.

🔴 Blocking Issues

  1. Tools cache point claim is incorrect (crates/goose/src/providers/bedrock.rs:176-183)

    The comment claims tools get a cache point, but to_bedrock_tool_config() has no cache point code:

    // Determine which messages should have cache points
    // AWS Bedrock allows max 4 cache points. Strategy:
    // - 1 system prompt (always cached if caching enabled)
    // - 1 tools (if tools exist)  // <-- THIS IS FALSE
    // - Remaining budget for messages (2-3 depending on tools)
    let message_cache_budget = if !tools.is_empty() { 2 } else { 3 };

    Looking at to_bedrock_tool_config() in formats/bedrock.rs:

    pub fn to_bedrock_tool_config(tools: &[Tool]) -> Result<bedrock::ToolConfiguration> {
        Ok(bedrock::ToolConfiguration::builder()
            .set_tools(Some(
                tools.iter().map(to_bedrock_tool).collect::<Result<_>>()?,
            ))
            .build()?)
    }

    No cache point is added to tools. This means when tools are present, you're only using 3 cache points (1 system + 2 messages) instead of the possible 4. Either:

    • Add cache point support to to_bedrock_tool_config(), OR
    • Fix the comment and change message_cache_budget to 3 regardless of tools

🟡 Warnings

  1. Tests only verify disabled caching behavior (crates/goose/src/providers/bedrock.rs:354-501)

    All the new tests check behavior when caching is disabled (the default). For example:

    #[test]
    fn test_cache_point_allocation_without_tools() {
        // ...
        // Since caching is disabled by default, no cache points should be allocated
        assert_eq!(cache_point_indices, Vec::<usize>::new());
    }

    These tests duplicate the cache point calculation logic instead of testing the actual converse() method with caching enabled. Consider adding integration-style tests that verify cache points are correctly placed when BEDROCK_ENABLE_CACHING=true.

  2. CachePoint handling in from_bedrock_content_block will fail if Bedrock echoes cache points (crates/goose/src/providers/formats/bedrock.rs:347-352)

    bedrock::ContentBlock::CachePoint(_) => {
        // CachePoint is metadata used by AWS for prompt caching optimization.
        // It doesn't represent actual content, so we skip it when converting back.
        // This is safe because cache points are only used for request optimization,
        // not for conveying information to the user.
        bail!("CachePoint blocks should be filtered out during message processing")
    }

    The comment says "skip it" and "should be filtered out" but the code bail!s instead. If Bedrock ever echoes CachePoint blocks in responses, from_bedrock_message would fail because it uses .collect::<Result<Vec<_>>>()?. Consider filtering with .filter_map() instead of collecting into a Result.

🟢 Suggestions

  1. Consider aligning caching strategy with Anthropic provider

    The Anthropic provider in formats/anthropic.rs caches:

    • System prompt (always)
    • Last tool spec (always, if tools present)
    • Last 2 user messages

    This Bedrock implementation caches the last N messages regardless of role. Both approaches are valid, but the difference may lead to different cost/performance characteristics. Just something to be aware of.

✅ Highlights

  • Good use of config-based opt-in with BEDROCK_ENABLE_CACHING defaulting to false
  • Appropriate model check to only enable for Claude models (model_name.contains("anthropic.claude"))
  • Documentation updated for the new parameter
  • The to_bedrock_message_with_caching function cleanly separates caching logic

Review generated by goose

fbalicchia and others added 22 commits January 23, 2026 10:11
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
…6352)

Signed-off-by: rabi <ramishra@redhat.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: Trae Robrock <trobrock@gmail.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
…goose#6370)

Signed-off-by: rabi <ramishra@redhat.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
…aaif-goose#6235)

Co-authored-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: David Katz <dkatz@squareup.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
jamadeo and others added 22 commits January 23, 2026 10:12
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: Valerii Kot <valerii.kot@rimthan.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
…aif-goose#6624)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Co-authored-by: Zane Staggs <zane@squareup.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
…-goose#6623)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: Bodhi Silberling <bodhirussellsilberling@yahoo.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zane Staggs <zane@squareup.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Co-authored-by: Douwe Osinga <douwe@squareup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: sheikhlimon <sheikhlimon404@gmail.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
@fbalicchia fbalicchia force-pushed the enable-bedrock-prompt-cache branch from 928ecba to 26b8b07 Compare January 23, 2026 09:12
@fbalicchia
Copy link
Copy Markdown
Contributor Author

/goose review this PR please

@jh-block
Copy link
Copy Markdown
Collaborator

Hey @fbalicchia thanks for the PR! There seem to be a lot of unrelated changes in here, did maybe a merge or rebase go wrong?

@fbalicchia
Copy link
Copy Markdown
Contributor Author

fbalicchia commented Jan 25, 2026

@jh-block, @blackgirlbytes You’re right — the PR ended up including some unrelated changes. The branch was quite old and during the rebase/merge things got mixed in.
I’m happy to clean this up (e.g. split it into smaller PRs or open a fresh one with only the relevant changes). Let me know what you’d prefer 👍

@jh-block
Copy link
Copy Markdown
Collaborator

I think a fresh PR with only the relevant changes would be fine -- thank you!

@fbalicchia
Copy link
Copy Markdown
Contributor Author

Closed in favor of 6710

@fbalicchia fbalicchia closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.