Skip to content

Add prompt optimization telemetry (prompt cache hit rate tracking) to sidebar and also kevin's glm prompt#4

Closed
arihantchoudhary wants to merge 16 commits intodevfrom
sidebar-prompt-caching-stats
Closed

Add prompt optimization telemetry (prompt cache hit rate tracking) to sidebar and also kevin's glm prompt#4
arihantchoudhary wants to merge 16 commits intodevfrom
sidebar-prompt-caching-stats

Conversation

@arihantchoudhary
Copy link
Contributor

@arihantchoudhary arihantchoudhary commented Dec 16, 2025

Summary

This PR adds real-time prompt cache hit rate monitoring to the session sidebar with visual trend tracking.

Features added:

  • Current cache hit rate percentage with token breakdown (e.g., "82.5% (1650/2000 tokens)")
  • Average hit rate calculation across all requests in the session
  • Visual sparkline graph showing cache efficiency trend over the last 20 requests
  • Automatic updates after each API response
  • Only displays when cache data is available

Implementation:

  • Uses existing cachedInputTokens field from assistant messages (line 1236 in openai-responses-language-model.ts)
  • Sparkline visualization using Unicode block characters (▁▂▃▄▅▆▇█)
  • Reactive Solid.js createMemo for efficient updates
  • No additional API calls required

Example output:

Cache Hit Rate
Current: 82.5% (1650/2000 tokens)
Average: 75.3%
Trend: ▃▄▅▆▇▇█▇
← Last 8 requests

The sparkline allows users to visually track whether cache efficiency is improving or declining throughout their session, helping optimize prompt engineering for better cache utilization.

Test plan

  • Verify cache metrics display correctly in sidebar
  • Confirm sparkline graph updates after each request
  • Check that display is hidden when no cache data available
  • Test with various cache hit rates (0%, 50%, 100%)

🤖 Generated with Claude Code

arihantchoudhary and others added 16 commits December 15, 2025 11:48
Complete overhaul of README to document all changes made to the project:
- Added comprehensive changelog organized by date
- Documented all 10 commits with detailed explanations
- Included file-level changes and impact analysis
- Added summary statistics (237 files, 22,700+ additions)
- Documented Python SDK implementation (229 new files)
- Explained PKCE authentication implementation
- Detailed provider architecture refactoring
- Listed all technical improvements and their business impact

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Organized all of Kevin's commits into four strategic pillars:
- Idea Stack (6 commits): Architecture, infrastructure, CI/CD
- TUI Features (2 commits): UI/UX improvements, metrics display
- SDK Level Features (4 commits): Auth, retry logic, Python SDK
- Experiment and Reporting (1 commit): Usage metrics foundation

Each commit includes:
- Commit ID and date
- Files modified with line counts
- Detailed changes description
- Business/technical impact
- Clear categorization for future development focus

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Complete technical documentation of all changes needed to get cache
information from Cerebras API to sidebar display:

Architecture Overview:
- Problem: API format mismatch (Chat Completions vs Responses API)
- Solution: Support both formats with nullish coalescing
- Result: Cache data flows API → SDK → DB → UI

5 Changes Documented:
1. Schema update to accept both API formats (prompt_tokens & input_tokens)
2. Non-streaming parser to extract cache from both field names
3. Streaming parser to extract cache from both field names
4. Sidebar calculation logic for cache hit rate
5. Sidebar UI display with color-coded feedback

Includes:
- Before/after code comparisons for each change
- Complete data flow diagrams
- Testing procedures
- Architecture diagrams
- Related commits and next steps

Files modified: 2 files, 5 locations
Result: Zero breaking changes, backward compatible

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
- Added "Commits Categorized by Strategic Pillars" section to README
- Organized all 10 commits into 4 pillars:
  * Pillar 1: Idea Stack (6 commits)
  * Pillar 2: TUI Features (2 commits)
  * Pillar 3: SDK Level Features (4 commits)
  * Pillar 4: Experiment and Reporting (1 commit)
- Removed COMMIT_CATEGORIZATION.md (content now in README)

Result: Single source of truth in README with clearer organization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
1. Moved "Commits Categorized by Strategic Pillars" to top of README
   - Now appears right after Collaborators section
   - Added warning banner to update with each commit

2. Created README update enforcement system:
   - Added .github/README_UPDATE_RULE.md with complete guidelines
   - Updated .husky/pre-push hook to check if README was updated
   - Hook prompts user if README not modified

3. Updated README structure:
   - Removed duplicate pillars section at bottom
   - Added focus descriptions for each pillar
   - Added this commit to Pillar 1

Result: Self-enforcing documentation system that keeps README current

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added major cache optimizations section with detailed explanations of various improvements and changes to caching strategies.

Co-authored-by: Isaac Tai <Isaac.Tai@cerebras.net>
Co-authored-by: Isaac Tai <Isaac.Tai@cerebras.net>
Co-authored-by: Isaac Tai <Isaac.Tai@cerebras.net>
Co-authored-by: Isaac Tai <Isaac.Tai@cerebras.net>
Co-authored-by: Isaac Tai <Isaac.Tai@cerebras.net>
…v1.0.140

Co-authored-by: Isaac Tai <Isaac.Tai@cerebras.net>
…ttribution

- Create dedicated glm.txt prompt file for GLM 4.6 model
- Update system.ts to route GLM models to the new prompt
- Add Cerebras Code branding and cloud.cerebras.ai references
- Auto-include Isaac Tai as co-author on all GLM-generated commits

🤖 Generated with Cerebras Code cloud.cerebras.ai

Co-Authored-By: Isaac Tai <Isaac.Tai@cerebras.net>
This commit adds real-time cache hit rate monitoring to the session sidebar,
displaying current and average cache efficiency with a sparkline visualization.

Features:
- Current cache hit rate percentage with token breakdown
- Average hit rate across all requests in session
- Visual sparkline graph showing trend over last 20 requests
- Automatic updates after each API response
- Only displays when cache data is available

The sparkline uses Unicode block characters (▁▂▃▄▅▆▇█) to show whether
cache efficiency is improving or declining throughout the session.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@arihantchoudhary arihantchoudhary changed the title Add prompt cache hit rate tracking to sidebar Add prompt optimization telemetry (prompt cache hit rate tracking) to sidebar and also kevin's glm prompt Dec 16, 2025
@kevint-cerebras
Copy link
Owner

Can we remove some of the AI-generated markdown files?

@kevint-cerebras
Copy link
Owner

Also, since my other PR for the system prompt is open and a WIP, can we handle the prompting integration there in that branch and not try to add it to this branch?

@kevint-cerebras kevint-cerebras force-pushed the dev branch 2 times, most recently from 5a6ce89 to 04459cc Compare December 18, 2025 19:07
kassieclaire added a commit to kassieclaire/cerebras-code-cli that referenced this pull request Feb 27, 2026
Phase 01: Exploration & Architecture Validation

Changes:
- 01-01: Added sidebar split pattern task, combined documentation, added ARCH verification
- 01-02: Split from old 01-02 (keyboard infrastructure only, 3 tasks)
- 01-03: NEW - keybind audit + documentation (split from old 01-02)
- 01-04: Renumbered from 01-03, fixed must_haves for store-agnostic design

Issues addressed:
- Blocker kevint-cerebras#1: Added sidebar split pattern task to 01-01 (Success Criteria kevint-cerebras#3)
- Blocker kevint-cerebras#2: Split 01-02 (5 tasks) into 01-02 (3) + 01-03 (2)
- Blocker kevint-cerebras#3: Fixed must_haves/key_links for store-agnostic prototype
- Warning kevint-cerebras#4: Fixed wave dependencies (01-02 wave 2, 01-03 wave 3)
- Warning kevint-cerebras#5: Added RESEARCH.md to <files> for exploration tasks
- Warning kevint-cerebras#6: Added test file to 01-04 files_modified
- Warning kevint-cerebras#8: Added ARCH constraint verification to 01-01 and 01-04

Ready for re-verification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants