Skip to content

Conversation

@ryanolson
Copy link
Contributor

@ryanolson ryanolson commented Jul 17, 2025

Summary

Adds comprehensive token counting analysis infrastructure to Dynamo for LLM streaming responses and logprob analysis.

🔄 Updated: Successfully merged main branch (32 commits) preserving all token counting features

Key Features

Token Counting Analysis

  • Stream analysis with per-choice token counting and validation
  • Error detection (tokens after finish, missing choices, incomplete streams)
  • Optional tokenizer integration for accurate counts
  • Token generation rate calculations

Logprob Analysis

  • Greedy vs non-greedy decoding pattern detection
  • Token selection confidence metrics using logprob distributions
  • Sensitivity analysis for close probability alternatives

GGUF Tokenizer Support

  • Native GGUF model tokenizer integration
  • Performance optimized with ahash
  • Supports Llama, GPT-2, and other GGUF models

Merge Highlights

Main Branch Integration (32 commits)

  • Integrated ModelExpress client, prefix matcher utilities, CUDA fatbin support
  • Enhanced backend health checks and monitoring
  • Upgraded to Rust 1.90 with clean clippy/fmt
  • Fixed blake3 API deprecation and utils module conflicts
  • All 323 tests passing

Usage

// Token analysis
let analysis = analyze_token_counting(recorded_stream, Some(&tokenizer), "request-id")?;
println!("Tokens: {}, Rate: {:.2}/sec", analysis.total_tokens, analysis.token_rate);

// Logprob analysis  
let analysis = analyze_logprobs(recorded_stream, 0.1)?;
println!("Greedy: {}, Confidence: {:.3}", analysis.greedy_count, analysis.average_confidence);

Files Changed

  • lib/llm/src/perf/tokens.rs - Core token counting analysis
  • lib/llm/src/perf/logprobs.rs - Logprob analysis utilities
  • lib/llm/src/gguf/ - GGUF tokenizer integration
  • lib/llm/src/utils/gzip.rs - Compressed model utilities
  • Test coverage with 323 passing tests

Ready for production use with full main branch compatibility.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 17, 2025

Walkthrough

The changes introduce new utilities for handling bzip2-compressed files, including extraction, validation, and automatic cleanup, along with a builder-based API. Token counting and analysis for streaming responses are implemented, supporting multiple data sources and validation. Dependency configurations are updated, and new modules for utilities and token operations are added. Tests and data files are also updated accordingly.

Changes

File(s) Change Summary
.gitattributes Added Git LFS tracking for a specific bzip2-compressed tokenizer file.
Cargo.toml, lib/llm/Cargo.toml Added bzip2 as a workspace dependency; updated blake3 dependency to use workspace version.
lib/llm/src/lib.rs, lib/llm/src/utils.rs Added and exported a new utils module.
lib/llm/src/utils/bzip2.rs New module: Implements bzip2 extraction utilities with builder pattern, validation, and automatic cleanup.
lib/llm/src/perf.rs Added tokens submodule; new utilities for recording and reading annotated data streams; new public re-exports.
lib/llm/src/perf/tokens.rs New module: Implements robust token counting and analysis for streaming chat responses with multiple data sources and validation.
lib/llm/src/perf/logprobs.rs Updated tests to use new annotated stream reading utility, simplifying stream analysis test logic.
lib/llm/src/tokenizers.rs Added TokenCounter trait for token counting; implemented for relevant types.
lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer.b3sum Added BLAKE3 checksum file for tokenizer integrity validation.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Code
    participant Bzip2ExtractorBuilder
    participant Bzip2Extractor
    participant Bzip2Extraction

    Test->>Bzip2ExtractorBuilder: new().source_path(...).target_filename(...).extract()
    Bzip2ExtractorBuilder->>Bzip2Extractor: build extractor
    Bzip2Extractor->>Bzip2Extractor: perform_extraction()
    Bzip2Extractor->>Bzip2Extraction: return handle to extracted file
    Test->>Bzip2Extraction: read_to_string()/validate_blake3_hash()
    Note right of Bzip2Extraction: Temp dir auto-deleted on drop
Loading
sequenceDiagram
    participant Perf as perf::record_data_stream
    participant DataStream
    participant RecordedStream

    Perf->>DataStream: consume all items
    DataStream-->>Perf: yield items
    Perf->>Perf: record items with timestamp/sequence
    Perf->>RecordedStream: return completed recording
Loading
sequenceDiagram
    participant Analyzer as analyze_token_counting
    participant RecordedStream
    participant TokenExtractor
    participant Tokenizer

    Analyzer->>RecordedStream: iterate responses
    loop For each response
        Analyzer->>TokenExtractor: extract_tokens_by_choice(tokenizer)
        TokenExtractor->>Tokenizer: (if needed) count tokens
        Tokenizer-->>TokenExtractor: token count/IDs
        TokenExtractor-->>Analyzer: token data per choice
    end
    Analyzer->>Analyzer: aggregate, validate, summarize
    Analyzer-->>Test: TokenAnalysis result
Loading

Poem

In the warren of code, a bzip2 breeze,
Unpacking files with elegant ease.
Tokens are counted, streams are replayed,
With hashes and checks, integrity’s made.
The rabbit hops on, with modules anew,
Tidying temp files—no chores left to do!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lib/llm/src/tokenizers.rs (1)

150-161: Consider removing redundant trait implementation.

This explicit implementation of TokenCounter for Tokenizer appears redundant since:

  1. Tokenizer derefs to Arc<dyn traits::Tokenizer>
  2. traits::Tokenizer requires Encoder
  3. The blanket implementation already provides TokenCounter for all Encoder types

Unless there's a specific reason for this explicit implementation (e.g., avoiding deref complications), consider removing it to reduce code duplication.

lib/llm/src/perf.rs (1)

392-404: Clean implementation with a minor consideration.

The function correctly handles file reading, SSE parsing, and stream conversion. Note that read_to_string loads the entire file into memory, which could be an issue for very large SSE files. For typical SSE replay data this should be fine, but consider documenting this limitation or using streaming file reading for very large files.

lib/llm/src/utils/bzip2.rs (1)

476-476: Consider more robust test file path construction.

The hardcoded relative path assumes tests run from the repository root. Consider using env!("CARGO_MANIFEST_DIR") or similar to construct a more robust path:

-let tokenizer_file_path = "dynamo/lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer-deepseek-r1-distill-llama-8b.json.bz2";
+let tokenizer_file_path = std::path::Path::new(env!("CARGO_MANIFEST_DIR"))
+    .join("tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer-deepseek-r1-distill-llama-8b.json.bz2");
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49b7a0d and 19a6abd.

⛔ Files ignored due to path filters (3)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
  • lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer-deepseek-r1-distill-llama-8b.json.bz2 is excluded by !**/*.bz2
📒 Files selected for processing (11)
  • .gitattributes (1 hunks)
  • Cargo.toml (1 hunks)
  • lib/llm/Cargo.toml (2 hunks)
  • lib/llm/src/lib.rs (1 hunks)
  • lib/llm/src/perf.rs (3 hunks)
  • lib/llm/src/perf/logprobs.rs (2 hunks)
  • lib/llm/src/perf/tokens.rs (1 hunks)
  • lib/llm/src/tokenizers.rs (2 hunks)
  • lib/llm/src/utils.rs (1 hunks)
  • lib/llm/src/utils/bzip2.rs (1 hunks)
  • lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer.b3sum (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
Cargo.toml (1)
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/llm/Cargo.toml (1)
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/llm/src/utils.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/perf/logprobs.rs (2)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/llm/src/perf.rs (1)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
lib/llm/src/perf/tokens.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
🧬 Code Graph Analysis (4)
lib/llm/src/perf/logprobs.rs (1)
lib/llm/src/perf.rs (2)
  • read_annotated_stream_from_file (392-404)
  • record_stream_with_context (292-301)
lib/llm/src/tokenizers.rs (1)
lib/llm/src/perf/tokens.rs (2)
  • count_tokens (494-497)
  • count_tokens_with_ids (499-507)
lib/llm/src/perf.rs (3)
lib/llm/src/protocols/codec.rs (1)
  • create_message_stream (296-302)
lib/llm/src/protocols.rs (1)
  • convert_sse_stream (51-68)
lib/llm/src/perf/logprobs.rs (3)
  • new (54-101)
  • new (1575-1579)
  • read_annotated_stream_from_file (1453-1453)
lib/llm/src/perf/tokens.rs (3)
lib/llm/src/tokenizers.rs (17)
  • tokenizer (398-400)
  • new (245-254)
  • new (326-333)
  • new (553-561)
  • token_ids (58-63)
  • token_ids (402-404)
  • len (339-341)
  • len (531-533)
  • count_tokens (92-92)
  • count_tokens (100-102)
  • count_tokens (151-153)
  • text (406-411)
  • count_tokens_with_ids (95-95)
  • count_tokens_with_ids (104-109)
  • count_tokens_with_ids (155-160)
  • builder (472-474)
  • from_file (126-128)
lib/llm/src/perf.rs (9)
  • new (51-57)
  • new (105-115)
  • new (578-582)
  • read_annotated_stream_from_file (392-404)
  • record_data_stream (371-389)
  • response_count (118-120)
  • data (60-62)
  • responses (128-130)
  • start_time (133-135)
lib/llm/src/utils/bzip2.rs (1)
  • builder (124-126)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (30)
lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer.b3sum (1)

1-1: LGTM - Checksum file is correctly formatted

The BLAKE3 checksum follows the standard format and will enable integrity verification of the extracted tokenizer.json file from the bzip2 archive.

lib/llm/src/lib.rs (1)

36-36: LGTM - Utils module addition is clean

The new utils module declaration follows the established pattern and appropriately exposes the bzip2 decompression functionality for the crate.

Cargo.toml (1)

53-53: Confirm bzip2 0.6.0 is up-to-date and secure

  • bzip2 = { version = "0.6" } points to 0.6.0 (released June 17, 2025), which is the latest stable version.
  • Transitions to a pure Rust implementation using libbz2-rs-sys by default.
  • Addresses CVE-2023-22895 (DoS via integer overflow in earlier releases); no known security issues in 0.6.0.

No further changes required—this dependency can be approved as-is.

lib/llm/Cargo.toml (2)

54-54: LGTM - Proper workspace dependency usage

The bzip2 workspace dependency addition is correctly configured.


81-81: Good dependency standardization

Converting blake3 to use workspace dependency improves consistency with the overall dependency management approach.

.gitattributes (1)

13-13: LGTM - Proper Git LFS configuration

The Git LFS configuration for the bzip2 tokenizer file is correctly set up with appropriate attributes for handling large binary files.

lib/llm/src/utils.rs (1)

1-5: LGTM!

Clean module structure for utilities. The dedicated utils module with the bzip2 submodule provides good organization for utility functions.

lib/llm/src/perf/logprobs.rs (2)

571-574: Good refactoring to use higher-level utilities.

The import changes align well with the new read_annotated_stream_from_file utility, simplifying the test setup by removing the need for manual SSE stream creation and conversion.


453-459: Excellent simplification of the test setup.

The refactoring effectively replaces the manual SSE parsing and conversion steps with the new read_annotated_stream_from_file utility, making the test more concise and maintainable while preserving the same functionality.

lib/llm/src/tokenizers.rs (2)

89-96: Well-designed trait for token counting.

The TokenCounter trait provides a clean interface for token counting with appropriate thread safety bounds and error handling. The separation between count_tokens (just count) and count_tokens_with_ids (IDs + count) offers good flexibility.


98-110: Good blanket implementation with a minor optimization opportunity.

The blanket implementation correctly leverages the existing Encoder trait. Consider this minor optimization for count_tokens_with_ids:

 fn count_tokens_with_ids(&self, input: &str) -> Result<(Vec<TokenIdType>, usize)> {
     let encoding = self.encode(input)?;
     let token_ids = encoding.token_ids().to_vec();
-    let count = token_ids.len();
-    Ok((token_ids, count))
+    Ok((token_ids.clone(), token_ids.len()))
 }

Wait, that's incorrect. Let me reconsider:

 fn count_tokens_with_ids(&self, input: &str) -> Result<(Vec<TokenIdType>, usize)> {
     let encoding = self.encode(input)?;
     let token_ids = encoding.token_ids().to_vec();
-    let count = token_ids.len();
-    Ok((token_ids, count))
+    Ok((token_ids, encoding.token_ids().len()))
 }

Actually, the current implementation is fine as is - it's clear and the performance difference is negligible.

lib/llm/src/perf.rs (3)

10-16: LGTM!

The new imports appropriately support the token analysis module and annotated stream functionality.


29-30: Good API design with descriptive aliases.

The re-exports provide cleaner, more descriptive names for the SSE parsing utilities, making the public API more intuitive.


351-389: Well-implemented stream recording function.

The record_data_stream function provides a clean, focused API for consuming and recording entire streams. Good documentation and proper async handling.

lib/llm/src/utils/bzip2.rs (5)

1-36: Excellent module documentation and organization.

The module documentation clearly explains the purpose, use cases, and provides helpful examples. Good choice of dependencies with anyhow for error handling and tempfile for RAII cleanup.


73-120: Well-implemented builder pattern.

The builder provides a clean, fluent API with proper validation. Good error handling for missing required fields and clear documentation of the extract() method.


132-188: Robust extraction implementation.

The extraction logic handles edge cases well:

  • Pre-validation of source file existence
  • Smart filename derivation with .bz2 stripping
  • Comprehensive error contexts for debugging
  • Explicit flush() to ensure data integrity

190-269: Excellent API design with RAII and validation.

The Bzip2Extraction struct provides a clean API with:

  • Automatic cleanup via RAII pattern
  • Convenient read methods with proper error contexts
  • BLAKE3 hash validation for integrity verification
  • Good encapsulation with private _temp_dir

271-514: Comprehensive test coverage.

Excellent test suite covering:

  • All major functionality paths
  • Error conditions and edge cases
  • RAII cleanup verification
  • Complex filename patterns
  • Integration test with graceful skip if test file is missing

The tests provide confidence in the implementation's correctness and robustness.

lib/llm/src/perf/tokens.rs (11)

1-46: Excellent module documentation!

The documentation clearly explains the module's purpose, provides comprehensive usage examples, and covers both basic and advanced scenarios. The examples are well-structured and demonstrate different token counting methods.


47-59: Good use of type aliases for semantic clarity.

The type aliases TokenCount and ForwardPassDuration improve code readability and make the intent clear.


60-132: Well-designed enum for representing token data sources.

The TokenDataSource enum effectively captures different token counting methods with varying accuracy levels. The implementation methods are clean and the documentation clearly explains the trade-offs of each approach.


133-151: Well-structured data type for choice-specific token information.

The ChoiceTokenData struct comprehensively captures all relevant token information for a single choice, including debugging aids like content storage.


152-220: Robust implementation of TokenExtractor with good error handling.

The implementation correctly handles tokenizer failures by falling back to single-token approximation. The empty content handling is appropriate.

Minor observation: There's a slight inconsistency in token_ids when content is empty - it's Some(Vec::new()) with tokenizer but None without. This might be intentional to distinguish between "no tokens found by tokenizer" vs "no tokenizer available", but consider documenting this distinction if it's important.


221-256: Well-designed structs for token timeline tracking and analysis.

Both TokenEvent and ChoiceTokenAnalysis effectively capture the temporal aspects of token generation and provide comprehensive analysis data per choice.


257-365: Excellent implementation of comprehensive token analysis with robust validation.

The analyze_token_counting function implements thorough validation including:

  • Detection of tokens generated after a choice has finished
  • Tracking of incomplete streams
  • Selection of the most accurate data source
  • Comprehensive timeline tracking

The validation logic is particularly well-designed and will help catch streaming protocol violations.


366-478: Comprehensive and user-friendly analysis API.

The TokenAnalysis implementation provides excellent utility methods:

  • print_summary with clear visual indicators
  • token_rate_for_choice correctly excluding empty responses from rate calculation
  • validate_data_source_consistency for detecting potential issues

The API is well-designed for both programmatic access and human-readable output.


479-509: Well-structured test infrastructure with good helper functions.

The test module provides:

  • A simple but effective MockTokenizer implementation
  • Reusable helper functions that reduce test code duplication
  • Clear test data creation patterns

Also applies to: 1114-1208


510-974: Excellent test coverage with comprehensive edge case handling.

The test suite thoroughly covers:

  • All data source types
  • Multiple choice scenarios with varying timelines
  • Validation error detection
  • Edge cases like empty content and missing choices
  • Complex scenarios like choices finishing at different times

The tests are well-structured and clearly document the expected behavior.


975-1112: Excellent integration test demonstrating real-world usage.

This integration test effectively:

  • Tests the full pipeline with real tokenizer and stream data
  • Validates file integrity with BLAKE3 hash verification
  • Demonstrates proper usage of the bzip2 extraction utilities
  • Includes comprehensive assertions on the analysis results

The hard-coded expectation of 32 tokens is appropriate for this specific test case with known input data.

@ryanolson ryanolson requested a review from grahamking July 17, 2025 20:08
@ryanolson ryanolson enabled auto-merge (squash) July 19, 2025 08:08
@ryanolson ryanolson added the backlog Issues or bugs that will be tracked for long term fixes label Aug 15, 2025
@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 17, 2025
@ryanolson
Copy link
Contributor Author

WIP

@github-actions github-actions bot removed the Stale label Sep 18, 2025
Resolved merge conflicts while preserving token counting improvements:
- Updated dependencies to latest versions from main
- Kept gzip compression utilities and token counting features
- Resolved import conflicts in logprobs.rs and tokens.rs
- Updated workflow files with latest CI improvements
- Removed deprecated tools.rs file as deleted in main
- Regenerated Cargo.lock files with updated dependencies
- Fixed API changes in blake3 and response structures

Token counting analysis features preserved:
- Token counting infrastructure with multiple data sources
- Performance recording framework for streaming responses
- GzipExtractor utility for compressed file handling
- Mock tokenizer tests replacing deepseek dependency

All tests pass successfully after merge.

Signed-off-by: Ryan Olson <[email protected]>
@ryanolson
Copy link
Contributor Author

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 20, 2025

/ok to test

@ryanolson, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

Merged main branch resolving conflicts and preserving token counting features.

**Conflict Resolutions:**
- .gitattributes: Added *.fatbin binary handling
- .gitignore: Preserved SDK binaries ignore entry
- Cargo.toml: Integrated ModelExpress dependencies, used main's rand 0.9.0
- lib/llm/Cargo.toml: Kept flate2/ahash deps, accepted ModelExpress integration
- Cargo.lock: Regenerated for dependency resolution

**Code Fixes:**
- Fixed blake3 API: Replaced deprecated mmap methods with std::fs::read
- Fixed response struct: Removed .inner field access in TokenExtractor
- Fixed utils module conflict: Merged gzip and prefix_matcher modules
- Fixed test imports: Updated to dynamo_async_openai
- Fixed clippy warnings: Collapsed nested if, removed unnecessary to_string()
- Added missing reasoning_content field to ChatCompletionStreamResponseDelta

All tests passing (323), clippy clean, formatted with Rust 1.90.
@ryanolson ryanolson disabled auto-merge September 26, 2025 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backlog Issues or bugs that will be tracked for long term fixes feat size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants