feat: perf.rs - token counting analysis #1985

ryanolson · 2025-07-17T17:18:47Z

Summary

Adds comprehensive token counting analysis infrastructure to Dynamo for LLM streaming responses and logprob analysis.

🔄 Updated: Successfully merged main branch (32 commits) preserving all token counting features

Key Features

Token Counting Analysis

Stream analysis with per-choice token counting and validation
Error detection (tokens after finish, missing choices, incomplete streams)
Optional tokenizer integration for accurate counts
Token generation rate calculations

Logprob Analysis

Greedy vs non-greedy decoding pattern detection
Token selection confidence metrics using logprob distributions
Sensitivity analysis for close probability alternatives

GGUF Tokenizer Support

Native GGUF model tokenizer integration
Performance optimized with ahash
Supports Llama, GPT-2, and other GGUF models

Merge Highlights

✅ Main Branch Integration (32 commits)

Integrated ModelExpress client, prefix matcher utilities, CUDA fatbin support
Enhanced backend health checks and monitoring
Upgraded to Rust 1.90 with clean clippy/fmt
Fixed blake3 API deprecation and utils module conflicts
All 323 tests passing

Usage

// Token analysis
let analysis = analyze_token_counting(recorded_stream, Some(&tokenizer), "request-id")?;
println!("Tokens: {}, Rate: {:.2}/sec", analysis.total_tokens, analysis.token_rate);

// Logprob analysis  
let analysis = analyze_logprobs(recorded_stream, 0.1)?;
println!("Greedy: {}, Confidence: {:.3}", analysis.greedy_count, analysis.average_confidence);

Files Changed

lib/llm/src/perf/tokens.rs - Core token counting analysis
lib/llm/src/perf/logprobs.rs - Logprob analysis utilities
lib/llm/src/gguf/ - GGUF tokenizer integration
lib/llm/src/utils/gzip.rs - Compressed model utilities
Test coverage with 323 passing tests

Ready for production use with full main branch compatibility.

…wrong package

Resolved conflicts between performance recording module implementation and empty file, keeping the full implementation. Also resolved conflicts in HTTP service tests to use proper API base configuration and service readiness checking. -Agent Generated Commit Message

…esp, Error>

…an/logprob-v2

copy-pr-bot · 2025-07-17T17:18:52Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-07-17T17:26:09Z

Walkthrough

The changes introduce new utilities for handling bzip2-compressed files, including extraction, validation, and automatic cleanup, along with a builder-based API. Token counting and analysis for streaming responses are implemented, supporting multiple data sources and validation. Dependency configurations are updated, and new modules for utilities and token operations are added. Tests and data files are also updated accordingly.

Changes

File(s)	Change Summary
.gitattributes	Added Git LFS tracking for a specific bzip2-compressed tokenizer file.
Cargo.toml, lib/llm/Cargo.toml	Added `bzip2` as a workspace dependency; updated `blake3` dependency to use workspace version.
lib/llm/src/lib.rs, lib/llm/src/utils.rs	Added and exported a new `utils` module.
lib/llm/src/utils/bzip2.rs	New module: Implements bzip2 extraction utilities with builder pattern, validation, and automatic cleanup.
lib/llm/src/perf.rs	Added `tokens` submodule; new utilities for recording and reading annotated data streams; new public re-exports.
lib/llm/src/perf/tokens.rs	New module: Implements robust token counting and analysis for streaming chat responses with multiple data sources and validation.
lib/llm/src/perf/logprobs.rs	Updated tests to use new annotated stream reading utility, simplifying stream analysis test logic.
lib/llm/src/tokenizers.rs	Added `TokenCounter` trait for token counting; implemented for relevant types.
lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer.b3sum	Added BLAKE3 checksum file for tokenizer integrity validation.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Code
    participant Bzip2ExtractorBuilder
    participant Bzip2Extractor
    participant Bzip2Extraction

    Test->>Bzip2ExtractorBuilder: new().source_path(...).target_filename(...).extract()
    Bzip2ExtractorBuilder->>Bzip2Extractor: build extractor
    Bzip2Extractor->>Bzip2Extractor: perform_extraction()
    Bzip2Extractor->>Bzip2Extraction: return handle to extracted file
    Test->>Bzip2Extraction: read_to_string()/validate_blake3_hash()
    Note right of Bzip2Extraction: Temp dir auto-deleted on drop

sequenceDiagram
    participant Perf as perf::record_data_stream
    participant DataStream
    participant RecordedStream

    Perf->>DataStream: consume all items
    DataStream-->>Perf: yield items
    Perf->>Perf: record items with timestamp/sequence
    Perf->>RecordedStream: return completed recording

sequenceDiagram
    participant Analyzer as analyze_token_counting
    participant RecordedStream
    participant TokenExtractor
    participant Tokenizer

    Analyzer->>RecordedStream: iterate responses
    loop For each response
        Analyzer->>TokenExtractor: extract_tokens_by_choice(tokenizer)
        TokenExtractor->>Tokenizer: (if needed) count tokens
        Tokenizer-->>TokenExtractor: token count/IDs
        TokenExtractor-->>Analyzer: token data per choice
    end
    Analyzer->>Analyzer: aggregate, validate, summarize
    Analyzer-->>Test: TokenAnalysis result

Poem

In the warren of code, a bzip2 breeze,
Unpacking files with elegant ease.
Tokens are counted, streams are replayed,
With hashes and checks, integrity’s made.
The rabbit hops on, with modules anew,
Tidying temp files—no chores left to do!
🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

lib/llm/src/tokenizers.rs (1)

150-161: Consider removing redundant trait implementation.

This explicit implementation of TokenCounter for Tokenizer appears redundant since:

Tokenizer derefs to Arc<dyn traits::Tokenizer>

traits::Tokenizer requires Encoder

The blanket implementation already provides TokenCounter for all Encoder types

Unless there's a specific reason for this explicit implementation (e.g., avoiding deref complications), consider removing it to reduce code duplication.

lib/llm/src/perf.rs (1)

392-404: Clean implementation with a minor consideration.

The function correctly handles file reading, SSE parsing, and stream conversion. Note that read_to_string loads the entire file into memory, which could be an issue for very large SSE files. For typical SSE replay data this should be fine, but consider documenting this limitation or using streaming file reading for very large files.
lib/llm/src/utils/bzip2.rs (1)
476-476: Consider more robust test file path construction.

The hardcoded relative path assumes tests run from the repository root. Consider using env!("CARGO_MANIFEST_DIR") or similar to construct a more robust path:
-let tokenizer_file_path = "dynamo/lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer-deepseek-r1-distill-llama-8b.json.bz2";
+let tokenizer_file_path = std::path::Path::new(env!("CARGO_MANIFEST_DIR"))
+    .join("tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer-deepseek-r1-distill-llama-8b.json.bz2");

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49b7a0d and 19a6abd.

⛔ Files ignored due to path filters (3)

Cargo.lock is excluded by !**/*.lock
lib/bindings/python/Cargo.lock is excluded by !**/*.lock
lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer-deepseek-r1-distill-llama-8b.json.bz2 is excluded by !**/*.bz2

📒 Files selected for processing (11)

.gitattributes (1 hunks)
Cargo.toml (1 hunks)
lib/llm/Cargo.toml (2 hunks)
lib/llm/src/lib.rs (1 hunks)
lib/llm/src/perf.rs (3 hunks)
lib/llm/src/perf/logprobs.rs (2 hunks)
lib/llm/src/perf/tokens.rs (1 hunks)
lib/llm/src/tokenizers.rs (2 hunks)
lib/llm/src/utils.rs (1 hunks)
lib/llm/src/utils/bzip2.rs (1 hunks)
lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer.b3sum (1 hunks)

🧰 Additional context used

🧠 Learnings (6)

Cargo.toml (1)

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/llm/Cargo.toml (1)

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/llm/src/utils.rs (1)

Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.

lib/llm/src/perf/logprobs.rs (2)

Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/llm/src/perf.rs (1)

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

lib/llm/src/perf/tokens.rs (1)

Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.

🧬 Code Graph Analysis (4)

lib/llm/src/perf/logprobs.rs (1)

lib/llm/src/perf.rs (2)

read_annotated_stream_from_file (392-404)

record_stream_with_context (292-301)

lib/llm/src/tokenizers.rs (1)

lib/llm/src/perf/tokens.rs (2)

count_tokens (494-497)

count_tokens_with_ids (499-507)

lib/llm/src/perf.rs (3)

lib/llm/src/protocols/codec.rs (1)

create_message_stream (296-302)

lib/llm/src/protocols.rs (1)

convert_sse_stream (51-68)

lib/llm/src/perf/logprobs.rs (3)

new (54-101)

new (1575-1579)

read_annotated_stream_from_file (1453-1453)

lib/llm/src/perf/tokens.rs (3)

lib/llm/src/tokenizers.rs (17)

tokenizer (398-400)

new (245-254)

new (326-333)

new (553-561)

token_ids (58-63)

token_ids (402-404)

len (339-341)

len (531-533)

count_tokens (92-92)

count_tokens (100-102)

count_tokens (151-153)

text (406-411)

count_tokens_with_ids (95-95)

count_tokens_with_ids (104-109)

count_tokens_with_ids (155-160)

builder (472-474)

from_file (126-128)

lib/llm/src/perf.rs (9)

new (51-57)

new (105-115)

new (578-582)

read_annotated_stream_from_file (392-404)

record_data_stream (371-389)

response_count (118-120)

data (60-62)

responses (128-130)

start_time (133-135)

lib/llm/src/utils/bzip2.rs (1)

builder (124-126)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (30)

lib/llm/tests/data/replays/deepseek-r1-distill-llama-8b/tokenizer.b3sum (1)

1-1: LGTM - Checksum file is correctly formatted

The BLAKE3 checksum follows the standard format and will enable integrity verification of the extracted tokenizer.json file from the bzip2 archive.

lib/llm/src/lib.rs (1)

36-36: LGTM - Utils module addition is clean

The new utils module declaration follows the established pattern and appropriately exposes the bzip2 decompression functionality for the crate.

Cargo.toml (1)

53-53: Confirm bzip2 0.6.0 is up-to-date and secure

bzip2 = { version = "0.6" } points to 0.6.0 (released June 17, 2025), which is the latest stable version.

Transitions to a pure Rust implementation using libbz2-rs-sys by default.

Addresses CVE-2023-22895 (DoS via integer overflow in earlier releases); no known security issues in 0.6.0.

No further changes required—this dependency can be approved as-is.

lib/llm/Cargo.toml (2)

54-54: LGTM - Proper workspace dependency usage

The bzip2 workspace dependency addition is correctly configured.

81-81: Good dependency standardization

Converting blake3 to use workspace dependency improves consistency with the overall dependency management approach.

.gitattributes (1)

13-13: LGTM - Proper Git LFS configuration

The Git LFS configuration for the bzip2 tokenizer file is correctly set up with appropriate attributes for handling large binary files.

lib/llm/src/utils.rs (1)

1-5: LGTM!

Clean module structure for utilities. The dedicated utils module with the bzip2 submodule provides good organization for utility functions.

lib/llm/src/perf/logprobs.rs (2)

571-574: Good refactoring to use higher-level utilities.

The import changes align well with the new read_annotated_stream_from_file utility, simplifying the test setup by removing the need for manual SSE stream creation and conversion.

453-459: Excellent simplification of the test setup.

The refactoring effectively replaces the manual SSE parsing and conversion steps with the new read_annotated_stream_from_file utility, making the test more concise and maintainable while preserving the same functionality.
lib/llm/src/tokenizers.rs (2)

89-96: Well-designed trait for token counting.

The TokenCounter trait provides a clean interface for token counting with appropriate thread safety bounds and error handling. The separation between count_tokens (just count) and count_tokens_with_ids (IDs + count) offers good flexibility.

98-110: Good blanket implementation with a minor optimization opportunity.

The blanket implementation correctly leverages the existing Encoder trait. Consider this minor optimization for count_tokens_with_ids:
 fn count_tokens_with_ids(&self, input: &str) -> Result<(Vec<TokenIdType>, usize)> {
     let encoding = self.encode(input)?;
     let token_ids = encoding.token_ids().to_vec();
-    let count = token_ids.len();
-    Ok((token_ids, count))
+    Ok((token_ids.clone(), token_ids.len()))
 }
Wait, that's incorrect. Let me reconsider:
 fn count_tokens_with_ids(&self, input: &str) -> Result<(Vec<TokenIdType>, usize)> {
     let encoding = self.encode(input)?;
     let token_ids = encoding.token_ids().to_vec();
-    let count = token_ids.len();
-    Ok((token_ids, count))
+    Ok((token_ids, encoding.token_ids().len()))
 }
Actually, the current implementation is fine as is - it's clear and the performance difference is negligible.
lib/llm/src/perf.rs (3)

10-16: LGTM!

The new imports appropriately support the token analysis module and annotated stream functionality.

29-30: Good API design with descriptive aliases.

The re-exports provide cleaner, more descriptive names for the SSE parsing utilities, making the public API more intuitive.

351-389: Well-implemented stream recording function.

The record_data_stream function provides a clean, focused API for consuming and recording entire streams. Good documentation and proper async handling.

lib/llm/src/utils/bzip2.rs (5)

1-36: Excellent module documentation and organization.

The module documentation clearly explains the purpose, use cases, and provides helpful examples. Good choice of dependencies with anyhow for error handling and tempfile for RAII cleanup.

73-120: Well-implemented builder pattern.

The builder provides a clean, fluent API with proper validation. Good error handling for missing required fields and clear documentation of the extract() method.

132-188: Robust extraction implementation.

The extraction logic handles edge cases well:

Pre-validation of source file existence

Smart filename derivation with .bz2 stripping

Comprehensive error contexts for debugging

Explicit flush() to ensure data integrity

190-269: Excellent API design with RAII and validation.

The Bzip2Extraction struct provides a clean API with:

Automatic cleanup via RAII pattern

Convenient read methods with proper error contexts

BLAKE3 hash validation for integrity verification

Good encapsulation with private _temp_dir

271-514: Comprehensive test coverage.

Excellent test suite covering:

All major functionality paths

Error conditions and edge cases

RAII cleanup verification

Complex filename patterns

Integration test with graceful skip if test file is missing

The tests provide confidence in the implementation's correctness and robustness.

lib/llm/src/perf/tokens.rs (11)

1-46: Excellent module documentation!

The documentation clearly explains the module's purpose, provides comprehensive usage examples, and covers both basic and advanced scenarios. The examples are well-structured and demonstrate different token counting methods.

47-59: Good use of type aliases for semantic clarity.

The type aliases TokenCount and ForwardPassDuration improve code readability and make the intent clear.

60-132: Well-designed enum for representing token data sources.

The TokenDataSource enum effectively captures different token counting methods with varying accuracy levels. The implementation methods are clean and the documentation clearly explains the trade-offs of each approach.

133-151: Well-structured data type for choice-specific token information.

The ChoiceTokenData struct comprehensively captures all relevant token information for a single choice, including debugging aids like content storage.

152-220: Robust implementation of TokenExtractor with good error handling.

The implementation correctly handles tokenizer failures by falling back to single-token approximation. The empty content handling is appropriate.

Minor observation: There's a slight inconsistency in token_ids when content is empty - it's Some(Vec::new()) with tokenizer but None without. This might be intentional to distinguish between "no tokens found by tokenizer" vs "no tokenizer available", but consider documenting this distinction if it's important.

221-256: Well-designed structs for token timeline tracking and analysis.

Both TokenEvent and ChoiceTokenAnalysis effectively capture the temporal aspects of token generation and provide comprehensive analysis data per choice.

257-365: Excellent implementation of comprehensive token analysis with robust validation.

The analyze_token_counting function implements thorough validation including:

Detection of tokens generated after a choice has finished

Tracking of incomplete streams

Selection of the most accurate data source

Comprehensive timeline tracking

The validation logic is particularly well-designed and will help catch streaming protocol violations.

366-478: Comprehensive and user-friendly analysis API.

The TokenAnalysis implementation provides excellent utility methods:

print_summary with clear visual indicators

token_rate_for_choice correctly excluding empty responses from rate calculation

validate_data_source_consistency for detecting potential issues

The API is well-designed for both programmatic access and human-readable output.

479-509: Well-structured test infrastructure with good helper functions.

The test module provides:

A simple but effective MockTokenizer implementation

Reusable helper functions that reduce test code duplication

Clear test data creation patterns

Also applies to: 1114-1208

510-974: Excellent test coverage with comprehensive edge case handling.

The test suite thoroughly covers:

All data source types

Multiple choice scenarios with varying timelines

Validation error detection

Edge cases like empty content and missing choices

Complex scenarios like choices finishing at different times

The tests are well-structured and clearly document the expected behavior.

975-1112: Excellent integration test demonstrating real-world usage.

This integration test effectively:

Tests the full pipeline with real tokenizer and stream data

Validates file integrity with BLAKE3 hash verification

Demonstrates proper usage of the bzip2 extraction utilities

Includes comprehensive assertions on the analysis results

The hard-coded expectation of 32 tokens is appropriate for this specific test case with known input data.

…ynamo into ryan/token-counting-analysis

github-actions · 2025-09-17T09:34:26Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

ryanolson · 2025-09-18T08:21:57Z

WIP

Signed-off-by: Ryan Olson <[email protected]>

Resolved merge conflicts while preserving token counting improvements: - Updated dependencies to latest versions from main - Kept gzip compression utilities and token counting features - Resolved import conflicts in logprobs.rs and tokens.rs - Updated workflow files with latest CI improvements - Removed deprecated tools.rs file as deleted in main - Regenerated Cargo.lock files with updated dependencies - Fixed API changes in blake3 and response structures Token counting analysis features preserved: - Token counting infrastructure with multiple data sources - Performance recording framework for streaming responses - GzipExtractor utility for compressed file handling - Mock tokenizer tests replacing deepseek dependency All tests pass successfully after merge. Signed-off-by: Ryan Olson <[email protected]>

ryanolson · 2025-09-20T21:19:43Z

/ok to test

copy-pr-bot · 2025-09-20T21:19:47Z

/ok to test

@ryanolson, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

Merged main branch resolving conflicts and preserving token counting features. **Conflict Resolutions:** - .gitattributes: Added *.fatbin binary handling - .gitignore: Preserved SDK binaries ignore entry - Cargo.toml: Integrated ModelExpress dependencies, used main's rand 0.9.0 - lib/llm/Cargo.toml: Kept flate2/ahash deps, accepted ModelExpress integration - Cargo.lock: Regenerated for dependency resolution **Code Fixes:** - Fixed blake3 API: Replaced deprecated mmap methods with std::fs::read - Fixed response struct: Removed .inner field access in TokenExtractor - Fixed utils module conflict: Merged gzip and prefix_matcher modules - Fixed test imports: Updated to dynamo_async_openai - Fixed clippy warnings: Collapsed nested if, removed unnecessary to_string() - Added missing reasoning_content field to ChatCompletionStreamResponseDelta All tests passing (323), clippy clean, formatted with Rust 1.90.

ryanolson and others added 21 commits July 15, 2025 23:46

adding byot feature

0443c4f

async-openai features added at top-level; removing features from the …

abeb568

…wrong package

updates

32933eb

refactor

cbaec4a

reducing scope on recorded stream

cdf2dfd

full end-to-end tests with captures vllm sse streams

fa0e539

adding capture notes

6ed716d

big change: dropping Sync requirement from Resp in AsyncEngine<Req, R…

9e63c00

…esp, Error>

rebased off main - ugly mismatch from ryan/http-client

2c50d30

removed comments

485bbfe

Merge branch 'main' into ryan/logprob-v2

56093db

missing test data file

f5b96db

Merge branch 'ryan/logprob-v2' of github.com:ai-dynamo/dynamo into ry…

3723e92

…an/logprob-v2

Merge branch 'main' into ryan/logprob-v2

ab43e34

update filename

af65f3f

Merge branch 'main' into ryan/logprob-v2

3a9b2b3

updates to lobprobs api and data structures

a52d174

Merge branch 'ryan/logprob-v2' of github.com:ai-dynamo/dynamo into ry…

a1b94bd

…an/logprob-v2

perform token counting analysis

623fd0e

updates

189e79b

pull-request-size bot added the size/XXL label Jul 17, 2025

github-actions bot added the feat label Jul 17, 2025

merge main

19a6abd

coderabbitai bot reviewed Jul 17, 2025

View reviewed changes

ryanolson and others added 2 commits July 17, 2025 19:59

update deny.toml

3278239

Merge branch 'main' into ryan/token-counting-analysis

de4d30a

ryanolson requested a review from grahamking July 17, 2025 20:08

ryanolson added 12 commits July 19, 2025 05:32

updating ci again

413f3b6

Merge branch 'ryan/token-counting-analysis' of github.com:ai-dynamo/d…

16c0a56

…ynamo into ryan/token-counting-analysis

still seeing slow ci; merging slow cargo bits together

d8c844e

more ci

8dbdd09

descoping

c17fc24

clean up updated ci file

86f1e29

adding a ci profile

d613a4b

invalidate cache

f7b56e2

adding ci profiles everywhere

3f33571

missing ci profile

f176216

inherits is mandatory

36629f0

update gguf tokenizer to address upstream changes

0fa3f12

ryanolson enabled auto-merge (squash) July 19, 2025 08:08

nnshah1 approved these changes Jul 20, 2025

View reviewed changes

ryanolson and others added 3 commits July 20, 2025 22:25

Merge branch 'main' into ryan/token-counting-analysis

57a94f4

not sure why but tests in the vllm container are failing

0a29d53

Merge branch 'ryan/token-counting-analysis' of github.com:ai-dynamo/d…

94f7ebe

…ynamo into ryan/token-counting-analysis

ryanolson added the backlog Issues or bugs that will be tracked for long term fixes label Aug 15, 2025

github-actions bot added the Stale label Sep 17, 2025

refactor: remove deepseek tokenizer and replace with mock tokenizer

cad03b3

Signed-off-by: Ryan Olson <[email protected]>

github-actions bot removed the Stale label Sep 18, 2025

ryanolson added 2 commits September 26, 2025 08:05

chore: remove old utils.rs file after module restructure

17316a0

ryanolson disabled auto-merge September 26, 2025 08:48

style: format code with Rust 1.90 rustfmt

7a5b1a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: perf.rs - token counting analysis #1985

feat: perf.rs - token counting analysis #1985

Uh oh!

ryanolson commented Jul 17, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jul 17, 2025

Uh oh!

coderabbitai bot commented Jul 17, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

ryanolson commented Sep 18, 2025

Uh oh!

ryanolson commented Sep 20, 2025

Uh oh!

copy-pr-bot bot commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: perf.rs - token counting analysis #1985

Are you sure you want to change the base?

feat: perf.rs - token counting analysis #1985

Uh oh!

Conversation

ryanolson commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Token Counting Analysis

Logprob Analysis

GGUF Tokenizer Support

Merge Highlights

Usage

Files Changed

Uh oh!

copy-pr-bot bot commented Jul 17, 2025

Uh oh!

coderabbitai bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

ryanolson commented Sep 18, 2025

Uh oh!

ryanolson commented Sep 20, 2025

Uh oh!

copy-pr-bot bot commented Sep 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ryanolson commented Jul 17, 2025 •

edited

Loading

coderabbitai bot commented Jul 17, 2025 •

edited

Loading