Skip to content

[model-gateway] Optimize memory usage in HTTP router#14667

Merged
slin1237 merged 1 commit intomainfrom
cpu-overhead-grpc-2
Dec 8, 2025
Merged

[model-gateway] Optimize memory usage in HTTP router#14667
slin1237 merged 1 commit intomainfrom
cpu-overhead-grpc-2

Conversation

@slin1237
Copy link
Collaborator

@slin1237 slin1237 commented Dec 8, 2025

Key optimizations:

  1. tree.rs: Fix O(n²) character iteration

    • Add CharIndexedText struct for O(1) character access
    • Replace chars().nth(idx) with pre-indexed Vec<char>
    • Use shared_prefix_count_indexed() to avoid intermediate strings
  2. pd_router.rs: Reduce allocations

    • Use Arc to share request across retry attempts (avoids O(retries) clones)
    • Use static string constants for bootstrap JSON keys
    • Optimize logprob merging with std::mem::take to avoid double cloning
    • Pre-allocate merged arrays with exact capacity
  3. router.rs: Reduce string allocations

    • Pre-allocate Bearer token header with capacity
    • Use static string constant for DP rank key
  4. chat.rs: Optimize to_simple_string()

    • Build string directly without intermediate Vec allocation

Checklist

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant