Add a concurrency limit to Process.ProxyRequest#123
Conversation
WalkthroughA concurrency limit feature was introduced to control the number of simultaneous HTTP requests handled by the proxy. This involved adding a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Proxy (Process)
participant Backend
Client->>Proxy (Process): HTTP Request
alt Concurrency slot available
Proxy (Process)->>Proxy (Process): Acquire semaphore slot
Proxy (Process)->>Backend: Forward request
Backend-->>Proxy (Process): Response
Proxy (Process)->>Proxy (Process): Release semaphore slot
Proxy (Process)-->>Client: Response
else Concurrency limit reached
Proxy (Process)-->>Client: HTTP 429 Too Many Requests
end
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
proxy/process_test.go (1)
344-374: Well-designed test for concurrency limitingThe test effectively validates the concurrency limiting behavior by:
- Setting the limit to 1
- Verifying the semaphore capacity
- Occupying the semaphore with a slow request
- Confirming that additional requests receive a 429 status code
Consider adding an additional test case with a higher concurrency limit (e.g., 2 or 3) to verify that multiple requests can be processed simultaneously up to the limit, and only excess requests are rejected.
proxy/process.go (1)
432-438: Effective implementation of concurrency limitingThe non-blocking send and receive on the semaphore is the correct approach for limiting concurrency. The implementation also:
- Returns the appropriate 429 status code
- Uses proper error messaging
- Ensures the semaphore is released after request processing via
deferConsider adding request metrics for rejected requests (due to concurrency limits) to help with monitoring and capacity planning.
+// Track metrics for rejected requests http.Error(w, "Too many requests", http.StatusTooManyRequests) +p.proxyLogger.Debugf("<%s> request %s - rejected due to concurrency limit", p.ID, r.RequestURI) return
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
proxy/config.go(1 hunks)proxy/process.go(3 hunks)proxy/process_test.go(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
proxy/process_test.go (1)
proxy/process.go (1)
NewProcess(65-89)
🔇 Additional comments (4)
proxy/config.go (1)
26-28: Good implementation of configuration for concurrency limitingThe new
ConcurrencyLimitfield added toModelConfigis well-documented and correctly tagged for YAML deserialization. This aligns with the PR objective of making the concurrency limit configurable at the model level.proxy/process.go (3)
61-62: Good field naming and documentationThe field name clearly describes its purpose for managing concurrency limits.
67-72: Well-implemented default concurrency limitSetting a reasonable default (10) and providing debug logging when using the default is a good practice. This aligns with the PR objective of introducing a configurable concurrency limit.
86-87: Correct semaphore initializationThe semaphore is properly initialized as a buffered channel with the configured capacity.
Part of the stability and reliability improvements work.
llama-swap does not have a concurrent connection limit, or a connection timeout to the upstream. When the upstream does not respond those connections will be kept open, waiting for a response. This PR adds a default connection limit of 10. After 10 concurrent requests the response will respond with a 429 Too Many Requests.
This can be overridden in the configuration like so:
Note: This PR comes out when working with llama.cpp vscode extension for FIM. After restarting llama-swap, hundreds of connections, some waiting for hours, wound up being dropped.
Summary by CodeRabbit
New Features
Tests