- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 4.6k
Closed
Labels
Description
What happened?
I have OpenAI tier 5 usage, which should give me 30,000 RPM = 500 RPS with "gpt-4o-mini". However I struggle get past 50 RPS.
The minimal replication:
from litellm import acompletion
tasks = [acompletion(
    model="gpt-4o-mini",
    messages=[
      {"role": "system", "content": "You're an agent who answers yes or no"},
      {"role": "user", "content": "Is the sky blue?"},
    ],
) for i in range(2000)]
I only get 50 items/second as opposed to ~500 items/second when sending raw HTTP requests.
Relevant log output
 16%|█████████████████████▌                                                                                                                 | 320/2000 [00:09<00:40, 41.49it/s]
Twitter / LinkedIn details
No response
CharlieJCJ, RyanMarten and fmmoret