Inconcistency between Responses API and Chat Completions API on rate limit errors

### Confirm this is an issue with the Python library and not an underlying OpenAI API

- [x] This is an issue with the Python library

### Describe the bug

I noticed that rate limit errors are raised differently in the two APIs during streaming. In chat completions the error is raised and retried by the client itself on the initial call, before reading the stream. On the other hand, the first call to responses.create(...) returns a stream object successfully and then upon reading the stream, an APIError is raised indicating a rate limit. As a result, the whole retrying logic in the client is completely skipped and the user has to implement their own. It is also unclear whether an error could be raised mid-stream which adds to the complexity of the fix.

After inspecting the client's code, I initially thought that this is not a library issue but a misalignment with the model's service. I contacted Microsoft support since they host our models but they insisted I raise an issue here.



### To Reproduce

To reproduce this you would need an Azure hosted OpenAI model ( I have personally already tried this with gpt-4o and gpt-5 ). Lower your token rate limit threshold to a low enough value and use the below snippet. 

### Code snippets

```Python
from openai import OpenAI

client = OpenAI(  
  base_url = "your-endpoint-v1",
  api_key="your-api-key"
)

response = client.responses.create(
    model="your-model",
    store=False,
    stream=True,
    input='a long enough prompt'*10000 # to hit a rate limit
)

for event in response:
   continue # an APIError is raised here
```

### OS

Linux, Windows

### Python version

Python >=3.11.9

### Library version

openai v1.109.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconcistency between Responses API and Chat Completions API on rate limit errors #2699

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconcistency between Responses API and Chat Completions API on rate limit errors #2699

Description

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions