Skip to content

flush() after writing to gzip_file#2753

Closed
vin wants to merge 2 commits intoKludex:masterfrom
vin:patch-1
Closed

flush() after writing to gzip_file#2753
vin wants to merge 2 commits intoKludex:masterfrom
vin:patch-1

Conversation

@vin
Copy link

@vin vin commented Nov 15, 2024

Summary

In order to better support streaming responses where the chunks are smaller than the file buffer size, we flush after writing.

Without the explicit flush, the writes are buffered and the subsequent reads see an empty self.gzip_buffer until the file automatically flushes due to either (1) the 32KiB write buffer1 fills or (2) the file is closed because the streaming response is complete.

Without flushing, the GZipMiddleware doesn't work as expected for streaming responses, especially not for Server-Sent Events which are expected to be delivered immediately to clients. The code as written appears to intend to flush immediately rather than buffering, as it does immediately call await self.send(message), but in practice that message is often empty.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

Footnotes

  1. https://github.com/python/cpython/blob/main/Lib/gzip.py#L26

@Kludex
Copy link
Owner

Kludex commented Nov 15, 2024

Can we add a test to prove your point?

@vin
Copy link
Author

vin commented Nov 15, 2024

I'll work on that today. I have a small repro case I'll share but I need to make it run as a test

@vin
Copy link
Author

vin commented Nov 16, 2024

I've added a test, but it's a bit complicated. Without the flush, the entire contents of the response are correct, but to show that they are received iteratively rather than all at once, I use a wrapping middleware to assert that GZipMiddleware isn't sending empty message bodies, which is what it does without the flush.

@vin
Copy link
Author

vin commented Nov 25, 2024

@Kludex could you take another look or recommend a good reviewer? Thanks!

@Kludex
Copy link
Owner

Kludex commented Dec 5, 2024

Not related.

@vin
Copy link
Author

vin commented Dec 13, 2024

Any concerns here, or how can we best move this forward?

@Kludex
Copy link
Owner

Kludex commented Dec 13, 2024

Any concerns here, or how can we best move this forward?

The best way to move forward would be to present the problem first, with an MRE, and references to other issues where other people had the same problem.

I think the current behavior is intentional, so I need to get more references around before reviewing this.

@vin
Copy link
Author

vin commented Dec 16, 2024

Best observed in a browser, but here's a small repro:

import time
from fastapi import FastAPI
from fastapi.middleware.gzip import GZipMiddleware
from fastapi.responses import StreamingResponse

app = FastAPI()
app.add_middleware(GZipMiddleware)

@app.get("/")
def streaming_response():

    def generate():
        for i in range(10):
            yield f'event: ping\ndata: {{"i": {i}}}\n\n'
            time.sleep(1)
    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )

and what it looks like before this PR:
image

and after:
image

In the first case, all responses appear together after 10 seconds. In the second, every 1 second a new event appears.

@vin
Copy link
Author

vin commented Dec 16, 2024

If the behavior is intentional, we really need to update the documentation here:
https://fastapi.tiangolo.com/advanced/middleware/#gzipmiddleware

The middleware will handle both standard and streaming responses.

to indicate that this will cause streaming responses to be buffered 32KiB at a time; this was certainly a surprising result for us, and one that caused our users to report that our app appeared broken as realtime status updates stopped working.

@Kludex
Copy link
Owner

Kludex commented Jan 23, 2025

This seems a valid PR.

It seems other web frameworks have the same issue. I'm confused as to how no one noticed... It's even hard to find references about people having issues with this. 🤔

@Kludex
Copy link
Owner

Kludex commented Jan 24, 2025

Okay. I look around, and based on what you said, I understand we have a real problem with SSEs.

It seems we should either:

  • Ignore the middleware on SSE.
  • Flush the buffer on every message, otherwise we can't see the ping.

I'm not so sure about all the streaming responses.


Reference to my future self: https://www.npmjs.com/package/compression#server-sent-events

vin added 2 commits January 24, 2025 12:55
In order to better support streaming responses where the chunks are smaller than the file buffer size, we flush after writing.

Without the explicit flush, the writes are buffered and the subsequent reads see an empty self.gzip_buffer until the file automatically flushes due to either
(1) the write buffer fills, probably at 8kiB, or (2) the file is closed because the streaming response is complete.

Without flushing, the GZipMiddleware doesn't work as expected for streaming responses, especially not for Server-Sent Events which are expected to be delivered immediately to clients. The code as written appears to intend to flush immediately rather than buffering, as it does immediately call `await self.send(message)`, but in practice that `message` is often empty.
@vin
Copy link
Author

vin commented Jan 24, 2025

I think having both GZip + SSE should be supportable. I think the most correct behavior is to send each message as it becomes available, even if this is not the most optimal compression. But still, enabling compression for streaming responses (including SSE) can be a reasonable choice; those individual messages could theoretically be larger than 32k and may benefit from compression the same as non-streaming responses might.

It seems the current implementation (without the flush) favors optimization of the compression at the expense of timely delivery. Enabling the flush allows the developer to accept the suboptimal compression and get the expected timely delivery, or to choose to trade off timely delivery for potential compression improvements by implementing their own buffering.

@vin
Copy link
Author

vin commented Jan 25, 2025

Similar conversation happening at tuffnatty/zstd-asgi#5. I argue that the least-surprising behavior of the middleware is to deliver small messages immediately rather than holding them in a buffer for an indefinite amount of time. As a user of the middleware, I'd rather read "compression won't be as effective for small messages; buffer them into larger chunks if you wish to maximize compression" vs "small messages will be held in a buffer and delivered 32kB at a time; if you don't want this, don't use this middleware"

@Kludex
Copy link
Owner

Kludex commented Jan 25, 2025

Is there any web framework in any language that does this right in the middleware level?

@tuffnatty
Copy link

As I suggested in tuffnatty/zstd-asgi#5, why not allow both behaviours selecting the desired one with a parameter?

@Kludex
Copy link
Owner

Kludex commented Jan 25, 2025

As I suggested in tuffnatty/zstd-asgi#5, why not allow both behaviours selecting the desired one with a parameter?

Because that seems to be taken responsibility from our side and give to the developers. I'm still trying to understand the implications of flushing it on every message. Seems a lot.

But for SSE, we need to send the ping messages, so there's no need for the user to configure it.

@vin
Copy link
Author

vin commented Jan 27, 2025

I agree that adding a parameter should be avoided if possible; I'd rather the middleware figure out the correct behavior and implement it, transparently and consistently.

Do we have any evidence the current implementation was even intentional? Read-after-write without flushing is a common, easy-to-make programming error. Calling await self.send(message) at line 100 -- when message["body"] is empty -- seems questionable.

@Kludex
Copy link
Owner

Kludex commented Feb 14, 2025

I did some research.

It seems we have 2 options:

  1. flush on every chunk
  2. disable compression on text/event-stream

I'll investigate other frameworks.

@Kludex
Copy link
Owner

Kludex commented Feb 15, 2025

@Kludex
Copy link
Owner

Kludex commented Feb 15, 2025

On tokio-rs/axum#2728, we can see:

Since the lack of compression is a bit painful when using SSE with htmx (streaming html without compression can use way too much bandwidth) [...]

But it's not supported out of the box, user needs to manually flush.

@Kludex
Copy link
Owner

Kludex commented Feb 15, 2025

Conclusions:

  1. I don't think we should force a flush on every chunk. Based on the frameworks above, it seems everybody is in agreement that we shouldn't be compressing that often to small chunks.
  2. This middleware makes it unusable to use server sent events. We don't offer any API to the user to flush, or to avoid a single endpoint - You could use middleware per Route/Router tho - but it's still surprising to find this issue.
  3. There are cases that SSE can still send big chunks, but not big enough to fill the buffer. As mentioned here.

So... I don't think we should merge this PR, in fact, I think we should write a test to avoid making this change in the future.

I think we should start by ignoring the middleware on text/event-source content. As for the nr3 of my conclusions, I don't have an answer yet. Happy to read more thoughts.

@vin
Copy link
Author

vin commented Feb 17, 2025

Thanks for doing the thorough research into the state of other frameworks!
I'm glad to see the alternative that will work for our specific use case. I personally still think the flush is best for the framework, since it allows the most flexibility for users:

  1. With large non-streaming responses, it works optimally
  2. With streaming responses, chunks are delivered without unnecessary delay
  3. With streaming responses, it still compresses, though suboptimally
  4. A user can choose to optimize for compression at the expense of latency by adding their own buffering
  5. Without the flush, the framework has chosen to optimize for compression at the expense of latency, with no way of overriding besides disabling compression altogether
  6. The text/event-stream detection streamlines the choice from (5), but still at the expense of disabling compression entirely rather than accepting suboptimal compression. I think it also just makes the middleware more surprising: it behaves one way in some cases and another way in other cases.

All that said, I'm glad to see the framework will support our specific use case without us needing to apply handler-specific workarounds, once #2871 is in. Thanks again for your collaboration here.

@Kludex
Copy link
Owner

Kludex commented Feb 22, 2025

Thanks for the help. 🙏

@Kludex Kludex closed this Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments