Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on aborted requests to properly close the connection #11129

Merged

Conversation

tonybart1337
Copy link
Contributor

What does this PR do?

Fixes this issue

Might also fix this one. I tested it on HTTP/2 as well

Motivation

It breaks caching on our servers and many other things because we receive truncated data

Additional Notes

Client sends a HTTP/1.1 request to the server and downloads a file which is dynamically generated and returned in chunks, but sometimes the connection might abort midway and instead of aborting the connection traefik sends zero length chunk finalizing the request and the client ends up receiving a truncated response with no way of knowing whether it received the full file or not.

@rtribotte
Copy link
Member

Hello @tonybart1337,

Thanks for investigating this and opening this PR!

I don't think raising a panic is the best solution in that case.
I would either return an InternalServerError if the response has not yet been started to be written to the client, or, when the response has indeed already started to be sent to the client (verifiable by wrapping the response writer), grab the connection and close it.

Would you be incline to rework the PR in that direction?
Thanks!

@tonybart1337
Copy link
Contributor Author

tonybart1337 commented Sep 27, 2024

Hi @rtribotte.

Thank you for taking your time on this PR.
I'm not a golang guru, but if you share a few hints I will gladly commit that into the PR.

I think it only lacks a few lines of code and this is a pretty serious bug

@rtribotte
Copy link
Member

Hello @tonybart1337,

It is up to you, but the simplest way would that we push review commits to show the approach I was thinking of, it is always easier to demonstrate directly with code, WDYT?
Notably, wrapping the responseWriter could a bit challenging.

@rtribotte
Copy link
Member

@tonybart1337 I can open a PR on your branch to demonstrate the approach, but the changes must be first rebased on branch v2.11, as this is a bug fix. Thanks!

@rtribotte rtribotte changed the base branch from master to v2.11 September 30, 2024 09:53
@rtribotte rtribotte added this to the 2.11 milestone Sep 30, 2024
@tonybart1337
Copy link
Contributor Author

tonybart1337 commented Sep 30, 2024

I'll share what I come up with in a few hours today

Looks like github button didn't rebase properly, I'll force push the fix as well

@tonybart1337
Copy link
Contributor Author

tonybart1337 commented Sep 30, 2024

@rtribotte rebased + implemented the wrapper

still not sure on how to properly close the connection, doesn't panic already handle it for us?

@rtribotte
Copy link
Member

@tonybart1337 Thanks!

I think panics should be reserved for cases where the state is unexpected or irrecoverable, and here it seems possible to properly handle the response or close the connection.
I'll open a PR on your branch to propose adjustments.

@tonybart1337
Copy link
Contributor Author

tonybart1337 commented Oct 3, 2024

@rtribotte please lmk if I can help somehow, I totally agree that panics should be used as the last resort, but since it gets the job done and closes the connection properly, maybe we should merge it to fix the issue and then you can implement a proper solution without a panic?

this issue can lead to serious consequences like caching invalid data and serving it to the end users, or serving incomplete data while the end users will have no idea why this happened, or even worse if they have no means to check the validity of the data received

in most cases I believe that users will get an immediate error when they try to decode their binary data, like an image, for example, or maybe most users don't use "transfer-encoding: chunked" at all, otherwise I don't know how to explain why this is not fixed yet

@rtribotte
Copy link
Member

rtribotte commented Oct 3, 2024

Hello @tonybart1337,

Actually, I think you are right about reflecting the panic, this let the net/http server handle the response abortion.
I have opened tonybart1337#1 for review.
Could please address it and also add tests?
Thanks!

@rtribotte
Copy link
Member

Thanks @tonybart1337 for merging the PR, we will leave the waiting for fixes label until the unit tests are fixed and a new one is introduced for this new behavior.

@kevinpollet
Copy link
Member

Hello @tonybart1337,

Are you still willing to work on this pull request and address @rtribotte's changes, or do you want us to push some commits?

@tonybart1337
Copy link
Contributor Author

Hi @kevinpollet .

If you could implement the tests yourselves that would be great, I don't have a lot of spare time at the moment 😞

@kevinpollet kevinpollet changed the title panic on aborted requests to properly close the connection Panic on aborted requests to properly close the connection Oct 25, 2024
@kevinpollet kevinpollet force-pushed the bugfix/aborted-requests-recovery branch from c8ef92e to ea78d51 Compare October 25, 2024 08:16
Copy link
Member

@kevinpollet kevinpollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

Copy link
Member

@rtribotte rtribotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kevinpollet kevinpollet force-pushed the bugfix/aborted-requests-recovery branch from ffc7f17 to 3e3d1f4 Compare October 25, 2024 13:33
@kevinpollet kevinpollet force-pushed the bugfix/aborted-requests-recovery branch from 3e3d1f4 to 2fdd7a5 Compare October 25, 2024 13:35
@traefiker traefiker merged commit 2794849 into traefik:v2.11 Oct 25, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants