Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% cpu in mainthread due to not closing properly? (channel.connected == False) #418

Closed
djay opened this issue Sep 11, 2023 · 97 comments · Fixed by #435
Closed

100% cpu in mainthread due to not closing properly? (channel.connected == False) #418

djay opened this issue Sep 11, 2023 · 97 comments · Fixed by #435

Comments

@djay
Copy link

djay commented Sep 11, 2023

Following on from debugging in this issue - collective/haufe.requestmonitoring#15

What we see is waitress switching into 100% CPU and staying there. It is happening in production randomly (within a week) and we haven't traced it back to a certain request).

Using a sampling profiler on waitress with 2 threads (in prod) we identified the thread using the CPU as the mainthread (top -H) and this is the profile. Note that since this is prod there are other requests so not all activity is related to the looping causing this bug.

 Austin  TUI   Wall Time Profile                                                                                             CPU  99% ▇█▇█▇▇▇▇   MEM 263M ████████    5/5
   _________   Command /app/bin/python3.9 /app/parts/instance/bin/interpreter /app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py /app/parts/instance/etc/wsgi.ini
   ⎝__⎠ ⎝__⎠   Python 3.9.0     PID 3351466     PID:TID 3351466:11          
               Samples 1451365  ⏲️   3'16"       Threshold 0%   
  OWN    TOTAL    %OWN   %TOTAL  FUNCTION                                                                                                                                     
  00"    3'16"     0.0%  100.0%  └─ <module> (/app/parts/instance/bin/interpreter:326)                                                                                       ▒
  00"    3'16"     0.0%  100.0%     └─ <module> (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:255)                                                                    ▒
  00"    3'16"     0.0%  100.0%        └─ main (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:251)                                                                     ▒
  00"    3'16"     0.0%  100.0%           └─ run (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:217)                                                                   │
  00"    3'16"     0.0%  100.0%              └─ serve (/app/eggs/Zope-5.6-py3.9.egg/Zope2/Startup/serve.py:203)                                                              │
  00"    3'16"     0.0%  100.0%                 └─ serve (/app/eggs/plone.recipe.zope2instance-6.11.0-py3.9.egg/plone/recipe/zope2instance/ctl.py:942)                       │
  00"    3'16"     0.0%  100.0%                    └─ serve_paste (/app/eggs/plone.recipe.zope2instance-6.11.0-py3.9.egg/plone/recipe/zope2instance/ctl.py:917)              │
  00"    3'16"     0.0%  100.0%                       └─ serve (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/__init__.py:19)                                                  │
  00"    3'16"     0.0%  100.0%                          └─ run (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/server.py:322)                                                  │
  05"    3'16"     2.5%   99.9%                             ├─ loop (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:245)                                           │
  36"     44"     18.3%   22.4%                             │  ├─ poll (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:158)                                        │
  05"     05"      2.4%    2.4%                             │  │  ├─ readable (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/server.py:290)                                    │
  01"     02"      0.4%    0.9%                             │  │  ├─ write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:117)                                    │
  01"     01"      0.4%    0.5%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:517)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  ├─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:95)                          │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:514)                    │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:515)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  ├─ poll (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:150)                                  │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  └─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:509)                    │
  00"     00"      0.0%    0.0%                             │  │  │     └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.1%                             │  │  ├─ write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:113)                                    │
  00"     00"      0.1%    0.1%                             │  │  │  ├─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:517)                    │
  00"     00"      0.0%    0.0%                             │  │  │  │  └─ handle_write (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:98)                          │
  00"     00"      0.0%    0.0%                             │  │  │  └─ handle_write_event (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/wasyncore.py:509)                    │
  00"     00"      0.1%    0.1%                             │  │  ├─ readable (/app/eggs/waitress-2.1.2-py3.9.egg/waitress/channel.py:154)   

from profiling it looks like channel is writable but the channel.connected == False.
So then it goes into a loop without writing or closing since it never actually does anything to the socket.
https://github.com/Pylons/waitress/blob/main/src/waitress/channel.py#L98

EDIT: My suspicion would be that what started this was a client that shutdown (half) very quickly after a connect and this happened before the dispatcher finished being setup. This causes getpeername to fail with EINVAL and connected = False.

self.connected = False

            try:
                self.addr = sock.getpeername()
            except OSError as err:
                if err.args[0] in (ENOTCONN, EINVAL):
                    # To handle the case where we got an unconnected
                    # socket.
                    self.connected = False
                else:
                    # The socket is broken in some unknown way, alert
                    # the user and remove it from the map (to prevent
                    # polling of broken sockets).
                    self.del_channel(map)
                    raise

Could be same issue as #411 but hard to tell.

One fix in #419 but could be better ways?

@djay
Copy link
Author

djay commented Sep 11, 2023

@d-maurer

The error is where self.connected was set tu False.
There, it should have been ensured that the corresponding "fileno"
is removed von socket_map and that it will not be put there again
(as long as self.connected remains False).

Something exceptional must have brought waitress into this state
(otherwise, we would have lots of 100 % CPU usage reports).
I assume that some bad client has used the system call shutdown
to close only part of the socket connection and that waitressdoes not
anticipate something like that.

Waitress does seem to properly close if shutdown is received (empty data).
see https://github.com/Pylons/waitress/blob/main/src/waitress/wasyncore.py#L449

So have to keep looking for a way connected can be false but it can still be trying to write.
Yes it is most likely bad actors. We get hit by this a lot in our line of business.

@d-maurer
Copy link

d-maurer commented Sep 11, 2023 via email

@d-maurer
Copy link

d-maurer commented Sep 11, 2023 via email

@djay
Copy link
Author

djay commented Sep 11, 2023

@d-maurer yes it could work to insert a self.del_channel(self) in

@djay
Copy link
Author

djay commented Sep 11, 2023

@d-mauer I created a PR and it passes the current tests and they hit that line but hard to know how to make a test for this scenario...

#419

@d-maurer
Copy link

d-maurer commented Sep 11, 2023 via email

@djay
Copy link
Author

djay commented Sep 12, 2023

@mcdonc another solution instead of #419 might be below. is that preferable?

def poll(timeout=0.0, map=None):
    if map is None:  # pragma: no cover
        map = socket_map
    if map:
        r = []
        w = []
        e = []
        for fd, obj in list(map.items()):  # list() call FBO py3
            # prevent getting into a loop for sockets disconnected but not properly closed.
            if obj.check_client_disconnected():
                obj.del_channel()
                continue

perhaps you have a better idea on how it could have got into this knot and the best way to test?

@djay
Copy link
Author

djay commented Sep 12, 2023

@mcdonc one code path that could perhaps lead to this is

# the user and remove it from the map (to prevent

since connecting == False also there doesn't seem to be a way for it to write data out or close?

EDIT: one scenario could be the client half disconnected very quickly before the dispatcher was setup so getpeername fails? but somehow the socket still can be written to?

@djay
Copy link
Author

djay commented Sep 12, 2023

Looks like it is possible a connection thats been broken before getpeername to then no have any error in select. in the case where there is nothing to read since that will result in a close. https://stackoverflow.com/questions/13257047/python-select-does-not-catch-broken-socket. not sure how it has something write in that case. maybe shutdown for readonly very quickly?

EDIT: https://man7.org/linux/man-pages/man3/getpeername.3p.html

"EINVAL The socket has been shut down." <- so looks like shutdown for read very quickly seems possible to create this tight loop.

@djay
Copy link
Author

djay commented Sep 12, 2023

or somethow the getpeername is invalid and that results in a oserror. and there is nothing to read but something to write. but I'm not sure if that results in the EINVAL or not.

@d-maurer
Copy link

d-maurer commented Sep 12, 2023 via email

@d-maurer
Copy link

d-maurer commented Sep 12, 2023 via email

@djay
Copy link
Author

djay commented Sep 12, 2023

@d-maurer I'm fairly sure I have one solid explanation how this could occur.

Outlined in this test - https://github.com/Pylons/waitress/pull/419/files#diff-5938662f28fcbb376792258701d0b6c21ec8a1232dada6ad2ca0ea97d4043d96R775

NOTE: I haven't worked out a way to detect the looping in a test yet. So the assert at the end is not correct.

It is as you say. There is a shutdown of the read only but this is a race condition. it has to happen before the dispatcher is created so right after the connect. I've confirmed this results in an getpeername returning OSError EINVAL and thus connected = False and the select still thinks it can write so the loop will be inifinite. or maybe until the bad actor properly closes the connection. not sure on that one.

In the long run waitress should likely change its "connected" concept.
HTTP is based on TCP which implements bidirectional communication channels.
The shutdown system call allows applications to shut down individual
directions. This is not compatible with a boolean "connected",
instead we have 4 connection states: (fully) disconnected,
read connected, write connected and (fully) connected.

true. but if I'm right on the cause of this this, the socket would never have connected=False with most shutdowns. Only when it happens too quickly. That flag is mostly used to indicate not yet connected or in the process of closing.

My favorite workaround (ensure "writable" returns "False" when "not connected")
is so simple that no test is necessary to convince us that
the busy loop is avoided.

yes that will also work. I'll switch it to that.
There is a system to remove inactive sockets so I guess that would get them closed eventually.
I'm not really sure the pros and cons of having sockets left open vs the consequences of just closing them for this case (I tried this. it also worked in terms of the tests).

@djay
Copy link
Author

djay commented Sep 12, 2023

@d-maurer I pushed new code that uses writable instead.

@d-maurer
Copy link

d-maurer commented Sep 12, 2023 via email

@djay
Copy link
Author

djay commented Sep 13, 2023

@d-maurer maybe a core contributor can step in and advise the best solution and test. @digitalresistor @kgaughan ?

@djay djay changed the title 100% cpu in mainthread due to not closing properly? 100% cpu in mainthread due to not closing properly? (channel.connected == False) Sep 13, 2023
@d-maurer
Copy link

d-maurer commented Sep 13, 2023 via email

@djay
Copy link
Author

djay commented Sep 13, 2023

@d-maurer that was my initial thought but as I pointed out in #418 (comment) recv in wasynccore will do handle_close on getting empty data and take it out of the map so I couldn't see any way for no bytes being sent to cause this loop.

    def recv(self, buffer_size):
        try:
            data = self.socket.recv(buffer_size)
            if not data:
                # a closed connection is indicated by signaling
                # a read condition, and having recv() return 0.
                self.handle_close()
                return b""
            else:
                return data
        except OSError as why:
            # winsock sometimes raises ENOTCONN
            if why.args[0] in _DISCONNECTED:
                self.handle_close()
                return b""
            else:
                raise

@djay
Copy link
Author

djay commented Sep 13, 2023

Also when I did some testing it did seem like the select would indicate a write was possible even without the back end producing any data. So there is no read needed. Just a connect and very quick shutdown. But I do have to work out a proper test for that.

@d-maurer
Copy link

d-maurer commented Sep 13, 2023 via email

@d-maurer
Copy link

d-maurer commented Sep 13, 2023 via email

@d-maurer
Copy link

d-maurer commented Sep 13, 2023 via email

@djay
Copy link
Author

djay commented Sep 13, 2023

@d-maurer that would be a different bug in waitress.

My problem is I run out of CPU on my servers if I don't restart them often due to these weird requests we are receiving. That no one else is the world seems to get :(

@d-maurer
Copy link

d-maurer commented Sep 13, 2023 via email

@d-maurer
Copy link

d-maurer commented Sep 13, 2023 via email

@d-maurer
Copy link

d-maurer commented Sep 13, 2023 via email

@d-maurer
Copy link

"EINVAL The socket has been shut down." <- so looks like shutdown for read very quickly seems possible to create this tight loop.

Note that a socket has 2 ends. "The socket has been shut down" might refer to the local (not the remote) end.

@digitalresistor
Copy link
Member

No, channel_request_lookahead is used to allow waitress to try and read more than the first request (normally it would read a request, wait for a response to be created/processed, only then read the next request, so that it wouldn't read a bunch of data/HTTP pipelining requests from the wire without being able to respond for example if the connection was goin to be closed early)... this however meant there was no good way to let the app know that the client disconnected for long running requests that wanted to check before they committed data to the database for example.

That is why waitress has a waitress.client_disconnected callable in the environment that allows the app to check if the remote client has disconnected, but that only works if we continue to read() from the socket and thus get notified when the remote hangs up.

@djay
Copy link
Author

djay commented Sep 18, 2023

@d-maurer maybe but it will result in the socket being closed since it will allow it to read the RST. Thats also how it works in the test. I'll try it in production.

But this is still a bug. I've put in two fixes.

    def handle_write(self):
        # Precondition: there's data in the out buffer to be sent, or
        # there's a pending will_close request

        if not self.connected and not self.close_when_flushed:
            # we dont want to close the channel twice
            return
    def __init__(self, server, sock, addr, adj, map=None):
        self.server = server
        self.adj = adj
        self.outbufs = [OverflowableBuffer(adj.outbuf_overflow)]
        self.creation_time = self.last_activity = time.time()
        self.sendbuf_len = sock.getsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF)

        # requests_lock used to push/pop requests and modify the request that is
        # currently being created
        self.requests_lock = threading.Lock()
        # outbuf_lock used to access any outbuf (expected to use an RLock)
        self.outbuf_lock = threading.Condition()

        wasyncore.dispatcher.__init__(self, sock, map=map)
        if not self.connected:
            # Sometimes can be closed quickly and getpeername fails.
            self.handle_close()

        # Don't let wasyncore.dispatcher throttle self.addr on us.
        self.addr = addr
        self.requests = []

@djay
Copy link
Author

djay commented Sep 20, 2023

@d-maurer I've also worked out why maintenance never close the connection and put in a fix for that too.
Unfortunately there doesn't seem to be a test for this close called twice case so still not clear why that line that caused the loop is there in the first place and if isn't better just to get rid of it.

@djay
Copy link
Author

djay commented Sep 20, 2023

Well this is at least the commit that put this in
9a7d2e5

@d-maurer
Copy link

d-maurer commented Sep 20, 2023 via email

@d-maurer
Copy link

d-maurer commented Sep 20, 2023 via email

@d-maurer
Copy link

d-maurer commented Sep 20, 2023 via email

@djay
Copy link
Author

djay commented Oct 20, 2023

The current PR has been running in production of a couple of weeks without the bug reoccuring so that at least confirms the source of the issue.
But I will still rewrite the PR with changes mentioned above

@digitalresistor
Copy link
Member

@djay I have not been able to reproduce the original error with the current released version of waitress. It is somewhat frustrating that I can't reproduce it.

@djay
Copy link
Author

djay commented Feb 5, 2024

@digitalresistor I merged in master, reversed my fixes and uncommented the test case code that reproduces it and it showed that it went into a loop still. How did you try and reproduce it?

@digitalresistor
Copy link
Member

@djay I have not been using the test code you created. I wanted to reproduce it without trying to set up the conditions to match exactly, or forcing a socket into a particular state. I have been testing with waitress running on a RHEL 8.9 system, I have been unable to get getpeername() to fail in those conditions even while trying to do the half-shutdown and stress testing waitress at the same time. I have not manually manipulated SO_LINGER to setup the conditions as specified, just using the defaults from the OS.

I also would expect to see far more instances of this error happening out in the wild, waitress is used extensively, and I have yet to see it happen on any of my applications that are running in production.

I don't disagree that there is an issue and we can solve it, but I find it interesting that it is not easily reproducible on test systems I have.

The current PR does the closing much later, I'd much rather see it get handled when getpeername() fails before we even attempt to do anything else with the socket, or start setting up a channel, if we can avoid doing extra work, let's avoid doing extra work.

Then we can rework the logic for writeable as well to make sure that we can't busy loop there.

@d-maurer
Copy link

d-maurer commented Feb 5, 2024 via email

@djay
Copy link
Author

djay commented Feb 5, 2024

@digitalresistor The circumstances are very particular. It was something we were getting from pentests only I believe.
An invalid request is sent and immediately closed by the client before the channel could be created. it would not be easy to reproduce as you would need the invalid request, and a loaded enough server that it not be so fast to create the channel.

The current PR does the closing much later, I'd much rather see it get handled when getpeername() fails before we even attempt to do anything else with the socket, or start setting up a channel, if we can avoid doing extra work, let's avoid doing extra work.

This part ensures its closed if getpeername fails

https://github.com/Pylons/waitress/pull/419/files#diff-5cb215c6142fa7daad673c77fcb7f3bc0a0630e18e3487a5ac0894e90f1b2071R71

And I added some other tests to show you how the other fix prevents any looping if for any other reasons channel.connected would become false and we have an error in the request. Even though that should not happen.

@digitalresistor
Copy link
Member

@djay I will take another look at this over the next week or so. I appreciate the follow-up!

@djay
Copy link
Author

djay commented Feb 13, 2024

@digitalresistor I wouldn't think about it too long. basically I've given a recipe to DDOS any waitress site. lock up one thread with a single request. This would happen every sunday with this repeated pentest. Once I deployed the fix it never happened again. Leaving a bug like this is in doesn't seem right to me.

digitalresistor added a commit that referenced this issue Mar 3, 2024
No longer call getpeername() on the remote socket either, as it is not
necessary for any of the places where waitress requires that self.addr
in a subclass of the dispatcher needs it.

This removes a race condition when setting up a HTTPChannel where we
accepted the socket, and know the remote address, yet call getpeername()
again which would have the unintended side effect of potentially setting
self.connected to False because the remote has already shut down part of
the socket.

This issue was uncovered in #418, where the server would go into a hard
loop because self.connected was used in various parts of the code base.
@digitalresistor
Copy link
Member

I've created a new PR that removes getpeername() entirely. It is not useful in the context of waitress, and cleaned up some code related to self.connected, since calling self.handle_close() multiple times is not an issue.

Could you drop this into your system instead and take a look if you still see the same issue: #435

@djay
Copy link
Author

djay commented Mar 4, 2024

@digitalresistor I can.
But are you sure that there is no other scenario where self.connected is False so it goes into a loop and it can't ever get closed down?
ie this line is still problematic I think. https://github.com/Pylons/waitress/pull/419/files#diff-5cb215c6142fa7daad673c77fcb7f3bc0a0630e18e3487a5ac0894e90f1b2071L95

@djay
Copy link
Author

djay commented Mar 4, 2024

@digitalresistor sorry I see you remove this also. Ok, I will test this

@digitalresistor
Copy link
Member

@djay do you have any feedback with the newly proposed changes?

@djay
Copy link
Author

djay commented Mar 22, 2024

@digitalresistor sorry for the delay on this. It was deployed earlier this week but the weekly pentest only happens once a week on a sunday so monday we should know.

@djay
Copy link
Author

djay commented Apr 3, 2024

@digitalresistor seems to be running fine in prod with what we presume are the same pentests being thrown at it

nicopicchio pushed a commit to uktrade/jml that referenced this issue Oct 29, 2024
Bumps [waitress](https://github.com/Pylons/waitress) from 2.1.2 to
3.0.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/Pylons/waitress/releases">waitress's
releases</a>.</em></p>
<blockquote>
<h2>v3.0.0</h2>
<h1>3.0.0 (2024-02-04)</h1>
<ul>
<li>
<p>Rename &quot;master&quot; git branch to &quot;main&quot;</p>
</li>
<li>
<p>Fix a bug that would appear on macOS whereby if we accept() a socket
that is
already gone, setting socket options would fail and take down the
server. See
<a
href="https://github.com/Pylons/waitress/pull/399">Pylons/waitress#399</a></p>
</li>
<li>
<p>Fixed testing of vendored asyncore code to not rely on particular
naming for
errno's. See <a
href="https://github.com/Pylons/waitress/pull/397">Pylons/waitress#397</a></p>
</li>
<li>
<p>HTTP Request methods and versions are now validated to meet the HTTP
standards thereby dropping invalid requests on the floor. See
<a
href="https://github.com/Pylons/waitress/pull/423">Pylons/waitress#423</a></p>
</li>
<li>
<p>No longer close the connection when sending a HEAD request response.
See
<a
href="https://github.com/Pylons/waitress/pull/428">Pylons/waitress#428</a></p>
</li>
<li>
<p>Always attempt to send the Connection: close response header when we
are
going to close the connection to let the remote know in more instances.
<a
href="https://github.com/Pylons/waitress/pull/429">Pylons/waitress#429</a></p>
</li>
<li>
<p>Python 3.7 is no longer supported. Add support for Python 3.11, 3.12
and
PyPy 3.9, 3.10. See <a
href="https://github.com/Pylons/waitress/pull/412">Pylons/waitress#412</a></p>
</li>
<li>
<p>Document that trusted_proxy may be set to a wildcard value to trust
all
proxies. See <a
href="https://github.com/Pylons/waitress/pull/431">Pylons/waitress#431</a></p>
</li>
</ul>
<h2>Updated Defaults</h2>
<ul>
<li>clear_untrusted_proxy_headers is set to True by default. See
<a
href="https://github.com/Pylons/waitress/pull/370">Pylons/waitress#370</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/Pylons/waitress/blob/main/CHANGES.txt">waitress's
changelog</a>.</em></p>
<blockquote>
<h2>3.0.1 (2024-11-28)</h2>
<p>Security</p>
<pre><code>
- Fix a bug that would lead to Waitress busy looping on select() on a
half-open
socket due to a race condition that existed when creating a new
HTTPChannel.
  See Pylons/waitress#435,
  Pylons/waitress#418 and

GHSA-3f84-rpwh-47g6
<p>With thanks to Dylan Jay and Dieter Maurer for their extensive
debugging and<br />
helping track this down.</p>
<ul>
<li>
<p>No longer strip the header values before passing them to the WSGI
environ.<br />
See <a
href="https://github.com/Pylons/waitress/pull/434">Pylons/waitress#434</a>
and<br />
<a
href="https://github.com/Pylons/waitress/issues/432">Pylons/waitress#432</a></p>
</li>
<li>
<p>Fix a race condition in Waitress when
<code>channel_request_lookahead</code> is enabled<br />
that could lead to HTTP request smuggling.</p>
<p>See <a
href="https://github.com/Pylons/waitress/security/advisories/GHSA-9298-4cf8-g4wj">https://github.com/Pylons/waitress/security/advisories/GHSA-9298-4cf8-g4wj</a></p>
</li>
</ul>
<h2>3.0.0 (2024-02-04)</h2>
<ul>
<li>
<p>Rename &quot;master&quot; git branch to &quot;main&quot;</p>
</li>
<li>
<p>Fix a bug that would appear on macOS whereby if we accept() a socket
that is<br />
already gone, setting socket options would fail and take down the
server. See<br />
<a
href="https://github.com/Pylons/waitress/pull/399">Pylons/waitress#399</a></p>
</li>
<li>
<p>Fixed testing of vendored asyncore code to not rely on particular
naming for<br />
errno's. See <a
href="https://github.com/Pylons/waitress/pull/397">Pylons/waitress#397</a></p>
</li>
<li>
<p>HTTP Request methods and versions are now validated to meet the
HTTP<br />
standards thereby dropping invalid requests on the floor. See<br />
<a
href="https://github.com/Pylons/waitress/pull/423">Pylons/waitress#423</a></p>
</li>
<li>
<p>No longer close the connection when sending a HEAD request response.
See<br />
<a
href="https://github.com/Pylons/waitress/pull/428">Pylons/waitress#428</a></p>
</li>
<li>
<p>Always attempt to send the Connection: close response header when we
are<br />
going to close the connection to let the remote know in more
instances.<br />
<a
href="https://github.com/Pylons/waitress/pull/429">Pylons/waitress#429</a></p>
</li>
<li>
<p>Python 3.7 is no longer supported. Add support for Python 3.11, 3.12
and<br />
PyPy 3.9, 3.10. See <a
href="https://github.com/Pylons/waitress/pull/412">Pylons/waitress#412</a></p>
</li>
</ul>
<p>&lt;/tr&gt;&lt;/table&gt;<br />
</code></pre></p>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/Pylons/waitress/commit/ae949bb428e50cf04152db56460f31c1e6d3a2a9"><code>ae949bb</code></a>
Ready for 3.0.1</li>
<li><a
href="https://github.com/Pylons/waitress/commit/e4359018537af376cf24bd13616d861e2fb76f65"><code>e435901</code></a>
Merge commit from fork</li>
<li><a
href="https://github.com/Pylons/waitress/commit/810a435f9e9e293bd3446a5ce2df86f59c4e7b1b"><code>810a435</code></a>
Add documentation for channel_request_lookahead</li>
<li><a
href="https://github.com/Pylons/waitress/commit/f4ba1c260cf17156b582c6252496213ddc96b591"><code>f4ba1c2</code></a>
Fix a race condition on recv_bytes boundary when request is invalid</li>
<li><a
href="https://github.com/Pylons/waitress/commit/7e7f11e61d358ab1cb853fcadf2b46b1f00f5993"><code>7e7f11e</code></a>
Add a new test to validate the lookahead race condition</li>
<li><a
href="https://github.com/Pylons/waitress/commit/6943dcf556610ece2ff3cddb39e59a05ef110661"><code>6943dcf</code></a>
Make DummySock() look more like an actual socket</li>
<li><a
href="https://github.com/Pylons/waitress/commit/fdd2ecfd325af2f419d91c62b2551e2c3922f686"><code>fdd2ecf</code></a>
Merge pull request <a
href="https://github.com/Pylons/waitress/issues/445">#445</a>
from Pylons/feature/support-py-3-13</li>
<li><a
href="https://github.com/Pylons/waitress/commit/dcd18e7b4b8e78e2abea8f286c23b0b9298bea9b"><code>dcd18e7</code></a>
Update exclude matrix</li>
<li><a
href="https://github.com/Pylons/waitress/commit/4633ea6d69d6b7eff5db91e263ea85f437026db0"><code>4633ea6</code></a>
Drop Python 3.8 and add Python 3.13</li>
<li><a
href="https://github.com/Pylons/waitress/commit/4584936eac5838b6d3b07e84a86874fa586ffe6e"><code>4584936</code></a>
Merge pull request <a
href="https://github.com/Pylons/waitress/issues/440">#440</a>
from Pylons/fix/ci</li>
<li>Additional commits viewable in <a
href="https://github.com/Pylons/waitress/compare/v2.1.2...v3.0.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=waitress&package-manager=pip&previous-version=2.1.2&new-version=3.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/uktrade/jml/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@merwok
Copy link

merwok commented Nov 22, 2024

Could this be backported to the 2.x line?
Updating to v3 causes infinite redirect loop on heroku which I can’t debug.

@mmerickel
Copy link
Member

The only bw-incompat change in waitress was changing clear_untrusted_proxy_headers=true as the default. If you haven't setup trusted proxy configs you might want to just set this back to false and I suspect that's a thing that would fix your heroku issue.

@merwok
Copy link

merwok commented Nov 22, 2024

Ah, thanks for the useful pointer! It seems that that’s an easy thing to configue properly. Then I can go back to fighting over redis ssl connection with self-signed certificate 🥲👍🏽

remdub added a commit to IMIO/buildout.library that referenced this issue Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants