Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep-alive connections keep threads occupied infinitely #368

Closed
andor44 opened this issue Mar 11, 2015 · 25 comments
Closed

Keep-alive connections keep threads occupied infinitely #368

andor44 opened this issue Mar 11, 2015 · 25 comments
Assignees
Labels
A-server Area: server. C-bug Category: bug. Something is wrong. This is bad!

Comments

@andor44
Copy link

andor44 commented Mar 11, 2015

As discussed in rustless/rustless#35, suggested by @reem, Hyper seems to be too "literal" about keep-alive, keeping threads in a keep-alive loop, essentially DDoSing a Hyper server. My temporary workaround is to use many threads to delay running out of threads for the time being.

@s-panferov
Copy link
Contributor

I think that Hyper must have something like KeepAliveTimeout in Apache.

@s-panferov
Copy link
Contributor

@andor44 I think that you may try to use nginx as a reverse proxy in front of Rustless/Huper to work-around the issue. Nginx could handle a big amount of connections without any issues and it has usable keep-alive settings. I think nginx is required for blocking servers anyway.

@andor44
Copy link
Author

andor44 commented Mar 11, 2015

@s-panferov I'll give that a shot.

Here's the relevant part of the code. I'm not sure how deep down the rabbit hole this should be handled though. Probably as deep as this though I'm not sure how easily you can specify a timeout in the current Hyper architecture.

@seanmonstar
Copy link
Member

The current IO doesn't provide a way to specify a timeout, which makes this harder.

@s-panferov
Copy link
Contributor

@andor44 have you tried to use a reverse proxy already? Has it helped?

@andor44
Copy link
Author

andor44 commented Mar 15, 2015

Yes. I am using nginx, and it uses HTTP 1.0 for reverse proxying requests, which doesn't have keep-alive as part of the spec, and it works perfectly.

@seanmonstar seanmonstar added C-bug Category: bug. Something is wrong. This is bad! A-server Area: server. labels Mar 17, 2015
@dovahcrow
Copy link

When will this bug be fixed? timeout should be an urgent feature.

@seanmonstar
Copy link
Member

Timeouts won't be supported by TcpStreams until Rust 1.1, which would be
mid June. To support timeouts in hyper earlier, we'd need to explore using
an additional thread, but it's performance may suffer too much.

In the mean time, I suggest putting nginx in front of your server, which
handles this well.

On Sat, Apr 18, 2015, 10:24 AM Young Woo [email protected] wrote:

When will this bug be fixed? timeout should be an urgent feature.


Reply to this email directly or view it on GitHub
#368 (comment).

@dovahcrow
Copy link

All right, thanks. But what about the client? Are there any good ways to
add timeout to a HttpClient?

On 2015年4月19日周日 上午2:15 Sean McArthur [email protected] wrote:

Timeouts won't be supported by TcpStreams until Rust 1.1, which would be
mid June. To support timeouts in hyper earlier, we'd need to explore using
an additional thread, but it's performance may suffer too much.

In the mean time, I suggest putting nginx in front of your server, which
handles this well.

On Sat, Apr 18, 2015, 10:24 AM Young Woo [email protected] wrote:

When will this bug be fixed? timeout should be an urgent feature.


Reply to this email directly or view it on GitHub
#368 (comment).


Reply to this email directly or view it on GitHub
#368 (comment).

@miguelmartin75
Copy link

This error is still apparent in v0.6. My server hangs sometimes and it is quite frequent, I know it's not my code because not even the Handler's handle is called for when I send a HTTP request; it can take up to an order of minutes or not at all for the handle function to be called. It seems to not be an issue when hosting & requesting locally.

I've switched to rust-http and my code works fine. This really needs to get fixed IMHO as this library does have a nice API.

@juanibiapina
Copy link

Is this issue the place to follow for updates on this bug?

juanibiapina added a commit to juanibiapina/zas that referenced this issue Aug 27, 2015
@mikedilger
Copy link
Contributor

FYI, for those who need a workaround for this issue while we wait for async:

  1. Keep a count of how many worker threads are busy or waiting in the keep-alive loop. The handler trait now has hooks (on_connection_start and on_connection_end) allowing you to increment/decrement your count of busy threads. You'll need to use a Mutex or AtomicUsize or other sync primitive where you update the count.
  2. Have your handler function check the count, and if the count is high enough (e.g. maybe all but 4 threads are occupied), set the Connection::close header (which hyper will notice and respect, closing the connection after the response).
  3. I still recommend using the 'timeouts' feature and setting read and write timeouts on your server; but now those timeouts don't need to be super-short anymore.

This way you can still support keep-alive most of the time, but when things get really busy, your server won't "hang". I've tested this and it works well.

@seanmonstar
Copy link
Member

I'm thinking of turning keep-alive off by default for hyper 0.6 servers, which is probably the right thing to do, and let people who have a solution turn it back on. However, that's a logical breaking change, even though code will continue to compile...

Thinking of adding something like this to keep the current behavior:

Server::http(addr).unwrap().keep_alive().listen(handler)

@juanibiapina
Copy link

No solution for the problem itself?

@seanmonstar
Copy link
Member

@juanibiapina the best solution is asyncio, tracked in #395. With synchronous IO, your only options are 1) set a timeout, which can be done currently, but still means the connection could block to the full length of the timeout, or 2) don't try to read on kept-alive connections.

@Ogeon
Copy link
Contributor

Ogeon commented Oct 8, 2015

It feels a bit like sweeping it under the rug, but it's about as good as any other workaround. It may become a gotcha in its own way, though. Probably less obvious, but still...

@reem
Copy link
Contributor

reem commented Oct 8, 2015

In iron at least I can attempt to implement the AtomicUsize-tracked-by-connecton_{start, end} solution, which may eliminate a lot of these issues.

@seanmonstar
Copy link
Member

I'm going to try the way Apache does it: if you want keep-alive, a timeout will be used when trying to read on the kept-alive connection. This is really the only thing that is possible on blocking sockets.

server.keep_alive(Duration::from_secs(5));
server.listen(...)

Of course, this will require activating the timeouts feature in hyper, and requires rustc v1.4 or greater.

@mikedilger
Copy link
Contributor

I concur. I think the number of people who find hyper "locks up" is sufficiently high to justify disabling keep-alive by default for 0.6. It could be considered a DoS security vulnerability as it currently stands. The performance benefit of keep-alive can't really be achieved under blocking I/O anyhow without an enormous number of threads, or being under very low load.

@juanibiapina
Copy link

You could make a new release without the keep-alive, like you suggested, since this is still 0.x. Then you could implement the timeout when 1.4 comes out.

As a user of the lib, I don't mind because I would be actively upgrading a not 1.0 dependency.

@juanibiapina
Copy link

@mikedilger you beat me to it xD

@Ogeon
Copy link
Contributor

Ogeon commented Oct 8, 2015

Doing the connection counting trick will at least let users choose a balance, but tune it the wrong way and your thread pool may be filled up with idle favicon connections and your server basically reverts to single thread mode. I implemented it in Rusful and it prevents lock-ups, but not much more. Disabling keep-alive may even be a better alternative, after all, since it will treat every connection equally. It's at least better than lock-ups.

@mikedilger
Copy link
Contributor

A co-worker of mine wrote a tool to benchmark hyper-based servers, and which is annoying in just the right ways to inspect keep-alive related behaviour: https://github.com/alanstockwell/hyperspank (shout out to @alanstockwell)

seanmonstar added a commit that referenced this issue Oct 8, 2015
Server keep-alive is now **off** by default. In order to turn it on, the
`keep_alive` method must be called on the `Server` object.

Closes #368
@seanmonstar seanmonstar self-assigned this Oct 8, 2015
@seanmonstar
Copy link
Member

Proposed fix is in #661

seanmonstar added a commit that referenced this issue Oct 9, 2015
Server keep-alive is now **off** by default. In order to turn it on, the
`keep_alive` method must be called on the `Server` object.

Closes #368
@seanmonstar
Copy link
Member

On IRC, @reem pointed out to me that it's possible to do a little more in this area. With the timeout solution, a scenario is still possible that all the threads are used to each deal with a connection that is using keep-alive, and even though they aren't blocked by a stalled connection, while they are active no other connections can be handled.

I'm not sure if this scenario should still be considered part of this bug, or more of a general improvement. Also, once non-blocking io is merged, this issue goes away automatically.

A solution here would be to have an acceptor thread, and a work queue. The acceptor would push all TcpStreams into the queue, and the workers would process 1 request/response pair. If keep-alive is true, that Stream could be placed at the back of the work queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-server Area: server. C-bug Category: bug. Something is wrong. This is bad!
Projects
None yet
Development

No branches or pull requests

9 participants