-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle epipe errors #871
Handle epipe errors #871
Conversation
n.b. I got a bit lost in the weeds trying to create a reasonable test case for this: the container-management setup in your test suite can't serialize testcontainer.Container, so env.Container is nil by the time you're on the far side of GetTestEnvironment(), and even if you work around that by directly instantiating a new env in the test so that you can call container.Stop(), I couldn't think of a reliable way to assert that we had caught the EPIPE and closed the connection -- but if you have any ideas I'm all ears! |
#844 is most likely related! |
@jkaflik Could I trouble you for a look at this? |
@n-oden I will have a look if we can have it reproducible in tests |
Reproducing it is in conception simple: create a client that continuously writes to a clickhouse db, and then restart the clickhouse server while the client is writing and observe the infinite flood of broken pipe errors in the logs. In practice with the integration test framework you have here, I'm not sure how we'd implement that while still keeping test times reasonable and results deterministic: you'd want to stop and start the container while writes were happening and I'm not sure how you'd do that with any precision. I'll keep poking at it but am very open to suggestions! |
@n-oden I looked briefly at the issue. Can you please let me know the version you use? Also, I assume you are using the |
I agree we should mark the connection as closed. I will look at how we can achieve the same result for |
@jkaflik we're using v2.5.0, having recently updated from v1. And correct, we're using the database/sql interface. |
@n-oden while I agree the connection should be marked as closed and the SQL driver on top should receive
having CH server restart just before writing to the socket ( It has the same behavior no matter if we introduce your change or not. My use case is:
I think there is some difference in how you do that and how you are able to reproduce it. |
@jkaflik I've been having the exact same issue trying to reproduce the error in the test harness here, which is frustrating, but we have definitely seen the described behavior on our production systems. Happy to do a screenshare or post a recording somewhere if that would help! |
@n-oden anything that brings me closer to this issue is welcome. Can you also let me know what/when library functions you call? |
drop me a line -- [email protected] |
Presently, if the tcp session to the server closes, we will emit broken pipe errors until either the connection comes back to "idle" state or until we reach ConnMaxLifetime. Since EPIPE means that the tcp session is dead (ie we received a RST packet from the server), there is no point in attempting to proceed past that point: set the connection as closed.
@jkaflik updated as we discussed! |
Presently, if the tcp session to the server closes remotely (e.g. if the clickhouse-server process restarts), we will emit broken pipe errors potentially until we reach ConnMaxLifetime.
Since EPIPE means that the tcp session is dead (ie we received a RST packet from the server), there is no point in attempting to proceed past that point: set the connection as closed.