Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@tomaka
Copy link
Contributor

@tomaka tomaka commented Nov 30, 2020

I see this situation arising in the logs:

  • The behaviour sends Open to the handler.
  • The handler answers with OpenResultErr.
  • The behaviour receives the OpenResultErr and immediately tries again by sending Open again.
  • However, since the handler has transitioned to the Closed state after returning OpenResultErr, its keep-alive timeout kicks in and shuts down the connection.

While it is questionable to immediately try opening the same connection immediately again it has failed to open, it is clearly not intended that the keep-alive timeout triggers in the millisecond or so during which the handler is very temporarily in the Closed state.

This PR modifies the tolerance level in order to wait for 5 seconds after entering the Closed state, instead of just the first time after opening a connection.

@tomaka tomaka added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Nov 30, 2020
@tomaka tomaka requested review from mxinden and romanb November 30, 2020 16:50
Copy link
Contributor

@romanb romanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is mainly an optimisation to avoid closing and re-opening connections or does the right "timing" in this sense actually cause a problem? The latter would be unfortunate, as no choice for the timeout could then guarantee that the problem does not arise, if I understand correctly.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 30, 2020

Yes, this is intended as an optimization.

There is indeed a problem happening, which is why an OpenResultErr is received. But this problem is caused by #7638 on the other side of the connection, and not by the connection being closed.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 30, 2020

bot merge

@ghost
Copy link

ghost commented Nov 30, 2020

Waiting for commit status.

@ghost
Copy link

ghost commented Nov 30, 2020

Checks failed; merge aborted.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 30, 2020

reconnect_after_disconnect stalled on CI. The test passes locally for me. Restarted CI to see whether it's a heisenbug.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 1, 2020

Of course the test passes on CI if I add logging.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 1, 2020

I just ran the test 100 times successfully in a row on my machine.

@gnunicorn gnunicorn added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Mar 29, 2021
@gnunicorn
Copy link
Contributor

is this still being worked on?

@tomaka
Copy link
Contributor Author

tomaka commented Mar 29, 2021

No, I'm completely clueless as to why CI fails.
All tests pass smoothly locally. CI becomes green whenever I add any sort of logging.

@gnunicorn
Copy link
Contributor

closing for a lack of progress.

@gnunicorn gnunicorn closed this May 19, 2021
@tomaka tomaka deleted the keep-alive branch May 19, 2021 14:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review. A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants