Skip to content

Fix channel close procedure when the peer dies or our handler goes down #9103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: maint
Choose a base branch
from

Conversation

yarisx
Copy link
Contributor

@yarisx yarisx commented Nov 22, 2024

Fix proposal for the issue #9102

Copy link
Contributor

github-actions bot commented Nov 22, 2024

CT Test Results

    2 files     29 suites   20m 11s ⏱️
  468 tests   464 ✅  4 💤 0 ❌
1 673 runs  1 649 ✅ 24 💤 0 ❌

Results for commit ca10880.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@IngelaAndin IngelaAndin added the team:PS Assigned to OTP team PS label Nov 25, 2024
@u3s u3s self-assigned this Nov 26, 2024
@u3s u3s added the testing currently being tested, tag is used by OTP internal CI label Nov 27, 2024
@yarisx yarisx marked this pull request as draft November 29, 2024 15:50
@yarisx
Copy link
Contributor Author

yarisx commented Nov 29, 2024

There is possibility to get a crash if the channel handler goes down before receiving 'channel-open-confirmation' from the peer. I have an update to fix that but need some time to make a test.

@u3s u3s removed the testing currently being tested, tag is used by OTP internal CI label Dec 2, 2024
@yarisx yarisx marked this pull request as ready for review December 3, 2024 11:43
@u3s u3s added the waiting waiting for changes/input from author label Jan 8, 2025
@yarisx
Copy link
Contributor Author

yarisx commented Jan 13, 2025

What is expected from me? I don't see any comments, have I missed something?

@u3s
Copy link
Contributor

u3s commented Jan 13, 2025

You wrote:

I have an update to fix that but need some time to make a test.

I thought you want to add some testcase code ...

@yarisx
Copy link
Contributor Author

yarisx commented Jan 13, 2025

The test is updated by the latest commit.

@u3s u3s removed the waiting waiting for changes/input from author label Jan 13, 2025
@u3s u3s added the waiting waiting for changes/input from author label Feb 10, 2025
Account for the case when user channel handler goes down before the channel
opening procedure is completed: if channel open confirmation is received for
such channel - the channel is automatically closed.

Add a test for such scenario.
@yarisx yarisx force-pushed the ymaslenn/ssh_channel_close_proper branch from 557de4f to 704973a Compare February 18, 2025 10:18
If the peer fails to respond to ssh_msg_channel_close the corresponding channel
entry will be removed from cache after the timeout (assuming the connection is
still alive with probably other channels open).
@yarisx yarisx force-pushed the ymaslenn/ssh_channel_close_proper branch from 704973a to ca10880 Compare February 18, 2025 10:49
@IngelaAndin IngelaAndin removed the waiting waiting for changes/input from author label Feb 18, 2025
@u3s u3s linked an issue Apr 1, 2025 that may be closed by this pull request
@u3s u3s added the testing currently being tested, tag is used by OTP internal CI label Apr 1, 2025
Copy link
Contributor

@u3s u3s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR and digging into this problem!
Please check comments inline.

@@ -2052,6 +2090,11 @@ cond_set_idle_timer(D) ->
_ -> {{timeout,idle_time}, infinity, none}
end.

channel_close_timer(D, ChannelId) ->
%{{timeout, {channel_close, ChannelId}}, 3000, none}. %?GET_OPT(idle_time, (D#data.ssh_params)#ssh.opts), none}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove comment. I think it it is not needed ... ?

local_id=Id}, Acc) when U == ChannelPid ->
ssh_client_channel:cache_delete(Cache, Id),
Acc;
%% Here we first collect the list of channel id's handled by the process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls fix indentation for this function. typically emacs formatting is used in our repo.

]
),
{ok, Channel0} = ssh_connection:session_channel(ConnRef, 50000),
{ok, Channel1} = ssh_connection:session_channel(ConnRef, 50000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add _ so compilation warning is avoided.


%% connect to it with a regular Erlang SSH client:
ChannelCloseTimeout = 3000,
{ok, ConnRef} = std_connect(HostPort, Config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix indentation

Parent ! {self(), Result}
end),
try
TestResult = receive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix indentation. Try to avoid more than 100 chars per line.

do_ensure_channels(_ConnRef, NumExpected, NumExpected) ->
ok;
do_ensure_channels(ConnRef, NumExpected, _ChannelListLen) ->
receive after 100 -> ok end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ct:sleep?

ssh:stop_daemon(Pid)
end.

ensure_channels(ConnRef, Expected) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow ensure_channels/2 and do_ensure_channels/3 functions.

  • why having 2 functions for this? maybe not needed in this simple case ...
  • what is the exit criteria if ChannelList does not reach expected number?

shouldn't we have some counter and do only certain number of iterations?

@@ -1943,6 +1945,134 @@ max_channels_option(Config) when is_list(Config) ->
ssh:close(ConnectionRef),
ssh:stop_daemon(Pid).

handler_down_before_open(Config) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this testcase work?
could be environment issue, but it was passed also when I reverted PR modification done in src folder.

i was expecting it to fail ...
If it fails for you? What is the error and associated stacktrace?

Comment on lines +2024 to +2025
ct:log("~p:~p open incomplete channel done - should not have happened",[?MODULE,?LINE]),
Parent ! {self(), {fail, "Unexpected channel success"}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are those 2 lines a dead code? if so, maybe replace them with code comment?

@u3s u3s added the waiting waiting for changes/input from author label Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI waiting waiting for changes/input from author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSH: incorrect channel cache update when closing the channel client-side
3 participants