Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show session and link details for AMQP 1.0 connection #12670

Merged
merged 2 commits into from
Nov 7, 2024

Conversation

ansd
Copy link
Member

@ansd ansd commented Nov 6, 2024

What?

On the connection page in the Management UI, display detailed session and link information including:

  • Link names
  • Link target and source addresses
  • Link flow control state
  • Session flow control state
  • Number of unconfirmed and unacknowledged messages
sessions

The next screenshot shows quiver at full speed with two connections against a single node. On the incoming link, we see nicely how 255 messages are awaiting confirms from the target classic queue with RabbitMQ preventing the publisher from sending more messages by granting 0 link credits (in AMQP 0.9.1 the connection would have been in flow state instead):

quiver

How?

A new HTTP API endpoint is added:

/connections/:connection_name/sessions

The HTTP handler first queries the Erlang connection process to find out about all session Pids. The handler then queries each Erlang session process of this connection.

(The table auto-refreshes by default every 5 seconds. The handler querying a single connection with 60 idle sessions with each 250 links takes ~100 ms.)

For better user experience in the Management UI, this commit also makes the session process store and expose link names as well as source/target addresses.

Mixed version notes

Note that due to the bugs described in the 2nd commit message of this PR and present on <=v4.0.3 opening the connection page of an AMQ 1.0 connection in a mixed version cluster on a node >= v4.1.0 with other nodes running on <= v4.3.0 results in a 500 error and the following crash logged by the RabbitMQ node:

2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>   crasher:
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>     initial call: cowboy_stream_h:request_process/3
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>     pid: <0.2542.0>
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>     registered_name: []
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>     exception error: no case clause matching
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>                      {badrpc,
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>                          {'EXIT',
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>                              {undef,
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>                                  [{rabbit_connection_tracking,lookup,
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>                                       [<<"127.0.0.1:61086 -> 127.0.0.1:5672">>,
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>                                        ['rabbit-1@ABCDDDEEAA']],
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>                                       []}]}}}
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>       in function  rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>       in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>       in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>       in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>       in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>       in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>       in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
2024-11-07 11:34:39.163797+01:00 [error] <0.2542.0>       in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)

This seems acceptable. Introducing a feature flag for this rare and harmless case seems to be an overkill.

Also note that we partially backported this PR to v4.0.x in #12700
This means that the Management UI will only return error code 500 with RabbitMQ logging:

2024-11-11 13:48:02.148463+01:00 [error] <0.1327.0> internal_server_error
2024-11-11 13:48:02.148463+01:00 [error] <0.1327.0> failed to get sessions for connection <33646.2171.0>: throw {bad_argument,
2024-11-11 13:48:02.148463+01:00 [error] <0.1327.0>                                                              session_pids}

if the AMQP 1.0 connection is opened against the >= 4.0.4 (but < 4.1) node while opening the connection page on the Management UI on a >=4.1 node.

@ansd ansd added this to the 4.1.0 milestone Nov 6, 2024
@mergify mergify bot added the make label Nov 6, 2024
@ansd ansd force-pushed the amqp-connection-sessions branch 2 times, most recently from b9cf16a to beb6749 Compare November 6, 2024 16:47
@mergify mergify bot added the bazel label Nov 6, 2024
@ansd ansd force-pushed the amqp-connection-sessions branch 9 times, most recently from b8cdf6d to 84d5c4a Compare November 7, 2024 12:19
ansd added 2 commits November 7, 2024 15:11
 ## What?

On the connection page in the Management UI, display detailed session and
link information including:
* Link names
* Link target and source addresses
* Link flow control state
* Session flow control state
* Number of unconfirmed and unacknowledged messages

 ## How?

A new HTTP API endpoint is added:
```
/connections/:connection_name/sessions
```

The HTTP handler first queries the Erlang connection process to find out about
all session Pids. The handler then queries each Erlang session process
of this connection.

(The table auto-refreshes by default every 5 seconds. The handler querying a single
connection with 60 idle sessions with each 250 links takes ~100 ms.)

For better user experience in the Management UI, this commit also makes the
session process store and expose link names as well as source/target addresses.
This commit fixes two different bugs/crashes.

To repro, prior to this commit:
1. Create an AMQP 1.0 connection on node-1.
2. Open the Management UI on node-2 and open the connection page of this
   single AMQP 1.0 connection.

The first crash was the following:
```
[error] <0.1297.0>   crasher:
[error] <0.1297.0>     initial call: cowboy_stream_h:request_process/3
[error] <0.1297.0>     pid: <0.1297.0>
[error] <0.1297.0>     registered_name: []
[error] <0.1297.0>     exception error: no case clause matching
[error] <0.1297.0>                      {badrpc,
[error] <0.1297.0>                          {'EXIT',
[error] <0.1297.0>                              {undef,
[error] <0.1297.0>                                  [{rabbit_connection_tracking,lookup,
[error] <0.1297.0>                                       [<<"[::1]:51729 -> [::1]:5672">>,
[error] <0.1297.0>                                        ['rabbit-1@ABCDDDEEAA']],
[error] <0.1297.0>                                       []}]}}}
[error] <0.1297.0>       in function  rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
[error] <0.1297.0>       in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
[error] <0.1297.0>       in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
[error] <0.1297.0>       in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
[error] <0.1297.0>       in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
[error] <0.1297.0>       in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[error] <0.1297.0>       in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
[error] <0.1297.0>       in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)
```

The second crash was the following:
```
[error] <0.1132.0>   crasher:
[error] <0.1132.0>     initial call: cowboy_stream_h:request_process/3
[error] <0.1132.0>     pid: <0.1132.0>
[error] <0.1132.0>     registered_name: []
[error] <0.1132.0>     exception error: no case clause matching
[error] <0.1132.0>                      {tracked_connection,
[error] <0.1132.0>                          {'rabbit-1@ABCDDDEEAA',
[error] <0.1132.0>                              <<"[::1]:65505 -> [::1]:5672">>},
[error] <0.1132.0>                          'rabbit-1@ABCDDDEEAA',<<"/">>,
[error] <0.1132.0>                          <<"[::1]:65505 -> [::1]:5672">>,<13661.1110.0>,
[error] <0.1132.0>                          {1,0},
[error] <0.1132.0>                          network,
[error] <0.1132.0>                          {0,0,0,0,0,0,0,1},
[error] <0.1132.0>                          65505,<<"guest">>,1730908606089}
[error] <0.1132.0>       in function  rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
[error] <0.1132.0>       in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
[error] <0.1132.0>       in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
[error] <0.1132.0>       in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
[error] <0.1132.0>       in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
[error] <0.1132.0>       in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[error] <0.1132.0>       in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
[error] <0.1132.0>       in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)
@ansd ansd force-pushed the amqp-connection-sessions branch from 84d5c4a to 124ef69 Compare November 7, 2024 14:12
@ansd ansd marked this pull request as ready for review November 7, 2024 14:49
Copy link
Member

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a feature flag to avoid a logged exception would be an overkill.

If anything, we can focus on adjusting v4.0.x accordingly if really necessary.

@michaelklishin michaelklishin merged commit bcfca2b into main Nov 7, 2024
273 checks passed
@michaelklishin michaelklishin deleted the amqp-connection-sessions branch November 7, 2024 19:39
ansd added a commit that referenced this pull request Nov 11, 2024
This commit fixes two different bugs/crashes.

To repro, prior to this commit on `main`:
1. Create an AMQP 1.0 connection on node-1.
2. Open the Management UI on node-2 and open the connection page of this
   single AMQP 1.0 connection.

The first crash was the following:
```
[error] <0.1297.0>   crasher:
[error] <0.1297.0>     initial call: cowboy_stream_h:request_process/3
[error] <0.1297.0>     pid: <0.1297.0>
[error] <0.1297.0>     registered_name: []
[error] <0.1297.0>     exception error: no case clause matching
[error] <0.1297.0>                      {badrpc,
[error] <0.1297.0>                          {'EXIT',
[error] <0.1297.0>                              {undef,
[error] <0.1297.0>                                  [{rabbit_connection_tracking,lookup,
[error] <0.1297.0>                                       [<<"[::1]:51729 -> [::1]:5672">>,
[error] <0.1297.0>                                        ['rabbit-1@ABCDDDEEAA']],
[error] <0.1297.0>                                       []}]}}}
[error] <0.1297.0>       in function  rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
[error] <0.1297.0>       in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
[error] <0.1297.0>       in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
[error] <0.1297.0>       in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
[error] <0.1297.0>       in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
[error] <0.1297.0>       in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[error] <0.1297.0>       in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
[error] <0.1297.0>       in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)
```

The second crash was the following:
```
[error] <0.1132.0>   crasher:
[error] <0.1132.0>     initial call: cowboy_stream_h:request_process/3
[error] <0.1132.0>     pid: <0.1132.0>
[error] <0.1132.0>     registered_name: []
[error] <0.1132.0>     exception error: no case clause matching
[error] <0.1132.0>                      {tracked_connection,
[error] <0.1132.0>                          {'rabbit-1@ABCDDDEEAA',
[error] <0.1132.0>                              <<"[::1]:65505 -> [::1]:5672">>},
[error] <0.1132.0>                          'rabbit-1@ABCDDDEEAA',<<"/">>,
[error] <0.1132.0>                          <<"[::1]:65505 -> [::1]:5672">>,<13661.1110.0>,
[error] <0.1132.0>                          {1,0},
[error] <0.1132.0>                          network,
[error] <0.1132.0>                          {0,0,0,0,0,0,0,1},
[error] <0.1132.0>                          65505,<<"guest">>,1730908606089}
[error] <0.1132.0>       in function  rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
[error] <0.1132.0>       in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
[error] <0.1132.0>       in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
[error] <0.1132.0>       in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
[error] <0.1132.0>       in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
[error] <0.1132.0>       in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[error] <0.1132.0>       in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
[error] <0.1132.0>       in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)

This commit is a partial backport o #12670

(cherry picked from commit 124ef69)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants