Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMQP-1.0 session, {badarg, [{erlang, iolist_size}] at high consumption rates #12816

Closed
mkuratczyk opened this issue Nov 26, 2024 · 3 comments · Fixed by #12904
Closed

AMQP-1.0 session, {badarg, [{erlang, iolist_size}] at high consumption rates #12816

mkuratczyk opened this issue Nov 26, 2024 · 3 comments · Fixed by #12904
Labels
Milestone

Comments

@mkuratczyk
Copy link
Contributor

mkuratczyk commented Nov 26, 2024

Describe the bug

I've seen this in two situations so far:

  • on main, with max_link_credit set to a higher value (1700 - for no particular reason, other than to see the impact)
  • on ra-2.16.0 branch without tweaking max_link_credit. However, the main benefit of ra-2.16 is a higher consumption rate with many consumers and that's exactly when this issue happened.

This is from a main+high max_link_credit env:

reason: {{badarg,
             [{erlang,iolist_size,
                  [[<<112,0,35,30,126>>,
                    [82,-325],
                    <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                    <<112,0,35,30,126>>,
                    <<112,0,0,6,164>>,
                    64,64,64,64]],
                  [{error_info,#{module => erl_erts_errors}}]},
              {amqp10_binary_generator,generate1,1,
                  [{file,"amqp10_binary_generator.erl"},{line,141}]},
              {amqp10_binary_generator,generate1,1,
                  [{file,"amqp10_binary_generator.erl"},{line,88}]},
              {amqp10_binary_generator,generate,1,
                  [{file,"amqp10_binary_generator.erl"},{line,79}]},
              {rabbit_amqp_writer,assemble_frame,3,
                  [{file,"rabbit_amqp_writer.erl"},{line,206}]},
              {rabbit_amqp_writer,internal_send_command_async,3,
                  [{file,"rabbit_amqp_writer.erl"},{line,189}]},
              {rabbit_amqp_writer,handle_cast,2,
                  [{file,"rabbit_amqp_writer.erl"},{line,110}]},
              {gen_server,try_handle_cast,3,
                  [{file,"gen_server.erl"},{line,1121}]}]},
         {gen_server,call,
             [<0.1759.0>,
              {send_command,0,
                  {'v1_0.end',
                      {'v1_0.error',
                          {symbol,<<"amqp:session:window-violation">>},
                          {utf8,
                              <<"incoming window violation (tolerated excess tranfers: 400)">>},
                          undefined}}},
              300000]}}

and this is from a ra-2.16 with the default 170 credits:

crasher:
  initial call: rabbit_amqp_writer:init/1
  pid: <0.19353.0>
  registered_name: []
  exception error: bad argument
    in function  iolist_size/1
       called as iolist_size([<<112,0,0,23,120>>,
                              [82,-15],
                              <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                              <<112,0,0,23,120>>,
                              "Rª",64,64,64,64])
       *** argument 1: not an iodata term
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
    in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
    in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
    in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
    in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
    in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
@mkuratczyk mkuratczyk added the bug label Nov 26, 2024
@mkuratczyk mkuratczyk changed the title AMQP-1.0 session, {badarg, [{erlang, iolist_size}] with high max_link_credit AMQP-1.0 session, {badarg, [{erlang, iolist_size}] at high consumption rates Dec 2, 2024
@ansd
Copy link
Member

ansd commented Dec 9, 2024

@mkuratczyk what are the steps to repro this issue? How did you invoke omq CLI to see this error?

@mkuratczyk
Copy link
Contributor Author

mkuratczyk commented Dec 9, 2024

I have no deterministic repro steps. This is the workload that was running when this happened:
omq amqp -x 50 --amqp-send-settled -r 1000 -y 10 --publish-to /queues/amqp10_consumers_join_late --consume-from /queues/amqp10_consumers_join_late --consumer-credits 1000 --queues quorum --consumer-startup-delay 150s -s 5000

The issue occurred 40 seconds after the consumers "joined" (started).

NOTE: with the latest omq versions, replace --amqp-send-settled with -c 50

ansd added a commit that referenced this issue Dec 9, 2024
Prior to this commit, when the sending client overshot RabbitMQ's incoming-window
(which is allowed in the event of a cluster wide memory or disk alarm),
and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative
incoming-window field in the FLOW frame causing the following crash in
the writer proc:
```
crasher:
  initial call: rabbit_amqp_writer:init/1
  pid: <0.19353.0>
  registered_name: []
  exception error: bad argument
    in function  iolist_size/1
       called as iolist_size([<<112,0,0,23,120>>,
                              [82,-15],
                              <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                              <<112,0,0,23,120>>,
                              "Rª",64,64,64,64])
       *** argument 1: not an iodata term
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
    in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
    in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
    in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
    in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
    in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
```

This commit fixes this crash by maintaning a floor of zero for
incoming-window in the FLOW frame.

Fixes #12816
@ansd
Copy link
Member

ansd commented Dec 9, 2024

I wasn't able to repro this issue with the provided omq commands.
However, #12904 fixes this crash and includes a test case that was able to reproduce this crash.

[<<112,0,35,30,126>>,
                    [82,-325],
                    <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                    <<112,0,35,30,126>>,
                    <<112,0,0,6,164>>,
                    64,64,64,64]

is the list of flow frame fields where decimal value 82 is hex value 52 which denotes an unit:

<type name="uint" class="primitive">
    <encoding code="0x70" category="fixed" width="4" label="32-bit unsigned integer in network byte order"/>
    <encoding name="smalluint" code="0x52" category="fixed" width="1" label="unsigned integer value in the range 0 to 255 inclusive"/>
    <encoding name="uint0" code="0x43" category="fixed" width="0" label="the uint value 0"/>
</type>

The -325 is the server's incoming-window.

ansd added a commit that referenced this issue Dec 9, 2024
Prior to this commit, when the sending client overshot RabbitMQ's incoming-window
(which is allowed in the event of a cluster wide memory or disk alarm),
and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative
incoming-window field in the FLOW frame causing the following crash in
the writer proc:
```
crasher:
  initial call: rabbit_amqp_writer:init/1
  pid: <0.19353.0>
  registered_name: []
  exception error: bad argument
    in function  iolist_size/1
       called as iolist_size([<<112,0,0,23,120>>,
                              [82,-15],
                              <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                              <<112,0,0,23,120>>,
                              "Rª",64,64,64,64])
       *** argument 1: not an iodata term
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
    in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
    in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
    in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
    in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
    in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
```

This commit fixes this crash by maintaning a floor of zero for
incoming-window in the FLOW frame.

Fixes #12816
ansd added a commit that referenced this issue Dec 10, 2024
Prior to this commit, when the sending client overshot RabbitMQ's incoming-window
(which is allowed in the event of a cluster wide memory or disk alarm),
and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative
incoming-window field in the FLOW frame causing the following crash in
the writer proc:
```
crasher:
  initial call: rabbit_amqp_writer:init/1
  pid: <0.19353.0>
  registered_name: []
  exception error: bad argument
    in function  iolist_size/1
       called as iolist_size([<<112,0,0,23,120>>,
                              [82,-15],
                              <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                              <<112,0,0,23,120>>,
                              "Rª",64,64,64,64])
       *** argument 1: not an iodata term
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
    in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
    in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
    in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
    in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
    in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
```

This commit fixes this crash by maintaning a floor of zero for
incoming-window in the FLOW frame.

Fixes #12816
mergify bot pushed a commit that referenced this issue Dec 10, 2024
Prior to this commit, when the sending client overshot RabbitMQ's incoming-window
(which is allowed in the event of a cluster wide memory or disk alarm),
and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative
incoming-window field in the FLOW frame causing the following crash in
the writer proc:
```
crasher:
  initial call: rabbit_amqp_writer:init/1
  pid: <0.19353.0>
  registered_name: []
  exception error: bad argument
    in function  iolist_size/1
       called as iolist_size([<<112,0,0,23,120>>,
                              [82,-15],
                              <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                              <<112,0,0,23,120>>,
                              "Rª",64,64,64,64])
       *** argument 1: not an iodata term
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
    in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
    in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
    in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
    in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
    in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
```

This commit fixes this crash by maintaning a floor of zero for
incoming-window in the FLOW frame.

Fixes #12816

(cherry picked from commit 0d34ef6)

# Conflicts:
#	deps/rabbit/test/amqp_client_SUITE.erl
ansd added a commit that referenced this issue Dec 10, 2024
Prior to this commit, when the sending client overshot RabbitMQ's incoming-window
(which is allowed in the event of a cluster wide memory or disk alarm),
and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative
incoming-window field in the FLOW frame causing the following crash in
the writer proc:
```
crasher:
  initial call: rabbit_amqp_writer:init/1
  pid: <0.19353.0>
  registered_name: []
  exception error: bad argument
    in function  iolist_size/1
       called as iolist_size([<<112,0,0,23,120>>,
                              [82,-15],
                              <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                              <<112,0,0,23,120>>,
                              "Rª",64,64,64,64])
       *** argument 1: not an iodata term
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
    in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
    in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
    in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
    in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
    in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
```

This commit fixes this crash by maintaning a floor of zero for
incoming-window in the FLOW frame.

Fixes #12816

(cherry picked from commit 0d34ef6)

# Conflicts:
#	deps/rabbit/test/amqp_client_SUITE.erl
@michaelklishin michaelklishin added this to the 4.0.5 milestone Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants