-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rabbitmq-server shell script: propagate BEAM VM process exit code #10819
Conversation
fa72c19
to
d24f4ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd rather just turn off set -e
than do logical operands that aren't immediately obvious.
As @dumbbell points out in an internal discussion, the |
#/bin/bash
set -e
(sleep 1; exit 0) &
wait $!
echo first
(sleep 1; exit 1) &
wait $!
echo second |
Could be done the following way if it makes it more obvious: if wait "$rabbitmq_server_pid"; then
exit "$?"
else
exit "$?"
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giner thanks for taking the time to contribute. You're the first person to notice this behavior.
I've brought up the entire rabbitmq-server
script and am very wary of changing anything in this script, especially given the comments that @dumbbell has made in the script itself.
The only way I would accept a change is with the following:
- An explanation of when you've seen RabbitMQ exit with a non-zero code that was not returned when this script exited.
- A "simplified" version of the
rabbitmq-server
script that can be run to demonstrate the expected behavior. That way we can run it using a variety ofsh
-compatible shells, likedash
Sorry to be so demanding but as you can imagine we must be very wary here.
For the first one we've noticed the container not being restarted after it is stopped due to time skew which is frequently happening with docker on macOS. For the second one, I'm not sure exactly what you want me to do. |
Another option: exit_code=0
wait "$rabbitmq_server_pid" || exit_code=$?
exit "$exit_code" # To be replaced by return if inside a function |
@giner an alternative The hesitancy comes from the fact that such changes are very subtle and very difficult to test across even a majority of environments where RabbitMQ can run: no single team or company can realistically cover them all. |
Have you tried the workaround that @michaelklishin provided? #8940 (comment) |
@giner something like this: https://github.com/lukebakken/rabbitmq-server-10819 At some point, it would be great to convert the above into a test, somehow. |
d24f4ec
to
5d19b2e
Compare
Updated the PR with a less invasive change. Now it doesn't force return or exit and the only thing it does is propagating the exit code instead of shadowing it. |
5d19b2e
to
03e7428
Compare
Here is a way to demonstrate the effect of this change Test# Start a VM and get inside
mkdir ~/ubuntu_24.04
vagrant init ubuntu/noble64
cd ~/ubuntu_24.04
vagrant up
vagrant ssh
# Install docker
sudo apt update
sudo apt install docker.io
newgrp docker
# Prepare changes
mkdir -p /tmp/rabbitmq
cd /tmp/rabbitmq
cat > Dockerfile << 'EOF'
from "rabbitmq"
RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)"
EOF
docker build -t rabbitmq-test-script .
# Run the original image
docker rm -f rabbitmq-orig 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-orig rabbitmq
# Run the updated image
docker rm -f rabbitmq-test-script 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script
# Make RabbitMQ exit
sleep 5
docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)'
docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)'
# Check the status
sleep 5
echo
echo "*** rabbitmq-orig ***"
docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
echo "*** rabbitmq-test-script ***"
docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
docker ps -a
# Result
# *** rabbitmq-orig ***
# 2024-03-22 01:09:54.975564+00:00 [info] <0.248.0> Starting RabbitMQ 3.13.0 on Erlang 26.2.2 [jit]
# User defined signal 2
#
# *** rabbitmq-test-script ***
# 2024-03-22 01:10:05.521885+00:00 [info] <0.248.0> Starting RabbitMQ 3.13.0 on Erlang 26.2.2 [jit]
# User defined signal 2
# 2024-03-22 01:10:11.554601+00:00 [info] <0.248.0> Starting RabbitMQ 3.13.0 on Erlang 26.2.2 [jit]
#
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# 2bf315e5164e rabbitmq-test-script "docker-entrypoint.s…" 22 seconds ago Up 15 seconds 4369/tcp, 5671-5672/tcp, 15691-15692/tcp, 25672/tcp rabbitmq-test-script
# d89867f6985b rabbitmq "docker-entrypoint.s…" 32 seconds ago Exited (0) 15 seconds ago rabbitmq-orig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the very detailed test process. I ran it and it demonstrated your change correctly. I also ran it with my suggested change from this review. I don't think the subshell is necessary.
03e7428
to
7bc79f1
Compare
b00d644
to
64fd1cf
Compare
bc71a34
to
f875455
Compare
f875455
to
5227229
Compare
5227229
to
2ea5943
Compare
It would be great to wait until the OCI build failure is resolved. I don't think this change should affect Selenium tests but as things stand right now, it somehow does, even after rebasing on top of |
Exit code is useful for monitoring and process supervisors when it comes to deciding on what to do when the process exits, for example we may want to restart it or send a report. The current implementation of `rabbitmq-server` script does not propagate the exit code in a general case which makes it impossible to know whether the exit was clean and, for example, use restart policy `on-failure` in docker. This change makes the exit code to be propagated.
Co-authored-by: Luke Bakken <[email protected]>
Co-authored-by: Luke Bakken <[email protected]>
2ea5943
to
0263b4c
Compare
Follow-up to #10819 @dumbbell made a note that the `|| exit "$?"` code is probably redundant. @lukebakken tested this patch using the RMQ docker image and `dash`, and the test procedure provided by @giner here: #10819 (comment) ```sh mkdir ~/ubuntu_24.04 vagrant init ubuntu/noble64 cd ~/ubuntu_24.04 vagrant up vagrant ssh sudo apt update sudo apt install docker.io newgrp docker mkdir -p /tmp/rabbitmq cd /tmp/rabbitmq cat > Dockerfile << 'EOF' from "rabbitmq" RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)" EOF docker build -t rabbitmq-test-script . docker rm -f rabbitmq-orig 2>/dev/null docker run --restart on-failure -d --name rabbitmq-orig rabbitmq docker rm -f rabbitmq-test-script 2>/dev/null docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script sleep 5 docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)' docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)' sleep 5 echo echo "*** rabbitmq-orig ***" docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ" echo echo "*** rabbitmq-test-script ***" docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ" echo docker ps -a ```
rabbitmq-server shell script: propagate BEAM VM process exit code (backport #10819)
Follow-up to #10819 @dumbbell made a note that the `|| exit "$?"` code is probably redundant. @lukebakken tested this patch using the RMQ docker image and `dash`, and the test procedure provided by @giner here: #10819 (comment) ```sh mkdir ~/ubuntu_24.04 vagrant init ubuntu/noble64 cd ~/ubuntu_24.04 vagrant up vagrant ssh sudo apt update sudo apt install docker.io newgrp docker mkdir -p /tmp/rabbitmq cd /tmp/rabbitmq cat > Dockerfile << 'EOF' from "rabbitmq" RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)" EOF docker build -t rabbitmq-test-script . docker rm -f rabbitmq-orig 2>/dev/null docker run --restart on-failure -d --name rabbitmq-orig rabbitmq docker rm -f rabbitmq-test-script 2>/dev/null docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script sleep 5 docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)' docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)' sleep 5 echo echo "*** rabbitmq-orig ***" docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ" echo echo "*** rabbitmq-test-script ***" docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ" echo docker ps -a ```
Follow-up to #10819 @dumbbell made a note that the `|| exit "$?"` code is probably redundant. @lukebakken tested this patch using the RMQ docker image and `dash`, and the test procedure provided by @giner here: #10819 (comment) ```sh mkdir ~/ubuntu_24.04 vagrant init ubuntu/noble64 cd ~/ubuntu_24.04 vagrant up vagrant ssh sudo apt update sudo apt install docker.io newgrp docker mkdir -p /tmp/rabbitmq cd /tmp/rabbitmq cat > Dockerfile << 'EOF' from "rabbitmq" RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)" EOF docker build -t rabbitmq-test-script . docker rm -f rabbitmq-orig 2>/dev/null docker run --restart on-failure -d --name rabbitmq-orig rabbitmq docker rm -f rabbitmq-test-script 2>/dev/null docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script sleep 5 docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)' docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)' sleep 5 echo echo "*** rabbitmq-orig ***" docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ" echo echo "*** rabbitmq-test-script ***" docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ" echo docker ps -a ```
Follow-up to #10819 @dumbbell made a note that the `|| exit "$?"` code is probably redundant. @lukebakken tested this patch using the RMQ docker image and `dash`, and the test procedure provided by @giner here: #10819 (comment) ```sh mkdir ~/ubuntu_24.04 vagrant init ubuntu/noble64 cd ~/ubuntu_24.04 vagrant up vagrant ssh sudo apt update sudo apt install docker.io newgrp docker mkdir -p /tmp/rabbitmq cd /tmp/rabbitmq cat > Dockerfile << 'EOF' from "rabbitmq" RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)" EOF docker build -t rabbitmq-test-script . docker rm -f rabbitmq-orig 2>/dev/null docker run --restart on-failure -d --name rabbitmq-orig rabbitmq docker rm -f rabbitmq-test-script 2>/dev/null docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script sleep 5 docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)' docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)' sleep 5 echo echo "*** rabbitmq-orig ***" docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ" echo echo "*** rabbitmq-test-script ***" docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ" echo docker ps -a ``` (cherry picked from commit 8ba0562)
Proposed Changes
Exit code is useful for monitoring and process supervisors when it comes to deciding on what to do when the process exits, for example we may want to restart it or send a report. The current implementation of
rabbitmq-server
script does not propagate the exit code in a general case which makes it impossible to know whether the exit was clean and, for example, use restart policyon-failure
in docker.This change makes the exit code to be propagated.
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc.