Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbitmq-server shell script: propagate BEAM VM process exit code #10819

Merged
merged 3 commits into from
Apr 9, 2024

Conversation

giner
Copy link
Contributor

@giner giner commented Mar 21, 2024

Proposed Changes

Exit code is useful for monitoring and process supervisors when it comes to deciding on what to do when the process exits, for example we may want to restart it or send a report. The current implementation of rabbitmq-server script does not propagate the exit code in a general case which makes it impossible to know whether the exit was clean and, for example, use restart policy on-failure in docker.

This change makes the exit code to be propagated.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

Further Comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc.

@giner giner force-pushed the stas/propagate_exit_code branch from fa72c19 to d24f4ec Compare March 21, 2024 05:47
Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd rather just turn off set -e than do logical operands that aren't immediately obvious.

deps/rabbit/scripts/rabbitmq-server Outdated Show resolved Hide resolved
@michaelklishin
Copy link
Member

As @dumbbell points out in an internal discussion, the || true part was necessary to work around a dash-specific behavior. It's a good question whether that behavior is still present in the versions/distributions we care about.

@giner
Copy link
Contributor Author

giner commented Mar 21, 2024

As @dumbbell points out in an internal discussion, the || true part was necessary to work around a dash-specific behavior. It's a good question whether that behavior is still present in the versions/distributions we care about.

I'm not sure why the comment says that it's dash specific. wait returns exit code in bash as well, the following will show only "first": I've read the comment again and realized that the example below is irrelevant

#/bin/bash

set -e

(sleep 1; exit 0) &
wait $!
echo first

(sleep 1; exit 1) &
wait $!
echo second

@giner
Copy link
Contributor Author

giner commented Mar 21, 2024

Personally I'd rather just turn off set -e than do logical operands that aren't immediately obvious.

Could be done the following way if it makes it more obvious:

if wait "$rabbitmq_server_pid"; then
  exit "$?"
else
  exit "$?"
fi

Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giner thanks for taking the time to contribute. You're the first person to notice this behavior.

I've brought up the entire rabbitmq-server script and am very wary of changing anything in this script, especially given the comments that @dumbbell has made in the script itself.

The only way I would accept a change is with the following:

  • An explanation of when you've seen RabbitMQ exit with a non-zero code that was not returned when this script exited.
  • A "simplified" version of the rabbitmq-server script that can be run to demonstrate the expected behavior. That way we can run it using a variety of sh-compatible shells, like dash

Sorry to be so demanding but as you can imagine we must be very wary here.

@giner
Copy link
Contributor Author

giner commented Mar 21, 2024

@giner thanks for taking the time to contribute. You're the first person to notice this behavior.

I've brought up the entire rabbitmq-server script and am very wary of changing anything in this script, especially given the comments that @dumbbell has made in the script itself.

The only way I would accept a change is with the following:

* An explanation of when you've seen RabbitMQ exit with a non-zero code that was not returned when this script exited.

* A "simplified" version of the `rabbitmq-server` script that can be run to demonstrate the expected behavior. That way we can run it using a variety of `sh`-compatible shells, like `dash`

Sorry to be so demanding but as you can imagine we must be very wary here.

For the first one we've noticed the container not being restarted after it is stopped due to time skew which is frequently happening with docker on macOS. For the second one, I'm not sure exactly what you want me to do.

@giner
Copy link
Contributor Author

giner commented Mar 21, 2024

Personally I'd rather just turn off set -e than do logical operands that aren't immediately obvious.

Could be done the following way if it makes it more obvious:

if wait "$rabbitmq_server_pid"; then
  exit "$?"
else
  exit "$?"
fi

Another option:

exit_code=0
wait "$rabbitmq_server_pid" || exit_code=$?
exit "$exit_code" # To be replaced by return if inside a function

@michaelklishin
Copy link
Member

@giner an alternative sbin/rabbitmq-server version that behaves the way you'd expect, that can be tested in at least some environments. But maybe further reducing the script is not really necessary here.

The hesitancy comes from the fact that such changes are very subtle and very difficult to test across even a majority of environments where RabbitMQ can run: no single team or company can realistically cover them all.

@lukebakken
Copy link
Collaborator

For the first one we've noticed the container not being restarted after it is stopped due to #8940 which is frequently happening with docker on macOS

Have you tried the workaround that @michaelklishin provided? #8940 (comment)

@lukebakken
Copy link
Collaborator

For the second one, I'm not sure exactly what you want me to do.

@giner something like this:

https://github.com/lukebakken/rabbitmq-server-10819

At some point, it would be great to convert the above into a test, somehow.

@giner giner force-pushed the stas/propagate_exit_code branch from d24f4ec to 5d19b2e Compare March 21, 2024 22:49
@giner
Copy link
Contributor Author

giner commented Mar 21, 2024

Updated the PR with a less invasive change. Now it doesn't force return or exit and the only thing it does is propagating the exit code instead of shadowing it.

@lukebakken lukebakken force-pushed the stas/propagate_exit_code branch from 5d19b2e to 03e7428 Compare March 21, 2024 23:19
@giner
Copy link
Contributor Author

giner commented Mar 22, 2024

Here is a way to demonstrate the effect of this change

Test
# Start a VM and get inside
mkdir ~/ubuntu_24.04
vagrant init ubuntu/noble64
cd ~/ubuntu_24.04
vagrant up
vagrant ssh

# Install docker
sudo apt update
sudo apt install docker.io
newgrp docker

# Prepare changes
mkdir -p /tmp/rabbitmq
cd /tmp/rabbitmq
cat > Dockerfile << 'EOF'
from "rabbitmq"
RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)"
EOF
docker build -t rabbitmq-test-script .

# Run the original image
docker rm -f rabbitmq-orig 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-orig rabbitmq

# Run the updated image
docker rm -f rabbitmq-test-script 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script

# Make RabbitMQ exit
sleep 5
docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)'
docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)'

# Check the status
sleep 5
echo
echo "*** rabbitmq-orig ***"
docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
echo "*** rabbitmq-test-script ***"
docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
docker ps -a

# Result
# *** rabbitmq-orig ***
# 2024-03-22 01:09:54.975564+00:00 [info] <0.248.0>  Starting RabbitMQ 3.13.0 on Erlang 26.2.2 [jit]
# User defined signal 2
# 
# *** rabbitmq-test-script ***
# 2024-03-22 01:10:05.521885+00:00 [info] <0.248.0>  Starting RabbitMQ 3.13.0 on Erlang 26.2.2 [jit]
# User defined signal 2
# 2024-03-22 01:10:11.554601+00:00 [info] <0.248.0>  Starting RabbitMQ 3.13.0 on Erlang 26.2.2 [jit]
# 
# CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                      PORTS                                                 NAMES
# 2bf315e5164e   rabbitmq-test-script   "docker-entrypoint.s…"   22 seconds ago   Up 15 seconds               4369/tcp, 5671-5672/tcp, 15691-15692/tcp, 25672/tcp   rabbitmq-test-script
# d89867f6985b   rabbitmq               "docker-entrypoint.s…"   32 seconds ago   Exited (0) 15 seconds ago                                                         rabbitmq-orig

Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the very detailed test process. I ran it and it demonstrated your change correctly. I also ran it with my suggested change from this review. I don't think the subshell is necessary.

deps/rabbit/scripts/rabbitmq-server Outdated Show resolved Hide resolved
deps/rabbit/scripts/rabbitmq-server Outdated Show resolved Hide resolved
lukebakken added a commit to lukebakken/rabbitmq-server-10819 that referenced this pull request Mar 22, 2024
@lukebakken lukebakken force-pushed the stas/propagate_exit_code branch from 03e7428 to 7bc79f1 Compare March 22, 2024 19:26
@giner giner requested review from lukebakken and dumbbell March 26, 2024 06:06
@lukebakken lukebakken force-pushed the stas/propagate_exit_code branch from b00d644 to 64fd1cf Compare March 26, 2024 13:58
@lukebakken lukebakken force-pushed the stas/propagate_exit_code branch 2 times, most recently from bc71a34 to f875455 Compare April 3, 2024 22:49
@lukebakken lukebakken force-pushed the stas/propagate_exit_code branch from f875455 to 5227229 Compare April 8, 2024 19:00
@michaelklishin
Copy link
Member

It would be great to wait until the OCI build failure is resolved. I don't think this change should affect Selenium tests but as things stand right now, it somehow does, even after rebasing on top of main.

giner and others added 3 commits April 9, 2024 14:56
Exit code is useful for monitoring and process supervisors when it comes
to deciding on what to do when the process exits, for example we may
want to restart it or send a report. The current implementation of
`rabbitmq-server` script does not propagate the exit code in a general
case which makes it impossible to know whether the exit was clean and,
for example, use restart policy `on-failure` in docker.

This change makes the exit code to be propagated.
@lukebakken lukebakken force-pushed the stas/propagate_exit_code branch from 2ea5943 to 0263b4c Compare April 9, 2024 21:56
@lukebakken lukebakken merged commit 78c842e into rabbitmq:main Apr 9, 2024
10 of 11 checks passed
@michaelklishin michaelklishin changed the title rabbitmq-server: Propagate exit code rabbitmq-server shell script: propagate BEAM VM process exit code Apr 9, 2024
lukebakken added a commit that referenced this pull request Apr 9, 2024
Follow-up to #10819

@dumbbell made a note that the `|| exit "$?"` code is probably redundant.

@lukebakken tested this patch using the RMQ docker image and `dash`, and
the test procedure provided by @giner here:

#10819 (comment)

```sh
mkdir ~/ubuntu_24.04
vagrant init ubuntu/noble64
cd ~/ubuntu_24.04
vagrant up
vagrant ssh

sudo apt update
sudo apt install docker.io
newgrp docker

mkdir -p /tmp/rabbitmq
cd /tmp/rabbitmq
cat > Dockerfile << 'EOF'
from "rabbitmq"
RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)"
EOF
docker build -t rabbitmq-test-script .

docker rm -f rabbitmq-orig 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-orig rabbitmq

docker rm -f rabbitmq-test-script 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script

sleep 5
docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)'
docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)'

sleep 5
echo
echo "*** rabbitmq-orig ***"
docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
echo "*** rabbitmq-test-script ***"
docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
docker ps -a

```
michaelklishin added a commit that referenced this pull request Apr 9, 2024
rabbitmq-server shell script: propagate BEAM VM process exit code (backport #10819)
michaelklishin pushed a commit that referenced this pull request Apr 10, 2024
Follow-up to #10819

@dumbbell made a note that the `|| exit "$?"` code is probably redundant.

@lukebakken tested this patch using the RMQ docker image and `dash`, and
the test procedure provided by @giner here:

#10819 (comment)

```sh
mkdir ~/ubuntu_24.04
vagrant init ubuntu/noble64
cd ~/ubuntu_24.04
vagrant up
vagrant ssh

sudo apt update
sudo apt install docker.io
newgrp docker

mkdir -p /tmp/rabbitmq
cd /tmp/rabbitmq
cat > Dockerfile << 'EOF'
from "rabbitmq"
RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)"
EOF
docker build -t rabbitmq-test-script .

docker rm -f rabbitmq-orig 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-orig rabbitmq

docker rm -f rabbitmq-test-script 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script

sleep 5
docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)'
docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)'

sleep 5
echo
echo "*** rabbitmq-orig ***"
docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
echo "*** rabbitmq-test-script ***"
docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
docker ps -a

```
michaelklishin pushed a commit that referenced this pull request Apr 11, 2024
Follow-up to #10819

@dumbbell made a note that the `|| exit "$?"` code is probably redundant.

@lukebakken tested this patch using the RMQ docker image and `dash`, and
the test procedure provided by @giner here:

#10819 (comment)

```sh
mkdir ~/ubuntu_24.04
vagrant init ubuntu/noble64
cd ~/ubuntu_24.04
vagrant up
vagrant ssh

sudo apt update
sudo apt install docker.io
newgrp docker

mkdir -p /tmp/rabbitmq
cd /tmp/rabbitmq
cat > Dockerfile << 'EOF'
from "rabbitmq"
RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)"
EOF
docker build -t rabbitmq-test-script .

docker rm -f rabbitmq-orig 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-orig rabbitmq

docker rm -f rabbitmq-test-script 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script

sleep 5
docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)'
docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)'

sleep 5
echo
echo "*** rabbitmq-orig ***"
docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
echo "*** rabbitmq-test-script ***"
docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
docker ps -a

```
mergify bot pushed a commit that referenced this pull request Apr 11, 2024
Follow-up to #10819

@dumbbell made a note that the `|| exit "$?"` code is probably redundant.

@lukebakken tested this patch using the RMQ docker image and `dash`, and
the test procedure provided by @giner here:

#10819 (comment)

```sh
mkdir ~/ubuntu_24.04
vagrant init ubuntu/noble64
cd ~/ubuntu_24.04
vagrant up
vagrant ssh

sudo apt update
sudo apt install docker.io
newgrp docker

mkdir -p /tmp/rabbitmq
cd /tmp/rabbitmq
cat > Dockerfile << 'EOF'
from "rabbitmq"
RUN sed -i 's/wait "\$rabbitmq_server_pid" || true/wait "$rabbitmq_server_pid" || (exit "$?")/g' "$(which rabbitmq-server)"
EOF
docker build -t rabbitmq-test-script .

docker rm -f rabbitmq-orig 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-orig rabbitmq

docker rm -f rabbitmq-test-script 2>/dev/null
docker run --restart on-failure -d --name rabbitmq-test-script rabbitmq-test-script

sleep 5
docker exec rabbitmq-orig /bin/sh -c 'kill -USR2 $(pidof beam.smp)'
docker exec rabbitmq-test-script /bin/sh -c 'kill -USR2 $(pidof beam.smp)'

sleep 5
echo
echo "*** rabbitmq-orig ***"
docker logs rabbitmq-orig 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
echo "*** rabbitmq-test-script ***"
docker logs rabbitmq-test-script 2>&1 | grep "User defined signal\|Starting RabbitMQ"
echo
docker ps -a

```

(cherry picked from commit 8ba0562)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants