Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests sometimes hang #134

Closed
andrewazores opened this issue Mar 10, 2020 · 9 comments
Closed

Integration tests sometimes hang #134

andrewazores opened this issue Mar 10, 2020 · 9 comments
Labels
bug Something isn't working

Comments

@andrewazores
Copy link
Member

When running mvn clean verify, or something like it, the build sometimes gets stuck and seems to hang on a pre-integration-test step. The issue appears to be with the exec plugin when attempting to start the container using podman. This uses a synchronous exec plugin configuration but gives the --detach flag to podman, so the podman process should be exiting after the container has been spun up. It seems that sometimes either podman doesn't exit, or somehow the exec plugin doesn't see that it has. When this happens, podman ps -a can be used to observe that the container does in fact exist and is running. However, podman kill container-jfr-itest typically complains about a "device or resource busy", although it still seems to succeed in killing and removing the container anyway.

@andrewazores andrewazores added the bug Something isn't working label Mar 10, 2020
@andrewazores andrewazores self-assigned this Mar 10, 2020
@andrewazores
Copy link
Member Author

On newer distros, docker may no longer work out of the box, so testing if this is a podman thing or an exec-plugin thing needs some setup.

Temporary (until reboot) fix:

sudo mkdir /sys/fs/cgroup/systemd
sudo mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd

"Permanent" (undoable) fix:

$ sudo dnf install -y grubby
$ sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
$ sudo reboot

After a fix is applied, mvn -DimageBuilder=$(which docker) clean verify; docker image prune -f can be run repeatedly to run the tests using Docker and observe the hang rate.

@andrewazores
Copy link
Member Author

Actually, an even better invocation for repeatedly running the tests but without running a full rebuild:

mvn -DimageBuilder=$(which docker) exec:exec@start-container failsafe:integration-test exec:exec@stop-container

@andrewazores
Copy link
Member Author

After applying the kernel cgroup change and rebooting to test this, I'm no longer seeing hangs with podman. Docker also works perfectly with the cgroup change.

@ebaron would you mind trying this out too?

@ebaron
Copy link
Member

ebaron commented Mar 10, 2020

I still saw this with podman. Tried the following:

$ sudo mkdir /sys/fs/cgroup/systemd
$ sudo mount -t cgroup -o none,name=systemd cgroup /sys/fs/cgroup/systemd
$ mvn exec:exec@start-container failsafe:integration-test exec:exec@stop-container
[INFO] Scanning for projects...
[INFO] 
[INFO] ------------< com.redhat.rhjmc.containerjfr:container-jfr >-------------
[INFO] Building container-jfr 0.16.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- exec-maven-plugin:1.6.0:exec (start-container) @ container-jfr ---
d2787c3464700eebe5d33d157ea3d32587889760e90ecd11750c4a28f68ac3fd
[INFO] 
[INFO] --- maven-failsafe-plugin:3.0.0-M4:integration-test (default-cli) @ container-jfr ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running itest.SanityIT
[INFO] Running itest.SanityIT$GetClientUrl
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.521 s - in itest.SanityIT$GetClientUrl
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.536 s - in itest.SanityIT
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] --- exec-maven-plugin:1.6.0:exec (stop-container) @ container-jfr ---
d2787c3464700eebe5d33d157ea3d32587889760e90ecd11750c4a28f68ac3fd
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5.249 s
[INFO] Finished at: 2020-03-10T13:20:18-04:00
[INFO] ------------------------------------------------------------------------
$ mvn exec:exec@start-container failsafe:integration-test exec:exec@stop-container
[INFO] Scanning for projects...
[INFO] 
[INFO] ------------< com.redhat.rhjmc.containerjfr:container-jfr >-------------
[INFO] Building container-jfr 0.16.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- exec-maven-plugin:1.6.0:exec (start-container) @ container-jfr ---
(hang)

@andrewazores andrewazores removed their assignment Mar 10, 2020
@ebaron
Copy link
Member

ebaron commented Mar 10, 2020

I was also able to reproduce the hang after booting the kernel with systemd.unified_cgroup_hierarchy=0.

@andrewazores
Copy link
Member Author

Hmm. Did you use Docker or still with Podman? I don't know what effect, if any, those fixes have on Podman, but they allow Docker to run on F31. Otherwise I got some "cgroup mount" error when trying to run anything with Docker.

Are you getting it to hang every time with Podman or is it also intermittent for you?

@ebaron
Copy link
Member

ebaron commented Mar 10, 2020

It's intermittent with Podman, I'll try with Docker.

@ebaron
Copy link
Member

ebaron commented Mar 10, 2020

Okay, did 50 runs with Docker and no hang.

@andrewazores
Copy link
Member Author

Thanks for testing that. I'll close this then since this really seems to be an issue with podman, maybe the same or related to containers/podman#4621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants