Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build/Tests] Make the test JVM exit if OOME occurs #14509

Merged
merged 1 commit into from
Mar 1, 2022

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Mar 1, 2022

Motivation

Modifications

  • Add XX:+ExitOnOutOfMemoryError to test JVM arguments
  • disable ClientDeduplicationFailureTest.testClientDeduplicationCorrectnessWithFailure completely since it leads to OOMEs in certain conditions.

- OOMEs can make the build to take very long to complete.
  It's better to fail fast in tests when OOMEs occur.
Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, good idea!

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens in case of OOM with this flag ?
is surefire reporting the name of the broker test ?
otherwise it will be very hard to understand why CI failed

@lhotari
Copy link
Member Author

lhotari commented Mar 1, 2022

what happens in case of OOM with this flag ? is surefire reporting the name of the broker test ? otherwise it will be very hard to understand why CI failed

The OOME does get logged in surefire report files. IIRC, the error code was 3 for the terminated test JVM. This is visible in the console output.
It's actual even harder to understand why CI failed when the JVM keeps on running since a OOME can leave some components in a broken state.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try

+1

@michaeljmarshall
Copy link
Member

Just created an issue to track the fact that we disabled a test: #14523.

@lhotari
Copy link
Member Author

lhotari commented Mar 1, 2022

Here's an explanation and a fix for some of the recent OOMEs in tests: #14524 .

Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
- OOMEs can make the build to take very long to complete.
  It's better to fail fast in tests when OOMEs occur.
poorbarcode pushed a commit that referenced this pull request Apr 3, 2023
- OOMEs can make the build to take very long to complete.
  It's better to fail fast in tests when OOMEs occur.

(cherry picked from commit 89a36f9)
@poorbarcode
Copy link
Contributor

Hi @lhotari

The 2.10 branch also occurs the OOM exception, so I cherry-picked this PR into the branch-2.10.

see: https://github.com/apache/pulsar/actions/runs/4566713860/jobs/8111430173?pr=19971

nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request May 11, 2023
- OOMEs can make the build to take very long to complete.
  It's better to fail fast in tests when OOMEs occur.

(cherry picked from commit 89a36f9)
(cherry picked from commit a19879b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants