Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky-test: ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild #23389

Closed
1 of 2 tasks
lhotari opened this issue Oct 2, 2024 · 4 comments · Fixed by #23852
Closed
1 of 2 tasks

Flaky-test: ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild #23389

lhotari opened this issue Oct 2, 2024 · 4 comments · Fixed by #23852

Comments

@lhotari
Copy link
Member

lhotari commented Oct 2, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Example failure

https://github.com/apache/pulsar/actions/runs/11125404278/job/30955253737?pr=23327#step:11:1680

Exception stacktrace

  Error:  Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 74.942 s <<< FAILURE! - in org.apache.pulsar.broker.service.ZkSessionExpireTest
  Error:  org.apache.pulsar.broker.service.ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild[false, class org.apache.pulsar.broker.service.NetworkErrorTestBase$PreferBrokerModularLoadManager](4)  Time elapsed: 31.007 s  <<< FAILURE!
  org.awaitility.core.ConditionTimeoutException: Assertion condition expected [2] but found [1] within 10 seconds.
  	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
  	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
  	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
  	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
  	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)
  	at org.apache.pulsar.broker.service.ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild(ZkSessionExpireTest.java:154)
  	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  	at java.base/java.lang.Thread.run(Thread.java:1583)
  Caused by: java.lang.AssertionError: expected [2] but found [1]
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.failNotEquals(Assert.java:1577)
  	at org.testng.Assert.assertEqualsImpl(Assert.java:149)
  	at org.testng.Assert.assertEquals(Assert.java:131)
  	at org.testng.Assert.assertEquals(Assert.java:1418)
  	at org.testng.Assert.assertEquals(Assert.java:1382)
  	at org.testng.Assert.assertEquals(Assert.java:1428)
  	at org.apache.pulsar.broker.service.ZkSessionExpireTest.lambda$testTopicUnloadAfterSessionRebuild$4(ZkSessionExpireTest.java:155)
  	at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
  	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:248)
  	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:235)
  	... 4 more

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari
Copy link
Member Author

lhotari commented Jan 3, 2025

This test is very flaky at the moment. Failure in branch-4.0 build: https://github.com/apache/pulsar/actions/runs/12600662668/job/35120529770#step:11:1734

@lhotari
Copy link
Member Author

lhotari commented Jan 3, 2025

Logs uploaded to https://gist.github.com/lhotari/8eb64203e95a352631957199b3d19420

In https://gist.githubusercontent.com/lhotari/8eb64203e95a352631957199b3d19420/raw/7e9a48fb45e24d1ce3b91dc9ab944bf80841ccd1/org.apache.pulsar.broker.service.ZkSessionExpireTest-output.txt
There are log lines such as these ones before the test failure:

2025-01-03T16:47:20,293 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
2025-01-03T16:47:20,394 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
2025-01-03T16:47:20,494 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
2025-01-03T16:47:20,595 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
2025-01-03T16:47:20,695 - WARN  - [ForkJoinPool.commonPool-worker-1:BrokerService] - Namespace bundle for topic (persistent://public/default/tp-699e52dc-c940-484b-93b9-80d97d71fd03) not served by this instance:localhost:40793. Please redo the lookup. Request is denied: namespace=public/default
!!!!!!!!! FAILURE-- [TestClass name=class org.apache.pulsar.broker.service.ZkSessionExpireTest].testTopicUnloadAfterSessionRebuild([true, class org.apache.pulsar.broker.service.NetworkErrorTestBase$PreferBrokerModularLoadManager])-------
org.awaitility.core.ConditionTimeoutException: Assertion condition expected [true] but found [false] within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)
	at org.apache.pulsar.broker.service.ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild(ZkSessionExpireTest.java:161)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.AssertionError: expected [true] but found [false]
	at org.testng.Assert.fail(Assert.java:110)
	at org.testng.Assert.failNotEquals(Assert.java:1577)
	at org.testng.Assert.assertTrue(Assert.java:56)
	at org.testng.Assert.assertTrue(Assert.java:66)
	at org.apache.pulsar.broker.service.ZkSessionExpireTest.lambda$testTopicUnloadAfterSessionRebuild$5(ZkSessionExpireTest.java:163)
	at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:248)
	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:235)
	... 4 more

@lhotari
Copy link
Member Author

lhotari commented Jan 5, 2025

@poorbarcode Do you have a chance to fix the flaky test ZkSessionExpireTest.testTopicUnloadAfterSessionRebuild ?

@lhotari
Copy link
Member Author

lhotari commented Jan 13, 2025

@poorbarcode The test usually passes when running on MacOS locally.

mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker test -Dtest=ZkSessionExpireTest -DexcludedGroups=

It seems that the flakiness comes into play when running with constrained CPU resources like in the CI. I have a solution to run tests in docker with shell functions from https://github.com/lhotari/pulsar-contributor-toolbox .

ptbx_run_test_in_docker -pl pulsar-broker -Dtest=ZkSessionExpireTest -DexcludedGroups=

ptbx_run_test_in_docker will setup a docker image, install java and tooling and then run the test in docker with --cpus=2 --memory=6g to limit resources. This usually triggers the flakiness in many different ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant