Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DROOLS-6874] investigating flaky HACEP test #2735

Merged
merged 1 commit into from
Mar 18, 2022

Conversation

tkobayas
Copy link
Contributor

@tkobayas tkobayas commented Mar 16, 2022

JIRA:
https://issues.redhat.com/browse/DROOLS-6874

How to replicate CI configuration locally?

Build Chain tool does "simple" maven build(s), the builds are just Maven commands, but because the repositories relates and depends on each other and any change in API or class method could affect several of those repositories there is a need to use build-chain tool to handle cross repository builds and be sure that we always use latest version of the code for each repository.

build-chain tool is a build tool which can be used on command line locally or in Github Actions workflow(s), in case you need to change multiple repositories and send multiple dependent pull requests related with a change you can easily reproduce the same build by executing it on Github hosted environment or locally in your development environment. See local execution details to get more information about it.

How to retest this PR or trigger a specific build:
  • a pull request please add comment: Jenkins retest this

  • a full downstream build please add comment: Jenkins run fdb

  • a compile downstream build please add comment: Jenkins run cdb

  • a full production downstream build please add comment: Jenkins execute product fdb

  • an upstream build please add comment: Jenkins run upstream

@mareknovotny
Copy link
Member

thanks @tkobayas for looking at these flaky tests

@tkobayas
Copy link
Contributor Author

Retry is effective:

[2022-03-16T10:35:50.859Z] [INFO] Running org.kie.hacep.PodAsReplicaTest
[2022-03-16T10:35:50.859Z] 06:35:50.396 [main] WARN  k.server.BrokerMetadataCheckpoint warn - No meta.properties file under dir /home/jenkins/workspace/KIE/main/pullrequest/droolsjbpm-integration-main.pr/bc/kiegroup_droolsjbpm_integration/drools-ha/ha-core-infra/target/kafkatest-11263299798265473719/meta.properties
[2022-03-16T10:35:50.859Z] 06:35:50.458 [Time-limited test] DEBUG o.a.k.c.consumer.internals.Fetcher getTopicMetadata - [Consumer clientId=consumer-drools-56, groupId=drools] Topic metadata fetch included errors: {snapshot=INVALID_REPLICATION_FACTOR}
[2022-03-16T10:35:50.859Z] 06:35:50.459 [Time-limited test] ERROR o.k.h.c.i.u.SnapshotOnDemandUtilsImpl getConfiguredSnapshotConsumer - failed at partitionsFor
[2022-03-16T10:35:50.859Z] org.apache.kafka.common.KafkaException: Unexpected error fetching metadata for topic snapshot
[2022-03-16T10:35:50.859Z]      at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:409)
[2022-03-16T10:35:50.859Z]      at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1948)
[2022-03-16T10:35:50.859Z]      at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1916)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.core.infra.utils.SnapshotOnDemandUtilsImpl.getConfiguredSnapshotConsumer(SnapshotOnDemandUtilsImpl.java:144)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.core.infra.DefaultSessionSnapShooter.deserialize(DefaultSessionSnapShooter.java:86)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.consumer.DroolsConsumerHandler.initializeSessionContextWithSnapshotCheck(DroolsConsumerHandler.java:75)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.consumer.DroolsConsumerHandler.initializeKieSessionContext(DroolsConsumerHandler.java:68)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.consumer.DroolsConsumerHandler.<init>(DroolsConsumerHandler.java:58)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.core.InfraFactory.getConsumerHandler(InfraFactory.java:71)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.core.Bootstrap.startConsumers(Bootstrap.java:104)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.core.Bootstrap.startEngine(Bootstrap.java:49)
[2022-03-16T10:35:50.859Z]      at org.kie.hacep.PodAsReplicaTest.processOneSentMessageAsLeaderAndThenReplicaTest(PodAsReplicaTest.java:39)
[2022-03-16T10:35:50.859Z]      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2022-03-16T10:35:50.859Z]      at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2022-03-16T10:35:50.860Z]      at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2022-03-16T10:35:50.860Z]      at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[2022-03-16T10:35:50.860Z]      at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
[2022-03-16T10:35:50.860Z]      at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
[2022-03-16T10:35:50.860Z]      at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
[2022-03-16T10:35:50.860Z]      at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
[2022-03-16T10:35:50.860Z]      at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
[2022-03-16T10:35:50.860Z]      at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
[2022-03-16T10:35:50.860Z]      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[2022-03-16T10:35:50.860Z]      at java.base/java.lang.Thread.run(Thread.java:829)
[2022-03-16T10:35:50.860Z] Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor is below 1 or larger than the number of available brokers.
[2022-03-16T10:35:50.860Z] 06:35:50.459 [Time-limited test] WARN  o.k.h.c.i.u.SnapshotOnDemandUtilsImpl getConfiguredSnapshotConsumer - *** Retrying... ***
[2022-03-16T10:35:54.139Z] 06:35:53.492 [pool-17-thread-1] WARN  o.k.h.c.i.e.LeaderElectionImpl lookupNewLeaderInfo - Pod[eng-jenkins-csb-business-automation.apps.ocp-c1.prod.psi.redhat.com] Unable to retrieve the current ConfigMap default-leaders from Kubernetes
[2022-03-16T10:35:56.023Z] 06:35:55.294 [pool-14-thread-1] WARN  o.k.h.c.i.e.LeaderElectionImpl lookupNewLeaderInfo - Pod[eng-jenkins-csb-business-automation.apps.ocp-c1.prod.psi.redhat.com] Unable to retrieve the current ConfigMap default-leaders from Kubernetes
[2022-03-16T10:35:56.023Z] 06:35:55.464 [Time-limited test] DEBUG o.a.k.c.consumer.internals.Fetcher getTopicMetadata - [Consumer clientId=consumer-drools-56, groupId=drools] Topic metadata fetch included errors: {snapshot=LEADER_NOT_AVAILABLE}
[2022-03-16T10:35:56.023Z] 06:35:55.567 [Time-limited test] WARN  o.k.h.c.i.u.SnapshotOnDemandUtilsImpl getConfiguredSnapshotConsumer - *** retry successful ***

kafkaServerTest.startServer();
KafkaServer server = kafkaServerTest.startServer();
for (int i = 0; i < MAX_RETRY; i++) {
if (server.kafkaController().kafkaScheduler().isStarted()) {
Copy link
Contributor Author

@tkobayas tkobayas Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if kafkaServerTest.startServer() returns, kafkaController's initialization is running in a different thread so it may finish slightly later that leads to the issue in a slow Jenkins. kafkaScheduler().isStarted() seems to be a suitable flag to check.

@tkobayas tkobayas marked this pull request as ready for review March 18, 2022 08:24
@tkobayas tkobayas changed the title [DO-NOT-MERGE] investigating flaky HACEP test [DROOLS-6874] investigating flaky HACEP test Mar 18, 2022
@sonarcloud
Copy link

sonarcloud bot commented Mar 18, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@tkobayas
Copy link
Contributor Author

@mareknovotny Please review and merge, thanks!

Copy link
Member

@mareknovotny mareknovotny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mareknovotny mareknovotny merged commit 9d764b3 into kiegroup:main Mar 18, 2022
@mareknovotny
Copy link
Member

please back port this to 7.67.x too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants