Skip to content

Conversation

@masseyke
Copy link
Member

The GeoIpDownloader has some race conditions (documented in #92888) that were made more likely to occur by #92335 since it's now more common to enable and disable the GeoIpDownloader in tests. This PR works around the race condition by adding a geoip processor that would be unnecessary without the race condition.
I don't have a great way to reproduce this. I ran it ~700 times with and without my fix using something like:

./gradlew ':modules:ingest-geoip:internalClusterTest' --tests "org.elasticsearch.ingest.geoip.GeoIpDownloaderIT.testGeoIpDatabasesDownloadNoGeoipProcessors" -Dtests.iters=300 -Dtests.locale=en-IE -Dtests.timezone=Europe/Samara -Druntime.java=17

Without my fix I saw it fail twice. With my fix I have not seen any failures. The failure is:

java.lang.NullPointerException: Cannot invoke "org.elasticsearch.ingest.geoip.GeoIpTaskState.getDatabases()" because "state" is null
	at org.elasticsearch.ingest.geoip.GeoIpDownloaderIT.lambda$cleanUp$0(GeoIpDownloaderIT.java:127)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1109)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1094)
	at org.elasticsearch.ingest.geoip.GeoIpDownloaderIT.cleanUp(GeoIpDownloaderIT.java:123)
	at jdk.internal.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
	at java.base/java.lang.Thread.run(Thread.java:833)

@masseyke masseyke added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.7.0 labels Jan 30, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jan 30, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@masseyke masseyke merged commit 4e507ba into elastic:main Jan 31, 2023
@masseyke masseyke deleted the fix/GeoIpDownloaderIT-race-conditions branch January 31, 2023 13:34
mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this pull request Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue Team:Data Management Meta label for data/management team v8.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants