Skip to content

Conversation

@martijnvg
Copy link
Member

@martijnvg martijnvg commented Dec 10, 2024

The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough. The read from perThreadProvider field could be stale and therefore returning a previous source provider. Instead the source provider should be returned from provider local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

The ReinitializingSourceProvider was introduced via #117792

Closes #118238

The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.
When multiple threads access `ReinitializingSourceProvider` the simple thread accounting could still result in returned `SourceProvider` being used by multiple threads concurrently.

The ReinitializingSourceProvider was introduced via elastic#117792

Closes elastic#118238
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Settings.Builder settings = indexSettings(1, 0).put(indexSettings()).put("index.mapping.source.mode", sourceMode);
client().admin().indices().prepareCreate("test-script").setMapping(mapping).setSettings(settings).get();
for (int i = 0; i < 10; i++) {
int numDocs = 256;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test adjustments were required to make it much more likely that the test failure as was reported in #118238 would reproduce locally. Without the fix to ReinitializingSourceProvider this test should fail after a few iterations, with the fix to ReinitializingSourceProvider this test failure reproduces.

provider = new PerThreadSourceProvider(sourceProviderFactory.get(), currentThread);
this.perThreadProvider = provider;
}
return perThreadProvider.source.getSource(ctx, doc);
Copy link
Member

@dnhatn dnhatn Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use provider not this.perThreadProvider here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try this, but then I ran into this assertion failure:

java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([89DC8DB16062966D]:0)
        at org.apache.lucene.tests.index.AssertingLeafReader$AssertingNumericDocValues.advanceExact(AssertingLeafReader.java:757)
        at org.apache.lucene.index.SingletonSortedNumericDocValues.advanceExact(SingletonSortedNumericDocValues.java:62)
        at org.elasticsearch.index.mapper.SortedNumericDocValuesSyntheticFieldLoader$ImmediateDocValuesLoader.advanceToDoc(SortedNumericDocValuesSyntheticFieldLoader.java:142)
        at org.elasticsearch.index.mapper.ObjectMapper$SyntheticSourceFieldLoader$ObjectDocValuesLoader.advanceToDoc(ObjectMapper.java:965)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.write(SourceLoader.java:210)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.source(SourceLoader.java:181)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$LeafWithMetrics.source(SourceLoader.java:146)
        at org.elasticsearch.search.lookup.SyntheticSourceProvider$SyntheticSourceLeafLoader.getSource(SyntheticSourceProvider.java:58)
        at org.elasticsearch.search.lookup.SyntheticSourceProvider.getSource(SyntheticSourceProvider.java:42)
        at org.elasticsearch.xpack.esql.plugin.ReinitializingSourceProvider.getSource(ReinitializingSourceProvider.java:37)
        at org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:40)
        at org.elasticsearch.script.AbstractFieldScript.extractFromSource(AbstractFieldScript.java:107)
        at org.elasticsearch.script.AbstractFieldScript.emitFromSource(AbstractFieldScript.java:127)
        at org.elasticsearch.script.LongFieldScript$1$1.execute(LongFieldScript.java:29)
        at org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:159)
        at org.elasticsearch.index.fielddata.LongScriptDocValues.advanceExact(LongScriptDocValues.java:26)
        at org.elasticsearch.search.MultiValueMode$6.advanceExact(MultiValueMode.java:534)
        at org.elasticsearch.index.fielddata.FieldData$17.advanceExact(FieldData.java:620)
        at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.getValueForDoc(LongComparator.java:80)
        at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.copy(LongComparator.java:105)
        at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.collectAnyHit(TopFieldCollector.java:124)
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:209)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:305)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:264)
        at org.elasticsearch.compute.lucene.LuceneOperator$LuceneScorer.scoreNextRange(LuceneOperator.java:193)
        at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.collect(LuceneTopNSourceOperator.java:168)
        at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.getCheckedOutput(LuceneTopNSourceOperator.java:148)
        at org.elasticsearch.compute.lucene.LuceneOperator.getOutput(LuceneOperator.java:118)
        at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
        at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
        at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)

So then I went with this approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But thinking more about this, just using the provider local variable should fix the issue. This is why I also went for it initially. Maybe there is something wrong elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed: 653ed40

The assertion that tripped here is because of another reason. The linked test failure failed because different threads using the same stored field readers or doc values instances (for the latter case, I observed this locally). This assertion failure here is because we go backwards with the current docid. I added protection against that in this commit, which reinitializes source provider if last seen docid is lower than the current docid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to reproduce this failure by running: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.Esql*IT.testScriptField" -Dtests.iters=128

Without the commit this fails, with the commit the executes successfully.

…using doc values APIs when source mode is synthetic:

```
java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([BA10D1FF912D808]:0)
        at org.apache.lucene.tests.index.AssertingLeafReader$AssertingNumericDocValues.advanceExact(AssertingLeafReader.java:757)
        at org.apache.lucene.index.SingletonSortedNumericDocValues.advanceExact(SingletonSortedNumericDocValues.java:62)
        at org.elasticsearch.index.mapper.SortedNumericDocValuesSyntheticFieldLoader$ImmediateDocValuesLoader.advanceToDoc(SortedNumericDocValuesSyntheticFieldLoader.java:142)
        at org.elasticsearch.index.mapper.ObjectMapper$SyntheticSourceFieldLoader$ObjectDocValuesLoader.advanceToDoc(ObjectMapper.java:965)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.write(SourceLoader.java:210)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$SyntheticLeaf.source(SourceLoader.java:181)
        at org.elasticsearch.index.mapper.SourceLoader$Synthetic$LeafWithMetrics.source(SourceLoader.java:146)
        at org.elasticsearch.search.lookup.SyntheticSourceProvider$SyntheticSourceLeafLoader.getSource(SyntheticSourceProvider.java:58)
        at org.elasticsearch.search.lookup.SyntheticSourceProvider.getSource(SyntheticSourceProvider.java:42)
        at org.elasticsearch.xpack.esql.plugin.ReinitializingSourceProvider.getSource(ReinitializingSourceProvider.java:41)
        at org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:40)
        at org.elasticsearch.script.AbstractFieldScript.extractFromSource(AbstractFieldScript.java:107)
        at org.elasticsearch.script.AbstractFieldScript.emitFromSource(AbstractFieldScript.java:127)
        at org.elasticsearch.script.LongFieldScript$1$1.execute(LongFieldScript.java:29)
        at org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:159)
        at org.elasticsearch.index.fielddata.LongScriptDocValues.advanceExact(LongScriptDocValues.java:26)
        at org.elasticsearch.search.MultiValueMode$6.advanceExact(MultiValueMode.java:534)
        at org.elasticsearch.index.fielddata.FieldData$17.advanceExact(FieldData.java:620)
        at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.getValueForDoc(LongComparator.java:80)
        at org.apache.lucene.search.comparators.LongComparator$LongLeafComparator.copy(LongComparator.java:105)
        at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.collectAnyHit(TopFieldCollector.java:124)
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:209)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:305)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:264)
        at org.elasticsearch.compute.lucene.LuceneOperator$LuceneScorer.scoreNextRange(LuceneOperator.java:193)
        at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.collect(LuceneTopNSourceOperator.java:168)
        at org.elasticsearch.compute.lucene.LuceneTopNSourceOperator.getCheckedOutput(LuceneTopNSourceOperator.java:148)
        at org.elasticsearch.compute.lucene.LuceneOperator.getOutput(LuceneOperator.java:118)
        at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
        at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
        at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)
```
@martijnvg martijnvg added the auto-backport Automatically create backport pull requests when merged label Dec 11, 2024
@martijnvg martijnvg changed the title Fix concurrency issue with ReinitializingSourceProvider Fix issues with ReinitializingSourceProvider Dec 11, 2024
@martijnvg martijnvg merged commit e3bddd0 into elastic:main Dec 11, 2024
16 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 11, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.  The read from `perThreadProvider` field could be stale and therefore returning a previous source provider.  Instead the source provider should be returned from `provider` local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

Closes elastic#118238
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.17
8.x

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 11, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.  The read from `perThreadProvider` field could be stale and therefore returning a previous source provider.  Instead the source provider should be returned from `provider` local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

Closes elastic#118238
elasticsearchmachine pushed a commit that referenced this pull request Dec 11, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.  The read from `perThreadProvider` field could be stale and therefore returning a previous source provider.  Instead the source provider should be returned from `provider` local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

Closes #118238
elasticsearchmachine pushed a commit that referenced this pull request Dec 11, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.  The read from `perThreadProvider` field could be stale and therefore returning a previous source provider.  Instead the source provider should be returned from `provider` local variable.

This change also addresses another issue, sometimes current docid goes backwards compared to last seen docid and this causes issue when synthetic source provider is used, as doc values can't advance backwards. This change addresses that by returning a new source provider if backwards docid is detected.

Closes #118238

Co-authored-by: Elastic Machine <[email protected]>
dnhatn added a commit that referenced this pull request Feb 8, 2025
dnhatn added a commit that referenced this pull request Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.17.1 v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] EsqlActionBreakerIT class failing

4 participants