-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fix issues with ReinitializingSourceProvider #118370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
bf75a92
5e046ce
66e09b6
f8aee02
ead8717
653ed40
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| pr: 118370 | ||
| summary: Fix concurrency issue with `ReinitializingSourceProvider` | ||
| area: Mapping | ||
| type: bug | ||
| issues: | ||
| - 118238 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,13 +15,23 @@ | |
| import java.util.function.Supplier; | ||
|
|
||
| /** | ||
| * This is a workaround for when compute engine executes concurrently with data partitioning by docid. | ||
| * This class exists as a workaround for using SourceProvider in the compute engine. | ||
| * <p> | ||
| * The main issue is when compute engine executes concurrently with data partitioning by docid (inter segment parallelization). | ||
| * A {@link SourceProvider} can only be used by a single thread and this wrapping source provider ensures that each thread uses | ||
| * its own {@link SourceProvider}. | ||
| * <p> | ||
| * Additionally, this source provider protects against going backwards, which the synthetic source provider can't handle. | ||
| */ | ||
| final class ReinitializingSourceProvider implements SourceProvider { | ||
|
|
||
| private PerThreadSourceProvider perThreadProvider; | ||
| private final Supplier<SourceProvider> sourceProviderFactory; | ||
|
|
||
| // Keeping track of last seen doc and if current doc is before last seen doc then source provider is initialized: | ||
| // (when source mode is synthetic then _source is read from doc values and doc values don't support going backwards) | ||
| private int lastSeenDocId; | ||
|
|
||
| ReinitializingSourceProvider(Supplier<SourceProvider> sourceProviderFactory) { | ||
| this.sourceProviderFactory = sourceProviderFactory; | ||
| } | ||
|
|
@@ -30,11 +40,12 @@ final class ReinitializingSourceProvider implements SourceProvider { | |
| public Source getSource(LeafReaderContext ctx, int doc) throws IOException { | ||
| var currentThread = Thread.currentThread(); | ||
| PerThreadSourceProvider provider = perThreadProvider; | ||
| if (provider == null || provider.creatingThread != currentThread) { | ||
| if (provider == null || provider.creatingThread != currentThread || doc < lastSeenDocId) { | ||
| provider = new PerThreadSourceProvider(sourceProviderFactory.get(), currentThread); | ||
| this.perThreadProvider = provider; | ||
| } | ||
| return perThreadProvider.source.getSource(ctx, doc); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should use
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did try this, but then I ran into this assertion failure: So then I went with this approach.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But thinking more about this, just using the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I pushed: 653ed40 The assertion that tripped here is because of another reason. The linked test failure failed because different threads using the same stored field readers or doc values instances (for the latter case, I observed this locally). This assertion failure here is because we go backwards with the current docid. I added protection against that in this commit, which reinitializes source provider if last seen docid is lower than the current docid.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was able to reproduce this failure by running: Without the commit this fails, with the commit the executes successfully. |
||
| lastSeenDocId = doc; | ||
| return provider.source.getSource(ctx, doc); | ||
| } | ||
|
|
||
| private record PerThreadSourceProvider(SourceProvider source, Thread creatingThread) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test adjustments were required to make it much more likely that the test failure as was reported in #118238 would reproduce locally. Without the fix to
ReinitializingSourceProviderthis test should fail after a few iterations, with the fix toReinitializingSourceProviderthis test failure reproduces.