Skip to content

Conversation

@msfroh
Copy link
Contributor

@msfroh msfroh commented Jun 5, 2025

Description

The copyFrom method opens a source IndexInput, a destination IndexOutput, and calls copyBytes on the output, passing the input. For a RemoteDirectory, the output may be the remote store, which may try to parallelize the copyBytes implementation by doing a multipart upload across multiple threads.

Lucene's default copyFrom assumes the copyBytes always runs on the current thread, which means that it can use IOContext.READONCE when opening the source file. Since we can't guarantee that for RemoteDirectory, we must override the Lucene implementation to use the "true" IOContext.

It would arguably be a good idea to force the IndexInput context to be IOContext.DEFAULT, but there is an invariant that assumes the SegmentInfos file is always read with IOContext.READONCE. That should generally be fine, since the SegmentInfos file should never trigger a multipart upload (since it's tiny).

Related Issues

Resolves #15902

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added bug Something isn't working Storage Issues and PRs relating to data and metadata storage Storage:Remote labels Jun 5, 2025
@msfroh msfroh force-pushed the iocontext_default_in_remotedirectory_copyfrom branch from 1ad097b to a0f175e Compare June 5, 2025 16:27
@msfroh msfroh changed the title Always use IOContext.DEFAULT for src file in RemoteDirectory's copyFrom Don't force IOContext.READONCE for src file in RemoteDirectory's copyFrom Jun 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2025

❌ Gradle check result for a0f175e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh msfroh force-pushed the iocontext_default_in_remotedirectory_copyfrom branch from a0f175e to f4bc8ff Compare June 5, 2025 17:10
@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2025

❌ Gradle check result for f4bc8ff: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh msfroh force-pushed the iocontext_default_in_remotedirectory_copyfrom branch 2 times, most recently from 8a67d47 to e70cc01 Compare June 5, 2025 18:26
@msfroh
Copy link
Contributor Author

msfroh commented Jun 5, 2025

@sachinpkale -- I'd appreciate your eyes on this. I tweaked your fix from #17502 to go back to using READONCE for the segments file, which meant that I needed to disable async uploads for that file.

I'm not sure if that has any performance implications for the s3-repository implementation, since it will end up synchronously uploading that one file.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2025

❌ Gradle check result for e70cc01: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

The copyFrom method opens a source IndexInput, a destination
IndexOutput, and calls copyBytes on the output, passing the input. For
a RemoteDirectory, the output may be the remote store, which may try to
parallelize the copyBytes implementation by doing a multipart upload
across multiple threads.

Lucene's default copyFrom assumes the copyBytes always runs on the
current thread, which means that it can use IOContext.READONCE when
opening the source file. Since we can't guarantee that for
RemoteDirectory, we must override the Lucene implementation to use
the "true" IOContext.

It would arguably be a good idea to force the IndexInput context to
be IOContext.DEFAULT, but there is an invariant that assumes the
SegmentInfos file is always read with IOContext.READONCE. That
should generally be fine, since the SegmentInfos file should never
trigger a multipart upload (since it's tiny).

Signed-off-by: Michael Froh <[email protected]>
@msfroh msfroh force-pushed the iocontext_default_in_remotedirectory_copyfrom branch from e70cc01 to 931edf9 Compare June 5, 2025 19:58
@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2025

❌ Gradle check result for 931edf9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

);
assertBusy(() -> assertEquals(0, refreshCountLatch.getCount()));
assertBusy(() -> assertEquals(1, successLatch.getCount()));
assertBusy(() -> assertEquals(0, successLatch.getCount()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vikasvb90 -- I think this was one of your tests. I don't entirely understand why my change caused the value to change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the assertion below that uploads failed > 1 is failing as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @linuxpi to look at the tests. @linuxpi is the author. But please take a look at my comment below if this is really needed for remote store.

try {
deleteFile(dest);
} catch (IOException e) {
// Ignore the exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we log the exception, so that we know there could be some leftover file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyFrom method in Lucene normally swallows this with IOUtils.deleteFilesIgnoringExceptions.

Unfortunately, OpenSearch uses forbiddenApis to prevent use of Lucene's IOUtils.

Copy link
Contributor

@vikasvb90 vikasvb90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't parallelise copying bytes from a single IndexInput. Here's the code pointer. Two reasons

  1. IndexInput is a stateful object just like other InputStream impls and therefore, we can't read bytes from multiple positions in parallel in a singe stream.
  2. We need the ability to track each set of read bytes by a stream separately for us to be able to compute individual checksums.

The way we do it today is create a new IndexInput for each part of file upload using a supplier passed to RemoteTranserContainer and track it independently and then process all later for checksum verification.
Therefore, we may not need to parallelise within a single IndexInput.

@msfroh
Copy link
Contributor Author

msfroh commented Jun 11, 2025

Thanks @vikasvb90 ! That makes a lot of sense.

It occurs to me that OCI also must not be trying to do a parallel multipart upload. As I was jumping back and forth between the OpenSearch code, the OCI repository code, and the OCI storage client code, I assumed we were going into this if branch: https://github.com/oracle/oci-java-sdk/blob/master/bmc-objectstorage/bmc-objectstorage-extensions/src/main/java/com/oracle/bmc/objectstorage/transfer/UploadManager.java#L187. But that condition must be false, because even though the OCI repository client has set uploadConfiguration.isAllowParallelUploads(), the chunkCreator.supportsParallelReads() returns false, precisely because the InputStream doesn't allow parallel reads from multiple positions.

So, the remaining cause for the WrongThreadException is that the OCI storage client still hands the single-threaded upload off to a new thread created here. If it used the current thread instead, I think we'd be fine.

@msfroh
Copy link
Contributor Author

msfroh commented Jun 11, 2025

That said, @vikasvb90, @linuxpi, and @sachinpkale -- do you see any risks or significant downsides with using IOContext.DEFAULT for these copyFrom calls?

While some remote store implementations (S3, Azure, and GCS) happen to execute the PUT request on the same thread that invoked copyFrom, at least one implementation (OCI) doesn't. While IOContext.READONCE may perform a little bit better when you're sure that the remote store impl keeps things on the current thread, do we want to assume that's true for all remote store implementations?

Maybe we should add a method to BlobContainer so that it the impl can say whether or not it promises to keep IO on the current thread? For the ones that do, we can use IOContext.READONCE and for ones that don't we can use IOContext.DEFAULT.

@vikasvb90
Copy link
Contributor

Maybe we should add a method to BlobContainer so that it the impl can say whether or not it promises to keep IO on the current thread? For the ones that do, we can use IOContext.READONCE and for ones that don't we can use IOContext.DEFAULT.

Yes, it looks like this is the only option to make multi-part upload work in this client with minimal set of changes.

@vikasvb90
Copy link
Contributor

But I think a better approach for sequential reads would be to use the same thread. This is also how it was done in the legacy multi-part code in repository-s3 plugin here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh I tested the changes with oci repository plugin and the condition to have READONCE for segments doesn't work. We still are facing same WrongThreadExcpetion in case of put requests. I was able to run without errors when IOContext is DEFAULT.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the whole fix doesnt works. Only if I pull the RemoteDirectory changes it works.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lucene skip-changelog stalled Issues that have stalled Storage:Remote Storage Issues and PRs relating to data and metadata storage

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[BUG] Segment-based replication / Remote Store is not compatible with Lucene 9.12 / 10

6 participants