Skip to content

Conversation

@varunbharadwaj
Copy link
Contributor

@varunbharadwaj varunbharadwaj commented Oct 25, 2025

Description

This PR refactors the pull-based indexing flow to support message mappers. A default message mapper is created to retain current behavior. Alternatively, a raw payload mapper is added to support ingesting from any given streaming source.

In the raw payload mode, the Kafka offset / Kinesis sequence number will be used as the document ID. This will ensure duplicate documents are not created on rewind/replay. Document versioning will not be supported, and only an eventually consistent view of documents can be expected on message replays (as older message can potentially overwrite newer one on replay, until the lag is caught up). This will be an append-only indexing mode.

This model should allow the flexibility to support other formats in the future, when needed.

Related Issues

Resolves #19548

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Oct 25, 2025
@varunbharadwaj varunbharadwaj force-pushed the vb/mappersupport branch 2 times, most recently from 3fab3ae to 72773f3 Compare October 25, 2025 05:10
@varunbharadwaj varunbharadwaj changed the title [Pull-based Ingestion] Support message mappers to support different input formats [Pull-based Ingestion] Support message mappers to support different input formats and raw payloads Oct 25, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 72773f3: TIMEOUT

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 72773f3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 72773f3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 72773f3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 85dae2e: SUCCESS

@github-actions
Copy link
Contributor

❌ Gradle check result for e9e9f3f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for e9e9f3f: SUCCESS

@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 92.98246% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.21%. Comparing base (d5aa830) to head (881989d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...g/opensearch/cluster/metadata/IngestionSource.java 75.00% 0 Missing and 2 partials ⚠️
.../pollingingest/mappers/IngestionMessageMapper.java 88.23% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19765      +/-   ##
============================================
- Coverage     73.23%   73.21%   -0.03%     
+ Complexity    71544    71539       -5     
============================================
  Files          5786     5789       +3     
  Lines        327013   327056      +43     
  Branches      47284    47288       +4     
============================================
- Hits         239498   239462      -36     
- Misses        68245    68291      +46     
- Partials      19270    19303      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

❌ Gradle check result for 6591f14: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 6591f14: SUCCESS

@varunbharadwaj
Copy link
Contributor Author

Looks like the disabled assertions has something to do with the new unit test file (guessing related to SuppressWarnings). I have currently deleted the file for testing - and hence this PR is not ready for review/merging.
Will update the file and send for review.
cc @andrross

@github-actions
Copy link
Contributor

❌ Gradle check result for 8490b5f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 83797ef: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for eb62bc7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for eb62bc7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for eb62bc7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@varunbharadwaj
Copy link
Contributor Author

Looks like the problem was another test disabling assertions. Updating it seems to have solved the problem. Thanks @andrross for finding the culprit test!

@github-actions
Copy link
Contributor

❌ Gradle check result for 881989d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 881989d: SUCCESS

@andrross andrross merged commit 6966267 into opensearch-project:main Nov 14, 2025
32 of 34 checks passed
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Dec 5, 2025
…nput formats and raw payloads (opensearch-project#19765)

* refactor pull-based ingestion to support message mappers
* remove DefaultAssertionStatus false from segment warmer test

Signed-off-by: Varun Bharadwaj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Make pull-based ingestion work with OTel collector out of the box

3 participants