Skip to content

[META] Improve Data Freshness for Remote Store Indexes #17776

@ashking94

Description

@ashking94

Please describe the end goal of this project

This META issue tracks the implementation of improvements to data freshness for remote store indexes. The main goal is to minimize the delay between data updates and their appearance in search results.

Key Components

1. Parallel Shard Refreshes

  • Implement shard-level refresh scheduling
  • Develop staggered refresh mechanism across index shards

2. Deterministic Refresh Interval

  • Modify refresh process to account for actual refresh duration
  • Implement intelligent sleep duration calculation

3. Multipart Operations

  • Implement multipart downloads for segments
  • Leverage existing multipart upload support for repository-s3 plugin

4. Segment Management Improvements

  • Decouple segment uploads from local refreshes
  • Decouple segment merges from replication
  • Implement merge policy for prioritizing newer segments

5. Hybrid Block Level Fetch

  • Implement block-level fetch from primary/remote store
  • Develop caching mechanism for fetched blocks
  • Add configuration for data freshness preferences

Required Analysis

  • Performance impact assessment of parallel refreshes
  • Trade-off analysis for proposed solutions
  • Implementation complexity evaluation
  • Backward compatibility assessment
  • Resource utilization measurement

Supporting References

#17657

Issues

Related component

Storage:Remote

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    🆕 New

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions