Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-8591: Configurable data source for offloaded messages #2022

Closed
4 tasks
sijie opened this issue Jan 18, 2021 · 1 comment
Closed
4 tasks

ISSUE-8591: Configurable data source for offloaded messages #2022

sijie opened this issue Jan 18, 2021 · 1 comment

Comments

@sijie
Copy link
Member

sijie commented Jan 18, 2021

Original Issue: apache#8591


Currently, if the data in pulsar was offloaded to the second storage layer, data can still exists in bookkeeper for a period of time, but the client will directly read data from the second layer.

This may lead to several problems:

  • Read from second layer have different performance characteristics, which may lead wrong estimate from users if they didn't know which layer they are reading.
  • The second layer may be managed by another team rather than Pulsar management team(for example, a independent HDFS management team), they may have independent quota or authority policy to users.
  • The second layer storage can be infinite in theory, if user set cursor to an error time in accident, it will cause a lot of resource waste.

So it's better to make data source configurable if data exists in both layer.

Maybe the below options are enough:

  • first layer only
  • first layer first
  • second layer only
  • second layer first

We can make second layer fist as the default value, which will result to the same behavior with current version.

Todo list:

  • publish PIP
  • add configuration
  • implement & test
  • doc
@codelipenghui
Copy link

close via apache#8717

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants