ISSUE-8591: Configurable data source for offloaded messages #2022

sijie · 2021-01-18T01:36:13Z

Currently, if the data in pulsar was offloaded to the second storage layer, data can still exists in bookkeeper for a period of time, but the client will directly read data from the second layer.

This may lead to several problems:

Read from second layer have different performance characteristics, which may lead wrong estimate from users if they didn't know which layer they are reading.
The second layer may be managed by another team rather than Pulsar management team(for example, a independent HDFS management team), they may have independent quota or authority policy to users.
The second layer storage can be infinite in theory, if user set cursor to an error time in accident, it will cause a lot of resource waste.

So it's better to make data source configurable if data exists in both layer.

Maybe the below options are enough:

first layer only
first layer first
second layer only
second layer first

We can make second layer fist as the default value, which will result to the same behavior with current version.

Todo list:

publish PIP
add configuration
implement & test
doc

The text was updated successfully, but these errors were encountered:

codelipenghui · 2021-01-19T03:21:40Z

close via apache#8717

sijie added component/tieredstorage triage/week-49 type/feature labels Jan 18, 2021

codelipenghui closed this as completed Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISSUE-8591: Configurable data source for offloaded messages #2022

ISSUE-8591: Configurable data source for offloaded messages #2022

sijie commented Jan 18, 2021

codelipenghui commented Jan 19, 2021

ISSUE-8591: Configurable data source for offloaded messages #2022

ISSUE-8591: Configurable data source for offloaded messages #2022

Comments

sijie commented Jan 18, 2021

codelipenghui commented Jan 19, 2021