Replies: 4 comments 6 replies
-
Are there any references to other implementations that support shared storage, how they deisgns and make trade-offs? I think discussions lacks narrativity and references at the moment so it makes me confusd, it should be better to complete the storytelling with more causation and context at the top of disscussion. I simply list some of my questions here:
also blob storage I think, which is Tonbo's user need, I do not mind to talk about any potential solutions here.
how to estimate the reduction? are all of the implementations of kv separation the same in this topic?
are there any design introduction about how large amount? |
Beta Was this translation helpful? Give feedback.
-
Here is an article about Key-Value Separation in LSM Storage Engines. |
Beta Was this translation helpful? Give feedback.
-
directly using Byte (like actively search for parquet's support for blob types. Welcome to add relevant information. |
Beta Was this translation helpful? Give feedback.
-
Repost Slack comment from @KKould 为什么tonbo不适合去实现kv分离tonbo我理解本身的定位是一个低侵入性且强依赖于parquet的存储引擎,他能够带来的好处是能够让用户简单地使用Rust直接生成对应的parquet文件,并且其本身并没有对parquet有较强的限制,用户几乎可以仅仅把tonbo作为业务代码中的数据写入源并使用datafusion以使用或不使用tonbo皆可的形式对本地或s3的parquet进行读取分析 而kv分离是一个非常局限性的feature,以现在最主流的wisckey的实现方案来看,wisckey对迭代读取的性能影响极其大,因为当迭代读取key时,value也需要对应进行随机点读,即使原文利用ssd随机性能强的特点并且结合预读取块来尽可能使随机读取合并,但这仍然会显著影响其性能。 目前业界内近乎所有应用WiscKey的存储引擎都是仅仅面向于极端大kv场景下所设计的
而TerarkDB则比较特殊,有对应的vSSTable对大kv的情况进行特殊的优化(TerarkDB本身支持多种不同的SSTable,这会带来相当的复杂性,比如大kv和小kv混合时,怎样选择对应的SSTable),但这是此类大KV中唯一适合Tonbo的低侵入方案。 如果非可变SSTable的方案,那么Tonbo必定迎来第二次大重构并且读写性能已经不再适合之前所描述的场景,甚至也不适合使用parquet进行实现,因为parquet本身在这里并没有比较好的方案,对其实现较好的KV分离设计非常困难(除非有一列只记录url或者offset这种极其hack,用户读取到block不知所以,并且强行耦合tonbo才能进行parquet读取的方案) 即使是BlobDB的方案选择再构建一个单独的kv分离版本,即使使用了刚才提到使用url列这样丑陋的方案,那么feature同步造成更多的维护成本尤其实现都不一致的情况下 tonbo实现Blob的方案
Loro的Blob类型需求解决方案需求:Loro需要Blob类型对大对象进行存储,所以Tonbo需要Blob来为其服务
|
Beta Was this translation helpful? Give feedback.
-
tonbo needs to support blob storage, so we need to refer to WiscKey: for implementation
What are the benefits and problems that tonbo can bring by implementing WischKey now?
😀
😅
BlobDB
, but if supported in the existing version, it will inevitably reduce performanceValue
will be involved, which may result in reduced sequential read performance.WiscKey
requires modifying a large amount of code and requires thevLog
component. Is the benefit brought by this feature valuable?vLog
manage files onS3
? The randomness of data location may cause files onS3
to be modified frequently but only a small number of values are modified for each file.ref
Beta Was this translation helpful? Give feedback.
All reactions