Enhancing the Scalability of Hummock LSM #20640

Li0k · 2025-02-27T08:02:39Z

Meta Scalability Issues

Single-Core Bottleneck: Meta has a single - core bottleneck that cannot be alleviated by scaling.Global Lock Problem: There is a global lock in version. In fact, multiple control groups (cg) do not conflict. The non-conflicting feature of LSM multi-cg submission is not utilized, and a global lock is not necessary.
CPU Overhead: Meta has a high CPU overhead. There is Vec sort overhead in apply_compact_ssts and HashSet overhead in check_sst_ids_exist. Meta can adopt a more reasonable structure, such as list + map index, to avoid global sorting when applying version delta each time, so as to optimize CPU utilization. High CPU usage is also a bottleneck for compaction efficiency.

LSM Scalability Issues

Single LSM Size Limit: The size of a single LSM tree is limited. When it exceeds the default configured size, the score calculation will be disordered.
Subtree Splitting Granularity: The granularity of LSM subtree splitting is at the table level and cannot be split at a smaller granularity. As a result, the resource utilization of a single LSM is lower than that of multiple LSMs, and the competition to the base level is more severe. From a personal perspective, LSM should not have a size limit. The score-calculating algorithm may need further research. For large-capacity single-table LSMs, splitting subtrees is a better solution. In the short term, it is feasible to support subtree splitting at the vnode level, but note that this cannot reduce the overall compactor utilization. Unreasonable configurations and score-calculating algorithms may expand the jitter of write amplification.

Large-Scale Ingestion Optimization

Adequate Resource Assurance: Sufficient resources are needed to ensure the compaction speed.
New Solution Exploration: Consider whether to adopt a new solution to bypass the write amplification problem of LSM, such as researching a better batch ingestion method, adopting the Append Only solution, and delaying compaction.

Li0k added the type/feature label Feb 27, 2025

github-actions bot added this to the release-2.3 milestone Feb 27, 2025

Li0k added type/enhancement Improvements to existing implementation. storage component/storage Storage and removed storage labels Feb 27, 2025

Li0k modified the milestones: release-2.3, future-release-2.4 Feb 27, 2025

Provide feedback