feat(storage): improve storage memory #8847
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
In the previous design, there would be a complete backup of the storage data items in the memory to ensure that the incremental data could be updated correctly, and then look for an opportunity to synchronize to the disk. Even if the synchronization failed, no rollback operation was needed.
However, this would lead to a long-term occupation of memory, increasing the overall memory overhead by 20%~30%. But if these memories were released directly, a large number of files would need to be read again in the next update for reasonable packing. Therefore, the concept of generations (similar to the mem cache of webpack) was introduced.
Each storage update would increase the generation, which usually occurred at the end of a compilation. For each data item, the generation in which it was produced would be retained, and the generation of the data pack would be the maximum value of the generations of all the data in this pack. After the data in a data pack changed, the data would be poured out of the pack. At this time, for the poured data and the changed data, since they must be written to the file, they could be sorted by generation at this time, and the data generated earlier would be placed in a pack together, so that they were less likely to be modified.
For the data packs that were not modified in the memory and were generated recently (determined by
fresh_generation
), they could also be poured and added to the optimization to improve the effect. If the generation of the data packs was quite different from the current one (determined byrelease_generation
), it meant that this data pack had not been modified recently and could be released from the memory. But only the contents of the data items in these packs would be released, not the keys of the data items in the packs. When the data item in the package was modified, the file would need to be read again to obtain the contents of the pack again, and this would trigger file reading. However, since the generation was only detected and the data was released after the file was clearly written, reading the file was usually reliable.This pull request introduces several changes to the persistent storage and pack management system to support generation tracking and improve asynchronous operations. The most important changes include adding generation tracking to various structures, updating asynchronous handling in
ScopeManager
, and modifying the pack strategy interfaces.Generation Tracking Enhancements:
crates/rspack_core/src/cache/persistent/storage/mod.rs
: Addedfresh_generation
andrelease_generation
options tocreate_storage
.crates/rspack_storage/src/pack/data/meta.rs
: Addedgeneration
field toPackFileMeta
andScopeMeta
structs. [1] [2]crates/rspack_storage/src/pack/data/pack.rs
: IntroducedPackGenerations
type and updatedPack
struct to includegenerations
. Addedrelease
andis_released
methods toPackContentsState
. [1] [2] [3] [4]Asynchronous Handling:
crates/rspack_storage/src/pack/manager/mod.rs
: Convertedupdate_scopes
andsave_scopes
functions to asynchronous. ModifiedScopeManager
to awaitupdate_scopes
. [1] [2] [3] [4] [5] [6]Pack Strategy Interface Updates:
crates/rspack_storage/src/pack/strategy/mod.rs
: UpdatedPackReadStrategy
andPackWriteStrategy
interfaces to handle generations. Addedoptimize_scope
method toScopeWriteStrategy
. [1] [2]crates/rspack_storage/src/pack/strategy/split/mod.rs
: ModifiedSplitPackStrategy
to includefresh_generation
andrelease_generation
. Updatedclean_unused
toclean
. [1] [2]crates/rspack_storage/src/pack/strategy/split/read_pack.rs
: Enhancedread_pack_contents
to support reading generations. [1] [2] [3] [4]These changes collectively enhance the functionality and performance of the storage system by introducing generation tracking and improving asynchronous operations.
Checklist