-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] A New GC Strategy for Local Generation #184
Comments
Hey 👋 !! First of all, thanks for the proposal, I agree the current GC strategy is quite confusing and prone to unexpected behavior if you don't configure it properly, so yeah, it would be great if we can improve it. I have a couple of doubts tho, regarding the Currently, the way the size/memory check works is by using a inverse-proportional backoff, because the idea is, the lower size/momory the greater the timeout (if there is yet enough space, we don't want to check the limits that often). On the other hand, if size/memory is reaching the limit, we want the timeout to happen more often, so that memory space can be released faster. For that reason, you can configure what is going to be the Based on this, my other question is, how we will ensure the size or memory will be checked more often when we are reaching the limit and less often when we are far from it? Perhaps as you explain, this sis maybe a totally different strategy? But, how we will ensure the size/memory will be checked as soon as we reach the limit? According to what I see in the chart, the Anyway, Iet me know your thoughts, I'm glad to discuss it so we can improve the current logic, that would be great. Thanks!! |
Thank you for your response. It is glad to discuss such an important strategy on this great project with you. The above proposal has not included the strategy to check size and memory. For the cache generation, there are two backends: If the backend is In my experience, most clusters are small, probably less than 16 nodes, so a checking interval of 3 seconds may be OK. If the cluster become larger, then the checking interval should be specified by the user, maybe 2 minutes or more. Maybe a configuration named In a large cluster, even if a large In my opinion, reducing extra traffic at peak time may be more important than reducing extra traffic at non-peak time. |
Ok, I think I'm understanding the strategy a bit more, but I need to understand some things yet 😅 . Let me first clarify some things that can simplify the problem a lot.
Agreed, even check-in the size/memory every second is OK.
Nebulex uses So, having said that, if I understand correctly, instead of having min and max timeout for size/memory checking ( |
Yes! Exactly! |
Ok, that makes sense. I suggest introducing new config options |
I have not considered keeping the old configurations before, so I will take some time to design it. Maybe I will reply to this tomorrow. |
The configuration proposal: The new version of configuration for Nebulex should be as simple as possible. The user only need to set only one option, for example: Following are the options for new strategy:
If any of This design can be compatible with old options without any modification and also make the configuration for new strategy as simple as possible. If old strategy is deprecated and finally removed, then we can enforce that at least one of the 3 options are required. |
Ok, some thoughts and doubts:
What is the difference between
I wouldn't say ALL options, let's better say: ALL options except |
The internal behavior is exactly the same. My suggestion of this new option is based on following reasons:
|
Ok, that makes sense, but I'd rename |
OK, I agree to rename I am currently on vacation until next weekend, and then I'll implement this new strategy and make a PR. |
Sure, no rush, take your time, thank you very much for this, looking forward to seeing that PR 👍 |
@dongfuye whenever you have some time check this out: https://github.com/nebulex-project/nebulex_local. It is the new local adapter for Nebulex v3, and I think it covers what we have discussed here. Let me know your thoughts about it. |
Great! It is perfectly implemented! |
I am using Nebulex for local cache and the comparing of Cachex and Nebulex in LRU show that Nebulex is 5 times faster than Cachex. It is amazing!
I have found a problem that the current GC strategy is hard to understand. If I only set the
max_size
andgc_cleanup_max_timeout
, then the used size will exceedmax_size
and make me confused.Suppose in a normal application using local cache, it will access individual entries according to a normal distribution. The following image shows the used memory:
When Nebulex startup, the active generation will hold up to
max_size
items and thencheck_size
will create a new generation. After that and beforegc_cleanup_max_timeout
, there are no GC, and Nebulex keep all items, so the kept items size can be as large asupper_limit
. Aftergc_cleanup_max_timeout
, the active generation will hold all the items accessed ingc_cleanup_max_timeout
, andcheck_size
will create another new generation, and drop the oldest generation, then the total size will belower_limit
.So the size of items hold by Nebulex is between
lower_limit
toupper_limit
.Cons:
upper_limit
andlower_limit
gc_cleanup_max_timeout
should be carefully specifyupper_limit
-lower_limit
may be too large, so the memory utilization may be lowAfter deeply investigate the GC strategy, I have designed a better strategy to conquer the above disadvantages. The new strategy is quite simple. It just create a new generation and drop the oldest generation whenever the active generation reach any limit. We can set 3 limits for the active generation:
generation_max_time
: If the active generation exceed this time limit, then a new generation will be created. If the items of our local cache has expire time, then you can setgeneration_max_time
to the expire time.generation_max_size
: If the size of items hold by the active generation exceed this value, then a new generation will be created.generation_max_memory
: If the memory of the active generation exceed this value, then a new generation will be created.You can set only one of these configurations or all of them. If all three are set, then a new generation is created when any limit is reached.
The memory used by this new strategy can easily be calculated as following:
![image](https://user-images.githubusercontent.com/118151011/213101424-d5f447c8-7bc4-474f-9cf9-b3f829b6e551.png)
Pros:
upper_limit
andlower_limit
upper_limit
-lower_limit
is smaller than the current strategy.If you are interested in this strategy, I will implement it.
The text was updated successfully, but these errors were encountered: