Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] A New GC Strategy for Local Generation #184

Open
dongfuye opened this issue Jan 18, 2023 · 14 comments
Open

[Proposal] A New GC Strategy for Local Generation #184

dongfuye opened this issue Jan 18, 2023 · 14 comments

Comments

@dongfuye
Copy link
Contributor

dongfuye commented Jan 18, 2023

I am using Nebulex for local cache and the comparing of Cachex and Nebulex in LRU show that Nebulex is 5 times faster than Cachex. It is amazing!

I have found a problem that the current GC strategy is hard to understand. If I only set the max_size and gc_cleanup_max_timeout, then the used size will exceed max_size and make me confused.

Suppose in a normal application using local cache, it will access individual entries according to a normal distribution. The following image shows the used memory:

image

When Nebulex startup, the active generation will hold up to max_size items and then check_size will create a new generation. After that and before gc_cleanup_max_timeout, there are no GC, and Nebulex keep all items, so the kept items size can be as large as upper_limit. After gc_cleanup_max_timeout, the active generation will hold all the items accessed in gc_cleanup_max_timeout, and check_size will create another new generation, and drop the oldest generation, then the total size will be lower_limit.

So the size of items hold by Nebulex is between lower_limit to upper_limit.

Cons:

  • It is difficult to understand and calculate the upper_limit and lower_limit
  • Both max_size and gc_cleanup_max_timeout should be carefully specify
  • upper_limit - lower_limit may be too large, so the memory utilization may be low

After deeply investigate the GC strategy, I have designed a better strategy to conquer the above disadvantages. The new strategy is quite simple. It just create a new generation and drop the oldest generation whenever the active generation reach any limit. We can set 3 limits for the active generation:

  • generation_max_time: If the active generation exceed this time limit, then a new generation will be created. If the items of our local cache has expire time, then you can set generation_max_time to the expire time.
  • generation_max_size: If the size of items hold by the active generation exceed this value, then a new generation will be created.
  • generation_max_memory: If the memory of the active generation exceed this value, then a new generation will be created.

You can set only one of these configurations or all of them. If all three are set, then a new generation is created when any limit is reached.

The memory used by this new strategy can easily be calculated as following:
image

Pros:

  • It is easy to understand the configuration
  • It is easy to calculate the upper_limit and lower_limit
  • 3 limits can be set individually, no complicated relation between them.
  • upper_limit - lower_limit is smaller than the current strategy.
  • It is easy to implement and maintain

If you are interested in this strategy, I will implement it.

@cabol
Copy link
Owner

cabol commented Jan 19, 2023

Hey 👋 !! First of all, thanks for the proposal, I agree the current GC strategy is quite confusing and prone to unexpected behavior if you don't configure it properly, so yeah, it would be great if we can improve it.

I have a couple of doubts tho, regarding the generation_max_time, you mention "If the items of our local cache has expire time, then you can set generation_max_time to the expire time", but in the cache the entries may have different expiration time, so I think that makes it very complicated, perhaps I misunderstood it?

Currently, the way the size/memory check works is by using a inverse-proportional backoff, because the idea is, the lower size/momory the greater the timeout (if there is yet enough space, we don't want to check the limits that often). On the other hand, if size/memory is reaching the limit, we want the timeout to happen more often, so that memory space can be released faster. For that reason, you can configure what is going to be the gc_cleanup_min_timeout and gc_cleanup_max_timeout, with those values along with the max_size and allocated_memory, an inverse linear function is built to calculate the next timeout, this is basically how the GC works for the size/memory check. Something like:

gc_size_memory_check

Based on this, my other question is, how we will ensure the size or memory will be checked more often when we are reaching the limit and less often when we are far from it? Perhaps as you explain, this sis maybe a totally different strategy? But, how we will ensure the size/memory will be checked as soon as we reach the limit? According to what I see in the chart, the max_time and the 2 * max_time is like the analogous to the gc_cleanup_min_timeout and gc_cleanup_max_timeout don't you think?

Anyway, Iet me know your thoughts, I'm glad to discuss it so we can improve the current logic, that would be great. Thanks!!

@dongfuye
Copy link
Contributor Author

Thank you for your response. It is glad to discuss such an important strategy on this great project with you.

The above proposal has not included the strategy to check size and memory.

For the cache generation, there are two backends: ets and shards. The cost of getting memory and size for ets is quite cheap, so even checking them every seconds is OK.

If the backend is shards, the cost of checking them will go expensive as cluster become large. For a cluster of 16 nodes, each checking in each node will result in 15 remote calls, and each checking of the cluster overall will result in 16*15 = 240 remote calls. For a cluster of 100 nodes, each checking of the cluster overall will result in 100 * 99 ~= 10 K remote calls.

In my experience, most clusters are small, probably less than 16 nodes, so a checking interval of 3 seconds may be OK. If the cluster become larger, then the checking interval should be specified by the user, maybe 2 minutes or more.

Maybe a configuration named gc_check_interval can be introduced. The default value is 3 seconds, and the user only need to set it for large clusters.

In a large cluster, even if a large gc_check_interval is set, there may be problems. If the items distributed uniformly across the shards and are accessed uniformly, then all nodes may reach the limit simultaneous and will do checking at the same time, causing tremendous remote calls. So in the implementation of checking size and memory, randomizing the checking time is important.

In my opinion, reducing extra traffic at peak time may be more important than reducing extra traffic at non-peak time.

@cabol
Copy link
Owner

cabol commented Jan 19, 2023

Ok, I think I'm understanding the strategy a bit more, but I need to understand some things yet 😅 . Let me first clarify some things that can simplify the problem a lot.

For the cache generation, there are two backends: ets and shards. The cost of getting memory and size for ets is quite cheap, so even checking them every seconds is OK.

Agreed, even check-in the size/memory every second is OK.

If the backend is shards, the cost of checking them will go expensive as cluster become large. For a cluster of 16 nodes, each checking in each node will result in 15 remote calls, and each checking of the cluster overall will result in 16*15 = 240 remote calls. For a cluster of 100 nodes, each checking of the cluster overall will result in 100 * 99 ~= 10 K remote calls.

Nebulex uses :shards but only locally, so there are no network calls at all. Therefore, the cost of checking the size here is proportional to the number of partitions (ETS partitions), but no network calls, so I wouldn't be too worried about it. I'm mentioning this because it may simplify the problem, we shouldn't think about remote calls because this happens locally, it is for the local adapter. Now, if you have a partitioned cache with the partitioned adapter, there are network calls, but not related to the GC. The GC works locally with its local adapter, which means, in a partitioned cache, each partition has its own GC, they may run at the same time, but they are independent and without remote calls. Let me know if it is clear.

So, having said that, if I understand correctly, instead of having min and max timeout for size/memory checking (gc_cleanup_min_timeout and gc_cleanup_max_timeout) and calculating the actual cleanup timeout using an inverse linear function (which I agree is confusing and maybe complex), the strategy consists of having a fixed timeout (let's call it gc_cleanup_timeout) that will check whether the size and/or memory have exceeded the limit but only for the current/newest generation and if so, then a new generation is created, is that right?

@dongfuye
Copy link
Contributor Author

the strategy consists of having a fixed timeout (let's call it gc_cleanup_timeout) that will check whether the size and/or memory have exceeded the limit but only for the current/newest generation and if so, then a new generation is created, is that right?

Yes! Exactly!

@cabol
Copy link
Owner

cabol commented Jan 20, 2023

Ok, that makes sense. I suggest introducing new config options :generation_cleanup_timeout and :generation_max_size (and maybe also :generation_allocated_memory), and when :generation_cleanup_timeout is set, it overrides :gc_cleanup_min_timeout and :gc_cleanup_max_timeout. In that way, we don't remove existing functionality causing breaking changes, instead, by adding new functionality we can get feedback and maybe later deprecate it. It would be nice to take measurements to compare both strategies, especially in terms of memory efficiency. Overall, I like the idea, as you said, it is simple, just focus on the current/newest generation, which means, we configure the max size (or max memory) but per generation, which makes this easier. Let me know what you think.

@dongfuye
Copy link
Contributor Author

I have not considered keeping the old configurations before, so I will take some time to design it. Maybe I will reply to this tomorrow.

@dongfuye
Copy link
Contributor Author

The configuration proposal:

The new version of configuration for Nebulex should be as simple as possible. The user only need to set only one option, for example: generation_max_size.

Following are the options for new strategy:

  • generation_max_size(will be implemented, the test need this): If the size of items hold by the newest generation exceed this value, then a new generation will be created. No default value.
  • generation_allocated_memory(will be implemented, my work need this): If the memory of the newest generation exceed this value, then a new generation will be created. No default value.
  • generation_max_time(may be implemented later): If the newest generation exceed this time limit, then a new generation will be created. No default value.
  • generation_cleanup_timeout: The interval for checking whether the newest generation has been exceed. Default value is 3 seconds.

If any of generation_max_time, generation_max_size and generation_allocated_memory is set, then the new strategy take effect and ignore all the old options. If none of these 3 options is set, then go back to the old strategy as normal.

This design can be compatible with old options without any modification and also make the configuration for new strategy as simple as possible. If old strategy is deprecated and finally removed, then we can enforce that at least one of the 3 options are required.

@cabol
Copy link
Owner

cabol commented Jan 21, 2023

Ok, some thoughts and doubts:

generation_max_time(may be implemented later): If the newest generation exceed this time limit, then a new generation will be created. No default value.

What is the difference between :generation_max_time and the current :gc_interval? As far as I understand and based on your explanation "If the newest generation exceed this time limit, then a new generation will be created", it does the same, so I still don't get that part.

If any of generation_max_time, generation_max_size and generation_allocated_memory is set, then the new strategy take effect and ignore all the old options. If none of these 3 options is set, then go back to the old strategy as normal.

I wouldn't say ALL options, let's better say: ALL options except :gc_interval, because this one is fundamental for the GC, and I don't see the difference with the option :generation_max_time you described, am I missing something?

@dongfuye
Copy link
Contributor Author

The internal behavior is exactly the same. My suggestion of this new option is based on following reasons:

  • For the new strategy, using the name : generation_max_time may be easier to understand. Other options for new strategy are generation_* and a new option named : generation_max_time can follow the convention.
  • Existing codes have already set :gc_interval to 12 hours by default. This value may not be appropriate for some applications, so let it default to nil in the new strategy may be a good choice.
  • It may be confusing to users to understand that :gc_interval is affecting both old and new strategy.

@cabol
Copy link
Owner

cabol commented Jan 22, 2023

Ok, that makes sense, but I'd rename :generation_max_time to :generation_start_timeout, to make it even clearer (indicating,g this is the time, in milliseconds, that the GC waits before starting/creating a new generation and delete the oldest one).

@dongfuye
Copy link
Contributor Author

OK, I agree to rename :generation_max_time to :generation_start_timeout

I am currently on vacation until next weekend, and then I'll implement this new strategy and make a PR.

@cabol
Copy link
Owner

cabol commented Jan 22, 2023

I am currently on vacation until next weekend, and then I'll implement this new strategy and make a PR.

Sure, no rush, take your time, thank you very much for this, looking forward to seeing that PR 👍

@cabol
Copy link
Owner

cabol commented Jun 16, 2024

@dongfuye whenever you have some time check this out: https://github.com/nebulex-project/nebulex_local. It is the new local adapter for Nebulex v3, and I think it covers what we have discussed here. Let me know your thoughts about it.

@dongfuye
Copy link
Contributor Author

whenever you have some time check this out: https://github.com/nebulex-project/nebulex_local. It is the new local adapter for Nebulex v3, and I think it covers what we have discussed here. Let me know your thoughts about it.

Great! It is perfectly implemented!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants