-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Add option to evict keys LRU from the sharded redis tables #3499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
| // The task was not in the GCS task table. It must therefore be in the | ||
| // lineage cache. | ||
| RAY_CHECK(lineage_cache_.ContainsTask(task_id)); | ||
| RAY_CHECK(lineage_cache_.ContainsTask(task_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the narrow waist at which we access evicted lineage, though there could be other sites I'm missing.
| handler_warning_timeout_ms_(100), | ||
| heartbeat_timeout_milliseconds_(100), | ||
| num_heartbeats_timeout_(100), | ||
| num_heartbeats_timeout_(300), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising this to 30s since 10s is too easy to hit with random pauses (e.g., forking process takes a long time, or the kernel stalls compacting hugepages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. It's possible that some of the tests are currently waiting for the full 10s, in which case that will become really slow. If that's the case and we observe that, then we can configure this parameter specifically in those tests.
|
Test FAILed. |
|
Checked and Ape-X seems to be stable at a aggressive 500MB redis memory limit. |
|
Test FAILed. |
|
Test FAILed. |
pcmoritz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working as intended for me.
|
Test FAILed. |
|
|
||
| # Check that we get warning messages for both raylets. | ||
| wait_for_errors(ray_constants.REMOVED_NODE_ERROR, 2, timeout=20) | ||
| wait_for_errors(ray_constants.REMOVED_NODE_ERROR, 2, timeout=40) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah for this test, can we actually do
internal_config=json.dumps({"num_heartbeats_timeout": 40}) or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
python/ray/rllib/train.py
Outdated
| default=None, | ||
| type=int, | ||
| help="--redis-max-memory to pass to Ray." | ||
| " This only has an affect in local mode.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ericl What does "local mode" mean? Does this work in multi-node mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just means that when using a cluster, you need to pass --redis-max-memory to ray start and not train.py
|
For now the memory flush policy do not support multiple redis shards. I notice that this PR can limit the used memory of each redis shard. If this PR is merged, does it mean we do not need redis memory flush anymore to some extent? |
|
@llan-ml that's right, this should supercede redis flushing. I updated the doc page to to remove the old flushing documentation. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
What do these changes do?
This adds an experimental
redis_max_memoryflag that bounds the redis memory used per data shard. Note that this only applies to the non-primary shards which store the majority of the task and object metadata. Hence, stuff like client metadata is never evicted.Since profiling data has a nested structure and cannot be LRU evicted, also make profiling controlled by
collect_profiling_data, and disable it whenredis_max_memoryis set.Analysis of redis's approximate LRU eviction algorithm: https://github.com/antirez/redis/blob/a2131f907a752e62c78ea6bb719daf9fe2f91402/src/evict.c#L118
We use
maxmemory_samples 10. There is also a persisted eviction pool of 16 entries. This effectively gives us 26 tries per eviction to hit a old key (lower bound). Let's assume the most recent 30% of keys are required for stable operation, and we evict at 10000 QPS. Then:So a lower bound on reliability with approx LRU eviction is 99% per year. The actual reliability will be much higher of course since it's unlikely we need even 30% of the metadata, and also the eviction pool is persisted over time.
TODO:
Related issue number
#3306
#954
#3452