Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make transaction status service multi-threaded. #4032

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fkouteib
Copy link

Problem

As part of an investigation into Agave OOM issues in internal private cluster tests, I found that TSS receiver channel would get severely backed up (80k+ pending msg) when the cluster is running at 40k TPS sustained (bench-TPS wkld; 80-20 FD to Agave node ratio). This would cause slow down across the system and build up memory usage until the node OOMs (crashed agave 256 GB node, and Agave tile on FD 512 GB node in my tests). This issue reproduces more prominently when running with '--enable-rpc-transaction-history' and '--enable-extended-tx-metadata-storage' enabled.

Summary of Changes

  • Make transaction status receiver multi-threaded running 4 worker threads.
  • With this change, the queue can get from 1k to 5k pending messages.

Original issue:

FD node failures are agave tile oom'ing.
Screenshot 2024-12-09 at 21 46 32

Improved state:

tiv1 and tiv2 are agave nodes running the fix. Other nodes running same FD code as before.
Screenshot 2024-12-09 at 21 27 03

original code (without tx history flags):

Screenshot 2024-12-09 at 21 58 04

@alessandrod
Copy link

Thanks for looking at this! Haven't done a proper review yet, but skimming through the code, it looks like it would parallelize well using rayon instead?

@fkouteib
Copy link
Author

Thanks for the feedback Alessandro. That makes sense, and spinning off just the write_transaction_status_batch() into a rayon thread after a message is dequeued would be cleaner and achieve the same desired outcome. One follow-up, mostly because I am not super familiar with how we manage this at large on Agave, we should do it with a private rayon thread pool that's still capped, rather than the global rayon pool. I am worried about introducing other perf variations and resource starvation with tapping the global pool. Is that what you have in mind?

@alessandrod
Copy link

Thanks for the feedback Alessandro. That makes sense, and spinning off just the write_transaction_status_batch() into a rayon thread after a message is dequeued would be cleaner and achieve the same desired outcome. One follow-up, mostly because I am not super familiar with how we manage this at large on Agave, we should do it with a private rayon thread pool that's still capped, rather than the global rayon pool. I am worried about introducing other perf variations and resource starvation with tapping the global pool. Is that what you have in mind?

This is a tricky one because the global rayon pool is actually 99.9% unused. But it does have a bajillion threads (num_cpus() I think?), so we should be careful to not crank it too hard. Between adding another pool and using the global one I'd vote for using the global one, and then hopefully someone makes that pool smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants