Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce number of pool connections #1384

Merged
merged 1 commit into from
Nov 1, 2024
Merged

Conversation

terrazoon
Copy link
Contributor

@terrazoon terrazoon commented Oct 31, 2024

Description

We saw some BrokenPipeErrors that caused messages not to be sent. Some of the stack traces pointed in the direction of multiprocessing.Manager, which handles downloading S3 objects to build up our jobs cache.

One recommendation when seeing BrokenPipeError is to reduce the number of connections in the connection pool, and that is the easiest fix to try.

Others are:

  1. Increase memory and CPU
  2. Use more retry logic (yuck)
  3. Consider replacing multiprocessing.Manager with multiprocessing.Queue or Pipe

NOTE: The multiprocessing.Manager is used in app/aws/s3.py and the whole cache-building workflow also relies on concurrent.futures.ThreadPoolExecutor. Both rely on the AWS_CLIENT_CONFIG, which defines the maximum number of pool connections.

Security Considerations

N/A

@terrazoon terrazoon marked this pull request as draft October 31, 2024 15:12
@terrazoon terrazoon marked this pull request as ready for review October 31, 2024 16:46
Copy link
Contributor

@ccostino ccostino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @terrazoon!

@xlorepdarkhelm xlorepdarkhelm merged commit 38c566e into main Nov 1, 2024
7 checks passed
@xlorepdarkhelm xlorepdarkhelm deleted the reduce_connections branch November 1, 2024 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants