Skip to content

Conversation

@wjszlachta-man
Copy link

What changes were proposed in this pull request?

On glibc based Linux systems select() can monitor only file descriptor numbers that are less than FD_SETSIZE (1024).

This is an unreasonably low limit for many modern applications.

This PR replaces select.select() with select.poll() when running on POSIX os.

Why are the changes needed?

When running via pyspark we frequently observe:

Exception occurred during processing of request from ('127.0.0.1', 46334)
Traceback (most recent call last):
  File "/usr/lib/python3.11/socketserver.py", line 317, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.11/socketserver.py", line 348, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.11/socketserver.py", line 361, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.11/socketserver.py", line 755, in __init__
    self.handle()
  File "/usr/lib/python3.11/site-packages/pyspark/accumulators.py", line 293, in handle
    poll(authenticate_and_accum_updates)
  File "/usr/lib/python3.11/site-packages/pyspark/accumulators.py", line 266, in poll
    r, _, _ = select.select([self.rfile], [], [], 1)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: filedescriptor out of range in select()

On POSIX systems poll() should be used instead of select().

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit tests + manual run on YARN cluster (Linux).

Was this patch authored or co-authored using generative AI tooling?

No

…osix

On glibc based Linux systems select() can monitor only file descriptor numbers
that are less than FD_SETSIZE (1024).

This is an unreasonably low limit for many modern applications.
@wjszlachta-man wjszlachta-man force-pushed the spark-51966-replace-select-with-poll-on-posix branch from 98d5e56 to d3fa95a Compare May 2, 2025 19:34
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant