Skip to content

Conversation

@gaogaotiantian
Copy link
Contributor

What changes were proposed in this pull request?

Use select.poll to replace select.select in worker.py on UNIX systems.

Why are the changes needed?

select.select has a known issue that it won't work with fd > 1024. We can reach this limit on heavy load systems.

Does this PR introduce any user-facing change?

No

How was this patch tested?

test_udf passes locally, the rest will run on CI.

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon
Copy link
Member

Merged to master.

xu20160924 pushed a commit to xu20160924/spark that referenced this pull request Dec 9, 2025
### What changes were proposed in this pull request?

Use `select.poll` to replace `select.select` in `worker.py` on UNIX systems.

### Why are the changes needed?

`select.select` has a known issue that it won't work with fd > 1024. We can reach this limit on heavy load systems.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`test_udf` passes locally, the rest will run on CI.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#53388 from gaogaotiantian/replace-select-with-poll.

Authored-by: Tian Gao <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@wjszlachta-man
Copy link

wjszlachta-man commented Dec 9, 2025

I think this only partially fixes the issue as similar can occur in python/pyspark/accumulators.py, where polling function uses select.select() and can fail for file descriptor numbers >=1024.

This was addressed in: #53306

@gaogaotiantian gaogaotiantian deleted the replace-select-with-poll branch December 9, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants