Skip to content

Why the unbounded channel is used inside the RepartitionExec? #4052

Answered by andygrove
YjyJeff asked this question in Q&A
Discussion options

You must be logged in to vote

Sure, I can explain. Let's say we have 2 input partitions and 2 output partitions, and we use small bounded channels (max 2 record batches).

Also, let's assume we have a single thread calling execute for partition 0, reads the results, then calls execute for partition 1. It will not read any results for partition 1 until it has read all results for partition 0, but there are tasks running for the input partitions, trying to write to both output partitions 0 and 1 and 1 will get blocked because it is not being read yet, resulting in a deadlock. Does that make sense?

Replies: 2 comments 7 replies

Comment options

You must be logged in to vote
5 replies
@YjyJeff
Comment options

@HaoYang670
Comment options

@YjyJeff
Comment options

@HaoYang670
Comment options

@HaoYang670
Comment options

Comment options

You must be logged in to vote
2 replies
@YjyJeff
Comment options

@andygrove
Comment options

Answer selected by YjyJeff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants