Why the unbounded channel is used inside the RepartitionExec? #4052

YjyJeff · 2022-11-01T02:07:48Z

YjyJeff
Nov 1, 2022

According to the document in the execute method in RepartitionExec:

Note that this operator uses unbounded channels to avoid deadlocks because the output partitions can be read in any order and this could cause input partitions to be blocked when sending data to output UnboundedReceivers that are not being read yet.

Can you explain it in detail? Why it will cause deadlocks? As far as I know, when we send the RecordBatch to channels that are not being read yet will cause the Task to sleep(scheduled off from the cpu by the tokio Scheduler and can be waked up by the scheduler later when the channel has space) rather than block the thread, which will never cause deadlock. A naive example:

// Single thread executor
#[tokio::main]
async fn main() {
    let (tx, mut rx) = tokio::sync::mpsc::channel(1);

    tokio::spawn(async move {
        for i in 0..10 {
            tx.send(i).await.unwrap();
            println!("Send {i}")
        }
    });

    // Sleep for a while, rx is not being readed now, which will cause the tx.send to sleep
    tokio::time::sleep(tokio::time::Duration::from_secs(5)).await;
    while let Some(value) = rx.recv().await {
        println!("{value}")
    }
}

Thanks in advance!

Answered by andygrove

Nov 14, 2022

Sure, I can explain. Let's say we have 2 input partitions and 2 output partitions, and we use small bounded channels (max 2 record batches).

Also, let's assume we have a single thread calling execute for partition 0, reads the results, then calls execute for partition 1. It will not read any results for partition 1 until it has read all results for partition 0, but there are tasks running for the input partitions, trying to write to both output partitions 0 and 1 and 1 will get blocked because it is not being read yet, resulting in a deadlock. Does that make sense?

View full answer

HaoYang670 · 2022-11-09T09:12:43Z

HaoYang670
Nov 9, 2022

Not 100% sure about it. For example, if we want to send A, B, C to the receiver by a bounded channel with capacity = 2.
Then the deadlock could happen if the receiver reads in the order C, A, B:

SENDER                                      RECEIVER                       CHANNEL
send A (success)                                                           [A]
send B  (success)                                                          [A, B]
Sleep (as the channel is full)                
                                              Read C                             [A, B]
                                   Deadlock occurs!!!

cc @andygrove .

5 replies

YjyJeff Nov 9, 2022
Author

You can only read the channel in written order. If you send A, B, C to a channel, you can only read A, B, and C from it.

HaoYang670 Nov 9, 2022

Why is that? The comment says that: “output partitions can be read in any order”.

YjyJeff Nov 9, 2022
Author

The meaning of "Output partitions can be read in any order" is that: you have multiple channels, one channel for each partition. You do not know which partition is read first, they can be read in any order.

For example, a single thread with A, and B channels. The thread may read A first, then B. Or it may read B first then A

HaoYang670 Nov 10, 2022

one channel for each partition

Hmm, I see. Then the reason of using unbunded channel is beyond my knowledge, sorry about it. @andygrove Could you please take a look and help to answer the question.

HaoYang670 Nov 10, 2022

                // Note that this operator uses unbounded channels to avoid deadlocks because
                // the output partitions can be read in any order and this could cause input
                // partitions to be blocked when sending data to output UnboundedReceivers that are not
                // being read yet. This may cause high memory usage if the next operator is
                // reading output partitions in order rather than concurrently. One workaround
                // for this would be to add spill-to-disk capabilities.
                let (sender, receiver) =
                    mpsc::unbounded_channel::<Option<ArrowResult<RecordBatch>>>();

Could the deadlock happen if one partition contains several RecordBatch?

andygrove · 2022-11-14T15:46:07Z

andygrove
Nov 14, 2022
Collaborator

Sure, I can explain. Let's say we have 2 input partitions and 2 output partitions, and we use small bounded channels (max 2 record batches).

Also, let's assume we have a single thread calling execute for partition 0, reads the results, then calls execute for partition 1. It will not read any results for partition 1 until it has read all results for partition 0, but there are tasks running for the input partitions, trying to write to both output partitions 0 and 1 and 1 will get blocked because it is not being read yet, resulting in a deadlock. Does that make sense?

2 replies

YjyJeff Nov 14, 2022
Author

Yes, deadlock happens in this case, thanks~ I think the key point is:

Calling execute for partition 0, reads the results, then calls execute for partition 1

If we poll the output partition in different tasks, one task for each output partition, the deadlock never happens. Am I right?

andygrove Nov 17, 2022
Collaborator

Yes, that is correct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the unbounded channel is used inside the RepartitionExec? #4052

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Why the unbounded channel is used inside the RepartitionExec? #4052

YjyJeff Nov 1, 2022

Replies: 2 comments · 7 replies

HaoYang670 Nov 9, 2022

YjyJeff Nov 9, 2022 Author

HaoYang670 Nov 9, 2022

YjyJeff Nov 9, 2022 Author

HaoYang670 Nov 10, 2022

HaoYang670 Nov 10, 2022

andygrove Nov 14, 2022 Collaborator

YjyJeff Nov 14, 2022 Author

andygrove Nov 17, 2022 Collaborator

YjyJeff
Nov 1, 2022

Replies: 2 comments 7 replies

HaoYang670
Nov 9, 2022

YjyJeff Nov 9, 2022
Author

YjyJeff Nov 9, 2022
Author

andygrove
Nov 14, 2022
Collaborator

YjyJeff Nov 14, 2022
Author

andygrove Nov 17, 2022
Collaborator