Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ object ShuffleExchangeExec {
}
// The comparator for comparing row hashcode, which should always be Integer.
val prefixComparator = PrefixComparators.LONG
val canUseRadixSort = SQLConf.get.enableRadixSort

// The prefix computer generates row hashcode as the prefix, so we may decrease the
// probability that the prefixes are equal when input rows choose column values from a
// limited range.
Expand All @@ -264,7 +264,7 @@ object ShuffleExchangeExec {
prefixComparator,
prefixComputer,
pageSize,
canUseRadixSort)
false /* canUseRadixSort */)
Copy link
Contributor

@hvanhovell hvanhovell Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the same check as used in SortExec? That would be something like:

canUseRadixSort && outputAttributes.length == 1 && SortPrefixUtils.canSortFullyWithPrefix(SortOrder(outputAttributes.head, Ascending))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are comparing binary here, so SortPrefixUtils.canSortFullyWithPrefix always return false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I did not know that we use binary comparisons. Is this because of Map not being comparable? If it is, then that might be problematic in itself, because you expect the retried stage to return map elements in the same order right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, back to the code, the change looks good to me. It might help to add a more insightful comment here.

Copy link
Contributor

@cloud-fan cloud-fan Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because of Map not being comparable

Yes, and it's also because we don't have to do an expensive normal sort.

The only problem we want to fix here is inputs with random order. Here we just want the inputs to have a stable order, but don't really care what the order is. So comparing via the unsafe row binary format is good enough here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree a better comment would be nice here, specifically reference the jira or say don't enable because...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we compare the UnsafeRow binary here, and we the sort here must have a stable result. Otherwise, it will cause the correctness bug.
Thanks for the advice, add more comments done.

sorter.sort(iter.asInstanceOf[Iterator[UnsafeRow]])
}
} else {
Expand Down