Skip to content

Conversation

@stevomitric
Copy link
Contributor

@stevomitric stevomitric commented Mar 21, 2024

What changes were proposed in this pull request?

Added the new MapSort expression in CollationOperations alongside new UTs.

Why are the changes needed?

In order to add the ability to do GROUP BY on map types we first have to be able to sort the maps by their key

Does this PR introduce any user-facing change?

No

How was this patch tested?

New tests in CollectionExpressionSuite

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Mar 21, 2024
* ascendingOrder - A boolean value describing the order in which the map will be sorted.
This can be either be ascending (true) or descending (false).
""",
examples = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't need this part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed ExpressionDescription.

| Object $o1 = (($simpleEntryType) $o1entry).getKey();
| Object $o2 = (($simpleEntryType) $o2entry).getKey();
| $comp;
| return $order ? $c : -$c;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for just seeing this, but maybe we should do here the same thing that ArraySort does which is put the ordering into a variable outside of compare and just multiply it with the result?

this way we avoid branching in every comparison

Copy link
Contributor

@stefankandic stefankandic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass!

copy(child = newChild)
}

case class MapSort(base: Expression, ascendingOrder: Expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case class MapSort(base: Expression, ascendingOrder: Expression)
case class MapSort(base: Expression, ascendingOrder: Boolean)

Copy link
Contributor Author

@stevomitric stevomitric Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the BinaryExpression require two expressions here? Do we demote this to UnaryExpression?

EDIT: Expression for ascendingOrder in array sorting has been set as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's is the internal use-cases for the expression? Do we need this parameter at all?

Seems like you are going to pass true as ascendingOrder always at
https://github.com/apache/spark/pull/45549/files#diff-11264d807efa58054cca2d220aae8fba644ee0f0f2a4722c46d52828394846efR2488

   case a @ Aggregate(groupingExpr, x, b) =>
      val newGrouping = groupingExpr.map { expr =>
        (expr, expr.dataType) match {
          case (_: MapSort, _) => expr
          case (_, _: MapType) =>
            MapSort(expr, Literal.TrueLiteral)
          case _ => expr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the point of internal use, we don't need it. Refactored expression as UnaryExpression and removed ordering altogether.

@stevomitric stevomitric requested a review from cloud-fan March 21, 2024 14:35
|""".stripMargin
}

override def prettyName: String = "map_sort"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this since the expression hasn't been bound to the function name.

copy(child = newChild)
}

case class MapSort(base: Expression, ascendingOrder: Expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's is the internal use-cases for the expression? Do we need this parameter at all?

Seems like you are going to pass true as ascendingOrder always at
https://github.com/apache/spark/pull/45549/files#diff-11264d807efa58054cca2d220aae8fba644ee0f0f2a4722c46d52828394846efR2488

   case a @ Aggregate(groupingExpr, x, b) =>
      val newGrouping = groupingExpr.map { expr =>
        (expr, expr.dataType) match {
          case (_: MapSort, _) => expr
          case (_, _: MapType) =>
            MapSort(expr, Literal.TrueLiteral)
          case _ => expr

Comment on lines 906 to 907
case Literal(_: Boolean, BooleanType) =>
TypeCheckResult.TypeCheckSuccess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to be so strict on this argument? For example, could you imagine a case where you want to select the sort order based on the values in another column or the result of an expression? Is this needlessly restrictive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, could you imagine a case where you want to select the sort order based on the values in another column or the result of an expression?

I can imagine the case but so far we are going to use the expression internally for one case only. Support of ascendingOrder = false or even an arbitrary boolean expression just overcomplicates the code.

@stevomitric stevomitric requested a review from MaxGekk March 21, 2024 21:58
@MaxGekk MaxGekk changed the title [SPARK-47007] Added MapSort expression [SPARK-47007][SQL] Add the MapSort expression Mar 22, 2024
@MaxGekk
Copy link
Member

MaxGekk commented Mar 22, 2024

+1, LGTM. Merging to master.
Thank you, @stevomitric and @stefankandic @cloud-fan @markj-db for review.

@MaxGekk MaxGekk closed this in c94090e Mar 22, 2024
cloud-fan pushed a commit that referenced this pull request Mar 27, 2024
### What changes were proposed in this pull request?
Changes proposed in this PR include:
- Relaxed checks that prevent aggregating of map types
- Added new analyzer rule that uses `MapSort` expression proposed in [this PR](#45639)
- Created codegen that compares two sorted maps

### Why are the changes needed?
Adding new functionality to GROUP BY map types

### Does this PR introduce _any_ user-facing change?
Yes, ability to use `GROUP BY MapType`

### How was this patch tested?
With new UTs

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45549 from stevomitric/stevomitric/map-group-by.

Lead-authored-by: Stevo Mitric <[email protected]>
Co-authored-by: Stefan Kandic <[email protected]>
Co-authored-by: Stevo Mitric <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants