-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31915][SQL][PYTHON] Remove projection that adds grouping keys in grouped and cogrouped pandas UDFs #28745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -23,14 +23,18 @@ import org.apache.spark.sql.catalyst.InternalRow | |||||||||||||||||||||||||||||||||
| import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute | ||||||||||||||||||||||||||||||||||
| import org.apache.spark.sql.catalyst.expressions.codegen._ | ||||||||||||||||||||||||||||||||||
| import org.apache.spark.sql.catalyst.plans.logical.EventTimeWatermark | ||||||||||||||||||||||||||||||||||
| import org.apache.spark.sql.catalyst.util.quoteIdentifier | ||||||||||||||||||||||||||||||||||
| import org.apache.spark.sql.catalyst.util.{quoteIdentifier, toPrettySQL} | ||||||||||||||||||||||||||||||||||
| import org.apache.spark.sql.types._ | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| object NamedExpression { | ||||||||||||||||||||||||||||||||||
| private val curId = new java.util.concurrent.atomic.AtomicLong() | ||||||||||||||||||||||||||||||||||
| private[expressions] val jvmId = UUID.randomUUID() | ||||||||||||||||||||||||||||||||||
| def newExprId: ExprId = ExprId(curId.getAndIncrement(), jvmId) | ||||||||||||||||||||||||||||||||||
| def unapply(expr: NamedExpression): Option[(String, DataType)] = Some((expr.name, expr.dataType)) | ||||||||||||||||||||||||||||||||||
| def fromExpression(expr: Expression): NamedExpression = expr match { | ||||||||||||||||||||||||||||||||||
| case ne: NamedExpression => ne | ||||||||||||||||||||||||||||||||||
| case _: Expression => Alias(expr, toPrettySQL(expr))() | ||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will send another PR to use this in other places, for example, spark/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala Lines 147 to 150 in a3a42b3
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala Lines 361 to 364 in 6c80ebb
spark/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala Lines 83 to 89 in 17857f9
and possibly at:
I can don't add this util here for now too if anyone is not sure on this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, let me take a look separate with a separate JIRA. |
||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, basically this is for the case when grouping expressions are non-deterministic: