-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26224][SQL] Advice the user when creating many project on subsequent calls to withColumn #23285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26224][SQL] Advice the user when creating many project on subsequent calls to withColumn #23285
Changes from 1 commit
da2c82e
a40db10
fa25e2e
5162b13
e37c2d6
802cb9e
5d1fc00
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,7 +40,7 @@ import org.apache.spark.sql.catalyst.encoders._ | |
| import org.apache.spark.sql.catalyst.expressions._ | ||
| import org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection | ||
| import org.apache.spark.sql.catalyst.json.{JacksonGenerator, JSONOptions} | ||
| import org.apache.spark.sql.catalyst.optimizer.CombineUnions | ||
| import org.apache.spark.sql.catalyst.optimizer.{CollapseProject, CombineUnions} | ||
| import org.apache.spark.sql.catalyst.parser.{ParseException, ParserUtils} | ||
| import org.apache.spark.sql.catalyst.plans._ | ||
| import org.apache.spark.sql.catalyst.plans.logical._ | ||
|
|
@@ -2146,7 +2146,7 @@ class Dataset[T] private[sql]( | |
| * Returns a new Dataset by adding columns or replacing the existing columns that has | ||
| * the same names. | ||
| */ | ||
| private[spark] def withColumns(colNames: Seq[String], cols: Seq[Column]): DataFrame = { | ||
| private[spark] def withColumns(colNames: Seq[String], cols: Seq[Column]): DataFrame = withPlan { | ||
| require(colNames.size == cols.size, | ||
| s"The size of column names: ${colNames.size} isn't equal to " + | ||
| s"the size of columns: ${cols.size}") | ||
|
|
@@ -2164,16 +2164,16 @@ class Dataset[T] private[sql]( | |
| columnMap.find { case (colName, _) => | ||
| resolver(field.name, colName) | ||
| } match { | ||
| case Some((colName: String, col: Column)) => col.as(colName) | ||
| case _ => Column(field) | ||
| case Some((colName: String, col: Column)) => col.as(colName).named | ||
| case _ => field | ||
| } | ||
| } | ||
|
|
||
| val newColumns = columnMap.filter { case (colName, col) => | ||
| val newColumns = columnMap.filter { case (colName, _) => | ||
| !output.exists(f => resolver(f.name, colName)) | ||
| }.map { case (colName, col) => col.as(colName) } | ||
| }.map { case (colName, col) => col.as(colName).named } | ||
|
|
||
| select(replacedAndExistingColumns ++ newColumns : _*) | ||
| CollapseProject(Project(replacedAndExistingColumns ++ newColumns, logicalPlan)) | ||
|
||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated on the JIRA ticket, the problem is deep query plan. I think we can have many ways to create such deep query plan, not only for
withColumns. For example, you can callselectmany times to do that too. This change makeswithColumnsa special case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but I think this is a special case. I have seen many cases when
withColumnis used in for loops: with this change such a pattern would be better supported.