-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29063][SQL] Modify fillValue approach to support joined dataframe #25768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -488,7 +488,7 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { | |
| } | ||
|
|
||
| val columnEquals = df.sparkSession.sessionState.analyzer.resolver | ||
| val projections = df.schema.fields.map { f => | ||
| val fillColumnsInfo = df.schema.fields.filter { f => | ||
| val typeMatches = (targetType, f.dataType) match { | ||
| case (NumericType, dt) => dt.isInstanceOf[NumericType] | ||
| case (StringType, dt) => dt == StringType | ||
|
|
@@ -497,12 +497,10 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { | |
| throw new IllegalArgumentException(s"$targetType is not matched at fillValue") | ||
| } | ||
| // Only fill if the column is part of the cols list. | ||
| if (typeMatches && cols.exists(col => columnEquals(f.name, col))) { | ||
| fillCol[T](f, value) | ||
| } else { | ||
| df.col(f.name) | ||
| } | ||
| typeMatches && cols.exists(col => columnEquals(f.name, col)) | ||
| }.map { col => | ||
| (col.name, fillCol[T](col, value)) | ||
| } | ||
| df.select(projections : _*) | ||
| df.withColumns(fillColumnsInfo.map(_._1), fillColumnsInfo.map(_._2)) | ||
|
||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When
dfhas a duplicate column name, what is the behavior? Also, we need to add test cases to ensure the behaviors are consistent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we fill the duplicate column, we'll still get
AnalysisException: Reference xx is ambiguous. Add test cases in 03305be.