Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -606,9 +606,17 @@ abstract class HadoopFsRelation private[sql](
// we need to cast into the data type that user specified.
def castPartitionValuesToUserSchema(row: InternalRow) = {
InternalRow((0 until row.numFields).map { i =>
Cast(
Literal.create(row.getString(i), StringType),
userProvidedSchema.fields(i).dataType).eval()
row.isNullAt(i) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to check null in InternalRow.getString()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies Thanks for reviewing the change. In cases when we know in advance that schema does not allow nulls , sometimes we can skip this null check. By moving it to internalRow , would we loose the opportunity to optimize ? Pl. let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null-checking is cheap.

Or we could use Literal.create(row.getUTF8String(i), StringType), to avoid the conversion between String and UTF8String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @davies .Made the change. Thanks a lot for your feedback.

case true =>
Cast(
Literal.create(null, StringType),
userProvidedSchema.fields(i).dataType).eval()
case false =>
Cast(
Literal.create(row.getString(i), StringType),
userProvidedSchema.fields(i).dataType).eval()

}
}: _*)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,19 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext
}
}

test("SPARK-11997 parquet with null partition values") {
withTempPath { dir =>
val path = dir.getCanonicalPath
sqlContext.range(1, 3)
.selectExpr("if(id % 2 = 0, null, id) AS n", "id")
.write.partitionBy("n").parquet(path)

checkAnswer(
sqlContext.read.parquet(path).filter("n is null"),
Row(2, null))
}
}

// This test case is ignored because of parquet-mr bug PARQUET-370
ignore("SPARK-10301 requested schema clipping - schemas with disjoint sets of fields") {
withTempPath { dir =>
Expand Down