Skip to content

Commit ec7570e

Browse files
chenhao-dbMaxGekk
authored andcommitted
[SPARK-49275][SQL] Fix return type nullness of the xpath expression
### What changes were proposed in this pull request? The `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)`. ### Why are the changes needed? It avoids potential failures in queries that uses the `xpath` expression. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? A new unit test. It would fail without the change in the PR. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47796 from chenhao-db/fix_xpath_nullness. Authored-by: Chenhao Li <[email protected]> Signed-off-by: Max Gekk <[email protected]>
1 parent c274c5a commit ec7570e

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -242,13 +242,15 @@ case class XPathString(xml: Expression, path: Expression) extends XPathExtract {
242242
Examples:
243243
> SELECT _FUNC_('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()');
244244
["b1","b2","b3"]
245+
> SELECT _FUNC_('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b');
246+
[null,null,null]
245247
""",
246248
since = "2.0.0",
247249
group = "xml_funcs")
248250
// scalastyle:on line.size.limit
249251
case class XPathList(xml: Expression, path: Expression) extends XPathExtract {
250252
override def prettyName: String = "xpath"
251-
override def dataType: DataType = ArrayType(SQLConf.get.defaultStringType, containsNull = false)
253+
override def dataType: DataType = ArrayType(SQLConf.get.defaultStringType)
252254

253255
override def nullSafeEval(xml: Any, path: Any): Any = {
254256
val nodeList = xpathUtil.evalNodeList(xml.asInstanceOf[UTF8String].toString, pathString)

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathExpressionSuite.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,11 @@ class XPathExpressionSuite extends SparkFunSuite with ExpressionEvalHelper {
185185
testExpr("<a><b class='bb'>b1</b><b>b2</b><b>b3</b><c class='bb'>c1</c><c>c2</c></a>",
186186
"a/*[@class='bb']/text()", Seq("b1", "c1"))
187187

188+
checkEvaluation(
189+
Coalesce(Seq(
190+
GetArrayItem(XPathList(Literal("<a></a>"), Literal("a")), Literal(0)),
191+
Literal("nul"))), "nul")
192+
188193
testNullAndErrorBehavior(testExpr)
189194
}
190195

0 commit comments

Comments
 (0)