[SPARK-14536] [SQL] fix to handle null value in array type column for postgres. #15192

sureshthalamati · 2016-09-22T01:05:44Z

What changes were proposed in this pull request?

JDBC read is failing with NPE due to missing null value check for array data type if the source table has null values in the array type column. For null values Resultset.getArray() returns null.
This PR adds null safe check to the Resultset.getArray() value before invoking method on the Array object.

How was this patch tested?

Updated the PostgresIntegration test suite to test null values. Ran docker integration tests on my laptop.

sureshthalamati · 2016-09-22T01:12:24Z

@JoshRosen

SparkQA · 2016-09-22T03:13:16Z

Test build #65751 has finished for PR 15192 at commit 9eb40db.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2016-09-27T07:47:28Z

@srowen SPARK-14536 is not a duplicate, this PR will address the issue.

mtrewartha · 2017-01-18T16:10:27Z

@sureshthalamati Any plans to get the conflicts fixed here and get things merged?

gatorsmile · 2017-01-18T18:06:31Z

The original support by array type is done in #9662. Could you resolve the conflicts and reproduce the error in the latest code base?

sureshthalamati · 2017-01-18T18:20:56Z

sure. I will resolve the conflicts today.

gatorsmile · 2017-01-18T18:26:26Z

Thanks!

SparkQA · 2017-01-19T11:11:43Z

Test build #71648 has finished for PR 15192 at commit eeba2b1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sureshthalamati · 2017-01-19T18:00:12Z

@gatorsmile Verified on the master , problem still exist. Resolved the conflicts, when u get a chance can you please review.

Error stack without the fix
java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:469)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:467)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:328)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:310)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
.....

gatorsmile · 2017-01-20T22:36:02Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

-          rs.getArray(pos + 1).getArray,
-          array => new GenericArrayData(elementConversion.apply(array)))
+        val array = nullSafeConvert[java.sql.Array](
+          rs.getArray(pos + 1),


Nit: input = rs.getArray(pos + 1),

gatorsmile · 2017-01-20T22:36:53Z

...er-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala

    assert(rows(0).getFloat(15) == 1.01f)
    assert(rows(0).getShort(16) == 1)
+
+    // Test reading null values.


// Test reading null values using the second row.

gatorsmile · 2017-01-20T22:37:24Z

...er-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala

-    assert(rows.length == 1)
+    val rows = df.collect().sortBy(_.toString())
+    assert(rows.length == 2)
    val types = rows(0).toSeq.map(x => x.getClass)


Add a comment above this line to indicate the following statements are testing the first row.

gatorsmile · 2017-01-20T22:38:16Z

Will also run the docker integration tests in my local computer. Post the results later. Thanks!

gatorsmile · 2017-01-20T23:59:44Z

LGTM pending test. The docker integration test cases pass in my local computer.

However, I saw a not-related test case failure:

===== FINISHED o.a.s.sql.jdbc.OracleIntegrationSuite: 'SPARK-16625: General data types to be mapped to Oracle' =====

- SPARK-16625: General data types to be mapped to Oracle *** FAILED ***
  types.apply(9).equals("class java.sql.Date") was false (OracleIntegrationSuite.scala:136)

Submitted a JIRA: https://issues.apache.org/jira/browse/SPARK-19318

@sureshthalamati could you take a look at it and fix it? Thanks!

SparkQA · 2017-01-21T02:22:22Z

Test build #71746 has finished for PR 15192 at commit d8cbe54.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-21T03:24:46Z

Thanks! Merging to master.

sureshthalamati · 2017-01-22T17:41:15Z

Thank you, @gatorsmile

mtrewartha · 2017-01-23T15:13:15Z

Thanks guys, this will be super, super helpful to us!

…postgres. ## What changes were proposed in this pull request? JDBC read is failing with NPE due to missing null value check for array data type if the source table has null values in the array type column. For null values Resultset.getArray() returns null. This PR adds null safe check to the Resultset.getArray() value before invoking method on the Array object. ## How was this patch tested? Updated the PostgresIntegration test suite to test null values. Ran docker integration tests on my laptop. Author: sureshthalamati <[email protected]> Closes apache#15192 from sureshthalamati/jdbc_array_null_fix-SPARK-14536.

holdenk · 2017-03-27T19:24:35Z

Hi @gatorsmile , on the dev@ list someone asked if this was something we were considering backporting into the 2.1 branch (for the 2.1.1 release). It looks like it might be a reasonable candidate for this so I figured since you were the one who committed it you might want to take a look and decide?

gatorsmile · 2017-03-28T01:11:15Z

@holdenk Sure, we can backport it to Spark 2.1.

@sureshthalamati Could you please back port it to Spark 2.1?

sureshthalamati · 2017-03-28T17:01:28Z

sure. Thanks

sureshthalamati · 2017-03-28T18:31:15Z

Created PR to back port this fix to 2.1: #17460

fix to jdbc read to handle null values in array data type column

eeba2b1

sureshthalamati force-pushed the jdbc_array_null_fix-SPARK-14536 branch from 9eb40db to eeba2b1 Compare January 19, 2017 08:32

gatorsmile reviewed Jan 20, 2017

View reviewed changes

Addressing review comments

d8cbe54

asfgit closed this in f174cdc Jan 21, 2017

[SPARK-14536] [SQL] fix to handle null value in array type column for postgres. #15192

[SPARK-14536] [SQL] fix to handle null value in array type column for postgres. #15192

Uh oh!

Conversation

sureshthalamati commented Sep 22, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sureshthalamati commented Sep 22, 2016

Uh oh!

SparkQA commented Sep 22, 2016

Uh oh!

sureshthalamati commented Sep 27, 2016

Uh oh!

mtrewartha commented Jan 18, 2017

Uh oh!

gatorsmile commented Jan 18, 2017

Uh oh!

sureshthalamati commented Jan 18, 2017

Uh oh!

gatorsmile commented Jan 18, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

sureshthalamati commented Jan 19, 2017

Uh oh!

gatorsmile Jan 20, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 20, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jan 20, 2017

Uh oh!

gatorsmile commented Jan 20, 2017

Uh oh!

SparkQA commented Jan 21, 2017

Uh oh!

gatorsmile commented Jan 21, 2017

Uh oh!

sureshthalamati commented Jan 22, 2017

Uh oh!

mtrewartha commented Jan 23, 2017

Uh oh!

holdenk commented Mar 27, 2017

Uh oh!

gatorsmile commented Mar 28, 2017

Uh oh!

sureshthalamati commented Mar 28, 2017

Uh oh!

sureshthalamati commented Mar 28, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gatorsmile Jan 20, 2017 •

edited

Loading