Skip to content

Conversation

@sureshthalamati
Copy link
Contributor

What changes were proposed in this pull request?

JDBC read is failing with NPE due to missing null value check for array data type if the source table has null values in the array type column. For null values Resultset.getArray() returns null.
This PR adds null safe check to the Resultset.getArray() value before invoking method on the Array object.

How was this patch tested?

Updated the PostgresIntegration test suite to test null values. Ran docker integration tests on my laptop.

@sureshthalamati
Copy link
Contributor Author

@JoshRosen

@SparkQA
Copy link

SparkQA commented Sep 22, 2016

Test build #65751 has finished for PR 15192 at commit 9eb40db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sureshthalamati
Copy link
Contributor Author

@srowen SPARK-14536 is not a duplicate, this PR will address the issue.

@mtrewartha
Copy link

@sureshthalamati Any plans to get the conflicts fixed here and get things merged?

@gatorsmile
Copy link
Member

The original support by array type is done in #9662. Could you resolve the conflicts and reproduce the error in the latest code base?

@sureshthalamati
Copy link
Contributor Author

sure. I will resolve the conflicts today.

@gatorsmile
Copy link
Member

Thanks!

@sureshthalamati sureshthalamati force-pushed the jdbc_array_null_fix-SPARK-14536 branch from 9eb40db to eeba2b1 Compare January 19, 2017 08:32
@SparkQA
Copy link

SparkQA commented Jan 19, 2017

Test build #71648 has finished for PR 15192 at commit eeba2b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sureshthalamati
Copy link
Contributor Author

@gatorsmile Verified on the master , problem still exist. Resolved the conflicts, when u get a chance can you please review.

Error stack without the fix
java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:469)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:467)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:328)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:310)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
.....

rs.getArray(pos + 1).getArray,
array => new GenericArrayData(elementConversion.apply(array)))
val array = nullSafeConvert[java.sql.Array](
rs.getArray(pos + 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: input = rs.getArray(pos + 1),

assert(rows(0).getFloat(15) == 1.01f)
assert(rows(0).getShort(16) == 1)

// Test reading null values.
Copy link
Member

@gatorsmile gatorsmile Jan 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Test reading null values using the second row.

assert(rows.length == 1)
val rows = df.collect().sortBy(_.toString())
assert(rows.length == 2)
val types = rows(0).toSeq.map(x => x.getClass)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment above this line to indicate the following statements are testing the first row.

@gatorsmile
Copy link
Member

Will also run the docker integration tests in my local computer. Post the results later. Thanks!

@gatorsmile
Copy link
Member

LGTM pending test. The docker integration test cases pass in my local computer.

However, I saw a not-related test case failure:

===== FINISHED o.a.s.sql.jdbc.OracleIntegrationSuite: 'SPARK-16625: General data types to be mapped to Oracle' =====

- SPARK-16625: General data types to be mapped to Oracle *** FAILED ***
  types.apply(9).equals("class java.sql.Date") was false (OracleIntegrationSuite.scala:136)

Submitted a JIRA: https://issues.apache.org/jira/browse/SPARK-19318

@sureshthalamati could you take a look at it and fix it? Thanks!

@SparkQA
Copy link

SparkQA commented Jan 21, 2017

Test build #71746 has finished for PR 15192 at commit d8cbe54.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in f174cdc Jan 21, 2017
@sureshthalamati
Copy link
Contributor Author

Thank you, @gatorsmile

@mtrewartha
Copy link

Thanks guys, this will be super, super helpful to us!

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…postgres.

## What changes were proposed in this pull request?

JDBC  read  is failing with  NPE due to missing null value check for array data type if the source table has null values in the array type column.  For null values Resultset.getArray()  returns null.
This PR adds null safe check to the Resultset.getArray() value before invoking method on the Array object.
## How was this patch tested?

Updated the PostgresIntegration test suite to test null values. Ran docker integration tests on my laptop.

Author: sureshthalamati <[email protected]>

Closes apache#15192 from sureshthalamati/jdbc_array_null_fix-SPARK-14536.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…postgres.

## What changes were proposed in this pull request?

JDBC  read  is failing with  NPE due to missing null value check for array data type if the source table has null values in the array type column.  For null values Resultset.getArray()  returns null.
This PR adds null safe check to the Resultset.getArray() value before invoking method on the Array object.
## How was this patch tested?

Updated the PostgresIntegration test suite to test null values. Ran docker integration tests on my laptop.

Author: sureshthalamati <[email protected]>

Closes apache#15192 from sureshthalamati/jdbc_array_null_fix-SPARK-14536.
@holdenk
Copy link
Contributor

holdenk commented Mar 27, 2017

Hi @gatorsmile , on the dev@ list someone asked if this was something we were considering backporting into the 2.1 branch (for the 2.1.1 release). It looks like it might be a reasonable candidate for this so I figured since you were the one who committed it you might want to take a look and decide?

@gatorsmile
Copy link
Member

@holdenk Sure, we can backport it to Spark 2.1.

@sureshthalamati Could you please back port it to Spark 2.1?

@sureshthalamati
Copy link
Contributor Author

sure. Thanks

@sureshthalamati
Copy link
Contributor Author

Created PR to back port this fix to 2.1: #17460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants