[SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools #29303

yaooqinn · 2020-07-30T10:17:31Z

What changes were proposed in this pull request?

This PR fulfills some missing fields for SparkGetColumnsOperation including COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION

and improve the test coverage.

Why are the changes needed?

make jdbc tools happier

Does this PR introduce any user-facing change?

yes,

before

after

How was this patch tested?

add unit tests

… server client tools

SparkQA · 2020-07-30T21:17:47Z

Test build #126807 has finished for PR 29303 at commit cc8c90d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2020-07-31T01:43:00Z

cc @cloud-fan @gatorsmile @maropu @wangyum thanks very much

HyukjinKwon · 2020-07-31T02:03:53Z

@yaooqinn FYI to make the GitHub Actions tests pass, you might have to rebase.

yaooqinn · 2020-07-31T02:05:11Z

I see, thanks, @HyukjinKwon.

yaooqinn · 2020-07-31T02:52:02Z

retest this please

SparkQA · 2020-07-31T06:17:55Z

Test build #126845 has finished for PR 29303 at commit 2a319ff.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ExecutorProcessLost(

SparkQA · 2020-07-31T07:05:02Z

Test build #126852 has finished for PR 29303 at commit 2a319ff.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ExecutorProcessLost(

yaooqinn · 2020-07-31T07:06:28Z

retest this please

SparkQA · 2020-07-31T07:50:40Z

Test build #126867 has finished for PR 29303 at commit 2a319ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ExecutorProcessLost(

cloud-fan · 2020-07-31T08:37:34Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala

+  private def getDecimalDigits(typ: DataType): Option[Int] = typ match {
+    case BooleanType | _: IntegerType => Some(0)
+    case FloatType => Some(7)
+    case DoubleType => Some(15)


how do we pick these numbers?

They are from hive's TypeDescriptor and I also checked the results, which are same from hive and spark, e.g. select 0.12345678901234567890D

the result is 0.12345678901234568 which have 17 decimal digits but hive says it's 15 for double...

It seems to follow IEEE 754 https://en.wikipedia.org/wiki/IEEE_754

Can we add a code comment to explain it?

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala

cloud-fan · 2020-07-31T09:23:26Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala


+  private def getColumnSize(typ: DataType): Option[Int] = typ match {
+    case StringType | BinaryType => None
+    case ArrayType(et, _) => getColumnSize(et)


how can we report column size for array type? We don't know how many elements in an array.

oh, right...

maropu · 2020-07-31T12:49:08Z

To check if we could fetch the new metadata via DatabaseMetaData, could you add tests in SparkMetadataOperationSuite, too?

SparkQA · 2020-07-31T13:09:19Z

Test build #126887 has finished for PR 29303 at commit 5ec2092.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-07-31T12:49:43Z

...rc/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala

    }
  }
+
+  test("SparkGetColumnsOperation") {


Please make this test title clearer, too.

maropu · 2020-07-31T12:53:27Z

...rc/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala

+    val decimalType = DecimalType(10, 2)
+    val ddl =
+      s"""
+         |CREATE TABLE $schemaName.$tableName


I think we don't have a strict rule for the format though, how about following the existing format?

spark/sql/core/src/test/scala/org/apache/spark/sql/ShowCreateTableSuite.scala

Lines 35 to 41 in 8014b0b

s"""CREATE TABLE ddl_test (

| a STRING,

| b STRING,

| `extra col` ARRAY<INT>,

| `<another>` STRUCT<x: INT, y: ARRAY<BOOLEAN>>

|)

|USING json

OK, this one looks better.

maropu · 2020-07-31T12:54:07Z

...rc/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala

+      s"""
+         |CREATE TABLE $schemaName.$tableName
+         |  (
+         |    a boolean comment '0',


Could you test all the types where possible for test coverage?

maropu · 2020-07-31T13:01:59Z

...rc/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala

+      assert(typeNames.get(3) === decimalType.sql)
+
+      val colSize = columns.get(6).getI32Val.getValues
+      assert(colSize.get(3).intValue() === decimalType.defaultSize)


Could you check all the elements in colSize? (Rather, I think we need to check all elements in the fetched rowSet.)

maropu · 2020-07-31T13:10:03Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala

+   * returned.
+   * For array, map, string, and binaries, the column size is variable, return null as unknown.
+   */
+  private def getColumnSize(typ: DataType): Option[Int] = typ match {


Hive returns the almost same values for column size?
https://github.com/apache/hive/blob/3e5e99eae154ceb8f9aa4e4ec71e6b05310e98e4/service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java#L187-L211

Hive does not return same result for each type

yaooqinn · 2020-08-03T07:11:22Z

To check if we could fetch the new metadata via DatabaseMetaData, could you add tests in SparkMetadataOperationSuite, too?

The new test can end to end cover the meta operations both for binary and HTTP mode, but SparkMetadataOperationSuite seems only works for binary only

SparkQA · 2020-08-03T07:37:36Z

Test build #126960 has finished for PR 29303 at commit 66d5812.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-03T08:29:08Z

Test build #126965 has finished for PR 29303 at commit 8e1a276.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-08-03T09:18:28Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala

+      } else {
+        Some(sizeArr.map(_.get).sum)
+      }
+    case other => Some(other.defaultSize)


nit: I think it's safer to list the types we know the size, instead of listing the types we don't know the size. I'd prefer

case dt @ (_: NumericType | DateType | TimestampType) => dt.defaultSize ...

SparkQA · 2020-08-03T10:19:02Z

Test build #126974 has finished for PR 29303 at commit cf34aa3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-08-03T12:24:37Z

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala

          null, // CHAR_OCTET_LENGTH
-          null, // ORDINAL_POSITION
+          pos.asInstanceOf[AnyRef], // ORDINAL_POSITION
          "YES", // IS_NULLABLE


Should this respect column.nullable?

IS_NULLABLE is related to PK, "YES" here is more specific as we don't have such thing implemented

cloud-fan · 2020-08-03T12:49:32Z

thanks, merging to master!

dongjoon-hyun · 2020-08-04T00:21:23Z

...rc/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala

+        Thread.sleep(10)
+        status = client.getOperationStatus(opHandle)
+      }
+      val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, null)


Hi, @yaooqinn and @cloud-fan . I'm not sure but this new test case seems to break all Jenkins Maven jobs on master branch. Could you take a look?

org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator cannot be cast to org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:42) at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.getColumns(ThriftCLIServiceClient.java:275) at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$17(ThriftServerWithSparkContextSuite.scala:146)

From the screenshot, please ignore SBT failures.

Let's revert if it takes a while to fix.

Looks like something related to another version of Hive which belong to another job(maybe failed) not being cleaned up.

Not all maven builds are broken:

Let's wait a little bit and see if it's an env issue.

Ya. It seems that Hive 2.3 is affected. And, JDK11 maven seems not running those test suite.
Do we have a follow-up idea, @yaooqinn and @cloud-fan ?
The error message is clear, java.lang.ClassCastException. If there is no idea, I'd like to recommend to revert first and pass Maven Jenkins properly.

BTW, this is not a flaky failure. This broke some profiles consistently.

BTW, it's up to you, @cloud-fan . I'll not revert this by myself since PRBuilder is still working.

### What changes were proposed in this pull request? The newly added test fails Jenkins maven jobs, see #29303 (comment) We move the test from `ThriftServerWithSparkContextSuite` to `SparkMetadataOperationSuite`, the former uses an embedded thrift server where the server and the client are in the same JVM process and the latter forks a new process to start the server where the server and client are isolated. The sbt runner seems to be fine with the test in the `ThriftServerWithSparkContextSuite`, but the maven runner with `scalates`t plugin will face the classloader issue as we will switch classloader to the one in the `sharedState` which is not the one that hive uses to load some classes. This is more like an issue that belongs to the maven runner or the `scalatest`. So in this PR, we simply move it to bypass the issue. BTW, we should test against the way of using embedded thrift server to verify whether it is just a maven issue or not, there could be some use cases with this API. ### Why are the changes needed? Jenkins recovery ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? modified uts Closes #29347 from yaooqinn/SPARK-32492-F. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

[SPARK-32492][SQL] Fulfill missing column meta information for thrift…

cc8c90d

… server client tools

probot-autolabeler bot added the SQL label Jul 30, 2020

Merge branch 'master' into SPARK-32492

2a319ff

cloud-fan reviewed Jul 31, 2020

View reviewed changes

maropu reviewed Jul 31, 2020

View reviewed changes

...tserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala Show resolved Hide resolved

cloud-fan reviewed Jul 31, 2020

View reviewed changes

comments

5ec2092

yaooqinn changed the title ~~[SPARK-32492][SQL] Fulfill missing column meta information for thriftserver client tools~~ [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools Jul 31, 2020

maropu reviewed Jul 31, 2020

View reviewed changes

address comments

66d5812

fix test

8e1a276

cloud-fan reviewed Aug 3, 2020

View reviewed changes

address comment

cf34aa3

cloud-fan reviewed Aug 3, 2020

View reviewed changes

cloud-fan closed this in 7f5326c Aug 3, 2020

dongjoon-hyun reviewed Aug 4, 2020

View reviewed changes

yaooqinn mentioned this pull request Aug 4, 2020

[SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs #29347

Closed

	s"""CREATE TABLE ddl_test (
	\| a STRING,
	\| b STRING,
	\| `extra col` ARRAY<INT>,
	\| `<another>` STRUCT<x: INT, y: ARRAY<BOOLEAN>>
	\|)
	\|USING json

[SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools #29303

[SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools #29303

Uh oh!

Conversation

yaooqinn commented Jul 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

before

after

How was this patch tested?

Uh oh!

SparkQA commented Jul 30, 2020

Uh oh!

yaooqinn commented Jul 31, 2020

Uh oh!

HyukjinKwon commented Jul 31, 2020

Uh oh!

yaooqinn commented Jul 31, 2020

Uh oh!

yaooqinn commented Jul 31, 2020

Uh oh!

SparkQA commented Jul 31, 2020

Uh oh!

SparkQA commented Jul 31, 2020

Uh oh!

yaooqinn commented Jul 31, 2020

Uh oh!

SparkQA commented Jul 31, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Jul 31, 2020

Uh oh!

SparkQA commented Jul 31, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Aug 3, 2020

Uh oh!

SparkQA commented Aug 3, 2020

Uh oh!

SparkQA commented Aug 3, 2020

Uh oh!

cloud-fan Aug 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Jul 30, 2020 •

edited

Loading

yaooqinn Jul 31, 2020 •

edited

Loading

maropu Jul 31, 2020 •

edited

Loading

cloud-fan Aug 3, 2020 •

edited

Loading

dongjoon-hyun Aug 4, 2020 •

edited

Loading

dongjoon-hyun Aug 4, 2020 •

edited

Loading