Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jul 30, 2020

What changes were proposed in this pull request?

This PR fulfills some missing fields for SparkGetColumnsOperation including COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION

and improve the test coverage.

Why are the changes needed?

make jdbc tools happier

Does this PR introduce any user-facing change?

yes,

before

image

after

image

image

How was this patch tested?

add unit tests

@SparkQA
Copy link

SparkQA commented Jul 30, 2020

Test build #126807 has finished for PR 29303 at commit cc8c90d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

cc @cloud-fan @gatorsmile @maropu @wangyum thanks very much

@HyukjinKwon
Copy link
Member

@yaooqinn FYI to make the GitHub Actions tests pass, you might have to rebase.

@yaooqinn
Copy link
Member Author

I see, thanks, @HyukjinKwon.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 31, 2020

Test build #126845 has finished for PR 29303 at commit 2a319ff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExecutorProcessLost(

@SparkQA
Copy link

SparkQA commented Jul 31, 2020

Test build #126852 has finished for PR 29303 at commit 2a319ff.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExecutorProcessLost(

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 31, 2020

Test build #126867 has finished for PR 29303 at commit 2a319ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExecutorProcessLost(

private def getDecimalDigits(typ: DataType): Option[Int] = typ match {
case BooleanType | _: IntegerType => Some(0)
case FloatType => Some(7)
case DoubleType => Some(15)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we pick these numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are from hive's TypeDescriptor and I also checked the results, which are same from hive and spark, e.g. select 0.12345678901234567890D

Copy link
Member Author

@yaooqinn yaooqinn Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result is 0.12345678901234568 which have 17 decimal digits but hive says it's 15 for double...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to follow IEEE 754 https://en.wikipedia.org/wiki/IEEE_754

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a code comment to explain it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


private def getColumnSize(typ: DataType): Option[Int] = typ match {
case StringType | BinaryType => None
case ArrayType(et, _) => getColumnSize(et)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we report column size for array type? We don't know how many elements in an array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right...

@yaooqinn yaooqinn changed the title [SPARK-32492][SQL] Fulfill missing column meta information for thriftserver client tools [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools Jul 31, 2020
@maropu
Copy link
Member

maropu commented Jul 31, 2020

To check if we could fetch the new metadata via DatabaseMetaData, could you add tests in SparkMetadataOperationSuite, too?

@SparkQA
Copy link

SparkQA commented Jul 31, 2020

Test build #126887 has finished for PR 29303 at commit 5ec2092.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

test("SparkGetColumnsOperation") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this test title clearer, too.

val decimalType = DecimalType(10, 2)
val ddl =
s"""
|CREATE TABLE $schemaName.$tableName
Copy link
Member

@maropu maropu Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have a strict rule for the format though, how about following the existing format?

s"""CREATE TABLE ddl_test (
| a STRING,
| b STRING,
| `extra col` ARRAY<INT>,
| `<another>` STRUCT<x: INT, y: ARRAY<BOOLEAN>>
|)
|USING json

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this one looks better.

s"""
|CREATE TABLE $schemaName.$tableName
| (
| a boolean comment '0',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test all the types where possible for test coverage?

assert(typeNames.get(3) === decimalType.sql)

val colSize = columns.get(6).getI32Val.getValues
assert(colSize.get(3).intValue() === decimalType.defaultSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check all the elements in colSize? (Rather, I think we need to check all elements in the fetched rowSet.)

* returned.
* For array, map, string, and binaries, the column size is variable, return null as unknown.
*/
private def getColumnSize(typ: DataType): Option[Int] = typ match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive does not return same result for each type

@yaooqinn
Copy link
Member Author

yaooqinn commented Aug 3, 2020

To check if we could fetch the new metadata via DatabaseMetaData, could you add tests in SparkMetadataOperationSuite, too?

The new test can end to end cover the meta operations both for binary and HTTP mode, but SparkMetadataOperationSuite seems only works for binary only

@SparkQA
Copy link

SparkQA commented Aug 3, 2020

Test build #126960 has finished for PR 29303 at commit 66d5812.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2020

Test build #126965 has finished for PR 29303 at commit 8e1a276.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
Some(sizeArr.map(_.get).sum)
}
case other => Some(other.defaultSize)
Copy link
Contributor

@cloud-fan cloud-fan Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's safer to list the types we know the size, instead of listing the types we don't know the size. I'd prefer

case dt @ (_: NumericType | DateType | TimestampType) => dt.defaultSize
... 

@SparkQA
Copy link

SparkQA commented Aug 3, 2020

Test build #126974 has finished for PR 29303 at commit cf34aa3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

null, // CHAR_OCTET_LENGTH
null, // ORDINAL_POSITION
pos.asInstanceOf[AnyRef], // ORDINAL_POSITION
"YES", // IS_NULLABLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this respect column.nullable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS_NULLABLE is related to PK, "YES" here is more specific as we don't have such thing implemented

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 7f5326c Aug 3, 2020
Thread.sleep(10)
status = client.getOperationStatus(opHandle)
}
val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, null)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @yaooqinn and @cloud-fan . I'm not sure but this new test case seems to break all Jenkins Maven jobs on master branch. Could you take a look?

org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator cannot be cast to org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider
      at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:42)
      at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.getColumns(ThriftCLIServiceClient.java:275)
      at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$17(ThriftServerWithSparkContextSuite.scala:146)

From the screenshot, please ignore SBT failures.
Screen Shot 2020-08-03 at 5 20 21 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert if it takes a while to fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like something related to another version of Hive which belong to another job(maybe failed) not being cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all maven builds are broken:
image

Let's wait a little bit and see if it's an env issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya. It seems that Hive 2.3 is affected. And, JDK11 maven seems not running those test suite.
Do we have a follow-up idea, @yaooqinn and @cloud-fan ?
The error message is clear, java.lang.ClassCastException. If there is no idea, I'd like to recommend to revert first and pass Maven Jenkins properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this is not a flaky failure. This broke some profiles consistently.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it's up to you, @cloud-fan . I'll not revert this by myself since PRBuilder is still working.

cloud-fan pushed a commit that referenced this pull request Aug 5, 2020
### What changes were proposed in this pull request?

The newly added test fails Jenkins maven jobs, see #29303 (comment)

We move the test from `ThriftServerWithSparkContextSuite` to `SparkMetadataOperationSuite`, the former uses an embedded thrift server where the server and the client are in the same JVM process and the latter forks a new process to start the server where the server and client are isolated.
The sbt runner seems to be fine with the test in the `ThriftServerWithSparkContextSuite`, but the maven runner with `scalates`t plugin will face the classloader issue as we will switch classloader to the one in the `sharedState` which is not the one that hive uses to load some classes. This is more like an issue that belongs to the maven runner or the `scalatest`.
So in this PR, we simply move it to bypass the issue.

BTW, we should test against the way of using embedded thrift server to verify whether it is just a maven issue or not, there could be some use cases with this API.

### Why are the changes needed?

Jenkins recovery

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

modified uts

Closes #29347 from yaooqinn/SPARK-32492-F.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants