Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Aug 7, 2019

What changes were proposed in this pull request?

This PR fix Hive 0.12 JDBC client can not handle binary type:

Connected to: Hive (version 3.0.0-SNAPSHOT)
Driver: Hive (version 0.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.12.0 by Apache Hive
0: jdbc:hive2://localhost:10000> SELECT cast('ABC' as binary);
Error: java.lang.ClassCastException: [B incompatible with java.lang.String (state=,code=0)

Server log:

19/08/07 10:10:04 WARN ThriftCLIService: Error fetching results:
java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible with java.lang.String
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
	at java.security.AccessController.doPrivileged(AccessController.java:770)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
	at com.sun.proxy.$Proxy26.fetchResults(Unknown Source)
	at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455)
	at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:819)
Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String
	at org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198)
	at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60)
	at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$getNextRowSet$1(SparkExecuteStatementOperation.scala:151)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$Lambda$1923.000000009113BFE0.apply(Unknown Source)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withSchedulerPool(SparkExecuteStatementOperation.scala:299)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:113)
	at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
	at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
	... 18 more

How was this patch tested?

unit tests

@SparkQA
Copy link

SparkQA commented Aug 7, 2019

Test build #108776 has finished for PR 25379 at commit 984e69b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

return stringValue(((HiveDecimal)value));
case BINARY_TYPE:
return stringValue((String)value);
return stringValue(new String((byte[])value));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We stored it as Array[Byte]:

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108796 has finished for PR 25379 at commit c75b1dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good if the test works.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108804 has finished for PR 25379 at commit 7b506e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@wangyum wangyum deleted the SPARK-28474 branch August 8, 2019 08:09
@dongjoon-hyun
Copy link
Member

Hi, @wangyum and @HyukjinKwon .
This test seems to be unstable. I filed a JIRA for this.

case BINARY_TYPE:
return stringValue((String)value);
String strVal = value == null ? null : UTF8String.fromBytes((byte[])value).toString();
return stringValue(strVal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum, #25480 this might be the cause. UTF8String.toString().

new String(getBytes(), StandardCharsets.UTF_8);

It actually reads bytes with UTF-8. IIRC, Java mangles the data if it's not conformed as specified encoding.
For some protocols, it might be unable to recognise those mangled strings back to binary. It might be the cause.

BTW, we should allow arbitrary binary, for instance, for the use case like images.

If you are unable to find the root cause, we can partially revert to the failed protocols for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants