Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Over Hive avro.schema.literal max size due to so long Avro schema #331

Open
daigorowhite opened this issue May 29, 2018 · 7 comments
Open

Comments

@daigorowhite
Copy link

Hi team, I have one issue with long Avro schema with kafka-connect-hdfs hive integration.

When I try to sink long schema table into HDFS with kafka-connect-hdfs.

Success to put data into HDFS , but I got this error when I throw query

0: jdbc:hive2://localhost:10000/default> select * from db.table limit 1 ;
Error: Error while compiling statement: FAILED: RuntimeException MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered exception determining schema. Returning signal schema to indicate problem: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value
 at [Source: java.io.StringReader@2c87f90b; line: 1, column: 6001]) (state=42000,code=40000)

I investigated root cause of this, and it is caused the Hive meta data param varchar size.
https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/avro/AvroHiveUtil.java#L69
https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/avro/AvroHiveUtil.java#L95

mysql> describe TABLE_PARAMS;
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| TBL_ID      | bigint(20)    | NO   | PRI | NULL    |       |
| PARAM_KEY   | varchar(256)  | NO   | PRI | NULL    |       |
| PARAM_VALUE | varchar(4000) | YES  |     | NULL    |       |
+-------------+---------------+------+-----+---------+-------+

Do you have any solution/idea for this in kafka-connect-hdfs?

@rhauch
Copy link
Member

rhauch commented May 29, 2018

I've seen this workaround: log into Hive Metastore DB, then run alter table SERDE_PARAMS MODIFY PARAM_VALUE VARCHAR(40000);

Probably want to pick a realistic size, though.

@daigorowhite
Copy link
Author

It is one way to avoid this problem, but I think it is depended Hive Setting side and could be shorten again by Hive upgrade.
How about just use avro.schema.url to have un-limited size .avro ?

@rhauch
Copy link
Member

rhauch commented May 30, 2018

@daigorowhite yes, using avro.schema.url is fine as long as it can be stored somewhere. HDFS is an obvious place, and that requires non-trivial code changes to handle all of the cases (e.g., the schema changes in a consumed topic and needs to be stored somewhere).

I only mentioned the workaround mentioned because it works today and requires no code changes. It may not be ideal, and it may not work for everyone.

@OneCricketeer
Copy link

OneCricketeer commented Jun 8, 2018

See HIVE-12274.

You could manually apply the upgrade script for Mysql

@daigorowhite
Copy link
Author

Thanks for sharing it! 👍

@OneCricketeer
Copy link

Duplicates #145

@Vincent-Zeng
Copy link

Hi, team.
With hive.integration=true, how can kafka-connect-sink use avro.schema.url instead of avro.schema.literal. Or I need alter table manually in Hive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants