-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-313] Fix select count star error when querying a realtime table #972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -197,10 +197,27 @@ private static synchronized Configuration addRequiredProjectionFields(Configurat | |
| return configuration; | ||
| } | ||
|
|
||
| /** | ||
| * Hive will append read columns' ids to old columns' ids during getRecordReader. In some cases, e.g. SELECT COUNT(*), | ||
| * the read columns' id is an empty string and Hive will combine it with Hoodie required projection ids and becomes | ||
| * e.g. ",2,0,3" and will cause an error. This method is used to avoid this situation. | ||
| */ | ||
| private static synchronized Configuration cleanProjectionColumnIds(Configuration conf) { | ||
|
||
| String columnIds = conf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR); | ||
| if (!columnIds.isEmpty() && columnIds.charAt(0) == ',') { | ||
| conf.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, columnIds.substring(1)); | ||
| if (LOG.isDebugEnabled()) { | ||
| LOG.debug("The projection Ids: {" + columnIds + "} start with ','. First comma is removed"); | ||
| } | ||
| } | ||
| return conf; | ||
| } | ||
|
|
||
| @Override | ||
| public RecordReader<NullWritable, ArrayWritable> getRecordReader(final InputSplit split, final JobConf job, | ||
| final Reporter reporter) throws IOException { | ||
|
|
||
| this.conf = cleanProjectionColumnIds(job); | ||
| LOG.info("Before adding Hoodie columns, Projections :" + job.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR) | ||
| + ", Ids :" + job.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR)); | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with you internally as well, this appears to be a bug in
Hive. It is manifesting becauseHudihas the need to append its minimum set of projection columns i.e its metadata columns even incase of acountquery.But ideally this needs to be fixed in Hive so it does not happen in the first place. https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java#L119
Can we file a Jira with Hive, and add it to the comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, after the discussion and some investigations, Hive is the first place causes this bug and creates the projection column ids like ",2,0,3". What my code does actually is to handle this bug inside Hudi.
Hive has fixed this bug after 3.0.0, but before 3.0.0 we would still face this problem. The Jira for Hive is here: https://issues.apache.org/jira/browse/HIVE-22438.