Skip to content

Conversation

@fsilent
Copy link
Contributor

@fsilent fsilent commented Nov 3, 2022

Change Logs

#6989 cannot support column type evolution.

class that implements SelfDescribingInputFormatInterface,
hive will try to do colum type evolution by itself

Impact

support type change

Risk level (write none, low medium or high below)

low.

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@fsilent
Copy link
Contributor Author

fsilent commented Nov 3, 2022

@xiarixiaoyao please help to review.

@xiarixiaoyao xiarixiaoyao self-requested a review November 3, 2022 12:00
@xiarixiaoyao xiarixiaoyao self-assigned this Nov 3, 2022
@xiarixiaoyao xiarixiaoyao changed the title [MINOR] Support schema evolution for Hive [MINOR] Support column type evolution for Hive Nov 3, 2022
@xiarixiaoyao xiarixiaoyao reopened this Nov 3, 2022
@fsilent fsilent closed this Nov 3, 2022
@fsilent fsilent reopened this Nov 3, 2022
@xiarixiaoyao xiarixiaoyao reopened this Nov 3, 2022
</dependency>

<!-- Hive -->
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need hive dependency here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed. now dont need to add hive dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still showing up. Can you remove this dependency as the change is in hudi-hadoop-mr package and hive-exec is already defined as dependency in hudi-hadoop-mr /pom.xml

@UseFileSplitsFromInputFormat
public class HoodieParquetInputFormat extends HoodieParquetInputFormatBase {
public class HoodieParquetInputFormat extends HoodieParquetInputFormatBase implements
SelfDescribingInputFormatInterface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be better to implements SelfDescribingInputFormatInterface by HoodieParquetInputFormatBase
let's move this change to HoodieParquetInputFormatBase

public abstract class HoodieParquetInputFormatBase extends MapredParquetInputFormat implements Configurable, SelfDescribingInputFormatInterface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. has changed

@xiarixiaoyao xiarixiaoyao reopened this Nov 4, 2022
@xiarixiaoyao xiarixiaoyao reopened this Nov 4, 2022
Copy link
Contributor

@leesf leesf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Please the CI failure and fix

@nsivabalan nsivabalan added priority:high Significant impact; potential bugs engine:hive Hive integration reader-core labels Nov 7, 2022
@fsilent
Copy link
Contributor Author

fsilent commented Nov 7, 2022

Because of SelfDescribingInputFormatInterface not exist in hive 1.x, so i have to add dependency in hudi-spark-common. Otherwise, the SelfDescribingInputFormatInterface class cannot be found if Hive 1.x is used during hudi-spark-common compilation.

@xiarixiaoyao xiarixiaoyao reopened this Nov 8, 2022
@xiarixiaoyao
Copy link
Contributor

@leesf @xushiyan could you pls help review again thanks
an additional question
What is the current community attitude towards hive1.x
hive1.x cannot support type change This is hive's own limitation, not hudi

@xiarixiaoyao
Copy link
Contributor

@hudi-bot run azure

@fsilent
Copy link
Contributor Author

fsilent commented Nov 9, 2022

@hudi-bot run azure

@xiarixiaoyao xiarixiaoyao reopened this Nov 9, 2022
@leesf
Copy link
Contributor

leesf commented Nov 9, 2022

@leesf @xushiyan could you pls help review again thanks an additional question What is the current community attitude towards hive1.x hive1.x cannot support type change This is hive's own limitation, not hudi

I do not think we need to support so old hive version for this feature.

@xushiyan xushiyan self-assigned this Nov 9, 2022
@xiarixiaoyao
Copy link
Contributor

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@pratyakshsharma
Copy link
Contributor

pratyakshsharma commented Nov 16, 2022

LGTM as well, though I am still curious to understand in detail how does this interface SelfDescribingInputFormatInterface help in achieving column type evolution. I tried searching for docs, but could not find anything. If you can point me to relevant docs, it would be great @fsilent @xiarixiaoyao

@fsilent
Copy link
Contributor Author

fsilent commented Nov 16, 2022

LGTM as well, though I am still curious to understand in detail how does this interface SelfDescribingInputFormatInterface help in achieving column type evolution. I tried searching for docs, but could not find anything. If you can point me to relevant docs, it would be great @fsilent @xiarixiaoyao

In hive FetechOpertor, will judge needConversion by the inputformot whether implement SelfDescribingInputFormatInterface
image

@nsivabalan nsivabalan added the release-0.12.2 Patches targetted for 0.12.2 label Dec 6, 2022
@codope codope removed the release-0.12.2 Patches targetted for 0.12.2 label Dec 7, 2022
<artifactId>hive-common</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this dependency as the change is in hudi-hadoop-mr package and hive-exec is already defined as dependency in hudi-hadoop-mr /pom.xml

</dependency>

<!-- Hive -->
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still showing up. Can you remove this dependency as the change is in hudi-hadoop-mr package and hive-exec is already defined as dependency in hudi-hadoop-mr /pom.xml

@bvaradar
Copy link
Contributor

@fsilent : Thanks a lot for the PR. Can you fix the package dependencies and also add a jira ticket in the description.

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Feb 26, 2024
Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bvaradar is this change still needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine:hive Hive integration priority:high Significant impact; potential bugs size:S PR with lines of changes in (10, 100]

Projects

Status: 🏗 Under discussion

Development

Successfully merging this pull request may close these issues.

10 participants