Skip to content

Conversation

@CTTY
Copy link
Contributor

@CTTY CTTY commented Mar 15, 2023

Change Logs

Fixed a potential serialization issue when Hudi is running on FileSystem implementation whose FileStatus is not serializable.

Impact

no impact

Risk level (write none, low medium or high below)

none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@CTTY
Copy link
Contributor Author

CTTY commented Mar 15, 2023

@hudi-bot run azure

@danny0405 danny0405 self-assigned this May 4, 2023
@danny0405
Copy link
Contributor

Can you rebase with the latest master and re-trigger the test?

@danny0405
Copy link
Contributor

There are test failues, can you squash with the latest master, let's see whether the failures could be fixed.

@danny0405
Copy link
Contributor

@CTTY You need to rebase with the latest master code to trigger the Azure CI, there had been some changes on the Azure CI conf files.

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, fine with the change, can you take care of the test failures @CTTY ~

@yihua
Copy link
Contributor

yihua commented Jun 10, 2023

@CTTY It looks like Hive query fails in the bundle validation due to: org.apache.hive.service.cli.HiveSQLException: java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/avro/AvroMissingFieldException. Could you take a look? This is likely because hudi-hadoop-mr-bundle relies on Avro 1.8 which does not have the class, while other modules can be compiled with a higher Avro version. And this PR uses Avro to serialize the file status.

@CTTY
Copy link
Contributor Author

CTTY commented Jun 10, 2023

@CTTY It looks like Hive query fails in the bundle validation due to: org.apache.hive.service.cli.HiveSQLException: java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/avro/AvroMissingFieldException. Could you take a look? This is likely because hudi-hadoop-mr-bundle relies on Avro 1.8 which does not have the class, while other modules can be compiled with a higher Avro version. And this PR uses Avro to serialize the file status.

Thanks for pointing this out.

I guess we still need a new SerializableFileStatus so we don't have to depend on Avro-generated HoodieFileStatus if hudi-hadoop-mr-bundle has to use hive.avro.version which is 1.8.2. I'll try to add it back later.

@hudi-bot
Copy link
Collaborator

hudi-bot commented Aug 7, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build


// List all directories in parallel
engineContext.setJobStatus(this.getClass().getSimpleName(), "Listing all partitions with prefix " + relativePathPrefix);
List<FileStatus> dirToFileListing = engineContext.flatMap(pathsToList, path -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CTTY looks like in the latest master, we no longer return FileStatus here (the Path instances are used instead). Is this PR still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I don't think this is needed anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'll close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine:spark Spark integration

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

4 participants