Skip to content

Conversation

@Paddy0523
Copy link
Contributor

Change Logs

Fixed hadoop configuration not being applied by org.apache.hudi.source.FileIndex

Impact

FileIndex uses the DEFAULT HoodieFlinkEngineContext to get the partitionPath without using the information in the configuration.
image
Since I was connecting to a remote hadoop, I subsequently got the following error due to missing configuration
image
image

Risk level (write none, low medium or high below)

none

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@Paddy0523
Copy link
Contributor Author

PTAL.@danny0405

}
String[] partitions = getOrBuildPartitionPaths().stream().map(p -> fullPartitionPath(path, p)).toArray(String[]::new);
FileStatus[] allFiles = FSUtils.getFilesInPartitions(HoodieFlinkEngineContext.DEFAULT, metadataConfig, path.toString(), partitions)
FileStatus[] allFiles = FSUtils.getFilesInPartitions(hoodieFlinkEngineContext, metadataConfig, path.toString(), partitions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of hadoop configuration do you wanna to pass around?

Copy link
Contributor Author

@Paddy0523 Paddy0523 May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that
image


private FileIndex(Path path, Configuration conf, RowType rowType, DataPruner dataPruner, PartitionPruners.PartitionPruner partitionPruner, int dataBucket) {
org.apache.hadoop.conf.Configuration hadoopConf = HadoopConfigurations.getHadoopConf(conf);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the hadoop conf as member instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I will keep the hadoop conf as member

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a warning. Can we ignore it? Or do I need to make any other changes
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating the HoodieFlinkEngineContext on the fly should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right.

@danny0405 danny0405 self-assigned this May 4, 2023
@hudi-bot
Copy link
Collaborator

hudi-bot commented May 4, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 053dd4b into apache:master May 4, 2023
yihua pushed a commit to yihua/hudi that referenced this pull request May 15, 2023
…he#8595)

Passed the hadoop config options from per-job to the FileIndex correctly.
yihua pushed a commit to yihua/hudi that referenced this pull request May 15, 2023
…he#8595)

Passed the hadoop config options from per-job to the FileIndex correctly.
yihua pushed a commit to yihua/hudi that referenced this pull request May 17, 2023
…he#8595)

Passed the hadoop config options from per-job to the FileIndex correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine:flink Flink integration

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants