Skip to content

Conversation

@tprelle
Copy link
Contributor

@tprelle tprelle commented Apr 20, 2022

Fix : #4604
When you you add a inner struct field as partition field without any transformation on an Iceberg orc table, you can not select anymore only this field because the as is a partition, iceberg do not read it from the file as it's inside a struct OrcValueReaders it's not able to reconstruct a constant struct.

It's working when you select also another field from the struct because as OrcValueReaders is able to read from the file using the another field reader

So instead of trying to add it as a constant, read directly from the files
the value of a hidden partition based on inner struct value even if you only read this column

@tprelle tprelle changed the title fixAddPartitionInnerStructField Orc : Fix inner struct field as partition (#4604) Apr 21, 2022
@rdblue
Copy link
Contributor

rdblue commented Apr 24, 2022

@shardulm94 can you take a look at this?

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tprelle! Thanks for this patch.

Normally we only change one version per project at a time and then backport. Becase of changes in api, that's not always possible but it does make reviewing easier. Would that be possible?

return selectNot(struct, fieldIds, false);
}

public static Types.StructType selectNot(Types.StructType struct, Set<Integer> fieldIds, boolean filteredStruct) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This boolean name is a bit strange to me. Is there perhaps a more descriptive name for what this corresponds to? Maybe doesHaveInnerStruct or doesHaveInnerProjectedStruct?

I was going to ask you to inline the field name with the raw boolean values in the tests, like true /* filteredStruct /*, but that still left me with some confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kbendick,
Thanks for the review.
I had trouble to name this variable. It's may more doesNeedToFilteredInnerStruct.
I add this variable because on some cases we need to filtered inner struct (ORC) in some cases we do not need.

Is doesNeedToFilteredInnerStruct is better ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just renamed it to doesNeedToKeepInnerStruct
Because it's more if it's a struct and if all ids need to be filtered we need keep all. So we remove all ids from filtered ids set

@tprelle
Copy link
Contributor Author

tprelle commented May 14, 2022

Hi @tprelle! Thanks for this patch.

Normally we only change one version per project at a time and then backport. Becase of changes in api, that's not always possible but it does make reviewing easier. Would that be possible?

So @kbendick, as i keep the old signature methode on the api it's possible.
To check : On the first PR i fix the major version of flink and spark. Then i will publish a backport PR for others versions ?

Instead of trying to add it as a constant, read directly from the files
the value of a hidden partition based on inner struct value.
@github-actions
Copy link

github-actions bot commented Aug 8, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 8, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orc : Bug when adding a inner struct field as partition field

3 participants