Skip to content

Conversation

@ConeyLiu
Copy link
Contributor

@ConeyLiu ConeyLiu commented Oct 28, 2021

This patch fixes two problems:

  1. For V1 tables, we use a VoidTransform to replace the removed partition field. While the result type of VoidTransform same as the type of the field. For example, the result type of Bucket(2, string_type_field) is int, while is string for VoidTransform(string_type_field). So we should use the original partition field type(int instead of string) to build the common partitioning type.
  2. We should use the matched PartitionSpec to convert the PartitionFiedlSummary to human string, not the current table spec.

Closes #3374

@github-actions github-actions bot added the core label Oct 28, 2021
@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented Oct 28, 2021

Hi @RussellSpitzer @rdblue @openinx @jackye1995, could you help to review this? Thanks a lot.

@RussellSpitzer
Copy link
Member

OK I think I understand the issue now, this is specifically an error where the attempt to build a common Partition Spec builds a spec using void transforms in V1 tables, the void column has a different return type than the original partition transform and throws an error. Is that correct?

I would recommend renaming this issue/pr to match the underlying issue. I also am guessing because this is the issue, there is no problem with V2 tables.

@aokolnychyi Should probably also take a look

@aokolnychyi
Copy link
Contributor

I may be able to take a look within the next few days


Table manifestsTable = new ManifestsTable(table.ops(), table);
TableScan scan = manifestsTable.newScan()
.filter(Expressions.lessThan("length", 10000L));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we filtering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be removed. The test is adjusted from testManifestsTableAlwaysIgnoresResiduals.

@ConeyLiu
Copy link
Contributor Author

Thanks @RussellSpitzer for the review.

OK I think I understand the issue now, this is specifically an error where the attempt to build a common Partition Spec builds a spec using void transforms in V1 tables, the void column has a different return type than the original partition transform and throws an error. Is that correct?

Yes, that's right.

I would recommend renaming this issue/pr to match the underlying issue. I also am guessing because this is the issue, there is no problem with V2 tables.

Any suggestion for this? What about Fixes read metadata failed after dropped partition in V1 tables?

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented Mar 2, 2022

@RussellSpitzer, I am sorry for the late update. Thanks for the review. Comments have been addressed. Pls take another look when you are free.

@aokolnychyi
Copy link
Contributor

Sorry for the delay on my side. This PR is the next one I'll look into. I took a quick look and I think I remember the context.

@aokolnychyi
Copy link
Contributor

Okay, I think I got the issue. Essentially, we don't know the correct type produced by Void transforms that we assign in v1 tables when a partition field is dropped. We assume the output type matches the source field type but it is not always the case. Let me take a look at the solution on Monday.

Map<Integer, PartitionField> fieldMap = Maps.newHashMap();
List<NestedField> structFields = Lists.newArrayList();
Map<Integer, Type> typeMap = Maps.newHashMap();
Map<Integer, String> nameMap = Maps.newHashMap();
Copy link
Contributor

@szlta szlta Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run into similar issues, and I think this will help resolve the type change of columns in older specs.
Another thing I have seen and is probably still a problem, is that this method may return a column name multiple times. Consider the following:
Table schema: a int, b date, c date
spec0: year(b), a
spec1: a
spec2: year(b), year(c)
then the result is something like this: 1000: b_year int, 1001: a int, 1002: b_year int, 1003: c_year int
Further down the line when we construct a Schema object, we will have a failure due to b_year name being present in two fields (1000, 1002).
How should this case be handled? Maybe appending _r [fieldId] to each column name?
cc: @RussellSpitzer , @aokolnychyi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szlta, Iceberg should rename older columns when it updates the spec to avoid conflicts. That's why we have to take the latest column names. I think we should have some tests in TestPartitioning. Could you check, @szlta?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aokolnychyi I think you're referring to what's happening in V1 tables. For those the spec is ever-growing in a way that no partition fields/transforms are removed, but rather converted to void.
The rename logic is there: https://github.com/apache/iceberg/blob/master/core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Ficeberg%2FBaseUpdatePartitionSpec.java#L179
but I think this is not used for V2 tables, as per https://github.com/apache/iceberg/blob/master/core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Ficeberg%2FBaseUpdatePartitionSpec.java#L261-L268

Since in V2, specs don't retain old deleted partition fields this rename is not required for normal operations.

The problem I'm describing only affects the metadata table queries, because for V2, due to the lack of above renames, Partitioning.partitionType() collects all partition fields from all previous specs too. With the lack of renames this can result in the same field name being present multiple times, and cause the PartitionsTable's (or DataFilesTable's) schema to be failed to get constructed.

Copy link
Contributor

@aokolnychyi aokolnychyi Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are right, @szlta. Here is an example to reproduce.

PartitionSpec initialSpec = PartitionSpec.builderFor(SCHEMA)
    .identity("data")
    .build();
TestTables.TestTable table = TestTables.create(tableDir, "test", SCHEMA, initialSpec, V2_FORMAT_VERSION);

table.updateSpec()
    .removeField("data")
    .commit();

table.updateSpec()
    .addField("data")
    .addField("id")
    .commit();

struct<1000: data: optional string, 1001: data: optional string, 1002: id: optional int>

While it would be great to update the logic that evolves the spec, I think we have to adapt the method that builds a common representation too. Otherwise, existing tables may be broken. Maintaining a set of used names and appending a suffix of the field ID sounds like a reasonable approach.

Thoughts, @rdblue @RussellSpitzer @szehon-ho @flyrain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are using field IDs while projecting values from the common representation in Spark.
At least, that part is working.

protected Map<Integer, StructProjection> buildPartitionProjections(Types.StructType partitionType,
                                                                   Map<Integer, PartitionSpec> specs) {
  Map<Integer, StructProjection> partitionProjections = Maps.newHashMap();
  specs.forEach((specID, spec) ->
      partitionProjections.put(specID, StructProjection.create(partitionType, spec.partitionType()))
  );
  return partitionProjections;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have two cases to solve this problem. The combined partition struct type is used in both metadata tables and when we need to project the _partition field for certain queries. For queries that use the _partition field, I think adding a suffix to make the names unique is the right approach. That's internal so it doesn't really matter what we rename to, as long as we get the projections that produce the partition tuple for a given spec right.

For metadata tables, we expect the field names to match the partition names. A simple example is that we use identity partitions using the original column name. So it would be weird to partition by category and need to query category_r1003 in the metadata table. I think that the right thing to do for metadata tables is to have a separate way to produce the combined struct that produces only one partition column per name.

In @aokolnychyi's example, 1001: data should be present and 1000: data should not be used for metadata tables.

This is also an area where we may want to bring back compatible partition columns. There's no reason why Anton's example couldn't detect that 1000: data was in an old spec and reuse the ID to avoid this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about partition transforms where the transform itself could change, but the field name remains the same?
E.g.: bucket(data, 10) would be data_bucket, and after dropping this from spec and re-adding with a new spec bucket(data, 8), id, then we would have 1000: data_bucket, 1001: data_bucket, 1002: id where the two data_bucket fields describe different things. I guess this is similar for truncate.

A partition in the old spec where data_bucket = 0 would be different from a partition in the new spec where data_bucket = 0

I agree with @rdblue that it's weird to have category_r1003 in queries, but if we want to avoid it, I see two ways of proceeding from metadata query perspective:
A metadata table should..

  1. ..either give back partition information as per the latest spec only..
  2. ..or we could combine the data returned for similarly named partition fields cascaded into just one field, but extend such tables with spec_id information in these cases.

So as per above example use case the partitions table would look like this for the two cases (with 1 partition in each spec):

1:
+---------------------------------+--------------------+------------------+
|         test.partition          | test.record_count  | test.file_count  |
+---------------------------------+--------------------+------------------+
| {data_bucket : 0, id : 1}       |          1         |        1         |
+---------------------------------+--------------------+------------------+

2:
+---------------------------------+--------------------+------------------+--------------+
|         test.partition          | test.record_count  | test.file_count  | test.spec_id |
+---------------------------------+--------------------+------------------+--------------+
| {data_bucket : 0, id : null}    |          1         |        1         |       0      |
| {data_bucket : 0, id : 1}       |          1         |        1         |       1      |
+---------------------------------+--------------------+------------------+--------------+

Note this is how it would be for V2 tables. For V1, due to the renaming we're already doing, the renamed fields would be present too (unless we aim at changing that too):

1:
+---------------------------------+--------------------+------------------+
|         test.partition          | test.record_count  | test.file_count  |
+---------------------------------+--------------------+------------------+
| {data_bucket : 0, id : 1}       |          1         |        1         |
+---------------------------------+--------------------+------------------+

2:
+----------------------------------------------------------+--------------------+------------------+--------------+
|         test.partition                                   | test.record_count  | test.file_count  | test.spec_id |
+----------------------------------------------------------+--------------------+------------------+--------------+
| {data_bucket_1000 : 0,    data_bucket : null, id : null} |          1         |        1         |       0      |
| {data_bucket_1000 : null, data_bucket: 0,     id : 1}    |          1         |        1         |       1      |
+----------------------------------------------------------+--------------------+------------------+--------------+

I'm personally more for solution 2, where we don't have to omit the old partitions but at the same time we get a nice and coherent partition info result. It's kind of in league what I'm proposing in issue #4292 (we could also continue the discussion of this problem there, I didn't mean to hijack @ConeyLiu 's PR like this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also up for solution 2, where we rename just the columns from older specs. That sounds like a reasonable solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll catch up today. Sorry for the delay!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the fixes in this PR look good other than a couple minor updates that are needed. We should definitely follow up with a PR that fixes the names as suggested by @szlta.

@rdblue
Copy link
Contributor

rdblue commented Mar 9, 2022

@ConeyLiu, can you update the description? It is really helpful to not only say what you're trying to fix, but also to describe the approach and why the changes are needed. Thanks!

@ConeyLiu ConeyLiu changed the title Fixes read metadata failed after dropped partition Fixes read metadata failed after dropped partition for V1 format Mar 13, 2022
@ConeyLiu
Copy link
Contributor Author

@ConeyLiu, can you update the description? It is really helpful to not only say what you're trying to fix, but also to describe the approach and why the changes are needed. Thanks!

Updated.

.sorted(Comparator.comparingInt(NestedField::fieldId))
List<NestedField> sortedStructFields = fieldMap.keySet().stream()
.sorted(Comparator.naturalOrder())
.map(fieldId -> NestedField.optional(fieldId, nameMap.get(fieldId), typeMap.get(fieldId)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes in this file are correct and fix the first issue, where the type may be incorrect for void transform fields in v1 tables.

@ConeyLiu
Copy link
Contributor Author

Rebased, thanks for the reminder. @aokolnychyi

@RussellSpitzer RussellSpitzer merged commit 6441d6e into apache:master Mar 28, 2022
@RussellSpitzer
Copy link
Member

Thanks @ConeyLiu !

@RussellSpitzer
Copy link
Member

Also thanks to all of the reviewers who took their time to check on this PR, I know this was a tricky issue and I'm glad we had so many eyes on the issue.

@ConeyLiu
Copy link
Contributor Author

Thanks all!

@szehon-ho
Copy link
Member

szehon-ho commented Apr 15, 2022

Should this fix be included in 0.13.2? @rdblue @RussellSpitzer

@rdblue
Copy link
Contributor

rdblue commented Apr 15, 2022

@szehon-ho, that sounds fine to me.

@rdblue rdblue added this to the Iceberg 0.13.2 Release milestone Apr 15, 2022
@szehon-ho
Copy link
Member

szehon-ho commented Apr 15, 2022

👍 looks like it has popped up in a few places.

Thanks, @ConeyLiu let me know if you want to make a pr against 0.13.x branch and I can review, otherwise i will take a look. (not sure exactly if thats the process)

@ConeyLiu
Copy link
Contributor Author

Thanks @szehon-ho, will do it.

ConeyLiu added a commit to ConeyLiu/iceberg that referenced this pull request Apr 16, 2022
…r V1 format (apache#3411)

V1 Tables replace dropped partition transforms with a Void Transform which always returns null. The type of this Void transform always matches the column's original type. This has issues when the column's type differed from the transform used to previously partition the column. Here the issue is patched by retaining the correct type for the transform values when those values are being read after the transform has been dropped.
szehon-ho pushed a commit that referenced this pull request Apr 18, 2022
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
…r V1 format (apache#3411)

V1 Tables replace dropped partition transforms with a Void Transform which always returns null. The type of this Void transform always matches the column's original type. This has issues when the column's type differed from the transform used to previously partition the column. Here the issue is patched by retaining the correct type for the transform values when those values are being read after the transform has been dropped.

(cherry picked from commit 6441d6e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

select .files and rewrite data files failed after remove partition field

6 participants