Skip to content

Improve performance of reading iceberg table with many equality delete files#17115

Merged
findepi merged 1 commit intotrinodb:masterfrom
Heltman:iceberg-deletefile-optimize
Sep 26, 2023
Merged

Improve performance of reading iceberg table with many equality delete files#17115
findepi merged 1 commit intotrinodb:masterfrom
Heltman:iceberg-deletefile-optimize

Conversation

@Heltman
Copy link
Copy Markdown
Contributor

@Heltman Heltman commented Apr 19, 2023

Description

Fixes #17114

Additional context and related issues

If a split with many delete file RowPredicate.and will create a deep stack, this pr compact all StructLikeSet to a collection to reduce stack depth.

The stack depth is only a hidden danger. The real problem is that multiple StructLikeSet of a split are not merged according to the ids, resulting in too many StructLikeSet being generated, which makes the filtering efficiency very lowly.

The main content is to classify delete files according to id, and only use the same StructLikeSet to collect deletion data.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Improve performance of reading iceberg table with many equality delete files. ({issue}`17114`)

@cla-bot cla-bot bot added the cla-signed label Apr 19, 2023
@Heltman Heltman requested a review from electrum April 19, 2023 06:26
@github-actions github-actions bot added the iceberg Iceberg connector label Apr 19, 2023
Copy link
Copy Markdown
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core change here, reducing the number of StructLikeSets created seems like a good idea to me. Just a couple code simplification questions.

Just for reference here's a thread on this from the original PR: #13219 (comment)
cc: @findepi

@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch from cd0d4d8 to 4f8e9ca Compare April 20, 2023 04:59
@xiacongling
Copy link
Copy Markdown
Contributor

Great job on a quick fix of the issue, @Heltman @alexjo2144 ! It may significantly improve Trino's performance on Iceberg delete file filtering.

IMHO, the patch made by @alexjo2144 makes less code changes and seems easier to understand, but some problems may not completely resolved:

  1. EqualityDeleteSets of delete files can be grouped by thier schemas. Since the equality field IDs come from the table schema, for two delete files with the same fields in different order, the projection structs will be identical. Using a set is preferred and it is what Iceberg does in its org.apache.iceberg.data.DeleteFilter.
  2. chaining RowPredicts with RowPredict.and will lead to recursive method calls, it will impact the performance and have the risk of stack overflow when too many delete files present. Delete file grouping can significantly reduce the number of delete filters, this improvement may not seem as important. Since delete filter will be applied to each row scanned from data files, even the smallest enhancement may improve the efficiency of the query. Would you consider adding this change as well, @alexjo2144 ? Is there any stats can be provided to prove the performance improvement, @Heltman?

@alexjo2144
Copy link
Copy Markdown
Member

Looks like using a Set is safe there in the Iceberg Spark implementation because they do an additional projection step to ensure that the readers return the Schema stored in the Set there, even if it doesn't match the file schema. We don't have that additional projection here yet, so it will cause problems in Trino.

It needs some clean-up but here's a test case illustrating the problem: https://gist.github.com/alexjo2144/8a80ff5146ab3c82fa0c5fc5b4f33e66

So we can either need to add the additional projection, or use an ordered Collection like a List

@Heltman
Copy link
Copy Markdown
Contributor Author

Heltman commented Apr 25, 2023

@alexjo2144 Schema mismatch is indeed a problem, but fortunately, the additional projection step implemented by spark is also easy to implement in trino. We only need to refer to the implementation of spark and organize the fields in order when reading the delete files. Trino's ParquetReader has additional projection(same as spark). Please check the new commit, I fix this problem and add your test case.

@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from c571e0a to fb9c955 Compare June 8, 2023 08:05
@findinpath
Copy link
Copy Markdown
Contributor

findinpath commented Jun 8, 2023

The stack depth is only a hidden danger. The real problem is that multiple StructLikeSet of a split are not merged according to the ids, resulting in too many StructLikeSet being generated, which makes the filtering efficiency very lowly.

Can you pls provide a test case in your PR which can be used to showcase this situation?

I was trying with the code from master and see only one StructLikeSet with 7 elements (same as in your submission).

@Test
public void testMultipleDeletes()
            throws Exception
    {
        String tableName = "test_equality_delets_different_schemas_" + randomNameSuffix();
        assertUpdate("CREATE TABLE " + tableName + " AS SELECT * FROM tpch.tiny.nation", 25);
        Table icebergTable = updateTableToV2(tableName);
        Assertions.assertThat(icebergTable.currentSnapshot().summary().get("total-equality-deletes")).isEqualTo("0");
        Path metadataDir = new Path(metastoreDir.toURI());
        TrinoFileSystem fs = HDFS_FILE_SYSTEM_FACTORY.create(SESSION);

        String deleteFile1 = "delete_file_" + UUID.randomUUID();
        List<String> firstDeleteFileColumns = ImmutableList.of("regionkey");
        Schema deleteRowSchema = icebergTable.schema().select(firstDeleteFileColumns);
        List<Integer> equalityFieldIds = firstDeleteFileColumns.stream()
                .map(name -> deleteRowSchema.findField(name).fieldId())
                .collect(toImmutableList());
        Parquet.DeleteWriteBuilder writerBuilder = Parquet.writeDeletes(new ForwardingFileIo(fs).newOutputFile(new Path(metadataDir, deleteFile1).toString()))
                .forTable(icebergTable)
                .rowSchema(deleteRowSchema)
                .createWriterFunc(GenericParquetWriter::buildWriter)
                .equalityFieldIds(equalityFieldIds)
                .overwrite();
        EqualityDeleteWriter<Record> writer = writerBuilder.buildEqualityWriter();

        Record dataDelete = GenericRecord.create(deleteRowSchema);
        try (Closeable ignored = writer) {
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 1L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 2L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 3L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 4L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 5L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 6L)));
            writer.write(dataDelete.copy(ImmutableMap.of("regionkey", 7L)));
        }
        icebergTable.newRowDelta().addDeletes(writer.toDeleteFile()).commit();


        assertQuery("SELECT * FROM " + tableName, "SELECT * FROM nation WHERE  regionkey > 7");
        assertUpdate("DROP TABLE " + tableName);
    }

@Heltman
Copy link
Copy Markdown
Contributor Author

Heltman commented Jun 8, 2023

Looks like using a Set is safe there in the Iceberg Spark implementation because they do an additional projection step to ensure that the readers return the Schema stored in the Set there, even if it doesn't match the file schema. We don't have that additional projection here yet, so it will cause problems in Trino.

It needs some clean-up but here's a test case illustrating the problem: https://gist.github.com/alexjo2144/8a80ff5146ab3c82fa0c5fc5b4f33e66

So we can either need to add the additional projection, or use an ordered Collection like a List

@findinpath just check below.

@Heltman
Copy link
Copy Markdown
Contributor Author

Heltman commented Jun 8, 2023

The stack depth is only a hidden danger. The real problem is that multiple StructLikeSet of a split are not merged according to the ids, resulting in too many StructLikeSet being generated, which makes the filtering efficiency very lowly.

Can you pls provide a test case in your PR which can be used to showcase this situation?

It is difficult to simulate this performance problem, because it needs to have more deletefiles, and it has been updated many times according to same columns.

We only need to imagine that if there are two deletefiles, each with 10,000 lines, 9,000 of which are the same. Originally, we need to filter a line of data from 20,000 lines, but now we only need to match 11,000 lines.

@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch from fb9c955 to 66fe0b4 Compare June 9, 2023 03:27
@Heltman
Copy link
Copy Markdown
Contributor Author

Heltman commented Jun 9, 2023

@findinpath , add testMultipleEqualityDeletes, delete file has compact:
image

@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch from 66fe0b4 to 852299d Compare June 9, 2023 04:00
Copy link
Copy Markdown
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall there are some cosmetics improvements needed and testing coverage for nested deletes is missing.

Great work ! 👍

@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from 552b092 to 58a765d Compare August 7, 2023 09:49
@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from d9193b1 to 947faab Compare August 31, 2023 12:19
@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from c1a4fa9 to 7a23964 Compare August 31, 2023 15:03
@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch from 7a23964 to 541cbb4 Compare September 4, 2023 14:28
@alexjo2144
Copy link
Copy Markdown
Member

Pending the last couple comments from Marius, this looks good to me.
It does need a rebase though.

@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch from 541cbb4 to 3e0278f Compare September 18, 2023 03:22
@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch 2 times, most recently from ff47acc to 6ed8f78 Compare September 18, 2023 06:34
@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch from 6ed8f78 to 69ca01b Compare September 21, 2023 07:19
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the call site looks like new EqualityDeleteSet(deleteSchema, schemaFromHandles(readColumns))
are the constructor arg names right? is the call site right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest public EqualityDeleteSet(Schema deleteSchema, Schema fileSchema). Because first one is schema from iceberg metadata, second schema from read file (parquet, orc, etc.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Heltman yes, public EqualityDeleteSet(Schema deleteSchema, Schema fileSchema) should be fine. Let's go forward with this suggestion

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally we reached a consensus, public EqualityDeleteSet(Schema deleteSchema, Schema dataSchema) is good idea. deleteSchema come from iceberg metadata, dataSchema come from equality delete file.

@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch from 69ca01b to 62276ce Compare September 26, 2023 11:58
@findinpath findinpath requested a review from findepi September 26, 2023 12:06
@Heltman Heltman force-pushed the iceberg-deletefile-optimize branch from 62276ce to a3d08f2 Compare September 26, 2023 12:11
@findepi findepi merged commit 04ac9ba into trinodb:master Sep 26, 2023
@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 26, 2023

Merged, thanks!

@findepi findepi changed the title Improve performance of reading iceberg table with many delete files Improve performance of reading iceberg table with many equality delete files Sep 26, 2023
@github-actions github-actions bot added this to the 427 milestone Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

Read Iceberg v2 table with many delete file is very slowly

6 participants