Skip to content

Reduce writers retained memory utilization#14823

Merged
sopel39 merged 7 commits intotrinodb:masterfrom
starburstdata:ks/reduce_insert_mem
Nov 2, 2022
Merged

Reduce writers retained memory utilization#14823
sopel39 merged 7 commits intotrinodb:masterfrom
starburstdata:ks/reduce_insert_mem

Conversation

@sopel39
Copy link
Copy Markdown
Member

@sopel39 sopel39 commented Oct 28, 2022

Whenever writer is closed, it's instance was
retained so that rollback could be performed
in case of an error. However, this was retaining
excessive amount of memory which was not needed
anymore.

This commit introduces RollbackAction that can
be retained after driver is closed which should
significantly reduce memory usage during long
inserts.

Description

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 28, 2022
@sopel39 sopel39 requested review from dain and removed request for electrum October 28, 2022 16:18
@sopel39 sopel39 force-pushed the ks/reduce_insert_mem branch 3 times, most recently from dd5b9ec to 271064f Compare October 28, 2022 16:24
Copy link
Copy Markdown
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which writers are holding onto memory after they are closed? Can we fix that directly instead of making the code more complicated with this change?

@sopel39
Copy link
Copy Markdown
Member Author

sopel39 commented Oct 28, 2022

Which writers are holding onto memory after they are closed? Can we fix that directly instead of making the code more complicated with this change?

it's OrcWriter and ParquetWriter. I decided that it's too cumbersome and not really elegant to make multiple fields inside these writers @Nullable and mutable. IMO it's cleaner and simpler just to drop writer when it's not needed.

@electrum
Copy link
Copy Markdown
Member

Alternative approach: #14824

@sopel39 sopel39 force-pushed the ks/reduce_insert_mem branch from 271064f to d7e9d07 Compare October 31, 2022 11:47
Copy link
Copy Markdown
Member Author

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ac

@sopel39 sopel39 force-pushed the ks/reduce_insert_mem branch 4 times, most recently from fc5bfe7 to c7bd53a Compare October 31, 2022 17:28
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me making the rollback action a Closeable looks weird, there is no resource that we are releasing.
You could use Runnable instead if the problem is that you want to avoid extra return null.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use Runnable instead if the problem is that you want to avoid extra return null.

I actually use it in try-with-resource, e.g:

        try (rollbackAction) {
            avroWriter.close();
        }
        catch (Exception e) {
            throw new TrinoException(ICEBERG_WRITER_CLOSE_ERROR, "Error rolling back write to Avro file", e);
        }

so I don't have to deal with syntax sugar myself. I think using native Java try-with-resource is actually better than relying on Guava closer.

Instance size of writers was not accounted for by
Delta, Hive and Iceberg writers. Once it becomes
accounted for, then it seems that estimated
memory for Orc and Parquet is excessive.
Whenever writer is closed, it's instance was
retained so that rollback could be performed
in case of an error. However, this was retaining
excessive amount of memory which was not needed
anymore.

This commit makes io.trino.plugin.hive.FileWriter#commit
return rollback action reference so that
page sinks don't have to keep reference to
writers themselves.
@sopel39 sopel39 force-pushed the ks/reduce_insert_mem branch from c7bd53a to bde343f Compare November 2, 2022 13:35
@sopel39 sopel39 merged commit 150f7cc into trinodb:master Nov 2, 2022
@sopel39 sopel39 mentioned this pull request Nov 2, 2022
@sopel39 sopel39 deleted the ks/reduce_insert_mem branch November 2, 2022 14:18
@github-actions github-actions bot added this to the 402 milestone Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants