Reduce memory utilization on the Driver during the commit phase by arhimondr · Pull Request #16120 · prestodb/presto

arhimondr · 2021-05-19T18:00:00Z

In Presto each writer produces a PartitionUpdate object that contains file names and other meta information for the files being written. This information is then collected on the driver to perform a final commit (do file renames). In some cases this meta information could be quite large. This patch tries to optimize several things:

Reduce PartitionUpdate memory footprint on the Driver by serializing to SMILE instead of JSON and applying ZSTD compression
Release serializes and compressed pages as soon as they read by the engine on the driver. This should help avoid double memory utilization

== RELEASE NOTES ==

Presto on Spark Changes
* Reduce commit memory footprint on the Driver

arhimondr · 2021-05-19T18:01:26Z

This is all in addition to the #16036, that should significantly reduce memory utilization on the driver as the statistic pages no longer have to be buffered in the TableFinishOperator for the Presto on Spark usecase (Thanks @viczhang861 for optimizing it!)

viczhang861

Try to make title "Release inmemory input pages incrementally" better, inmemory is only used for PrestoSparkTaskInputs

presto-hive/src/main/java/com/facebook/presto/hive/CreateEmptyPartitionProcedure.java

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

presto-hive/src/main/java/com/facebook/presto/hive/HivePageSink.java

presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java

presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkQueryExecutionFactory.java

presto-hive/src/main/java/com/facebook/presto/hive/HivePageSink.java

viczhang861 · 2021-05-20T00:37:02Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java

extra "the" after "compress"

To decrease memory pressure pages from the inmemory input can be released as soon as they are read by the Spark source operator

tdcmeehan · 2021-05-20T15:02:23Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveUtil.java

+        try (ByteArrayOutputStream output = new ByteArrayOutputStream();
+                ZstdOutputStreamNoFinalizer zstdOutput = new ZstdOutputStreamNoFinalizer(output)) {
+            codec.writeBytes(zstdOutput, instance);
+            zstdOutput.close();
+            output.close();
+            return output.toByteArray();
+        }
+        catch (IOException e) {
+            throw new UncheckedIOException(e);
+        }
+    }


Suggested change

try (ByteArrayOutputStream output = new ByteArrayOutputStream();

ZstdOutputStreamNoFinalizer zstdOutput = new ZstdOutputStreamNoFinalizer(output)) {

codec.writeBytes(zstdOutput, instance);

zstdOutput.close();

output.close();

return output.toByteArray();

}

catch (IOException e) {

throw new UncheckedIOException(e);

}

}

try (ByteArrayOutputStream output = new ByteArrayOutputStream()) {

try (ZstdOutputStreamNoFinalizer zstdOutput = new ZstdOutputStreamNoFinalizer(output)) {

codec.writeBytes(zstdOutput, instance);

}

return output.toByteArray();

}

catch (IOException e) {

throw new UncheckedIOException(e);

}

}

In theory it should be the same. Java guarantees to close all resources in a reverse order.

public class Main { private static class Closeable1 implements Closeable { @Override public void close() { System.out.println("Close Closeable1"); } } private static class Closeable2 implements Closeable { @Override public void close() { System.out.println("Close Closeable2"); } } public static void main(String[] args) { try (Closeable1 closeable1 = new Closeable1(); Closeable2 closeable2 = new Closeable2()) { System.out.println("Body"); } } }

Prints

Body Close Closeable2 Close Closeable1

The comment was more around not calling close() explicitly, since the try-with-resources does that.

Oh, sorry. I misunderstood. Let me create a patch

Actually I'm not sure if it is correct to call return output.toByteArray() before the ByteArrayOutputStream is closed? The implementation allows it, but I wonder if that's what is expected?

tdcmeehan · 2021-05-20T15:10:48Z

presto-spark-base/src/main/java/com/facebook/presto/spark/util/PrestoSparkUtils.java

        return scala.reflect.ClassTag$.MODULE$.apply(clazz);
    }
+
+    public static <T> Iterator<T> getNullifyingIterator(List<T> list)


I guess you can't just call remove() on the iterator because that's a code change in Spark?

In theory remove on the iterator for an ArrayList is an O(N) operation (because it has to shift the "tail"). Although in practice I don't think it is going to be an issue, just being on a safer side (just in case there's a query that produces a list with a number of pages that would make this complexity to create a problem)

arhimondr requested review from aweisberg, pgupta2, tdcmeehan and viczhang861 May 19, 2021 18:01

viczhang861 reviewed May 19, 2021

View reviewed changes

pgupta2 reviewed May 19, 2021

View reviewed changes

presto-hive/src/main/java/com/facebook/presto/hive/HivePageSink.java Outdated Show resolved Hide resolved

pgupta2 approved these changes May 19, 2021

View reviewed changes

arhimondr force-pushed the optimize-partition-update branch from a089538 to 07a07d0 Compare May 19, 2021 22:44

viczhang861 approved these changes May 20, 2021

View reviewed changes

presto-hive/src/main/java/com/facebook/presto/hive/HiveClientConfig.java Outdated

Copy link

Contributor

viczhang861 May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra "the" after "compress"

arhimondr force-pushed the optimize-partition-update branch 2 times, most recently from d3c8757 to 5af3c26 Compare May 20, 2021 13:39

arhimondr added 3 commits May 20, 2021 10:13

Support compression for PartitionUpdate in Hive connector

9f09379

Release inmemory input pages incrementally

9976238

To decrease memory pressure pages from the inmemory input can be released as soon as they are read by the Spark source operator

Log sizes for pages received on the Driver

faa30da

arhimondr force-pushed the optimize-partition-update branch from 5af3c26 to faa30da Compare May 20, 2021 14:13

tdcmeehan approved these changes May 20, 2021

View reviewed changes

arhimondr merged commit 4bd5dc5 into prestodb:master May 20, 2021

arhimondr deleted the optimize-partition-update branch May 20, 2021 15:53

sujay-jain mentioned this pull request May 21, 2021

Add release notes for 0.254 #16141

Merged

10 tasks

ajaygeorge mentioned this pull request May 26, 2021

[TEST] Add release notes for 0.254 #16165

Closed

16 tasks

Conversation

arhimondr commented May 19, 2021

Uh oh!

arhimondr commented May 19, 2021

Uh oh!

viczhang861 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants