Enable writer scaling by default by radek-kondziolka · Pull Request #10614 · trinodb/trino

radek-kondziolka · 2022-01-14T13:12:44Z

Add supportsReportingWrittenBytes method to Metadata interface
- Implementing a supportsReportingWrittenBytes method for Iceberg and Hive connectors
Enable scaleWriters by default
Apply scaleWriters only for supporting connectors and tablesThe scaleWriters was being enabled by default but this strategy should be applied only for connectors that supports reporting written bytes (for now)
Add ValidateScaledWritersRightnessUsage to validate whether SCALED_WRITER_DISTRIBUTION partition scheme should be used actually.
Add tests for this feature and fix some other tests
- The test for validator was added too

findepi · 2022-01-14T13:15:08Z

arhimondr

Here is my understanding of the most recent changes:

supportsReportingWrittenBytes replaces a connector capabilities flag. We chose this approach to allow connector return different results for different tables.
supportsReportingWrittenBytes has to accept TableMetadata, as TableHandle is not available for tables that are about to be created (e.g.: CREATE TABLE AS SELECT statement)
TableMetadata has to be included in WriterTarget which must be JSON serializable. The TableMetadata object has to be JSON serializable itself.
TableMetadata includes property map that is currently excluded from serialization. The property map is technically designed to store information such as table format. If this information is lost during serialization supportsReportingWrittenBytes may not function as intended. However at this moment supportsReportingWrittenBytes is invoked before the serialization round trip.

Relying on the fact that TableMetadata is not serialized / deserialized before making a supportsReportingWrittenBytes seems to be a little fragile. To make it less fragile we have to make sure the table properties stored in the TableMetadata are JSON serializable.

@radek-starburst, @martint How strongly do we feel about being able to return different results for different tables? Is there a real precedent when we would like to return different results in an existing connector? I'm just wondering whether it is worth messing with the table properties serialization to provide this capability?

core/trino-main/src/main/java/io/trino/metadata/TableMetadata.java

core/trino-main/src/main/java/io/trino/metadata/Metadata.java

core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java

core/trino-spi/src/main/java/io/trino/spi/connector/ColumnMetadata.java

core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorTableMetadata.java

findepi · 2022-01-18T08:33:50Z

Here is my understanding of the most recent changes: [...]

thanks @arhimondr for listing the issues with metadata / TableMetadata / serialization here.

Writer scaling is somewhat similar to (and currently exclusive with) writing layout.

Modeling -- support for scaling (support for reporting written bytes) could be informed about by the connector in the same place where it chooses the layout.
Future development -- the fact that write layout precludes writer scaling is a bit limiting. At some point we could shift the requirement and make the partitioned writers adaptive too.
See https://trinodb.slack.com/archives/CGB0QHWSW/p1642481584086800 cc @pangyifish @raunaqmorarka @losipiuk @alexjo2144

@radek-starburst did you consider such approach?

radek-kondziolka · 2022-01-18T10:27:30Z

Here is my understanding of the most recent changes:

* `supportsReportingWrittenBytes` replaces a connector capabilities flag. We chose this approach to allow connector return different results for different tables.

Firstly, thanks for you review :)

Yes

* `supportsReportingWrittenBytes` has to accept `TableMetadata`, as `TableHandle` is not available for tables that are about to be created (e.g.: CREATE TABLE AS SELECT statement)

Yes, exactly

* `TableMetadata` has to be included in `WriterTarget` which must be JSON serializable. The `TableMetadata` object has to be JSON serializable itself.

Yes

* `TableMetadata` includes property map that is currently excluded from serialization. The property map is technically designed to store information such as table format. If this information is lost during serialization `supportsReportingWrittenBytes` may not function as intended. However at this moment `supportsReportingWrittenBytes` is invoked before the serialization round trip.

No, it is not invoked before serialization round trip (not in tests). This object is serializable but not completelty serializable what means that just some fields will be null.

Relying on the fact that TableMetadata is not serialized / deserialized before making a supportsReportingWrittenBytes seems to be a little fragile. To make it less fragile we have to make sure the table properties stored in the TableMetadata are JSON serializable.

Yes, it is not so easy to make it completelty serializable because of Map<String, Object> but it is doable (I think so).

@radek-starburst, @martint How strongly do we feel about being able to return different results for different tables? Is there a real precedent when we would like to return different results in an existing connector? I'm just wondering whether it is worth messing with the table properties serialization to provide this capability?

For me, I cannot imagine a such situation that we can have different results for different tables.

radek-kondziolka · 2022-01-18T10:47:27Z

Here is my understanding of the most recent changes: [...]

thanks @arhimondr for listing the issues with metadata / TableMetadata / serialization here.

Writer scaling is somewhat similar to (and currently exclusive with) writing layout.
* Modeling -- support for scaling (support for reporting written bytes) could be informed about by the connector in the same place where it chooses the layout.

* Future development -- the fact that write layout precludes writer scaling is a bit limiting. At some point we could shift the requirement and make the partitioned writers adaptive too.
  See https://trinodb.slack.com/archives/CGB0QHWSW/p1642481584086800 cc @pangyifish @raunaqmorarka @losipiuk  @alexjo2144
@radek-starburst did you consider such approach?

@findepi , I did not. I could try to implement it as you suggest, but firstly, we would better wait for all to agree the common version.

martint · 2022-03-25T15:52:40Z

core/trino-main/src/main/java/io/trino/metadata/Metadata.java

TableMetadata is a container object that describes the shape of a table (columns, properties, etc). Using it to identify a table is not appropriate, especially given that a caller can make up any TableMetadata it wants.

Tables should be identified either by name (catalog/schema/name) or by handle (i.e., and opaque identifier).

ok @martint we go with that

@arhimondr @radek-starburst @martint

i happened to discuss this with @radek-starburst today

for existing tables (INSERT) the ConnectorTableHandle should be presented to the ConnectorMetadata to drive the decision. ConnectorTableHandle is the best info we can provide, so let's use it,

for new tables (CTAS) the ConnectorTableMetadata should be presented to the connector. It contains the best information possible. Showing only properties is not sufficient -- it allows connector to inspect eg partitioning, but eg doesn't allow to inspect types.

for UPDATE and DELETE nothing is needed since writer scaling is not applicable for these operations -- the operations happen within source split

for TABLE EXECUTE -- ideally we should provide ConnectorTableHandle + ConnectorTableExecuteHandle, since connector's decision may depend on the actual operation being executed. However, since we don't have such use-case today AND it is easy to backwards-compatibly add a new such API in the future, let's provide ConnectorTableHandle only as in INSERT case

arhimondr

LGTM % nits

We usually try to avoid (but not always do) having commits like Implement class A. Generally the goal should be for every commit to be self-sufficient and represent a meaningful self containing change.

I would recommend squashing the commits into a single one with a simple title Enable scaled_writers by default for supported connectors

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/AddExchanges.java

core/trino-main/src/main/java/io/trino/sql/planner/plan/TableWriterNode.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

radek-kondziolka · 2022-05-04T06:14:51Z

I would recommend squashing the commits into a single one with a simple title Enable scaled_writers by default for supported connectors

Do you mean to squash all commits to the one?

arhimondr

Do you mean to squash all commits to the one?

Yes, they are all part of the same logical change

core/trino-main/src/main/java/io/trino/sql/planner/plan/TableWriterNode.java

This PR enable scale writers by defult, adds supportsReportingWrittenBytes method to Metadata and implements a supportsReportingWrittenBytes method for Iceberg, DeltaLake and Hive connectors. Additionally, validator ValidateScaledWritersUsage was added to validate whether the SCALED_WRITER_DISTRIBUTION partition scheme should be used actually. Adding tests for this feature and fix some other tests. The test for validator was added too.

mosabua · 2022-05-05T16:36:34Z

I think we need a release notes entry for this. Any suggestions?

mosabua · 2022-05-05T17:30:37Z

First cut of docs update .. #12261

arhimondr · 2022-05-05T17:53:31Z

@mosabua

I think we need a release notes entry for this. Any suggestions?

Core

* Scaled writers are now enabled by default for supporting connectors (such as Hive, Deltalake, Iceberg)

findepi · 2022-06-10T14:42:47Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+                .setSystemProperty(TASK_WRITER_COUNT, "2")
+                .build();
+        getQueryRunner().execute(session, format("CREATE TABLE IF NOT EXISTS %s AS SELECT * FROM %s", "linetime_multiple_file_backed", "tpch.tiny.lineitem")).getMaterializedRows();
+        getQueryRunner().execute(session, format("CREATE TABLE IF NOT EXISTS %s AS SELECT * FROM %s", "orders_multiple_file_backed", "tpch.tiny.orders")).getMaterializedRows();


just use assertUpdate

findepi · 2022-06-10T14:45:14Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

+        // We need to prepare tables for this test. The test is required to use tables that are backed by at lest two files
+        Session session = Session.builder(getSession())
+                .setSystemProperty(TASK_WRITER_COUNT, "2")


I don't think writer count >= 2 guarantees number of files >= 2.
It still may be possible that all data ends up on a single machine and single thread (just not very likely).

Explicit partitioning, or using small target-file-size can guarantee 2+ files.

Also, if the test relies on # of files, it should explicitly validate the state of the table, eg via SELECT count(DISTINCT "$path").

cc @alexjo2144

I did not know that. I was thinking that it depends on number of writers. I am going to analyze it more deeply.

cla-bot bot added the cla-signed label Jan 14, 2022

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from 6fa2dc1 to d068d3c Compare January 14, 2022 13:41

radek-kondziolka requested review from arhimondr, losipiuk, martint and sopel39 January 14, 2022 13:41

radek-kondziolka mentioned this pull request Jan 14, 2022

Update properties-writer-scaling.rst fileThe description of option w… #10615

Closed

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from d068d3c to fd86491 Compare January 14, 2022 14:13

github-actions bot added the tests:hive label Jan 14, 2022

radek-kondziolka marked this pull request as ready for review January 17, 2022 08:14

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch 4 times, most recently from 463d3e9 to 848ff39 Compare January 17, 2022 21:51

arhimondr reviewed Jan 18, 2022

View reviewed changes

findepi changed the title ~~Rk/enable by default scale writers~~ Enable writer scaling by default Jan 18, 2022

findepi added enhancement New feature or request performance labels Jan 18, 2022

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from 848ff39 to d13e1b4 Compare January 18, 2022 10:40

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from d13e1b4 to dd68992 Compare January 18, 2022 10:51

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from dd68992 to 2ef4539 Compare February 17, 2022 11:04

martint reviewed Mar 25, 2022

View reviewed changes

findepi mentioned this pull request Mar 28, 2022

ALTER TABLE ... EXECUTE optimize (file_size_threshold) does not work when writer count is large #11672

Open

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch 3 times, most recently from 2b925fd to 923eb04 Compare April 5, 2022 06:35

arhimondr approved these changes Apr 29, 2022

View reviewed changes

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from 209079b to 732a85d Compare May 4, 2022 06:14

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from 732a85d to 8f5bd98 Compare May 4, 2022 06:36

radek-kondziolka requested review from arhimondr and losipiuk May 4, 2022 07:08

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch 5 times, most recently from 1e0ae04 to f355e69 Compare May 4, 2022 11:55

arhimondr approved these changes May 4, 2022

View reviewed changes

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch 3 times, most recently from b43d561 to 87a6587 Compare May 4, 2022 16:11

arhimondr mentioned this pull request May 4, 2022

Add properties, requirements, and best practices from FTE blog #12231

Merged

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from 87a6587 to 8032524 Compare May 4, 2022 19:04

radek-kondziolka force-pushed the rk/enable_by_default_scale_writers branch from 8032524 to 2e96713 Compare May 5, 2022 06:14

arhimondr merged commit 39c0273 into trinodb:master May 5, 2022

mosabua mentioned this pull request May 5, 2022

Add Trino 380 release notes #12184

Merged

github-actions bot added this to the 380 milestone May 5, 2022

mosabua mentioned this pull request May 5, 2022

Release notes for 380 #12182

Closed

mosabua mentioned this pull request May 5, 2022

Update writer scaling documentation #12261

Merged

findepi reviewed Jun 10, 2022

View reviewed changes

findepi mentioned this pull request Jun 10, 2022

Flaky TestIcebergParquetConnectorTest.testLocalDynamicFilteringWithSelectiveBuildSizeJoin #10932

Closed

Conversation

radek-kondziolka commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findepi commented Jan 14, 2022

Uh oh!

arhimondr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

findepi commented Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

radek-kondziolka commented Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

radek-kondziolka commented Jan 18, 2022

Uh oh!

martint Mar 25, 2022

Choose a reason for hiding this comment

Uh oh!

radek-kondziolka Mar 30, 2022

Choose a reason for hiding this comment

Uh oh!

findepi Apr 1, 2022

Choose a reason for hiding this comment

Uh oh!

arhimondr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radek-kondziolka commented May 4, 2022

Uh oh!

arhimondr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mosabua commented May 5, 2022

Uh oh!

mosabua commented May 5, 2022

Uh oh!

arhimondr commented May 5, 2022

Uh oh!

findepi Jun 10, 2022

Choose a reason for hiding this comment

Uh oh!

findepi Jun 10, 2022

Choose a reason for hiding this comment

Uh oh!

radek-kondziolka Jun 13, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

6 participants

radek-kondziolka commented Jan 14, 2022 •

edited

Loading

findepi commented Jan 18, 2022 •

edited

Loading

radek-kondziolka commented Jan 18, 2022 •

edited

Loading