Fix inserting into transactional table when task_writer_count > 1 by homar · Pull Request #10261 · trinodb/trino

homar · 2021-12-10T08:39:14Z

homar · 2021-12-13T22:41:17Z

failure is not related #8432

findepi · 2021-12-14T08:43:34Z

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/StreamPropertyDerivations.java

if this is a transaction

do you mean transactional table?

node.getPartitioningScheme().getPartitioning().getHandle().getTransactionHandle().isPresent()

this doesn't let us recognize what kind of table we're dealing with.
in fact, i'd expect this to be always true whenever we're dealing with connector-provided partitioning.
i am validating my understanding with Remove redundant null-friendliness commit in #10293

What about dropping this condition, and adding the "if arguments list is empty" logic directly to the code below?

do you mean transactional table?

this is exactly what I meant, was that an incorrect assumption ?

this doesn't let us recognize what kind of table we're dealing with.

any idea how to recognize if we are dealing with transactional table ?

What about dropping this condition, and adding the "if arguments list is empty" logic directly to the code below?

actually I wanted to avoid that, doing it this way will change the behaviour for all the situation and I want the change only for transactional tables. There is an explicit check:

checkArgument(distribution == SINGLE || !this.partitioningColumns.equals(Optional.of(ImmutableList.of())), "Multiple streams must not be partitioned on empty set");

modifying the logic you mentioned may cause this check not to fail for situation when it should. I just wanted to make it pass for transactional table as we know there will be one bucket and thus one stream only.

Here is how it works

Hive provides bucketing function to be used when distributing writes. The function is 0-arg, because it's an artificial bucketing. The point is -- we want to have exactly one writer to the table.

StreamPropertyDerivations chokes on the 0-arg bucketing function.

The fix can be

make StreamPropertyDerivations not choke on that (like you did)

find some other way for a connector to make sure there is only one writer

make Hive fool engine -- declare false argument to bucketing function, pretending it's not 0-arg

that would be working around engine's limitation. We shouldn't need to do that though

anything else? -- @electrum might know better

cc @losipiuk @arhimondr

I just wonder if this is an accidentally choking or if it was made on purpose like this in which case removing that choking won't break some other cases.

findepi · 2021-12-14T08:45:41Z

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/StreamPropertyDerivations.java

The pattern Optional.of( expr ).filter( condition ) is clever, but IMO doesn't make the code more readable.

findepi · 2021-12-14T08:47:45Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSink.java

add a hint why we don't want that

findepi · 2021-12-14T08:49:02Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

Counter -> Count
GE -> GreaterThan

findepi · 2021-12-14T08:49:14Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

stray ; im query

findepi · 2021-12-14T08:50:10Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

This is well-known 25. Just use, even without declaring a constant.

findepi · 2021-12-14T08:50:26Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

Do we expect exactly one file to be created? let's have an assertion on that

Isn't nation to small. Would two writers be still used for a source table which only has 1 split (I assume nation has just one).
Maybe use UNION of a couple of NATION tables as a source.

this was just an example from issue description #9149 I removed that whole test

findepi · 2021-12-14T08:52:12Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

When this fails, you don't know how many files there are

// There should be only 1 file assertThat(onTrino().executeQuery("SELECT count(DISTINCT \"$path\" FROM " + tableName)) .containsOnly(row(1L));

homar · 2021-12-15T13:34:44Z

@findepi please take another look

findepi

cc @losipiuk @alexjo2144

findepi · 2021-12-16T10:10:16Z

core/trino-main/src/main/java/io/trino/sql/planner/LocalExecutionPlanner.java

I'm limited, so i find .orElse(false) hard to follow
I'd write

if (node.getPartitioningScheme().isPresent() && node.getPartitioningScheme().get().getPartitioning().getHandle().isSingleNode()) {

findepi · 2021-12-16T10:24:11Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSink.java

It's not about consistency between non-transactional bucketed and transactional, non-bucketed (implicitly bucketed) tables.
The naming rules for the two are different.
For transactional tables, the naming convention is bucket_<bucket-number> (eg bucket_00000). Doesn't contain any random, or incrementing part, so we simply cannot create more than once file.

trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveWriterFactory.java

Line 452 in d79f6ea

path = createHiveBucketPath(subdirPath, bucketToUse, table.getParameters());

For bucketed, non-transactional tables -- actually i am not sure why we have this condition here. @dain would know.
My reading of the code leads to the following naming pattern for bucketed files

format("0%s_0_%s", paddedBucket, queryId.get())

-- if this is the right one (didn't test), then it's constant for bucket x query, so we cannot create more than 1 file either. (We could easily improve that by incrementing this _0_ part, but that's another story).

The condition seems good, but comments needs rewording.

redundant (...) parens in sequence of ||

I meant to be consistent with the behaviour -> when table is non transactional and has 1 bucket, writer count is ignored and only 1 file is created. I must have used wrong wording.

findepi · 2021-12-16T10:25:26Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSinkProvider.java

move before handle.getBucketProperty(),

findepi · 2021-12-16T10:26:32Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

use assertEquals(numberOfCreatedFiles, 1, "There should be only 1 file created")

homar · 2021-12-16T11:55:06Z

There is a test failure: io.trino.tests.product.hive.TestHiveTransactionalTable.testUpdateFullAcidWithOriginalFilesTrinoInserting [true, NONE]
I am trying to figure it out.

losipiuk · 2021-12-20T20:22:08Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java

name it bucketingColumnTypes

losipiuk · 2021-12-20T20:23:01Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java

Move !hiveTypes.isEmpty() to be next to !partitionColumns.isEmpty()

losipiuk · 2021-12-20T20:28:23Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

Do we need the test above? This one seems to cover same stuff + DELETE

right, I will delete it

findepi · 2021-12-21T08:00:02Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java

I don't understand why && !bucketingColumnTypes.isEmpty() is added here.

cc @raunaqmorarka @sopel39

In my understanding because we started to rely more on HivePartitioningHandle and its isSingleNode method and I wanted to isUsePartitionedBucketing to return value that is consistent with isSingleNode - if isSingleNode returns true than isUsePartitionedBucketing should return false

it seems that isUsePartitionedBucketing is now false for partitioned, unbucketed, non-transactional tables, while it used to be true.

unbacketed means hiveBucketHandle.isEmpty() So it looks we should bail out of the method earlier and never get here.

unbacketed means hiveBucketHandle.isEmpty() So it looks we should bail out of the method earlier and never get here.

Yes. I think we should hit if (hiveBucketHandle.isEmpty()) { earlier in the code, so I don't think this check here is needed

I probably don't understand something but I just tested and partitioned transactional table(so 1 implicit bucket) seems to work fine, different partitions are created and each of them have 1 bucket

Writes will be correct, but with the change here only one node and one thread (in entire cluster) will be writing data. The code you changed distributes writes between worker nodes, so we can avoid single writer in entire cluster.

@homar it sounds like you tested transactional partitioned tables (unbucketed; aka with implicit 1 bucket)
The concern is about INSERT into non-transactional partitioned, unbucketed table.

@homar it sounds like you tested transactional partitioned tables (unbucketed; aka with implicit 1 bucket)
The concern is about INSERT into non-transactional partitioned, unbucketed table.

I think we want to redistribute writes even for transactional, partitioned and bucketed (implicit 1 bucket).

@homar it sounds like you tested transactional partitioned tables (unbucketed; aka with implicit 1 bucket)
The concern is about INSERT into non-transactional partitioned, unbucketed table.

unbucketed, non-transactional should not reach this code beacause of if (hiveBucketHandle.isEmpty()) { earlier in the code. But unfortunately I am afraid that @sopel39 comments regarding decreasing number of writers is still a valid concern.

@sopel39 again I probably miss something but even with my changes for transactional, unbucketed(so 1 implicit bucket) and partitioned table when I try to make an insert that creates 100 partitions here https://github.com/trinodb/trino/blob/ee2ef32e6f09515a888a016adc1cc6ccd32cbae4/plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSink.java#L354 I get 100 writers. Maybe you mentioned different writers - in such a case please point me to the particular part of the code

findepi · 2021-12-21T08:00:32Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java

Suggested change

ImmutableList<HiveType> bucketingColumnTypes = hiveBucketHandle.get().getColumns().stream()

List<HiveType> bucketingColumnTypes = hiveBucketHandle.get().getColumns().stream()

findepi · 2021-12-21T08:01:44Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSink.java

make final
and maybe move before private final int[] dataColumnInputIndex

findepi · 2021-12-21T08:03:38Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSink.java

move before this.hdfsEnvironment = ...

(this isn't ideal; ideally fields, constructor params and assignments follow same ordering, but this is a mess here, so ideal place doesn't exist)

findepi · 2021-12-21T08:04:19Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePartitioningHandle.java

Add a comment.

findepi · 2021-12-21T08:07:06Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

This isn't very specific. "Output partitioning: SINGLE" could be for example for the top Output stage, or many times in the plan. Would it be possible to identify the actual stage we want to test for, here? It's probably source of table writer, right?

cc @losipiuk

findepi · 2021-12-21T08:07:52Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

Suggested change

public void testDataIsNotBrokenInUnbucketedTransactionalTableWithTaskWriterCountGreaterThan1()

public void testUnbucketedTransactionalTableWithTaskWriterCountGreaterThan1()

findepi · 2021-12-21T12:32:07Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

Should we have a test with task_writer_count>1 for unpartitioned (here) and also partitioned tables?
i think we should.

homar · 2021-12-22T15:27:26Z

@losipiuk @sopel39 @findepi I removed that problematic line from HiveMetadata and made a small change to HivePartitioningHandle.isSingleNode so could you please take another look ?

sopel39 · 2021-12-23T17:25:12Z

@losipiuk @sopel39 @findepi I removed that problematic line from HiveMetadata and made a small change to HivePartitioningHandle.isSingleNode so could you please take another look ?

Could you add tests similar as io.trino.plugin.hive.AbstractTestHive#testInsertPartitionedBucketedTableLayout and io.trino.plugin.hive.AbstractTestHive#testInsertBucketedTableLayout for transactional bucketed and bucketed-partitioned tables?

alexjo2144 · 2021-12-23T17:55:32Z

Did this also impact writes made via an UPDATE query?

homar · 2021-12-27T12:56:56Z

@losipiuk @sopel39 @findepi I removed that problematic line from HiveMetadata and made a small change to HivePartitioningHandle.isSingleNode so could you please take another look ?

Could you add tests similar as io.trino.plugin.hive.AbstractTestHive#testInsertPartitionedBucketedTableLayout and io.trino.plugin.hive.AbstractTestHive#testInsertBucketedTableLayout for transactional bucketed and bucketed-partitioned tables?

@sopel39 I added 2 tests to io.trino.plugin.hive.TestHive I hope this is what you asked for.

Did this also impact writes made via an UPDATE query?

@alexjo2144 I checked and actually debug doesn't stop at any place I made a change to while performing UPDATE

losipiuk · 2021-12-27T13:47:49Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

table name does not match the test

losipiuk · 2021-12-27T13:47:57Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

table name does not match the test

losipiuk · 2021-12-27T13:48:04Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

extra space

losipiuk · 2021-12-27T13:48:15Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

extra space

losipiuk · 2021-12-27T13:48:55Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

losipiuk · 2021-12-27T13:49:02Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

losipiuk · 2021-12-27T13:49:40Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

space after %% (everywhere)

losipiuk · 2021-12-27T13:50:14Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

use assertThat not verify

I will go with assertEquals and assertTrue because assertThat is imported from tempto.assertions and don't want to work with integers

losipiuk · 2021-12-27T13:51:26Z

...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

wrong error message.

Btw: can we merge the tests and have the boolean partitioned argument?

sure I can give this a try

fixes: trinodb#9149

sopel39 · 2022-01-03T12:26:33Z

plugin/trino-hive-hadoop2/src/test/java/io/trino/plugin/hive/TestHive.java

    }
+
+    @Test
+    public void testInsertBucketedTransactionalTableLayout()


why not in AbstractTestHive?

because AbstractTestHive is also extended by other classes like TestHiveAlluxioMetastore

because AbstractTestHive is also extended by other classes like TestHiveAlluxioMetastore

Yet io.trino.plugin.hive.AbstractTestHive#testInsertBucketedTableLayout and io.trino.plugin.hive.AbstractTestHive#testInsertPartitionedBucketedTableLayout are in AbstractTestHive

sopel39 · 2022-01-03T12:33:45Z

plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePartitioningHandle.java

+    public boolean isSingleNode()
+    {
+        // empty hiveTypes means there is no bucketing
+        return hiveTypes.isEmpty() && !usePartitionedBucketing;


why no bucketing means no insert distribution? Because you want single file?

sopel39 · 2022-01-03T12:38:46Z

core/trino-main/src/main/java/io/trino/sql/planner/LocalExecutionPlanner.java

        {
            // Set table writer count
-            context.setDriverInstanceCount(getTaskWriterCount(session));
+            // being a single node means there is one node and one writer so


being a single node means there is one node and one writer so

Single node doesn't mean single writer (there can be multiple writers per node).

Currently, single node partitioning is used only by system partitioning handle and it's not for insert path.
This code here only deals with local distribution, but there is also io.trino.sql.planner.optimizations.AddLocalExchanges.Rewriter#visitTableWriter and possibly more, see changes in b8e4e3f
I would rather not change this code.

Could we just handle your case using dedicated constant partitioning function which would direct all rows to single writer?

sopel39 · 2022-01-11T09:33:58Z

This seems to be superseded by: #10460

cla-bot bot added the cla-signed label Dec 10, 2021

homar added the tests:hive label Dec 10, 2021

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch 4 times, most recently from a4526f4 to 5b9dcc7 Compare December 13, 2021 16:01

homar changed the title ~~[WIP] Fix inserting into transactional table when task_writer_count > 1~~ Fix inserting into transactional table when task_writer_count > 1 Dec 13, 2021

homar requested a review from findepi December 13, 2021 22:41

homar marked this pull request as ready for review December 14, 2021 08:07

findepi reviewed Dec 14, 2021

View reviewed changes

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch 3 times, most recently from a484216 to fefbcc0 Compare December 15, 2021 13:19

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch from 8637167 to fefbcc0 Compare December 16, 2021 09:26

findepi approved these changes Dec 16, 2021

View reviewed changes

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch from fefbcc0 to 135459d Compare December 20, 2021 13:10

losipiuk reviewed Dec 20, 2021

View reviewed changes

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch from 135459d to e73eff1 Compare December 21, 2021 07:44

findepi reviewed Dec 21, 2021

View reviewed changes

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch from e73eff1 to 133cbe4 Compare December 21, 2021 09:37

findepi reviewed Dec 21, 2021

View reviewed changes

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch 2 times, most recently from d106344 to 3a35e96 Compare December 22, 2021 12:20

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch from 3a35e96 to 3fc53b0 Compare December 27, 2021 12:48

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch from 98c5447 to 3fc53b0 Compare December 27, 2021 13:46

losipiuk reviewed Dec 27, 2021

View reviewed changes

Fix inserting into transactional table when task_writer_count > 1

71e30f7

fixes: trinodb#9149

homar force-pushed the homar/insert_into_unbucketed_trans_table_when_writer_task_ branch 2 times, most recently from c934618 to 71e30f7 Compare December 27, 2021 16:30

sopel39 reviewed Jan 3, 2022

View reviewed changes

joshthoward requested a review from sopel39 January 11, 2022 01:38

findepi closed this Jan 14, 2022

	ImmutableList<HiveType> bucketingColumnTypes = hiveBucketHandle.get().getColumns().stream()
	List<HiveType> bucketingColumnTypes = hiveBucketHandle.get().getColumns().stream()

	public void testDataIsNotBrokenInUnbucketedTransactionalTableWithTaskWriterCountGreaterThan1()
	public void testUnbucketedTransactionalTableWithTaskWriterCountGreaterThan1()

Conversation

homar commented Dec 10, 2021

Uh oh!

homar commented Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

homar commented Dec 15, 2021

Uh oh!

findepi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

homar commented Dec 16, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

homar Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

homar Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

homar commented Dec 13, 2021 •

edited

Loading

homar Dec 21, 2021 •

edited

Loading

sopel39 Dec 21, 2021 •

edited

Loading

homar Dec 21, 2021 •

edited

Loading

homar Dec 21, 2021 •

edited

Loading