Fix inserting into transactional table when task_writer_count > 1#10261
Fix inserting into transactional table when task_writer_count > 1#10261homar wants to merge 1 commit intotrinodb:masterfrom
Conversation
a4526f4 to
5b9dcc7
Compare
|
failure is not related #8432 |
There was a problem hiding this comment.
if this is a transaction
do you mean transactional table?
node.getPartitioningScheme().getPartitioning().getHandle().getTransactionHandle().isPresent()
this doesn't let us recognize what kind of table we're dealing with.
in fact, i'd expect this to be always true whenever we're dealing with connector-provided partitioning.
i am validating my understanding with Remove redundant null-friendliness commit in #10293
What about dropping this condition, and adding the "if arguments list is empty" logic directly to the code below?
There was a problem hiding this comment.
do you mean transactional table?
this is exactly what I meant, was that an incorrect assumption ?
this doesn't let us recognize what kind of table we're dealing with.
any idea how to recognize if we are dealing with transactional table ?
What about dropping this condition, and adding the "if arguments list is empty" logic directly to the code below?
actually I wanted to avoid that, doing it this way will change the behaviour for all the situation and I want the change only for transactional tables. There is an explicit check:
checkArgument(distribution == SINGLE || !this.partitioningColumns.equals(Optional.of(ImmutableList.of())),
"Multiple streams must not be partitioned on empty set");
modifying the logic you mentioned may cause this check not to fail for situation when it should. I just wanted to make it pass for transactional table as we know there will be one bucket and thus one stream only.
There was a problem hiding this comment.
Here is how it works
- Hive provides bucketing function to be used when distributing writes. The function is 0-arg, because it's an artificial bucketing. The point is -- we want to have exactly one writer to the table.
StreamPropertyDerivationschokes on the 0-arg bucketing function.
The fix can be
- make
StreamPropertyDerivationsnot choke on that (like you did) - find some other way for a connector to make sure there is only one writer
- make Hive fool engine -- declare false argument to bucketing function, pretending it's not 0-arg
- that would be working around engine's limitation. We shouldn't need to do that though
- anything else? -- @electrum might know better
There was a problem hiding this comment.
I just wonder if this is an accidentally choking or if it was made on purpose like this in which case removing that choking won't break some other cases.
There was a problem hiding this comment.
The pattern Optional.of( expr ).filter( condition ) is clever, but IMO doesn't make the code more readable.
There was a problem hiding this comment.
add a hint why we don't want that
There was a problem hiding this comment.
Counter -> Count
GE -> GreaterThan
There was a problem hiding this comment.
This is well-known 25. Just use, even without declaring a constant.
There was a problem hiding this comment.
Do we expect exactly one file to be created? let's have an assertion on that
There was a problem hiding this comment.
Isn't nation to small. Would two writers be still used for a source table which only has 1 split (I assume nation has just one).
Maybe use UNION of a couple of NATION tables as a source.
There was a problem hiding this comment.
this was just an example from issue description #9149 I removed that whole test
There was a problem hiding this comment.
When this fails, you don't know how many files there are
// There should be only 1 file
assertThat(onTrino().executeQuery("SELECT count(DISTINCT \"$path\" FROM " + tableName))
.containsOnly(row(1L));
a484216 to
fefbcc0
Compare
|
@findepi please take another look |
8637167 to
fefbcc0
Compare
There was a problem hiding this comment.
I'm limited, so i find .orElse(false) hard to follow
I'd write
if (node.getPartitioningScheme().isPresent() &&
node.getPartitioningScheme().get().getPartitioning().getHandle().isSingleNode()) {
There was a problem hiding this comment.
It's not about consistency between non-transactional bucketed and transactional, non-bucketed (implicitly bucketed) tables.
The naming rules for the two are different.
For transactional tables, the naming convention is bucket_<bucket-number> (eg bucket_00000). Doesn't contain any random, or incrementing part, so we simply cannot create more than once file.
For bucketed, non-transactional tables -- actually i am not sure why we have this condition here. @dain would know.
My reading of the code leads to the following naming pattern for bucketed files
format("0%s_0_%s", paddedBucket, queryId.get())
-- if this is the right one (didn't test), then it's constant for bucket x query, so we cannot create more than 1 file either. (We could easily improve that by incrementing this _0_ part, but that's another story).
The condition seems good, but comments needs rewording.
There was a problem hiding this comment.
redundant (...) parens in sequence of ||
There was a problem hiding this comment.
I meant to be consistent with the behaviour -> when table is non transactional and has 1 bucket, writer count is ignored and only 1 file is created. I must have used wrong wording.
There was a problem hiding this comment.
move before handle.getBucketProperty(),
There was a problem hiding this comment.
use assertEquals(numberOfCreatedFiles, 1, "There should be only 1 file created")
|
There is a test failure: |
fefbcc0 to
135459d
Compare
There was a problem hiding this comment.
Move !hiveTypes.isEmpty() to be next to !partitionColumns.isEmpty()
There was a problem hiding this comment.
Do we need the test above? This one seems to cover same stuff + DELETE
135459d to
e73eff1
Compare
There was a problem hiding this comment.
I don't understand why && !bucketingColumnTypes.isEmpty() is added here.
There was a problem hiding this comment.
In my understanding because we started to rely more on HivePartitioningHandle and its isSingleNode method and I wanted to isUsePartitionedBucketing to return value that is consistent with isSingleNode - if isSingleNode returns true than isUsePartitionedBucketing should return false
There was a problem hiding this comment.
it seems that isUsePartitionedBucketing is now false for partitioned, unbucketed, non-transactional tables, while it used to be true.
There was a problem hiding this comment.
unbacketed means hiveBucketHandle.isEmpty() So it looks we should bail out of the method earlier and never get here.
There was a problem hiding this comment.
unbacketed means hiveBucketHandle.isEmpty() So it looks we should bail out of the method earlier and never get here.
Yes. I think we should hit if (hiveBucketHandle.isEmpty()) { earlier in the code, so I don't think this check here is needed
There was a problem hiding this comment.
I probably don't understand something but I just tested and partitioned transactional table(so 1 implicit bucket) seems to work fine, different partitions are created and each of them have 1 bucket
Writes will be correct, but with the change here only one node and one thread (in entire cluster) will be writing data. The code you changed distributes writes between worker nodes, so we can avoid single writer in entire cluster.
There was a problem hiding this comment.
@homar it sounds like you tested transactional partitioned tables (unbucketed; aka with implicit 1 bucket)
The concern is about INSERT into non-transactional partitioned, unbucketed table.
There was a problem hiding this comment.
@homar it sounds like you tested transactional partitioned tables (unbucketed; aka with implicit 1 bucket)
The concern is about INSERT into non-transactional partitioned, unbucketed table.
I think we want to redistribute writes even for transactional, partitioned and bucketed (implicit 1 bucket).
There was a problem hiding this comment.
@homar it sounds like you tested transactional partitioned tables (unbucketed; aka with implicit 1 bucket)
The concern is about INSERT into non-transactional partitioned, unbucketed table.
unbucketed, non-transactional should not reach this code beacause of if (hiveBucketHandle.isEmpty()) { earlier in the code. But unfortunately I am afraid that @sopel39 comments regarding decreasing number of writers is still a valid concern.
There was a problem hiding this comment.
@sopel39 again I probably miss something but even with my changes for transactional, unbucketed(so 1 implicit bucket) and partitioned table when I try to make an insert that creates 100 partitions here https://github.com/trinodb/trino/blob/ee2ef32e6f09515a888a016adc1cc6ccd32cbae4/plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSink.java#L354 I get 100 writers. Maybe you mentioned different writers - in such a case please point me to the particular part of the code
There was a problem hiding this comment.
| ImmutableList<HiveType> bucketingColumnTypes = hiveBucketHandle.get().getColumns().stream() | |
| List<HiveType> bucketingColumnTypes = hiveBucketHandle.get().getColumns().stream() |
There was a problem hiding this comment.
make final
and maybe move before private final int[] dataColumnInputIndex
There was a problem hiding this comment.
move before this.hdfsEnvironment = ...
(this isn't ideal; ideally fields, constructor params and assignments follow same ordering, but this is a mess here, so ideal place doesn't exist)
There was a problem hiding this comment.
This isn't very specific. "Output partitioning: SINGLE" could be for example for the top Output stage, or many times in the plan. Would it be possible to identify the actual stage we want to test for, here? It's probably source of table writer, right?
cc @losipiuk
There was a problem hiding this comment.
| public void testDataIsNotBrokenInUnbucketedTransactionalTableWithTaskWriterCountGreaterThan1() | |
| public void testUnbucketedTransactionalTableWithTaskWriterCountGreaterThan1() |
e73eff1 to
133cbe4
Compare
There was a problem hiding this comment.
Should we have a test with task_writer_count>1 for unpartitioned (here) and also partitioned tables?
i think we should.
d106344 to
3a35e96
Compare
Could you add tests similar as |
|
Did this also impact writes made via an UPDATE query? |
3a35e96 to
3fc53b0
Compare
@sopel39 I added 2 tests to
@alexjo2144 I checked and actually debug doesn't stop at any place I made a change to while performing UPDATE |
98c5447 to
3fc53b0
Compare
There was a problem hiding this comment.
table name does not match the test
There was a problem hiding this comment.
table name does not match the test
There was a problem hiding this comment.
I will go with assertEquals and assertTrue because assertThat is imported from tempto.assertions and don't want to work with integers
There was a problem hiding this comment.
wrong error message.
Btw: can we merge the tests and have the boolean partitioned argument?
There was a problem hiding this comment.
sure I can give this a try
c934618 to
71e30f7
Compare
| } | ||
|
|
||
| @Test | ||
| public void testInsertBucketedTransactionalTableLayout() |
There was a problem hiding this comment.
because AbstractTestHive is also extended by other classes like TestHiveAlluxioMetastore
There was a problem hiding this comment.
because AbstractTestHive is also extended by other classes like TestHiveAlluxioMetastore
Yet io.trino.plugin.hive.AbstractTestHive#testInsertBucketedTableLayout and io.trino.plugin.hive.AbstractTestHive#testInsertPartitionedBucketedTableLayout are in AbstractTestHive
| public boolean isSingleNode() | ||
| { | ||
| // empty hiveTypes means there is no bucketing | ||
| return hiveTypes.isEmpty() && !usePartitionedBucketing; |
There was a problem hiding this comment.
why no bucketing means no insert distribution? Because you want single file?
| { | ||
| // Set table writer count | ||
| context.setDriverInstanceCount(getTaskWriterCount(session)); | ||
| // being a single node means there is one node and one writer so |
There was a problem hiding this comment.
being a single node means there is one node and one writer so
Single node doesn't mean single writer (there can be multiple writers per node).
Currently, single node partitioning is used only by system partitioning handle and it's not for insert path.
This code here only deals with local distribution, but there is also io.trino.sql.planner.optimizations.AddLocalExchanges.Rewriter#visitTableWriter and possibly more, see changes in b8e4e3f
I would rather not change this code.
Could we just handle your case using dedicated constant partitioning function which would direct all rows to single writer?
|
This seems to be superseded by: #10460 |
fixes: #9149