Skip to content

Deny DML commands within explicit transactions in Hive#10798

Merged
findepi merged 1 commit intotrinodb:masterfrom
findinpath:hive-fail-insert-in-explicit-transactions
Jan 28, 2022
Merged

Deny DML commands within explicit transactions in Hive#10798
findepi merged 1 commit intotrinodb:masterfrom
findinpath:hive-fail-insert-in-explicit-transactions

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

@findinpath findinpath commented Jan 25, 2022

Fixes #10760

@cla-bot cla-bot bot added the cla-signed label Jan 25, 2022
@findinpath findinpath requested review from alexjo2144, findepi and losipiuk and removed request for findepi January 25, 2022 19:42
@findinpath findinpath force-pushed the hive-fail-insert-in-explicit-transactions branch from 05c033c to 2310135 Compare January 26, 2022 13:10
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove isTransactionalTable -- beginDelete won't be called for a non-acid table.

Copy link
Copy Markdown
Contributor Author

@findinpath findinpath Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried the scenario of deleting from a non-transactional table that is being partitioned and this worked.

CREATE TABLE test2 (a_string varchar, a_integer int)
WITH (format = 'ORC', transactional = false, partitioned_by = ARRAY['a_integer']);

insert into test2 values ('hello', 1);
insert into test2 values ('world', 2);
insert into test2 values ('Trino', 1);

delete from test2 where a_integer =1 ;

I tend to think that it makes sense to keep the transactional table check.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that removing isTransactionalTable check here would break this scneario?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example above would call applyDelete, righ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@losipiuk yes

Copy link
Copy Markdown
Contributor Author

@findinpath findinpath Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trino:default> start transaction;
START TRANSACTION
trino:default> delete from test2 where a_integer =1 ;
DELETE

Query 20220128_112521_00030_bfch6, FINISHED, 1 node
http://localhost:8080/ui/query.html?20220128_112521_00030_bfch6
Splits: 1 total, 1 done (100.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 100% active
Per Node: 0.0 parallelism,     0 rows/s,     0B/s
Parallelism: 0.0
Peak Memory: 116B
0.43 [0 rows, 0B] [0 rows/s, 0B/s]

trino:default> select * from test2;
 a_string | a_integer 
----------+-----------
 world    |         2 
(1 row)

Query 20220128_112531_00031_bfch6, FINISHED, 1 node
http://localhost:8080/ui/query.html?20220128_112531_00031_bfch6
Splits: 9 total, 9 done (100.00%)
CPU Time: 0.0s total,   166 rows/s,   49KB/s, 46% active
Per Node: 0.0 parallelism,     3 rows/s,   965B/s
Parallelism: 0.0
Peak Memory: 150B
0.31 [1 rows, 301B] [3 rows/s, 965B/s]

trino:default> rollback;
ROLLBACK
trino:default> select * from test2;
 a_string | a_integer 
----------+-----------
 hello    |         1 
 Trino    |         1 
 world    |         2 
(3 rows)

Query 20220128_112538_00033_bfch6, FINISHED, 1 node
http://localhost:8080/ui/query.html?20220128_112538_00033_bfch6
Splits: 11 total, 11 done (100.00%)
CPU Time: 0.0s total,   187 rows/s, 54.9KB/s, 42% active
Per Node: 0.0 parallelism,     6 rows/s, 1.92KB/s
Parallelism: 0.0
Peak Memory: 150B
0.46 [3 rows, 900B] [6 rows/s, 1.92KB/s]

I've tried this functionality on master on config-default & config-hdp3 where the check is not present.

Copy link
Copy Markdown
Contributor Author

@findinpath findinpath Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trino:default> CREATE TABLE txtest (a_string varchar) WITH (format = 'ORC', transactional = true);
CREATE TABLE
trino:default> insert into txtest values ('a'), ('b'), ('c');
INSERT: 3 rows

Query 20220128_113930_00016_cj4we, FINISHED, 1 node
http://localhost:8080/ui/query.html?20220128_113930_00016_cj4we
Splits: 19 total, 19 done (100.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 7% active
Per Node: 0.0 parallelism,     0 rows/s,     0B/s
Parallelism: 0.0
Peak Memory: 416B
1.08 [0 rows, 0B] [0 rows/s, 0B/s]

trino:default> start transaction;
START TRANSACTION
trino:default> delete from txtest where a_string = 'a';
DELETE: 1 row

Query 20220128_113950_00018_cj4we, FINISHED, 1 node
http://localhost:8080/ui/query.html?20220128_113950_00018_cj4we
Splits: 10 total, 10 done (100.00%)
CPU Time: 0.1s total,    57 rows/s,   13KB/s, 50% active
Per Node: 0.1 parallelism,     6 rows/s, 1.36KB/s
Parallelism: 0.1
Peak Memory: 108B
0.50 [3 rows, 693B] [6 rows/s, 1.36KB/s]

trino:default> delete from txtest where a_string = 'b';
Query 20220128_113954_00019_cj4we failed: Tried to access metastore after transaction has been committed/aborted
java.lang.IllegalStateException: Tried to access metastore after transaction has been committed/aborted
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.checkReadable(SemiTransactionalHiveMetastore.java:2441)
	at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.getTable(SemiTransactionalHiveMetastore.java:217)
	at io.trino.plugin.hive.HiveMetadata.getView(HiveMetadata.java:2391)

Tested on config-hdp3 on master

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i fail to understand the response.
it would help me if you did remove the check and the CI showed the failure for me.

Copy link
Copy Markdown
Member

@losipiuk losipiuk Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath just dro the isTransactionalTable() part of check.
It is not needed. It is already checked above via ensureTableSupportsDelete()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not be strongly against verify(isTransactionalTable(table.getParameters()) but it feels overchatty.

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i fail to understand the response.
it would help me if you did remove the check and the CI showed the failure for me.

@findinpath findinpath force-pushed the hive-fail-insert-in-explicit-transactions branch from 9c1ca1a to ff1f792 Compare January 28, 2022 16:25
@findinpath
Copy link
Copy Markdown
Contributor Author

i fail to understand the response.
it would help me if you did remove the check and the CI showed the failure for me.

Nevermind. I am actually confused about how the Trino transactions work and that's why I went digressing with testing scenarios. I observed now that the isTransactionalTable(table.getParameters() check would be a duplicate to ensureTableSupportsDelete().

@findepi
Copy link
Copy Markdown
Member

findepi commented Jan 28, 2022

please don't rebase when updating PR

fortunately the diff this time was tiny (https://github.com/trinodb/trino/compare/9c1ca1ac52da5476c09de08a1b98a69c7a02edc2..ff1f79297a6594d078ddc496fc907036616b3d91), so that's just a reminder.

@findinpath
Copy link
Copy Markdown
Contributor Author

please don't rebase when updating PR

do you recommend doing a rebase only at the end of the PR?

@findepi
Copy link
Copy Markdown
Member

findepi commented Jan 28, 2022

do you recommend doing a rebase only at the end of the PR?

no; use this: git rebase -i $(git merge-base head master)

@findepi
Copy link
Copy Markdown
Member

findepi commented Jan 28, 2022

2022-01-28T17:31:38.6128622Z tests               | 2022-01-28 23:16:38 INFO: FAILURE     /    io.trino.tests.product.hive.TestHiveCompatibility.testSmallDecimalFieldWrittenByOptimizedParquetWriterCannotBeReadByHive (Groups: storage_formats_detailed) took 0.7 seconds
2022-01-28T17:31:38.6170871Z tests               | 2022-01-28 23:16:38 SEVERE: Failure cause:
2022-01-28T17:31:38.6173566Z tests               | java.lang.AssertionError: 
2022-01-28T17:31:38.6173888Z tests               | Expecting message:
2022-01-28T17:31:38.6174876Z tests               |   <"java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://hadoop-master:9000/user/hive/warehouse/parquet_table_small_decimal_created_in_trino/20220128_173138_01082_w8u6u_22065fa1-0530-4e65-a213-ffdf5e7f7d8a">
2022-01-28T17:31:38.6257591Z tests               | to match regex:
2022-01-28T17:31:38.6258108Z tests               |   <".* org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file .*">
2022-01-28T17:31:38.6258573Z tests               | but did not.
2022-01-28T17:31:38.6258808Z tests               | 
2022-01-28T17:31:38.6259059Z tests               | Throwable that failed the check:
2022-01-28T17:31:38.6259318Z tests               | 

this is unrelated to this PR, but something to fix nonetheless @findinpath

@findepi findepi merged commit ddd080e into trinodb:master Jan 28, 2022
@github-actions github-actions bot added this to the 370 milestone Jan 28, 2022
@@ -1723,7 +1726,9 @@ public HiveInsertTableHandle beginInsert(ConnectorSession session, ConnectorTabl
if (isTransactional && retryMode != NO_RETRIES) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"create table as select" can work as "insert into", so the table create in transaction will still be there after rollback...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Incorrect results when transactional / ORC ACID Hive table accessed in Trino transaction

4 participants