Skip to content

Don't drop destination table when non-transactional insert fails#12229

Merged
findepi merged 1 commit intotrinodb:masterfrom
lhofhansl:non-t-drop
May 10, 2022
Merged

Don't drop destination table when non-transactional insert fails#12229
findepi merged 1 commit intotrinodb:masterfrom
lhofhansl:non-t-drop

Conversation

@lhofhansl
Copy link
Copy Markdown
Member

@lhofhansl lhofhansl commented May 3, 2022

Description

See #12225

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Most JDBC connectors will have this issue. Verified with Postgres.

How would you describe this change to a non-technical end user or system administrator?

Trino will happily drop your entire table if it encounters an error during a non-transactional insert.
This is a pretty SERIOUS issue.

Related issues, pull requests, and links

Fixes #12225

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 3, 2022
@lhofhansl
Copy link
Copy Markdown
Member Author

Simple fix. I'll think on a quick test too (very limited time, though.)

@lhofhansl lhofhansl added the bug label May 3, 2022
@lhofhansl
Copy link
Copy Markdown
Member Author

lhofhansl commented May 3, 2022

Haven't run tests in a while. Always fails with docker-machine executable was not found on PATH ... (This is a Fedora 34 box)

@lhofhansl
Copy link
Copy Markdown
Member Author

Something like this. But I cannot run the tests locally ATM.

@lhofhansl lhofhansl force-pushed the non-t-drop branch 3 times, most recently from 9c27216 to c1499f9 Compare May 4, 2022 01:28
@lhofhansl lhofhansl requested a review from hashhar May 4, 2022 03:15
@findinpath findinpath requested a review from findepi May 4, 2022 04:31
@hashhar hashhar requested review from ebyhr and wendigo May 4, 2022 06:15
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a logic of jdbcClient.rollbackCreateTable
since isNonTransactionalInsert is handled by jdbcClient.beginInsertTable

(eg imagine a connector which overrides jdbcClient.beginInsertTable and ignores isNonTransactionalInsert session property; it would have no way to behave correctly)

Copy link
Copy Markdown
Member Author

@lhofhansl lhofhansl May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if someone calls jdbcClient.rollbackCreateTable in some other context? In that case we'd want to remove that table regardless of whether inserts are transactional or not.

Edit: I guess they'd have to get it right in jdbcClient.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem it's wrong either way:
DefaultJdbcMetadata.beginCreateTable also sets up jdbcClient.rollbackCreateTable as rollback action. In that case we always want the rollback to happen.
For DefaultJdbcMetadata.beginInsert we want the rollback handled conditionally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think JdbcOutputTableHandle needs to track information whether this is a temp table or not.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. That's would work.
I think one can also argue if there's code in a JdbcClient that creates a table it wants deleted then it should setup up the rollback action (not DefaultJdbcMetadata).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... Even that would not work (nicely). It would need to be a specific flag for isNonTransactionalInsert.
All other table creation (that we want to have removed) have to be marked as temporary. Now you have remember explicitly not do that, seems more frail to me. IMHO this has to be connected to an action, not the OutputTable.

I think my initial fix is still the safest. BaseJdbcClient.beginInsertTable is not overriden (except for disabling via throwing an Exception). And when one overrides that, one would have to aware of the implications. One could put a comment on that method.

As is, this will happily drop your TB size table if you fail to insert into it on non-transactional mode. :(

IMHO, the damage is done in the original implementation that separated table creation from the responsibility to clean the table up into two different classes. It's not immediately clear how to undo that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my initial fix is still the safest.

it spreads responsibility between the two classes: DefaultJdbcMetadata (which generally should not be overridden) and JdbcClient which is the extension point for all the jdbc-based connectors

All other table creation (that we want to have removed) have to be marked as temporary. Now you have remember explicitly not do that, seems more frail to me

I thought about adding new parameter to JdbcOutputTableHandle. You cannot forget about it, because it's a required parameter.

However, we don't need a new parameter, we should just compare JdbcOutputTableHandle#tableName and JdbcOutputTableHandle#temporaryTableName. Or, better, make temporaryTableName Optional.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it spreads responsibility between the two classes

That already happened because BaseJdbcClient creates the tables, but DefaultJdbcMetadata is responsible for the cleanup.

I thought about adding new parameter to JdbcOutputTableHandle. You cannot forget about it, because it's a required parameter.

I started with that, easy enough to do. But I ran into the issue I mentioned above.

compare JdbcOutputTableHandle#tableName and JdbcOutputTableHandle#temporaryTableName

So from looking BaseJdbcClient.beginInsertTable you are suggesting that DefaultJdbcMetadata.beginInsert to set up the rollback only handler when tableName and temporaryTableName are identical? (real tables seem to have these two set differently)
Or that the rollback handler not do anything when the tableName and temporaryTableName are identical? (that would seem weird)
Won't any derived beginInsertTable now equally have to be aware of that or have the table removed?

Happy to do that as long as we can fix the problem.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

@lhofhansl
Copy link
Copy Markdown
Member Author

We should get this into 380. This is a serious bug that will delete data unexpectedly.

@findepi @martint

@lhofhansl lhofhansl force-pushed the non-t-drop branch 2 times, most recently from 8507a06 to a1f6ada Compare May 7, 2022 23:05
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the logic, but this doesn't change the fact the logic belongs to jdbcClient.rollbackCreateTable, as it is coupled with jdbcClient.beginInsertTable.

As suggested in a1f6ada#r866674664, let's make temporaryTableName Optional.

Copy link
Copy Markdown
Member Author

@lhofhansl lhofhansl May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Sorry, I get what you meant now.

I pushed a new update. (Will rename squash/rename the commits later).

Not sure I like it.
Perhaps I could abstract the repeated handle.getTemporaryTableName().orElse(handle.getTableName()) into JdbcOutputTableHanlde itself, and add a separate method to check whether a temporary table has been provided.

@lhofhansl lhofhansl force-pushed the non-t-drop branch 2 times, most recently from 25f32ac to e8663c5 Compare May 9, 2022 17:17
@lhofhansl
Copy link
Copy Markdown
Member Author

Test failures look unrelated.

@findepi findepi changed the title BaseJdbc: Do not drop the connector table upon errors during non-transactional inserts. Don't drop destination table when non-transactional insert fails May 10, 2022
@findepi
Copy link
Copy Markdown
Member

findepi commented May 10, 2022

(changed commit title and applied small style changes)

@findepi findepi merged commit f23b4bd into trinodb:master May 10, 2022
@findepi findepi mentioned this pull request May 10, 2022
@lhofhansl
Copy link
Copy Markdown
Member Author

As usually, thank for the thorough review.

@github-actions github-actions bot added this to the 381 milestone May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

PostgreSQL connector drops destination table upon error during non-transactional insert

3 participants