Skip to content

Do not create temp staging directory for CREATE TABLE without data#5802

Closed
laurachenyu wants to merge 1 commit intotrinodb:masterfrom
laurachenyu:internal_patch
Closed

Do not create temp staging directory for CREATE TABLE without data#5802
laurachenyu wants to merge 1 commit intotrinodb:masterfrom
laurachenyu:internal_patch

Conversation

@laurachenyu
Copy link
Copy Markdown
Contributor

In case of CREATE TABLE (managed), empty staging directory was not cleaned up at commit time, but relying on /tmp periodic delete.

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Nov 3, 2020

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Yu Chen.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Copy Markdown
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, @findepi do you want to take a look?

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see that it works. It would be great to have a test for this though

Also, i think it should be solved at a higher level (earlier). See inline comment

@findepi
Copy link
Copy Markdown
Member

findepi commented Nov 12, 2020

@laurachenyu is there a test covering CTAS WITH NO DATA ensuring

  • the table gets registered with the right loction in the metastore
  • the location is correctly created
  • same for partitioned table -- of course, with NO DATA no partition will be created, but table declaration and on disk location should be the same

@electrum
Copy link
Copy Markdown
Member

There's some confusion about the problem due to the phrase "without data" in the commit message. This PR is for a regular CREATE TABLE statement, not for CREATE TABLE AS ... WITH NO DATA (which might have the same issue, but let's consider that out of scope for this PR).

@laurachenyu I suggest removing the "without data" part from the commit message to avoid confusion.

Solving this in LocationService seems correct, since the LocationService design/API is the source of this problem. The forNewTable() method returns a LocationHandle which contains both the final target path and the temporary path used for writing. The location service creates the temporary directory before returning it. In the case of a simple CREATE TABLE, we only want the final target path, not a temporary path. The simple solution in this PR is to ask the location service not to create it.

Another option is to add a new API to directly return the target path for a new table. The existing LocationService API is fairly convoluted and could be simplified. (it was originally built as an extension point but is no longer needed)

String schemaName = temporaryCreateEmptyTable.getSchemaName();
String tableName = temporaryCreateEmptyTable.getTableName();
LocationService locationService = getLocationService();
LocationHandle locationHandle = locationService.forNewTable(transaction.getMetastore(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each arg in newline


HdfsContext context = new HdfsContext(session, schemaName, tableName);
if (locationHandle.getWriteMode() == STAGE_AND_MOVE_TO_TARGET_DIRECTORY) {
assertFalse(pathExists(context, hdfsEnvironment, writePath),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.prestosql.plugin.hive.util.HiveWriteUtils#createTemporaryPath uses randomUUID, so different temp directory will be used in transaction.getMetastore().createTable than writePath.

What you should do instead is to set some random staging directory (via temporary_staging_directory_path session property) and check that no dir was created there after createTable

@mosabua
Copy link
Copy Markdown
Member

mosabua commented Oct 20, 2022

👋 @laurachenyu - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that. cc @electrum @sopel39 @findepi

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@weijiii
Copy link
Copy Markdown
Member

weijiii commented Oct 27, 2022

@mosabua Laura was a previous member on our team and we are working to merge some of our internal patches back to the OSS. We would like to resume working on this patch but I may open a separate PR for this.

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Nov 1, 2022

Superseded by #14830

@ebyhr ebyhr closed this Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

8 participants