Skip to content

Increase test coverage for Hive S3 streaming upload#11607

Merged
losipiuk merged 1 commit intotrinodb:masterfrom
linzebing:issue-10797
Mar 24, 2022
Merged

Increase test coverage for Hive S3 streaming upload#11607
losipiuk merged 1 commit intotrinodb:masterfrom
linzebing:issue-10797

Conversation

@linzebing
Copy link
Copy Markdown
Member

@linzebing linzebing commented Mar 22, 2022

Add a test testInsertIntoPartitionedTableLargeFiles to exercise multiple code paths of S3 streaming upload, with part size 5MB:

  1. file size <= 5MB (shortcut to direct upload)
  2. file size > 5MB but <= 10MB (which triggered Writing to S3 sometimes results in corrupted files #10710)
  3. file size > 10MB

The query will write three files, of sizes 15336718 bytes, 26283542 bytes and 33655242 bytes respectively.

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

trino-hive plugin test

How would you describe this change to a non-technical end user or system administrator?

This increases test coverage for our Hive S3 streaming upload path.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand how a single test adds coverage for all 3 cases you mention in the commit message?
It would be worthy to add a comment in the code itself to explain what is happening. And also (if possible) an assertion which would validate that you tested exactly the case you wanted to.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments. This query writes three files whose sizes fall into the different ranges.

Added assertion for the file sizes, and a comment explanation.

@linzebing
Copy link
Copy Markdown
Member Author

Seems this test incurred an OOM. I will look into this.

Add a test testInsertIntoPartitionedTableLargeFiles to exercise multiple code paths of S3 streaming upload, with upload part size 5MB:
1. file size <= 5MB (shortcut to direct upload)
2. file size > 5MB but <= 10MB (which triggered trinodb#10710)
3. file size > 10MB
@losipiuk losipiuk merged commit e91a653 into trinodb:master Mar 24, 2022
@github-actions github-actions bot added this to the 375 milestone Mar 24, 2022
@findepi
Copy link
Copy Markdown
Member

findepi commented Mar 25, 2022

See #11659 -- the test is reported to be flaky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Extend test coverage for S3 streaming upload

4 participants