Hive Connector with Amazon S3 documentation updates by dnanuti · Pull Request #15035 · trinodb/trino

dnanuti · 2022-11-15T17:34:38Z

Description

Documentation updates following up changes on Hive connector with Amazon S3: fix for Select pushdown for uncompressed files, addition of JSON support to Amazon S3 Select and usage of S3 Select scan range requests.
Relevant PRs:

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

arhimondr · 2022-11-15T17:37:30Z

@jhlodin Could you please take a look?

jhlodin

Thanks for writing docs! I suggested edits for the paragraph and a recommended anchor link.

Are any changes needed to the "Is S3 Select a good fit for my workload" section above this content?

jhlodin · 2022-11-15T18:44:03Z

docs/src/main/sphinx/connector/hive-s3.rst

Suggested change

For uncompressed files, Scan Range feature of S3 Select is used.

An Amazon S3 Select scan range request runs across the specified byte range.

This range is aligned with the internal Hive splits for the query fragments

that get pushed down to Select. Changes in the Hive connector performance

tuning configuration properties would be reflected here as well.

For uncompressed files, S3 Select scans ranges of bytes in parallel. The scan range

requests run across the byte ranges of the internal Hive splits for the query fragments

pushed down to S3 Select. Changes to the Hive catalog's :ref:`performance tuning

configuration properties <hive-performance-tuning-configuration>` are reflected

here as well.

To make the anchor link work, please add
.. _hive-performance-tuning-configuration:
above line 734 of the hive connector page (/connector/hive.rst)

Thanks a lot! Updated 👍

Can't reply to the above comment. There are no changes needed for "Is S3 Select a good fit for my workload" section.

It looks like only the ref link was added, can you apply the rest of the suggested edits?

Should look like the following:

For uncompressed files, S3 Select scans ranges of bytes in parallel. The scan range requests run across the byte ranges of the internal Hive splits for the query fragments pushed down to S3 Select. Changes to the Hive catalog's :ref:`performance tuning configuration properties <hive-performance-tuning-configuration>` are reflected here as well.

This was rephrased a bit on our side as well. From a technical perspective, I think we should say Hive connector, not Hive catalog, as this is related to how the connector works.

Does this work for you?

For uncompressed files, S3 Select scans ranges of bytes in parallel. The scan range requests run across the byte ranges of the internal Hive splits for the query fragments pushed down to S3 Select. Changes in the Hive connector :ref:`performance tuning configuration properties <hive-performance-tuning-configuration>` are likely to impact S3 Select pushdown performance.

Yep, that makes sense to me! Once that change is in, LGTM

jhlodin

LGTM % one last minor suggestion

jhlodin · 2022-11-22T17:10:35Z

docs/src/main/sphinx/connector/hive-s3.rst

"they are retrieving" -> "they retrieve"

Fixed Select pushdown for uncompressed files, added JSON support to Amazon S3 Select and started using S3 Select scan range requests. Relevant PRs: 12633, 13354, 13477, 13754, 14040

cla-bot bot added the cla-signed label Nov 15, 2022

github-actions bot added the docs label Nov 15, 2022

arhimondr approved these changes Nov 15, 2022

View reviewed changes

jhlodin reviewed Nov 15, 2022

View reviewed changes

dnanuti force-pushed the master branch 3 times, most recently from 5786c92 to 60e50e8 Compare November 16, 2022 14:29

martint force-pushed the master branch from d011a28 to d002d09 Compare November 16, 2022 15:35

dnanuti requested a review from jhlodin November 21, 2022 14:50

dnanuti force-pushed the master branch from 60e50e8 to b860707 Compare November 22, 2022 17:08

jhlodin approved these changes Nov 22, 2022

View reviewed changes

docs/src/main/sphinx/connector/hive-s3.rst Outdated

Copy link
Copy Markdown

Contributor

jhlodin Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"they are retrieving" -> "they retrieve"

Hive Connector with Amazon S3 documentation updates

00625b7

Fixed Select pushdown for uncompressed files, added JSON support to Amazon S3 Select and started using S3 Select scan range requests. Relevant PRs: 12633, 13354, 13477, 13754, 14040

dnanuti force-pushed the master branch from b860707 to 00625b7 Compare November 22, 2022 17:12

arhimondr merged commit af813ca into trinodb:master Nov 22, 2022

github-actions bot added this to the 404 milestone Nov 23, 2022

colebow mentioned this pull request Nov 23, 2022

Add Trino 405 release notes #15139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive Connector with Amazon S3 documentation updates#15035

Hive Connector with Amazon S3 documentation updates#15035
arhimondr merged 1 commit intotrinodb:masterfrom
dnanuti:master

dnanuti commented Nov 15, 2022

Uh oh!

arhimondr commented Nov 15, 2022

Uh oh!

jhlodin left a comment

Uh oh!

jhlodin Nov 15, 2022 •

edited

Loading

Uh oh!

jhlodin Nov 15, 2022

Uh oh!

dnanuti Nov 16, 2022

Uh oh!

dnanuti Nov 16, 2022

Uh oh!

jhlodin Nov 21, 2022

Uh oh!

dnanuti Nov 22, 2022 •

edited

Loading

Uh oh!

jhlodin Nov 22, 2022 •

edited

Loading

Uh oh!

jhlodin left a comment

Uh oh!

jhlodin Nov 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

-For uncompressed files, Scan Range feature of S3 Select is used.
-An Amazon S3 Select scan range request runs across the specified byte range.
-This range is aligned with the internal Hive splits for the query fragments
-that get pushed down to Select. Changes in the Hive connector performance
-tuning configuration properties would be reflected here as well.
+For uncompressed files, S3 Select scans ranges of bytes in parallel. The scan range
+requests run across the byte ranges of the internal Hive splits for the query fragments
+pushed down to S3 Select. Changes to the Hive catalog's :ref:`performance tuning
+configuration properties <hive-performance-tuning-configuration>` are reflected
+here as well.

Conversation

dnanuti commented Nov 15, 2022

Description

Release notes

Uh oh!

arhimondr commented Nov 15, 2022

Uh oh!

jhlodin left a comment

Choose a reason for hiding this comment

Uh oh!

jhlodin Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhlodin Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

dnanuti Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

dnanuti Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

jhlodin Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

dnanuti Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhlodin Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhlodin left a comment

Choose a reason for hiding this comment

Uh oh!

jhlodin Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

jhlodin Nov 15, 2022 •

edited

Loading

dnanuti Nov 22, 2022 •

edited

Loading

jhlodin Nov 22, 2022 •

edited

Loading