Hive Connector with Amazon S3 documentation updates#15035
Hive Connector with Amazon S3 documentation updates#15035arhimondr merged 1 commit intotrinodb:masterfrom
Conversation
|
@jhlodin Could you please take a look? |
jhlodin
left a comment
There was a problem hiding this comment.
Thanks for writing docs! I suggested edits for the paragraph and a recommended anchor link.
Are any changes needed to the "Is S3 Select a good fit for my workload" section above this content?
There was a problem hiding this comment.
| For uncompressed files, Scan Range feature of S3 Select is used. | |
| An Amazon S3 Select scan range request runs across the specified byte range. | |
| This range is aligned with the internal Hive splits for the query fragments | |
| that get pushed down to Select. Changes in the Hive connector performance | |
| tuning configuration properties would be reflected here as well. | |
| For uncompressed files, S3 Select scans ranges of bytes in parallel. The scan range | |
| requests run across the byte ranges of the internal Hive splits for the query fragments | |
| pushed down to S3 Select. Changes to the Hive catalog's :ref:`performance tuning | |
| configuration properties <hive-performance-tuning-configuration>` are reflected | |
| here as well. |
There was a problem hiding this comment.
To make the anchor link work, please add
.. _hive-performance-tuning-configuration:
above line 734 of the hive connector page (/connector/hive.rst)
There was a problem hiding this comment.
Can't reply to the above comment. There are no changes needed for "Is S3 Select a good fit for my workload" section.
There was a problem hiding this comment.
It looks like only the ref link was added, can you apply the rest of the suggested edits?
Should look like the following:
For uncompressed files, S3 Select scans ranges of bytes in parallel. The scan range
requests run across the byte ranges of the internal Hive splits for the query fragments
pushed down to S3 Select. Changes to the Hive catalog's :ref:`performance tuning
configuration properties <hive-performance-tuning-configuration>` are reflected
here as well.
There was a problem hiding this comment.
This was rephrased a bit on our side as well. From a technical perspective, I think we should say Hive connector, not Hive catalog, as this is related to how the connector works.
Does this work for you?
For uncompressed files, S3 Select scans ranges of bytes in parallel. The scan range
requests run across the byte ranges of the internal Hive splits for the query fragments
pushed down to S3 Select. Changes in the Hive connector :ref:`performance tuning
configuration properties <hive-performance-tuning-configuration>` are likely to impact
S3 Select pushdown performance.
There was a problem hiding this comment.
Yep, that makes sense to me! Once that change is in, LGTM
5786c92 to
60e50e8
Compare
jhlodin
left a comment
There was a problem hiding this comment.
LGTM % one last minor suggestion
There was a problem hiding this comment.
"they are retrieving" -> "they retrieve"
Fixed Select pushdown for uncompressed files, added JSON support to Amazon S3 Select and started using S3 Select scan range requests. Relevant PRs: 12633, 13354, 13477, 13754, 14040
Description
Documentation updates following up changes on Hive connector with Amazon S3: fix for Select pushdown for uncompressed files, addition of JSON support to Amazon S3 Select and usage of S3 Select scan range requests.
Relevant PRs:
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: