Skip to content

findinpath: Support OR-ed condition in Delta checkpoint iterator#17

Closed
findinpath wants to merge 106 commits intomasterfrom
findinpath/delta-checkpoint-entry-or
Closed

findinpath: Support OR-ed condition in Delta checkpoint iterator#17
findinpath wants to merge 106 commits intomasterfrom
findinpath/delta-checkpoint-entry-or

Conversation

@findinpath
Copy link
Copy Markdown
Owner

Description

Alternative for trinodb#19240

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@findinpath findinpath changed the title Support OR-ed condition in Delta checkpoint iterator findinpath: Support OR-ed condition in Delta checkpoint iterator Oct 18, 2023
@findinpath findinpath force-pushed the findinpath/delta-checkpoint-entry-or branch from dcc9c6d to 6d8e652 Compare October 18, 2023 11:26
findinpath and others added 9 commits October 18, 2023 13:56
Typically metadata files should be accessed once, so prefer `.add(x)`
over `.addCopies(x, 1)`, so that `.addCopies` stand out as potentially
something to address.
Once the metadata & protocol entries are found, the scanning of
multi-part checkpoint files can be stopped.
The `metadata` & `protocol` entries are already read (and saved) once
when retrieving the table handle.
Reuse this information while retrieving the active files for the table.
@findinpath findinpath force-pushed the findinpath/delta-checkpoint-entry-or branch from 6d8e652 to b00803c Compare October 18, 2023 16:14
findepi and others added 9 commits October 18, 2023 20:18
It was found to be confusing.
Adds SESSION_AUTHORIZATION to ClientCapabilities, so that the
client side support for this feature can be identified using
the headers sent to Trino.
This upgrades Delta Lake version to 3.0.0.
Refactor TestRetryingConnectionFactory and TestLazyConnectionFactory
to use Dependency Injection Motivation behind this change is to be
able to later add dependent objects to tested class without a need
for major changes in the test class itself.
Extend the logic to specially treat metadata queries and not allocate
any memory for those, so they are scheduled immediatelly, to cover
all schematas from system catalog (previously only jdbc schama was
covered).
@findinpath findinpath force-pushed the findinpath/delta-checkpoint-entry-or branch 2 times, most recently from e134ec5 to 000da7e Compare October 19, 2023 14:55
takezoe and others added 2 commits October 19, 2023 16:52
This bit of code is making the assumption that it's safe to remove the Sort because there are no exchanges yet. However, it's forgetting the fact that the data needs to be sorted, so later when the exchange is added this expectation is broken.
Implements column-wise hash calculations in FlatHashCompiler and changes
FlatGroupByHash to use it to implement a batched approach to first
computing a range of position hashes and then attempting to insert those
positions using the hashes that were precomputed.

By calculating position hashes in a columnar tarversal path, we can
avoid repeated expensive bounds checking and allow the JIT to unroll
loops into much more efficient forms.

Additionally, when attempting to insert the positions into the
FlatGroupByHash after precomputing the hash code we can start loading
the relevant portition of the hash table into memory sooner since we
aren't intermixing computing the hash with the memory accesses.
dain and others added 22 commits October 20, 2023 20:44
These implementations delegate to Block, but the block implementation
for these types do not support getSlice
The connector runs connector smoke tests with HMS.
The synthetic dictionary, rle, and lazy blocks require special handling
in the codebase, and new synthetic blocks cannot reasonable be added
without updating the existing code. This commit blocks direct extensions
of Block, only allowing extensions to ValueBlock.

Seal Block interface
Mark ValueBlock interface non-sealed
Mark all current ValueBlock implementations as final
Co-authored-by: Yuya Ebihara <ebyhry@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.