Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Add support for hash join on rowid #203

Merged
merged 1 commit into from
Feb 23, 2023
Merged

Add support for hash join on rowid #203

merged 1 commit into from
Feb 23, 2023

Conversation

alexbaden
Copy link
Contributor

Support join on rowid by skipping the hash table build and computing the row offset in the probe. Offset computation consists of null handling and range handling. For an empty table, we use roughly the expression range of (0, -1) which always returns -1 inside the newly added rowid_hash_join_idx runtime function.

There is one limitation -- because of this approach of not building the table we cannot support a query of the form t1.rowid = t2.rowid AND .... . Loop joins still work of course. If this query becomes something we need to support with hash join, we can push the secondary join expression into filters and run a mini-loop join on each row that passes the rowid "hash".

Added a bunch of tests but could probably use some more coverage. Open to ideas. The way I force the hash table builder to skip the build is by marking it empty during construction. This is kind of a hack, but I think it's ok for now with the checks I've added.

Closes #165

Copy link
Contributor

@ienkovich ienkovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

executor_->cgen_state_->castToTypeIn(key_lvs.front(), 64)};

const auto& query_info = getInnerQueryInfo(val_col_var).info;
if (query_info.getPhysicalNumTuples() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why we can't always use rhs_source_col_range_. In theory, for an empty table, we should have some invalid ranges for all columns where max < min.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's set for rowid specially -- I wonder if that's worth changing? https://github.com/intel-ai/hdk/blob/main/omniscidb/QueryEngine/ExpressionRange.cpp#L547

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a bug we might hit someday. I guess the referenced handling should go after the next if-statement, not before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, you're probably right. I'll try it in a separate branch.

@alexbaden alexbaden merged commit 274907a into main Feb 23, 2023
@alexbaden alexbaden deleted the alex/join_on_rowid branch February 23, 2023 01:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to join tables if there are more than 1000 rows
2 participants