Skip to content

Improve Clickhouse pushdown. Attempt 1#22921

Closed
ssheikin wants to merge 5 commits intotrinodb:masterfrom
ssheikin:ssheikin/65/trino/clickhouse-minmax
Closed

Improve Clickhouse pushdown. Attempt 1#22921
ssheikin wants to merge 5 commits intotrinodb:masterfrom
ssheikin:ssheikin/65/trino/clickhouse-minmax

Conversation

@ssheikin
Copy link
Copy Markdown
Contributor

@ssheikin ssheikin commented Aug 2, 2024

Description

Per https://clickhouse.com/docs/en/sql-reference/statements/show#show_columns ClickHouse has no per-column collations
Clickhouse is UTF-8 encoded with byte-by-byte comparison. https://clickhouse.com/docs/en/sql-reference/statements/select/order-by#collation-support So exactly as trino.
https://github.com/airlift/slice/blob/2.2/src/main/java/io/airlift/slice/Slice.java#L1205 That’s why we can pushdown all operations on varchars.

Additional context and related issues

Relates to

Superceeded by

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* . ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Aug 2, 2024
@ssheikin ssheikin changed the title Assume remote collation is case sensitive Assume remote collation is case sensitive in ClickHouse Aug 2, 2024
@ssheikin ssheikin force-pushed the ssheikin/65/trino/clickhouse-minmax branch from 194a5df to 233f342 Compare August 2, 2024 15:24
@ssheikin ssheikin changed the title Assume remote collation is case sensitive in ClickHouse Improve Clickhouse varchar pushdown Aug 2, 2024
@ssheikin ssheikin force-pushed the ssheikin/65/trino/clickhouse-minmax branch 5 times, most recently from c92db94 to 4592e44 Compare August 2, 2024 16:31
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it something we could restrict the scope inside applyJoin itself ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a bit tricky -

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find prettier solution.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having different version or ConnectorRewrite for each predicate like filter or join or aggregation might make the framework brittle. Can we implement something like isSupportedJoinCondition for the newer join framework ? Plus doing this change just for one connector i.e Clickhouse doesn't support inequality joins as of now might also be tranisient as Clickhouse might resolve them sometime in the future. This code might loose its importance once it has been implemented.

Comment on lines 261 to 264
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would be applied for all the data types incl double and real as well - What would be its behaviour wrt to NaN and Infinity ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relates to #23496
pushdowns for real are disabled,
inequalities for double to be tested

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having different version or ConnectorRewrite for each predicate like filter or join or aggregation might make the framework brittle. Can we implement something like isSupportedJoinCondition for the newer join framework ? Plus doing this change just for one connector i.e Clickhouse doesn't support inequality joins as of now might also be tranisient as Clickhouse might resolve them sometime in the future. This code might loose its importance once it has been implemented.

@ssheikin ssheikin requested a review from SemionPar August 8, 2024 11:40
@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Aug 29, 2024
@mosabua
Copy link
Copy Markdown
Member

mosabua commented Aug 29, 2024

I assume you are still running with the @ssheikin and @Praveen2112 ..

@Praveen2112
Copy link
Copy Markdown
Member

Yes

@github-actions github-actions bot removed the stale label Aug 30, 2024
from  convert predicate for filtering, because constructing predicates
for join may be more restrictive than for filtering. E.g. in clickhouse
https://clickhouse.com/docs/en/sql-reference/statements/select/join#experimental-join-with-inequality-conditions-for-columns-from-different-tables
Assume remote collation is case sensitive
Per https://clickhouse.com/docs/en/sql-reference/statements/show#show_columns
ClickHouse has no per-column collations
Clickhouse is UTF-8 encoded with byte-by-byte comparison.
https://clickhouse.com/docs/en/sql-reference/statements/select/order-by#collation-support
So exactly as trino.
https://github.com/airlift/slice/blob/2.2/src/main/java/io/airlift/slice/Slice.java#L1205
That’s why we can pushdown all operations on varchars.

Predicate pushdown
- inequality
- in
- like
- not equal (!=, <>)
- less than (<)
- less than or equal (<=)
- greater than (>)
- greater than or equal (>=)
- between
- is null
- is not null
Assume remote collation is case sensitive
Per https://clickhouse.com/docs/en/sql-reference/statements/show#show_columns
ClickHouse has no per-column collations
Clickhouse is UTF-8 encoded with byte-by-byte comparison.
https://clickhouse.com/docs/en/sql-reference/statements/select/order-by#collation-support
So exactly as trino.
https://github.com/airlift/slice/blob/2.2/src/main/java/io/airlift/slice/Slice.java#L1205
That’s why we can pushdown all operations on varchars.
@ssheikin ssheikin force-pushed the ssheikin/65/trino/clickhouse-minmax branch from 4592e44 to 9e49ef7 Compare September 16, 2024 12:49
@ssheikin ssheikin changed the title Improve Clickhouse varchar pushdown Improve Clickhouse pushdown Sep 18, 2024
@ssheikin ssheikin changed the title Improve Clickhouse pushdown Improve Clickhouse pushdown. Attempt 1. Sep 20, 2024
@ssheikin ssheikin closed this Sep 20, 2024
@ssheikin ssheikin changed the title Improve Clickhouse pushdown. Attempt 1. Improve Clickhouse pushdown. Attempt 1 Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants