Trivial count optimization with primary key #60463

amosbird · 2024-02-27T17:44:14Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support partial trivial count optimization when the query filter is able to select exact ranges from merge tree tables.

This implements a more general version of #36732

This PR also improves key analysis with less checkInRange calls.
This PR also improves projection analysis with less selectRangesToRead calls.
This PR also fixes incorrect logic in BoolMask, although they are unused before.

robot-ch-test-poll1 · 2024-02-27T17:51:04Z

This is an automated comment for commit 9419a25 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	❌ failure
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	❌ failure
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ failure
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	❌ failure

Successful checks

Check name	Description	Status
A Sync	If it fails, ask a maintainer for help	✅ success
ClickBench	Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table	✅ success
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker keeper image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docker server image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docs check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	✅ success
Mergeable Check	Checks if all other necessary checks are successful	✅ success
PR Check	Checks correctness of the PR's body	✅ success
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Style check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	✅ success

KochetovNicolai · 2024-03-11T17:56:09Z

src/Processors/QueryPlan/Optimizations/optimizeUseAggregateProjection.cpp

+    }
+    else /// trivial count optimization
+    {
+        expr_or_filter_node = &projection_reading_node;


Let's rename expr_or_filter_node to expr_or_filter_or_projection_reading_node or to something else shorter :)

KochetovNicolai · 2024-03-11T18:01:55Z

src/Processors/QueryPlan/Optimizations/optimizeUseAggregateProjection.cpp

+    /// Stores row count from exact ranges of parts.
+    size_t exact_count = 0;


Let's remove this from AggregateProjectionCandidates. This is only initialized/used in optimizeUseAggregateProjections itself.

KochetovNicolai · 2024-03-11T18:42:04Z

src/Processors/QueryPlan/ReadFromMergeTree.h

@@ -101,7 +101,9 @@ class ReadFromMergeTree final : public SourceStepWithFilter
        UInt64 selected_marks_pk = 0;
        UInt64 total_marks_pk = 0;
        UInt64 selected_rows = 0;
+        bool find_exact_ranges = false;


Rename to has_exact_ranges?

KochetovNicolai · 2024-03-12T11:26:31Z

src/Storages/MergeTree/KeyCondition.cpp

@@ -88,13 +88,14 @@ String extractFixedPrefixFromLikePattern(std::string_view like_pattern, bool req
    return fixed_prefix;
 }

-/// for "^prefix..." string it returns "prefix"
-static String extractFixedPrefixFromRegularExpression(const String & regexp)
+/// for "^prefix..." string it returns ("prefix", is inclusive)


Let's add a comment what is inclusive flag means.

Can we add this change in a separate PR?
And, possibly, other changes in KeyCondition which are not about trivial count optimization.

Sure. I'll extract common building blocks into a separate PR.

KochetovNicolai · 2024-03-12T11:36:51Z

src/Storages/MergeTree/KeyCondition.cpp

@@ -158,6 +162,7 @@ static String extractFixedPrefixFromRegularExpression(const String & regexp)
                if (!fixed_prefix.empty())
                    fixed_prefix.pop_back();

+                inclusive = false;


This is at least not clear to me.
E.g. for abc.* , would not this return abc, false? (I expect abc should be accepted for this regex)

KochetovNicolai · 2024-03-12T11:59:42Z

src/Storages/MergeTree/KeyCondition.cpp

@@ -1797,6 +1802,8 @@ bool KeyCondition::extractAtomFromTree(const RPNBuilderTreeNode & node, RPNEleme
                    func_name = "lessOrEquals";
                else if (func_name == "greater")
                    func_name = "greaterOrEquals";
+
+                relaxed = true;


It is not clear to me that this is the only case.
I don't know how to prove the correctness.

Ideally, I would prefer some kind of approach that is clear to be correct, even if it works only for very simple cases.

I will provide some proofs.

KochetovNicolai · 2024-03-12T14:57:55Z

src/Storages/MergeTree/BoolMask.h

+    bool isComplete(const BoolMask & initial_mask) const
    {
-        return {can_be_false, can_be_true};
+        if (initial_mask == consider_only_can_be_true)
+            return can_be_true;
+        else if (initial_mask == consider_only_can_be_false)
+            return can_be_false;
+        else
+            return can_be_true && can_be_false;
    }


I did not understand why this change is needed.

const BoolMask BoolMask::consider_only_can_be_true(false, true); const BoolMask BoolMask::consider_only_can_be_false(true, false);

Will explain it in separate PR.

KochetovNicolai · 2024-03-12T15:38:53Z

src/Storages/MergeTree/KeyCondition.cpp

@@ -3106,6 +3113,58 @@ bool KeyCondition::alwaysFalse() const
    return rpn_stack[0] == 0;
 }

+bool KeyCondition::canBeFalseAlwaysUnknown() const


I don't fully understand the implementation.
If we have the function canBeTrueAlwaysUnknown, what would be the difference?

If we have the function canBeTrueAlwaysUnknown, what would be the difference?

The main difference would be the implementation of RPNElement::FUNCTION_OR and RPNElement::FUNCTION_AND.

atom_can_be_false_always_unknown AND any_atom => atom_can_be_false_always_unknown

atom_can_be_true_always_unknown OR any_atom => atom_can_be_true_always_unknown

canBeFalseAlwaysUnknown is introduced as an optimization to find exact ranges during key analysis. If the given key condition is not possible to tell some hyperrectangle can be false, exact ranges cannot be determined. We'll disable exact range finding and fall back to faster analysis by checking .can_be_true only.

KochetovNicolai · 2024-03-12T15:53:32Z

src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

        {
-            if (range.end == marks_count && !has_final_mark)


It's to ease the implementation of finding exact ranges (end points are important). Will refactor this in separate PR.

KochetovNicolai · 2024-03-12T15:54:17Z

src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp

@@ -1158,28 +1179,23 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange(
            }
            else
            {
-                if (has_final_mark && range.end == marks_count)


don't understand why this isremoved

amosbird · 2024-06-07T07:51:23Z

AST fuzzer (asan) — Logical error: 'RangeReader read 1 rows, but 8192 expected.'.

#64960

Stateless tests (coverage) [6/6] — fail: 1, passed: 1084, skipped: 2

Unrelated. Failure in share_big_sets_between_multiple_mutations_tasks_long also found in many other PRs.

Stress test (tsan) — Killed by signal (in clickhouse-server.log)

#64054

robot-ch-test-poll1 added the pr-improvement Pull request with some product improvements label Feb 27, 2024

KochetovNicolai self-assigned this Feb 27, 2024

amosbird marked this pull request as ready for review March 1, 2024 13:51

KochetovNicolai reviewed Mar 12, 2024

View reviewed changes

amosbird mentioned this pull request Mar 15, 2024

Refactor KeyCondition and key analysis #61459

Merged

amosbird force-pushed the trivial-count-opt branch from 0e618ca to 683dc03 Compare June 4, 2024 16:02

Trivial count optimization with primary key

9419a25

amosbird force-pushed the trivial-count-opt branch from 683dc03 to 9419a25 Compare June 6, 2024 15:21

KochetovNicolai approved these changes Jun 11, 2024

View reviewed changes

KochetovNicolai added this pull request to the merge queue Jun 11, 2024

Merged via the queue into ClickHouse:master with commit 96fcc30 Jun 11, 2024
239 of 246 checks passed

robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 11, 2024

amosbird mentioned this pull request Jul 12, 2024

Clean up projection inside storage snapshot #66443

Merged

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trivial count optimization with primary key #60463

Trivial count optimization with primary key #60463

amosbird commented Feb 27, 2024 •

edited by Algunenano

Loading

robot-ch-test-poll1 commented Feb 27, 2024 •

edited by robot-clickhouse-ci-2

Loading

KochetovNicolai Mar 11, 2024

KochetovNicolai Mar 11, 2024

KochetovNicolai Mar 11, 2024

KochetovNicolai Mar 12, 2024

KochetovNicolai Mar 12, 2024

amosbird Mar 13, 2024

KochetovNicolai Mar 12, 2024

KochetovNicolai Mar 12, 2024

amosbird Mar 13, 2024

KochetovNicolai Mar 12, 2024

amosbird Mar 13, 2024

KochetovNicolai Mar 12, 2024

amosbird Mar 13, 2024 •

edited

Loading

KochetovNicolai Mar 12, 2024

amosbird Mar 13, 2024

KochetovNicolai Mar 12, 2024

amosbird commented Jun 7, 2024

		/// Stores row count from exact ranges of parts.
		size_t exact_count = 0;

Trivial count optimization with primary key #60463

Trivial count optimization with primary key #60463

Conversation

amosbird commented Feb 27, 2024 • edited by Algunenano Loading

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

robot-ch-test-poll1 commented Feb 27, 2024 • edited by robot-clickhouse-ci-2 Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amosbird Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amosbird commented Jun 7, 2024

amosbird commented Feb 27, 2024 •

edited by Algunenano

Loading

robot-ch-test-poll1 commented Feb 27, 2024 •

edited by robot-clickhouse-ci-2

Loading

amosbird Mar 13, 2024 •

edited

Loading