Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix waiting while dropping empty parts #23315

Merged
merged 10 commits into from
Jun 1, 2021

Conversation

CurtizJ
Copy link
Member

@CurtizJ CurtizJ commented Apr 19, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix waiting of automatic dropping of empty parts. It could lead to full filling of background pool and stuck of replication.

Fixes #23292.

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Apr 19, 2021
@CurtizJ
Copy link
Member Author

CurtizJ commented Apr 19, 2021

It's hard to implement test with stable behaviour for this PR, because dropping of empty parts is executing asynchronously and there is no mechanism to control time of it's start.

@alexey-milovidov alexey-milovidov self-assigned this Apr 19, 2021
auto parts = getDataPartsVector();
for (const auto & part : parts)
{
if (part->rows_count == 0)
{
ASTPtr literal = std::make_shared<ASTLiteral>(part->name);
/// If another replica has already started drop, it's ok, no need to throw.
dropPartition(literal, /* detach = */ false, /*drop_part = */ true, getContext(), /* throw_if_noop = */ false);
dropPartition(literal, /* detach = */ false, /*drop_part = */ true, context_for_drop, /* throw_if_noop = */ false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ugly. Why don't extract another method from dropPartition that will do the internal parts without the whole query-related logic?

@alexey-milovidov
Copy link
Member

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Apr 21, 2021

Command update: success

Branch has been successfully updated

virtual void dropPartition(const ASTPtr & partition, bool detach, bool drop_part, ContextPtr context, bool throw_if_noop = true) = 0;
/// Tries to drop part in background without any waits or throwing exceptions in case of errors.
virtual void dropPart(const String & name) = 0;
virtual void dropPartition(const ASTPtr & partition, bool detach, bool drop_part, ContextPtr context) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now the parameter drop_part looks weird.

@@ -624,12 +624,13 @@ class StorageReplicatedMergeTree final : public ext::shared_ptr_helper<StorageRe
/// Info about how other replicas can access this one.
ReplicatedMergeTreeAddress getReplicatedMergeTreeAddress() const;

void dropPart(const String & name) override;
bool dropPart(zkutil::ZooKeeperPtr & zookeeper, String part_name, LogEntry & entry, bool detach, bool throw_if_noop);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the code becomes prohibitively complicated.

@@ -624,12 +624,13 @@ class StorageReplicatedMergeTree final : public ext::shared_ptr_helper<StorageRe
/// Info about how other replicas can access this one.
ReplicatedMergeTreeAddress getReplicatedMergeTreeAddress() const;

void dropPart(const String & name) override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least need to name the method differently.

@@ -624,12 +624,13 @@ class StorageReplicatedMergeTree final : public ext::shared_ptr_helper<StorageRe
/// Info about how other replicas can access this one.
ReplicatedMergeTreeAddress getReplicatedMergeTreeAddress() const;

void dropPart(const String & name) override;
bool dropPart(zkutil::ZooKeeperPtr & zookeeper, String part_name, LogEntry & entry, bool detach, bool throw_if_noop);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to private? Name it with Impl?

@alexey-milovidov alexey-milovidov added the st-waiting-for-fix We are waiting for fixes in this issue or pull request label May 10, 2021
@CurtizJ
Copy link
Member Author

CurtizJ commented May 17, 2021

Stress test looks suspicious:

2021.05.17 20:03:59.789358 [ 269 ] {} <Fatal> BaseDaemon: (version 21.6.1.6861, build id: 51338596DD99F8FD4D9DB1771AC33D051563333F) (from thread 333) Terminate called for uncaught exception:
Code: 49, e.displayText() = DB::Exception: Part 0_0_1_999999999 intersects previous part 0_0_0_0_1. It is a bug., Stack trace (when copying this message, always include the lines below):
2021.05.17 20:03:59.634206 [ 272 ] {ba358a21-a74a-49d8-bab6-d71ea3ace476} <Debug> executeQuery: (from [::1]:44052, using production parser) alter table rmt drop partition '0';
2021.05.17 20:03:59.634415 [ 272 ] {ba358a21-a74a-49d8-bab6-d71ea3ace476} <Trace> ContextAccess (default): Access granted: ALTER DELETE ON test_13.rmt
2021.05.17 20:03:59.682750 [ 272 ] {ba358a21-a74a-49d8-bab6-d71ea3ace476} <Trace> test_13.rmt: Deleted 1 deduplication block IDs in partition ID 0
2021.05.17 20:03:59.682818 [ 272 ] {ba358a21-a74a-49d8-bab6-d71ea3ace476} <Debug> test_13.rmt: Disabled merges covered by range 0_0_1_999999999
2021.05.17 20:03:59.695639 [ 272 ] {ba358a21-a74a-49d8-bab6-d71ea3ace476} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

@CurtizJ
Copy link
Member Author

CurtizJ commented May 17, 2021

Probably fix is here: #24208.

{
/// Asks to complete merges and does not allow them to start.
/// This protects against "revival" of data for a removed partition after completion of merge.
auto merge_blocker = stopMergesAndWait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a pity that we can't do it per partition... Stopping (and waiting) of all merges in 'hot' partition during dropping some old partition is not great. And while files are removed no merges will start.

Maybe we can release the merge_blocker bit earlier - for ex. after removePartsFromWorkingSet ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lock can released after removePartsFromWorkingSet. It's better to lock for merges only parts, that are currently removing, but not in this PR.

@alexey-milovidov alexey-milovidov merged commit c7bd3b8 into ClickHouse:master Jun 1, 2021
azat added a commit to azat/ClickHouse that referenced this pull request Jun 30, 2021
CI reports failure of the 01442_merge_detach_attach test [1]:

<details>

    2021-06-21 02:25:43 01442_merge_detach_attach:                                              [ FAIL ] 122.37 sec. - result differs with reference:
    2021-06-21 02:25:43 --- /usr/share/clickhouse-test/queries/0_stateless/01442_merge_detach_attach.reference	2021-06-21 00:43:12.000000000 +0300
    2021-06-21 02:25:43 +++ /tmp/clickhouse-test/0_stateless/01442_merge_detach_attach.stdout	2021-06-21 02:25:43.211212197 +0300
    2021-06-21 02:25:43 @@ -0,0 +1 @@
    2021-06-21 02:25:43 +2

    2021.06.21 02:25:08.930896 [ 100543 ] {16eb4fe5-2d6b-4c81-a6be-c6a3b293edd3} <Debug> executeQuery: (from [::1]:36540, using production parser) (comment: '/usr/share/clickhouse-test/queries/0_stateless/01442_merge_detach_attach.sh') ALTER TABLE t DETACH PARTITION tuple()
    2021.06.21 02:25:08.931245 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> executeQuery: (from [::1]:36542, using production parser) (comment: '/usr/share/clickhouse-test/queries/0_stateless/01442_merge_detach_attach.sh') OPTIMIZE TABLE t FINAL
    2021.06.21 02:25:08.931826 [ 100543 ] {16eb4fe5-2d6b-4c81-a6be-c6a3b293edd3} <Trace> ContextAccess (default): Access granted: ALTER DELETE ON test_89nl0v.t
    2021.06.21 02:25:08.932159 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Trace> ContextAccess (default): Access granted: OPTIMIZE ON test_89nl0v.t
    2021.06.21 02:25:08.932889 [ 100543 ] {16eb4fe5-2d6b-4c81-a6be-c6a3b293edd3} <Information> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249): Detaching all_143_143_0
    2021.06.21 02:25:08.932921 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249) (MergerMutator): Selected 2 parts from all_143_143_0 to all_144_144_0
    2021.06.21 02:25:08.933530 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> DiskLocal: Reserving 1.00 MiB on disk `default`, having unreserved 4.60 TiB.
    2021.06.21 02:25:08.933705 [ 100543 ] {16eb4fe5-2d6b-4c81-a6be-c6a3b293edd3} <Information> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249): Detaching all_144_144_0
    2021.06.21 02:25:08.934215 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249) (MergerMutator): Merging 2 parts: from all_143_143_0 to all_144_144_0 into Compact
    2021.06.21 02:25:08.934280 [ 100543 ] {16eb4fe5-2d6b-4c81-a6be-c6a3b293edd3} <Information> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249): Detached 2 parts.
    2021.06.21 02:25:08.934948 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249) (MergerMutator): Selected MergeAlgorithm: Horizontal
    2021.06.21 02:25:08.936090 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> MergeTreeSequentialSource: Reading 2 marks from part all_143_143_0, total 1 rows starting from the beginning of the part, column x
    2021.06.21 02:25:08.937621 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> MergeTreeSequentialSource: Reading 2 marks from part all_144_144_0, total 1 rows starting from the beginning of the part, column x
    2021.06.21 02:25:08.938124 [ 100543 ] {16eb4fe5-2d6b-4c81-a6be-c6a3b293edd3} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
    2021.06.21 02:25:08.939928 [ 100543 ] {} <Debug> TCPHandler: Processed in 0.018537432 sec.
    2021.06.21 02:25:08.942140 [ 100543 ] {} <Debug> TCPHandler: Done processing connection.
    2021.06.21 02:25:08.948343 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249) (MergerMutator): Merge sorted 2 rows, containing 1 columns (1 merged, 0 gathered) in 0.014203821 sec., 140.80718139154246 rows/sec., 140.81 B/sec.
    2021.06.21 02:25:08.952021 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Trace> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249): Renaming temporary part tmp_merge_all_143_144_1 to all_143_144_1.
    2021.06.21 02:25:08.952869 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Warning> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249) (MergerMutator): Unexpected number of parts removed when adding all_143_144_1: 0 instead of 2
    2021.06.21 02:25:08.953264 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Trace> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249) (MergerMutator): Merged 2 parts: from all_143_143_0 to all_144_144_0
    2021.06.21 02:25:08.953913 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> MemoryTracker: Peak memory usage: 4.01 MiB.
    2021.06.21 02:25:08.958369 [ 100540 ] {bd4b710d-cd27-44ea-9dc9-e45d6ace0bd7} <Debug> MemoryTracker: Peak memory usage (for query): 4.01 MiB.
    ...
    2021.06.21 02:25:09.216075 [ 100543 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Debug> executeQuery: (from [::1]:36544, using production parser) (comment: '/usr/share/clickhouse-test/queries/0_stateless/01442_merge_detach_attach.sh') SELECT count() FROM t HAVING count() > 0
    2021.06.21 02:25:09.229491 [ 100543 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Trace> ContextAccess (default): Access granted: SELECT(x) ON test_89nl0v.t
    2021.06.21 02:25:09.232000 [ 100543 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
    2021.06.21 02:25:09.239907 [ 100543 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Debug> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249) (SelectExecutor): Key condition: unknown
    2021.06.21 02:25:09.240358 [ 100543 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Debug> test_89nl0v.t (efd364af-6721-4249-afd3-64af67214249) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 1/1 marks by primary key, 1 marks to read from 1 ranges
    2021.06.21 02:25:09.241560 [ 100543 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Debug> MergeTreeSelectProcessor: Reading 1 ranges from part all_143_144_1, approx. 2 rows starting from 0
    2021.06.21 02:25:09.256053 [ 58403 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Trace> AggregatingTransform: Aggregating
    2021.06.21 02:25:09.256410 [ 58403 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Trace> Aggregator: Aggregation method: without_key
    2021.06.21 02:25:09.257576 [ 58403 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Debug> AggregatingTransform: Aggregated. 2 to 1 rows (from 2.00 B) in 0.013910412 sec. (143.777 rows/sec., 143.78 B/sec.)
    2021.06.21 02:25:09.257911 [ 58403 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Trace> Aggregator: Merging aggregated data
    2021.06.21 02:25:09.262595 [ 100543 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Information> executeQuery: Read 2 rows, 2.00 B in 0.045805192 sec., 43 rows/sec., 43.66 B/sec.
    2021.06.21 02:25:09.263337 [ 100543 ] {1699b404-b132-4b96-b04a-1c28cd062a79} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

</details>

  [1]: https://clickhouse-test-reports.s3.yandex.net/25513/b96df135aee40b1a54a4fc3f0d4db89e11385564/functional_stateless_tests_(memory).html#fail1

The problem here is that there is a tiny race window between DETACH
and OPTIMIZE.

DETACH cancel merges, and checks that there no merges for parts
currently under lock, and later changes the state of parts. After it
allows merges.

However OPTIMIZE get parts for processing under lock and only when it
starts writing temporary part it checks that merges are not canceled.

So suppose the following scenario:

      T#OPTIMIZE                       T#DETACH

                                       - cancel merges
                                           - lock
                                             - check that no merges are in progress
                                           - unlock
    - lock
      - get parts

          *<--Here, state of the parts are not changed yet,
              and lock is not held,
              so OPTIMIZE may (and will) get those parts. -->*

      - write temporary part
        - check merges not canceled
    - unlock
                                           - change the parts state
                                       - allows merges

Plus this patch will also fail merges earlier in case of concurrent
DETACH.

Refs: ClickHouse#13746
Refs: ClickHouse#23315
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix Pull request with bugfix, not backported by default st-waiting-for-fix We are waiting for fixes in this issue or pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replication may stuck due to clearEmptyParts (>=20.12)
5 participants