Skip to content

Commit 3b34575

Browse files
committed
clickhouse: prevent replicated tables from starting in read-only mode.
On start, ClickHouse compares the local state of each distributed table to its distributed state. If it finds a discrepancy, it starts the table in read-only mode. When this happens, oximeter can't write new records to the relevant table(s). In the past, we've worked around this by manually instructing ClickHouse using the `force_restore_data` sentinel file, but this requires manual detection and intervention each time a table starts up in read-only mode. This patch sets the `replicated_max_ratio_of_wrong_parts` flag to 1.0 so that ClickHouse always accepts shared state, and never starts tables in read-only mode. As described in ClickHouse/ClickHouse#66527, this appears to be a bug, or at least an ergonomic flaw, in ClickHouse. One replica of a table can routinely fall behind the others, e.g. due to restart or network partition, and shouldn't require manual intervention to start back up. Part of #8595.
1 parent c8716e2 commit 3b34575

File tree

2 files changed

+10
-2
lines changed

2 files changed

+10
-2
lines changed

clickhouse-admin/types/src/config.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,9 +201,13 @@ impl ReplicaConfig {
201201
<max_tasks_in_queue>1000</max_tasks_in_queue>
202202
</distributed_ddl>
203203
204-
<!-- Disable sparse column serialization, which we expect to not need -->
205204
<merge_tree>
205+
<!-- Disable sparse column serialization, which we expect to not need -->
206206
<ratio_of_defaults_for_sparse_serialization>1.0</ratio_of_defaults_for_sparse_serialization>
207+
208+
<!-- Prevent ClickHouse from setting distributed tables to read-only. -->
209+
<!-- See https://github.com/oxidecomputer/omicron/issues/8595 for details. -->
210+
<replicated_max_ratio_of_wrong_parts>1.0</replicated_max_ratio_of_wrong_parts>
207211
</merge_tree>
208212
{macros}
209213
{remote_servers}

clickhouse-admin/types/testutils/replica-server-config.xml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,9 +104,13 @@
104104
<max_tasks_in_queue>1000</max_tasks_in_queue>
105105
</distributed_ddl>
106106

107-
<!-- Disable sparse column serialization, which we expect to not need -->
108107
<merge_tree>
108+
<!-- Disable sparse column serialization, which we expect to not need -->
109109
<ratio_of_defaults_for_sparse_serialization>1.0</ratio_of_defaults_for_sparse_serialization>
110+
111+
<!-- Prevent ClickHouse from setting distributed tables to read-only. -->
112+
<!-- See https://github.com/oxidecomputer/omicron/issues/8595 for details. -->
113+
<replicated_max_ratio_of_wrong_parts>1.0</replicated_max_ratio_of_wrong_parts>
110114
</merge_tree>
111115

112116
<macros>

0 commit comments

Comments
 (0)