Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve recovery of ReplicatedAccessStorage after errors. #39977

Conversation

vitlibar
Copy link
Member

@vitlibar vitlibar commented Aug 8, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improve recovery of ReplicatedAccessStorage after errors.

@robot-ch-test-poll robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Aug 8, 2022
@vitlibar vitlibar force-pushed the improve-recovery-of-replicated-access-storage branch from c7dbcd6 to 644e2d2 Compare August 9, 2022 11:48
@vitlibar vitlibar marked this pull request as ready for review August 9, 2022 16:05
@vitlibar vitlibar changed the title [WIP] Improve recovery of ReplicatedAccessStorage after errors. Improve recovery of ReplicatedAccessStorage after errors. Aug 9, 2022
@vitlibar vitlibar force-pushed the improve-recovery-of-replicated-access-storage branch 6 times, most recently from a0e5cc6 to d083bc8 Compare August 10, 2022 19:05
@antonio2368 antonio2368 self-assigned this Aug 18, 2022
Comment on lines +558 to 532
if (name_collision)
id_by_name = it_by_name->second->id;

if (name_collision && !replace_if_exists)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (name_collision)
id_by_name = it_by_name->second->id;
if (name_collision && !replace_if_exists)
{
if (name_collision)
id_by_name = it_by_name->second->id;
if (!replace_if_exists)
{

Comment on lines +94 to 96
if (name_collision)
id_by_name = it_by_name->second->id;

if (name_collision && !replace_if_exists)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (name_collision)
id_by_name = it_by_name->second->id;
if (name_collision && !replace_if_exists)
if (name_collision)
id_by_name = it_by_name->second->id;
if (!replace_if_exists)
{

Copy link
Member Author

@vitlibar vitlibar Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep the code like this for symmetry with later code, see the condition below

if (id_collision && !replace_if_exists)

@@ -59,7 +60,7 @@ ReplicatedAccessStorage::ReplicatedAccessStorage(
if (zookeeper_path.front() != '/')
zookeeper_path = "/" + zookeeper_path;

initializeZookeeper();
initZooKeeperIfNeeded();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be in try-catch block so CH doesn't fail to startup on unstable ZK connection?
(E.g. operation timeout when root nodes are being created)
All other functions will simply initialize it later on through getZooKeeper right?

Copy link
Member Author

@vitlibar vitlibar Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure on startup can be because of a wrong configuration and in this case it's can be better to stop immediately.
Also ReplicatedAccessStorage can contain some critical users or roles and if they don't appear after start it can cause confusion and even security problems. But yes, you're right connection to ZK can be unstable. I think a simple retry might help a bit with unstable connection without problems related to later initialization.

@@ -46,6 +46,7 @@ ReplicatedAccessStorage::ReplicatedAccessStorage(
, zookeeper_path(zookeeper_path_)
, get_zookeeper(get_zookeeper_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be deleted now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's in use, see ReplicatedAccessStorage::getZooKeeperNoLock()

Comment on lines +558 to +585
auto entity = tryReadEntityFromZooKeeper(zookeeper, id);
if (entity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto entity = tryReadEntityFromZooKeeper(zookeeper, id);
if (entity)
if (auto entity = tryReadEntityFromZooKeeper(zookeeper, id);
entity)

Comment on lines +576 to +603
bool exists = zookeeper->tryGetWatch(entity_path, entity_definition, &entity_stat, watch_entity);
if (!exists)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool exists = zookeeper->tryGetWatch(entity_path, entity_definition, &entity_stat, watch_entity);
if (!exists)
if (exists = zookeeper->tryGetWatch(entity_path, entity_definition, &entity_stat, watch_entity);
!exists)

Copy link
Member Author

@vitlibar vitlibar Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a bit less readable because the scope of the variable exists doesn't really matter here.



# ReplicatedAccessStorage must be able to continue working after reloading ZooKeeper.
def test_reload_zookeeper(started_cluster):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test where you start CH without ZK?
You can use

with PartitionManager() as pm:
    pm.drop_instance_zk_connections(node1)

@vitlibar vitlibar force-pushed the improve-recovery-of-replicated-access-storage branch from 2d26ab5 to 3c0c41c Compare September 16, 2022 16:40
@vitlibar vitlibar force-pushed the improve-recovery-of-replicated-access-storage branch 2 times, most recently from 34f3409 to 0ce9ef5 Compare September 16, 2022 17:20
@vitlibar vitlibar force-pushed the improve-recovery-of-replicated-access-storage branch from 0ce9ef5 to b2868cc Compare September 18, 2022 10:23
@vitlibar vitlibar force-pushed the improve-recovery-of-replicated-access-storage branch from b2868cc to e7e51ab Compare September 18, 2022 10:44
@vitlibar vitlibar merged commit b119987 into ClickHouse:master Sep 19, 2022
@vitlibar vitlibar deleted the improve-recovery-of-replicated-access-storage branch September 19, 2022 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants