-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve recovery of ReplicatedAccessStorage after errors. #39977
Improve recovery of ReplicatedAccessStorage after errors. #39977
Conversation
c7dbcd6
to
644e2d2
Compare
a0e5cc6
to
d083bc8
Compare
if (name_collision) | ||
id_by_name = it_by_name->second->id; | ||
|
||
if (name_collision && !replace_if_exists) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (name_collision) | |
id_by_name = it_by_name->second->id; | |
if (name_collision && !replace_if_exists) | |
{ | |
if (name_collision) | |
id_by_name = it_by_name->second->id; | |
if (!replace_if_exists) | |
{ |
if (name_collision) | ||
id_by_name = it_by_name->second->id; | ||
|
||
if (name_collision && !replace_if_exists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (name_collision) | |
id_by_name = it_by_name->second->id; | |
if (name_collision && !replace_if_exists) | |
if (name_collision) | |
id_by_name = it_by_name->second->id; | |
if (!replace_if_exists) | |
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep the code like this for symmetry with later code, see the condition below
if (id_collision && !replace_if_exists)
@@ -59,7 +60,7 @@ ReplicatedAccessStorage::ReplicatedAccessStorage( | |||
if (zookeeper_path.front() != '/') | |||
zookeeper_path = "/" + zookeeper_path; | |||
|
|||
initializeZookeeper(); | |||
initZooKeeperIfNeeded(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be in try-catch
block so CH doesn't fail to startup on unstable ZK connection?
(E.g. operation timeout when root nodes are being created)
All other functions will simply initialize it later on through getZooKeeper
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure on startup can be because of a wrong configuration and in this case it's can be better to stop immediately.
Also ReplicatedAccessStorage can contain some critical users or roles and if they don't appear after start it can cause confusion and even security problems. But yes, you're right connection to ZK can be unstable. I think a simple retry might help a bit with unstable connection without problems related to later initialization.
@@ -46,6 +46,7 @@ ReplicatedAccessStorage::ReplicatedAccessStorage( | |||
, zookeeper_path(zookeeper_path_) | |||
, get_zookeeper(get_zookeeper_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be deleted now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's in use, see ReplicatedAccessStorage::getZooKeeperNoLock()
auto entity = tryReadEntityFromZooKeeper(zookeeper, id); | ||
if (entity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto entity = tryReadEntityFromZooKeeper(zookeeper, id); | |
if (entity) | |
if (auto entity = tryReadEntityFromZooKeeper(zookeeper, id); | |
entity) |
bool exists = zookeeper->tryGetWatch(entity_path, entity_definition, &entity_stat, watch_entity); | ||
if (!exists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool exists = zookeeper->tryGetWatch(entity_path, entity_definition, &entity_stat, watch_entity); | |
if (!exists) | |
if (exists = zookeeper->tryGetWatch(entity_path, entity_definition, &entity_stat, watch_entity); | |
!exists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a bit less readable because the scope of the variable exists
doesn't really matter here.
|
||
|
||
# ReplicatedAccessStorage must be able to continue working after reloading ZooKeeper. | ||
def test_reload_zookeeper(started_cluster): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test where you start CH without ZK?
You can use
with PartitionManager() as pm:
pm.drop_instance_zk_connections(node1)
2d26ab5
to
3c0c41c
Compare
…oading ZooKeeper.
…oryAccessStorage::insertWithID().
…essStorage::insertWithID().
Co-authored-by: Antonio Andelic <[email protected]>
34f3409
to
0ce9ef5
Compare
0ce9ef5
to
b2868cc
Compare
b2868cc
to
e7e51ab
Compare
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Improve recovery of ReplicatedAccessStorage after errors.