Backport #11879, #12214, #12266 to v4.0.x #12273

the-mikedavis · 2024-09-10T19:09:23Z

This is a backport of #11879, #12214 and #12266 and commit fcb90e4 from main to v4.0.x. These are some Khepri related changes that build on the tree reorg in #11225.

Currently we use a combination of `khepri_tx:get_many/1` and then either `khepri_tx:delete/1` or `khepri_tx:delete_many/1`. This isn't a functional change: switching to `khepri_tx_adv:delete_many/1` is essentially equivalent but performs the deletion and lookup all in one command and one traversal of the tree. This should improve performance when deleting many bindings in an exchange.

Currently we delete each exchange one-by-one which requires three commands: the delete itself plus a put and delete for a runtime parameter that acts as a lock to prevent a client from declaring an exchange while it's being deleted. The lock is unnecessary during vhost deletion because permissions are cleared for the vhost before any resources are deleted. We can use a transaction to delete all exchanges and bindings for a vhost in a single command against the Khepri store. This minimizes the number of commands we need to send against the store and therefore the latency of the deletion. In a quick test with a vhost containing only 10,000 exchanges (no bindings, queues, users, etc.), this is an order of magnitude speedup: the prior commit takes 22s to delete the vhost while with this commit the vhost is deleted in 2s.

With the change in the parent commit we no longer set and clear a runtime parameter when deleting an exchange as part of vhost deletion. We need to adapt the `audit_vhost_internal_parameter` test case to test that the parameter is set and cleared when the exchange is deleted instead.

Previously this function threw errors. With this minor refactor we return them instead so that `register_projection/0` is easier for callers to work with. (In the child commit we will add another caller.)

This is a cosmetic change. `?RA_CLUSTER_NAME` is equivalent but is used for clustering commands. Commands sent via the `khepri`/`khepri_adv` APIs consistently use the `?STORE_ID` macro instead.

This causes a clearer error when the `enable_feature_flags/2` function returns something not in the shape `ok | {skip, any()}`.

This covers a specific case where we need to register projections not covered by the enable callback of the `khepri_db` feature flag. The feature flag may be enabled if a node has been part of a cluster which enabled the flag, but the metadata store might be reset. Upon init the feature flag will be enabled but the store will be empty and the projections will not exist, so operations like inserting default data will fail when asserting that a vhost exists for example. This fixes the `cluster_management_SUITE:forget_cluster_node_in_khepri/1` case when running the suite with `RABBITMQ_METADATA_STORE=khepri`, which fails as mentioned above. We could run projection registration always when using Khepri but once projections are registered the command is idempotent so there's no need to, and the commands are somewhat large.

`khepri:fence/0,1,2` queries the leader's Raft index and blocks the caller for the given (or default) timeout until the local member has caught up in log replication to that index. We want to do this during Khepri init to ensure that the local Khepri store is reasonably up to date before continuing in the boot process and starting listeners. This is conceptually similar to the call to `mnesia:wait_for_tables/2` during `rabbit_mnesia:init/0` and should have the same effect.

…docs This function is meant to remove any projections which were mistakenly registered in 3.13.x rather than all existing projections.

Previously about half of the Khepri projection names were pluralized.

Without these there is no indication of unregistering and registering projections.

the-mikedavis added 15 commits September 10, 2024 15:07

rabbit_khepri: Avoid throws in register_projection/0

ffec9f7

Previously this function threw errors. With this minor refactor we return them instead so that `register_projection/0` is easier for callers to work with. (In the child commit we will add another caller.)

rabbit_khepri: Add projection registration to khepri_db ff enable fun

a771263

rabbit_khepri: Unregister all projections when enabling khepri_db ff

3860c1e

rabbit_khepri: Remove projection registration from setup/0

3e42bd5

rabbit_khepri: Use ?STORE_ID for projection registration functions

7c8cc61

This is a cosmetic change. `?RA_CLUSTER_NAME` is equivalent but is used for clustering commands. Commands sent via the `khepri`/`khepri_adv` APIs consistently use the `?STORE_ID` macro instead.

Explicitly match skips when setting metadata store in CT

564a562

This causes a clearer error when the `enable_feature_flags/2` function returns something not in the shape `ok | {skip, any()}`.

Ensure projections are registered in metadata_store_phase1_SUITE

c91c951

rabbit_khepri: Rename legacy projection unregistration function, add …

6bf0aac

…docs This function is meant to remove any projections which were mistakenly registered in 3.13.x rather than all existing projections.

Consistently use singular names for Khepri projections

b992bc0

Previously about half of the Khepri projection names were pluralized.

rabbit_khepri: Add debug logs in khepri_db enable callback

2e7f149

Without these there is no indication of unregistering and registering projections.

the-mikedavis self-assigned this Sep 10, 2024

michaelklishin added this to the 4.0.0 milestone Sep 10, 2024

the-mikedavis merged commit ecdf04d into v4.0.x Sep 10, 2024
291 checks passed

the-mikedavis deleted the md/bp/v4.0.x/khepri-prs branch September 10, 2024 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport #11879, #12214, #12266 to v4.0.x #12273

Backport #11879, #12214, #12266 to v4.0.x #12273

the-mikedavis commented Sep 10, 2024

Backport #11879, #12214, #12266 to v4.0.x #12273

Backport #11879, #12214, #12266 to v4.0.x #12273

Conversation

the-mikedavis commented Sep 10, 2024