Skip to content

Latest commit

 

History

History
124 lines (90 loc) · 6.61 KB

pip-369.md

File metadata and controls

124 lines (90 loc) · 6.61 KB

PIP-369: Flag based selective unload on changing ns-isolation-policy

Background knowledge

In Apache Pulsar, namespace isolation policies are used to limit the ownership of certain subsets of namespaces to specific broker groups. These policies are defined using regular expressions to match namespaces and specify primary and secondary broker groups along with failover policy configurations. This ensures that the ownership of the specified namespaces is restricted to the designated broker groups.

For more information, refer to the Pulsar documentation on namespace isolation.

History/Context

In Apache Pulsar 2.7.1+, there was a flag introduced (enableNamespaceIsolationUpdateOnTime) that controlled whether to unload namespaces or not when a namespace isolation policy is applied. #8976

Later on, in 2.11, rework was done as part of PIP-149 to make get/set isolationData calls async, which resulted in namespaces to always get unloaded irrespective of enableNamespaceIsolationUpdateOnTime config, not adhering to this config at all.

And now in 3.3, enableNamespaceIsolationUpdateOnTime broker config was deprecated as it no longer serves any purpose. #22449

Motivation

In Apache Pulsar 3.x, changing a namespace isolation policy results in unloading all namespace bundles that match the namespace's regular expression provided in the isolation policy. This can be problematic for cases where the regex matches a large subset of namespaces, such as tenant-x/.*. One of such case is mentioned on this issue #23092 where policy change resulted in 100+ namespace bundles to get unloaded. And broker exhausted all the available connections due to too many unload calls happening at once resulting in 5xx response. Other issues that happens with this approach are huge latency spikes as topics are unavailable until bundles are loaded back, increasing the pending produce calls. The only benefit this approach serves is ensuring that all the namespaces matching the policy regex will come to correct broker group. But when namespace bundles are already on the correct broker group (according to the policy), unloading those namespaces doesn't serve any purpose.

This PIP aims to address the need to either prevent unnecessary unloading or provide a more granular approach to determine what should be unloaded.

Some of the cases covered by this PIP are discussed in #23094 by @grssam.

  • unload nothing as part of the set policy call
  • unload every matching namespace as part of the policy set call
  • unload only the changed namespaces (newly added + removed)

Goals

In Scope

This PIP proposes a flag-based approach to control what should be unloaded when an isolation policy is applied. The possible values for this flag are:

  • all_matching: Unload all the namespaces that matches either old or new policy change.
  • changed: Only unload namespaces that are either added or removed due to the policy change.
  • none: Do not unload anything. Unloading can occur naturally due to load balancing or can be done manually using the unload admin call.

This flag will be a part of isolation policy data with defaults. Objective is to keep the default behavior unchanged on applying the new policy.

Out of Scope

Applying concurrency reducer to limit how many async calls will happen in parallel is out of the scope for this PIP. This should be addressed in a separate PIP, as solving the issue of infinite asynchronous calls probably requires changes to broker configurations and is a problem present in multiple areas.

Detailed Design

Design & Implementation Details

A new flag will be introduced in NamespaceIsolationData.

enum UnloadScope {
    all_matching,        // unloads everything, OLD ⋃ NEW
    changed,    // unload namespaces delta, (new ⋃ old) - (new ∩ old) 
    none,       // skip unloading anything, ϕ
};

Filters will be added based on the above when namespaces are selected for unload in set policy call. UnloadScope.all_matching will be the default in current version.

NOTE:
For 3.x unchanged behaviour, the all_matching UnloadScope option should only unload namespaces matching new policy (NEW). This matches the current behavior and maintains backward compatibility from implementation POV.

For 4.x,

  1. The behaviour for the all_matching flag should change to unload everything matching either the old or new policy (union of both).
  2. The default flag value should be changed, so accidentally missing this flag while applying the policy shouldn't impact workloads already on the correct broker group.

Public API

A new flag will be added in the NamespaceIsolationData. This changes the request body when set policy API is called. To keep things backwards compatible, unload_scope will be optional. API params will remain unchanged.

Path: /{cluster}/namespaceIsolationPolicies/{policyName}

{
  "policy-name": {
    "namespaces": [...],
    "primary": [...],
    "secondary": [...],
    "auto_failover_policy": {
      ...
    },
    "unload_scope": "all_matching|changed|none"
  }
}

CLI

# set call will have an optional flag. Sample command as shown below:
#
pulsar-admin ns-isolation-policy set cluster-name policy-name --unload-scope none
# Possible values for unload-scope: [all_matching, changed, none]

Backward & Forward Compatibility

Added flag is optional, that doesn't require any changes to pre-existing policy data. If the flag is not present then default value shall be considered.

Alternatives

Boolean flag passed during set policy call to either unload the delta namespaces (removed and added) without affecting unchanged namespaces or unload nothing. PR: #23094

Limitation: This approach does not consider cases where unloading is needed for every matching namespace as part of the policy set call. Manual unloading would be required for unchanged namespaces not on the correct broker group.

Links

PS: This PIP should get cherry-picked to 3.0.x as it provides a way to resolve the bug mentioned at #23092 which exist in all the production system today.