Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rate-limiters to ServerPolicy #3305

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Oct 29, 2024

This adds the local_rate_limit module to the server-policy crate, that
ServerPolicy uses for its new local_rate_limit field, containing
three optional rate-limiters: total, identity, overrides (this one is
really a vector of limiters, one per configured override).

I tried putting that under Protocol instead, but the PartialEq
requirement made it very hard to follow. Server OTOH doesn't really
require that trait, so I was able to remove it and accommodate the
limiters.

I made sure to avoid pulling the dashmap dependency in governor; I
haven't checked yet the necessity of the "jitter" and "quanta" features.

This temporarily overrides linkerd2-proxy-api dependency to pick changes
from linkerd/linkerd2-proxy-api#388

@alpeb alpeb requested a review from a team as a code owner October 29, 2024 17:40
@alpeb alpeb marked this pull request as draft October 29, 2024 17:59
@alpeb alpeb force-pushed the alpeb/server-policy-rate-limiter branch 5 times, most recently from 0536b1f to 8636f8e Compare October 29, 2024 22:19
This adds the local_rate_limit module to the server-policy crate, that
`ServerPolicy` uses for its new `local_rate_limit` field, containing
three optional rate-limiters: total, identity, overrides (this one is
really a vector of limiters, one per configured override).

I tried putting that under `Protocol` instead, but the `PartialEq`
requirement made it very hard to follow. `Server` OTOH doesn't really
require that trait, so I was able to remove it and accommodate the
limiters.

I made sure to avoid pulling the dashmap dependency in `governor`; I
haven't checked yet the necessity of the "jitter" and "quanta" features.

This temporarily overrides linkerd2-proxy-api dependency to pick changes
from linkerd/linkerd2-proxy-api#388
@alpeb alpeb force-pushed the alpeb/server-policy-rate-limiter branch from 8636f8e to b580e65 Compare October 29, 2024 22:19
linkerd/proxy/server-policy/Cargo.toml Show resolved Hide resolved
Comment on lines +8 to +19
#[derive(Debug, Default)]
pub struct HttpLocalRateLimit {
pub total: Option<RateLimiter<NotKeyed, InMemoryState, DefaultClock>>,
pub identity: Option<RateLimiter<String, DefaultKeyedStateStore<String>, DefaultClock>>,
pub overrides: Vec<HttpLocalRateLimitOverride>,
}

#[derive(Debug)]
pub struct HttpLocalRateLimitOverride {
pub ids: Vec<String>,
pub rate_limit: RateLimiter<Vec<String>, DefaultKeyedStateStore<Vec<String>>, DefaultClock>,
}
Copy link
Member

@olix0r olix0r Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think we can maintain a narrower API--no need to expose all of the governor types
  • It would be preferable to use the Identity types directly here rather than require String round-tripping (allocations).
  • I think the override rate limiter is misconfigured in your version--it should not be keyed on Vec.
  • Furthermore, it seems preferable to store the overrides for efficient lookup (i.e. in a HashMap).
Suggested change
#[derive(Debug, Default)]
pub struct HttpLocalRateLimit {
pub total: Option<RateLimiter<NotKeyed, InMemoryState, DefaultClock>>,
pub identity: Option<RateLimiter<String, DefaultKeyedStateStore<String>, DefaultClock>>,
pub overrides: Vec<HttpLocalRateLimitOverride>,
}
#[derive(Debug)]
pub struct HttpLocalRateLimitOverride {
pub ids: Vec<String>,
pub rate_limit: RateLimiter<Vec<String>, DefaultKeyedStateStore<Vec<String>>, DefaultClock>,
}
#[derive(Debug, Default)]
pub struct LocalRateLimit {
total: Option<RateLimit<InMemoryState>>,
per_identity: Option<RateLimit<DefaultKeyedStateStore<Option<Id>>>>,
overrides: HashMap<Id, Arc<RateLimit<InMemoryState>>>,
}
#[derive(Debug)]
struct RateLimit<S: StateStore> {
rps: NonZeroU32,
limiter: RateLimiter<S::Key, S, DefaultClock>,
}
#[derive(Debug, thiserror::Error, PartialEq, Eq)]
pub enum RateLimitError {
#[error("total rate limit exceeded: {0}rps")]
Total(NonZeroU32),
#[error("per-identity rate limit exceeded: {0}rps")]
PerIdentity(NonZeroU32),
#[error("override rate limit exceeded: {0}rps")]
Override(NonZeroU32),
}
impl LocalRateLimit {
    pub fn check(&self, id: Option<&Id>) -> Result<(), RateLimitError> {
        if let Some(lib) = &self.total {
            if lim.limiter.check().is_err() {
                return Err(RateLimitError::Total(lim.rps));
            }
        }

        if let Some(id) = id {
            if let Some(lim) = self.overrides.get(id) {
                if lim.limiter.check().is_err() {
                    return Err(RateLimitError::Override(lim.rps));
                }
                return Ok(());
            }
        }

        if let Some(lim) = &self.per_identity {
            if limiter.check_key(&lim.id.cloned()).is_err() {
                return Err(RateLimitError::PerIdentity(lim.rps));
            }
        }

        Ok(())
    }
}

The server-policy crate can hide all of the specifics about rate limiting and let callers have a very simple interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the override rate limiter is misconfigured in your version--it should not be keyed on Vec.

As per linkerd/linkerd2#13231, an HTTPLocalRateLimitPolicy would look something like this:

apiVersion: policy.linkerd.io/v1alpha1
kind: HTTPLocalRateLimitPolicy
metadata:
  namespace: emojivoto
  name: web-rl
spec:
  targetRef:
    group: policy.linkerd.io
    kind: Server
    name: web-http
  total:
    requestsPerSecond: 100
  identity:
    requestsPerSecond: 20
  overrides:
  - requestsPerSecond: 10
    clientRefs:
    - kind: ServiceAccount
      namespace: emojivoto
      name: default

Under overrides, I implemented the clientRefs as an array, which explains the types I was hinting at (I may have made that an array after our conversations, sorry about that). It's definitely simpler to have it be just one object though like in your suggestion, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The override representation in the YAML doesn't dictate how we store them at runtime.

Also note that I interpret the overrides to support "pooled" limits. For example:

  - requestsPerSecond: 10
    clientRefs:
    - kind: ServiceAccount
      namespace: emojivoto
      name: web
    clientRefs:
    - kind: ServiceAccount
      namespace: emojivoto
      name: default

In this situation I would expect these two accounts to share a rate limit of 10RPS. This is distinct from the configuration in which they have distinct but identical limits:

  - requestsPerSecond: 10
    clientRefs:
    - kind: ServiceAccount
      namespace: emojivoto
      name: web
  - requestsPerSecond: 10
    clientRefs:
    - kind: ServiceAccount
      namespace: emojivoto
      name: default

I imagine that we could build a map of overrides from the existing spec:

            let overrides = proto
                .overrides
                .into_iter()
                .flat_map(|ovr| {
                    let Some(lim) = ovr.limit else {
                        return vec![];
                    };

                    let Some(rps) = NonZeroU32::new(lim.requests_per_second) else {
                        return vec![];
                    };
                    let limiter = RateLimiter::direct(Quota::per_second(rps));
                    let limit = Arc::new(RateLimit { rps, limiter });

                    ovr.clients
                        .into_iter()
                        .flat_map(|cl| {
                            cl.identities
                                .into_iter()
                                .filter_map(|id| id.name.parse::<Id>().ok())
                        })
                        .map(move |id| (id, limit.clone()))
                        .collect::<Vec<(Id, Arc<RateLimit<InMemoryState>>)>>()
                })
                .collect::<HashMap<Id, Arc<RateLimit<InMemoryState>>>>();

This supports shared limits and efficient map-based lookups.

Comment on lines +37 to +63
let total = proto.total.map(|lim| {
let quota = Quota::per_second(NonZeroU32::new(lim.requests_per_second).unwrap());
RateLimiter::direct(quota)
});

let identity = proto.identity.map(|lim| {
let quota = Quota::per_second(NonZeroU32::new(lim.requests_per_second).unwrap());
RateLimiter::keyed(quota)
});

let overrides = proto
.overrides
.into_iter()
.flat_map(|ovr| {
ovr.limit.map(|lim| {
let ids = ovr
.clients
.into_iter()
.flat_map(|cl| cl.identities.into_iter().map(|id| id.name))
.collect();
let quota =
Quota::per_second(NonZeroU32::new(lim.requests_per_second).unwrap());
let rate_limit = RateLimiter::keyed(quota);
HttpLocalRateLimitOverride { ids, rate_limit }
})
})
.collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we cannot call unwrap (or expect) when handling input from the control plane. We need to make sure the None-ness propagates to cause the item to be ignored:

Suggested change
let total = proto.total.map(|lim| {
let quota = Quota::per_second(NonZeroU32::new(lim.requests_per_second).unwrap());
RateLimiter::direct(quota)
});
let identity = proto.identity.map(|lim| {
let quota = Quota::per_second(NonZeroU32::new(lim.requests_per_second).unwrap());
RateLimiter::keyed(quota)
});
let overrides = proto
.overrides
.into_iter()
.flat_map(|ovr| {
ovr.limit.map(|lim| {
let ids = ovr
.clients
.into_iter()
.flat_map(|cl| cl.identities.into_iter().map(|id| id.name))
.collect();
let quota =
Quota::per_second(NonZeroU32::new(lim.requests_per_second).unwrap());
let rate_limit = RateLimiter::keyed(quota);
HttpLocalRateLimitOverride { ids, rate_limit }
})
})
.collect();
let total = proto.total.and_then(|lim| {
let rps = NonZeroU32::new(lim.requests_per_second)?;
let limiter = RateLimiter::direct(Quota::per_second(rps));
Some(RateLimit { rps, limiter })
});
let per_identity = proto.identity.and_then(|lim| {
let rps = NonZeroU32::new(lim.requests_per_second)?;
let limiter = RateLimiter::hashmap(Quota::per_second(rps));
Some(RateLimit { rps, limiter })
});
let overrides = proto
.overrides
.into_iter()
.flat_map(|ovr| {
let Some(lim) = ovr.limit else {
return vec![];
};
let Some(rps) = NonZeroU32::new(lim.requests_per_second) else {
return vec![];
};
let limiter = RateLimiter::direct(Quota::per_second(rps));
let limit = Arc::new(RateLimit { rps, limiter });
ovr.clients
.into_iter()
.flat_map(|cl| {
cl.identities
.into_iter()
.filter_map(|id| id.name.parse::<Id>().ok())
})
.map(move |id| (id, limit.clone()))
.collect::<Vec<(Id, Arc<RateLimit<InMemoryState>>)>>()
})
.collect::<HashMap<Id, Arc<RateLimit<InMemoryState>>>>();

linkerd/proxy/server-policy/src/lib.rs Show resolved Hide resolved
@alpeb alpeb mentioned this pull request Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants