-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(zero): add a flag to limit the uid lease per namespace #7568
Conversation
7639a45
to
86c8b7c
Compare
d387418
to
c9d7449
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there're a lot of changes. So, reduce the scope.
Reviewed 10 of 18 files at r2, 3 of 3 files at r3, 5 of 6 files at r4.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @NamanJain8 and @vvbalaji-dgraph)
dgraph/cmd/bulk/loader.go, line 192 at r4 (raw file):
for { ctx, cancel := context.WithTimeout(context.Background(), time.Second) ctx = x.AttachNamespaceOutgoing(ctx, x.GalaxyNamespace)
We don't need to do this. If Zero doesn't find a namespace, then just assume GalaxyNs.
dgraph/cmd/live/run.go, line 634 at r4 (raw file):
x.Checkf(err, "Unable to connect to zero, Is it running at %s?", opt.zero) creds := z.NewSuperFlag(Live.Conf.GetString("creds")).MergeAndCheckDefault(x.DefaultCreds)
same over here?
dgraph/cmd/live/run.go, line 636 at r4 (raw file):
creds := z.NewSuperFlag(Live.Conf.GetString("creds")).MergeAndCheckDefault(x.DefaultCreds) alloc := xidmap.New(xidmap.XidMapOptions{ Zero: connzero,
Might want to rename, if this could be a conn to Alpha.
dgraph/cmd/zero/assign.go, line 184 at r4 (raw file):
defer span.End() validateAndGetToken := func() error {
rateLimit
dgraph/cmd/zero/assign.go, line 186 at r4 (raw file):
validateAndGetToken := func() error { if num.GetType() != pb.Num_UID { // We only rate limit lease of UIDs.
same thing would apply to timestamp too.
dgraph/cmd/zero/assign.go, line 194 at r4 (raw file):
return err } if num.Val > opts.limit.GetUint64("uid-lease") {
Parse it once, and use it. Don't parse it again and again.
dgraph/cmd/zero/assign.go, line 201 at r4 (raw file):
if !s.rateLimiter.Allow(ns, num.Val) { // Return error after random delay. delay := rand.Intn(int(opts.limit.GetDuration("refill-interval")))
parse once. For rand, use a locally generated rand. rand.NewSource rand.NewRand or something.
dgraph/cmd/zero/assign.go, line 203 at r4 (raw file):
delay := rand.Intn(int(opts.limit.GetDuration("refill-interval"))) time.Sleep(time.Duration(delay) * time.Second) return errors.New("Cannot lease UID because UID lease for the namespace is exhausted." +
for namespace: %#x is exhausted.
dgraph/cmd/zero/assign.go, line 236 at r4 (raw file):
// pass on the incoming metadata to the zero leader. if md, ok := metadata.FromIncomingContext(ctx); ok { ctx = metadata.NewOutgoingContext(ctx, md)
Do you need to do this?
dgraph/cmd/zero/http.go, line 71 at r4 (raw file):
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) // Consider all the request coming through /assign belong to GalaxyNamespace. ctx = x.AttachNamespace(ctx, x.GalaxyNamespace)
can remove.
dgraph/cmd/zero/run.go, line 109 at r4 (raw file):
in an interval specified by refill-interval.`). Flag("refill-interval", "The interval after which the tokens for UID lease are replenished,").
comma at the end.
x/x.go, line 1413 at r4 (raw file):
// RateLimiter implements a basic rate limiter. type RateLimiter struct {
There are good libraries which exist already.
x/x.go, line 1425 at r4 (raw file):
r := &RateLimiter{ limiter: lockedMap{ mp: make(map[uint64]uint64),
Use sync.Map for the map, you store the key and the pointer. And use atomics for the values.
And then also try maybe current approach. Do some parallel benchmarking, see which one is faster. In your benchmarks assume that all the namespaces that need to be accessed are already in the map.
Also benchmark with 3rd party solutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @manishrjain from 7 discussions.
Reviewable status: 2 of 17 files reviewed, 7 unresolved discussions (waiting on @manishrjain, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)
dgraph/cmd/bulk/loader.go, line 192 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
We don't need to do this. If Zero doesn't find a namespace, then just assume GalaxyNs.
Thanks. Done.
dgraph/cmd/live/run.go, line 634 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
same over here?
Done.
dgraph/cmd/live/run.go, line 636 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Might want to rename, if this could be a conn to Alpha.
Renamed it to UidAssigner
dgraph/cmd/zero/assign.go, line 184 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
rateLimit
Done.
dgraph/cmd/zero/assign.go, line 194 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Parse it once, and use it. Don't parse it again and again.
Done.
dgraph/cmd/zero/assign.go, line 201 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
parse once. For rand, use a locally generated rand. rand.NewSource rand.NewRand or something.
Not sure about the benefit of it. As the local source is thread-unsafe and we will need to have some extra mechanism to do locking and stuff.
dgraph/cmd/zero/assign.go, line 203 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
for namespace: %#x is exhausted.
Done.
dgraph/cmd/zero/assign.go, line 236 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Do you need to do this?
Yes, this is needed because the metadata from the incoming context is lost when the context is passed to some other server.
dgraph/cmd/zero/http.go, line 71 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
can remove.
Done.
dgraph/cmd/zero/run.go, line 109 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
comma at the end.
Done.
x/x.go, line 1425 at r4 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Use sync.Map for the map, you store the key and the pointer. And use atomics for the values.
And then also try maybe current approach. Do some parallel benchmarking, see which one is faster. In your benchmarks assume that all the namespaces that need to be accessed are already in the map.
Also benchmark with 3rd party solutions.
Done benchmarking various approaches (See https://gist.github.com/NamanJain8/5a935fb33cc43b6175d7da66cc7ead70). Sync map with atomics works best. Couldn't find any third-party rate limiter that fits our use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 21 files at r5.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @manishrjain, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)
dgraph/cmd/alpha/run.go, line 735 at r5 (raw file):
x.Config.LimitQueryEdge = x.Config.Limit.GetUint64("query-edge") x.Config.BlockClusterWideDrop = x.Config.Limit.GetBool("disallow-drop") x.Config.LimitNormalizeNode = int(x.Config.Limit.GetInt64("normalize-node"))
try to keep this local.
x/x.go, line 470 at r5 (raw file):
// AttachJWTNamespaceOutgoing attaches the namespace in the JWT claims to the outgoing metadata of // the context. func AttachJWTNamespaceOutgoing(ctx context.Context) (context.Context, error) {
ContextFromJWT
x/x.go, line 482 at r5 (raw file):
// AttachNamespaceOutgoing adds given namespace in the outgoing metadata of the context. func AttachNamespaceOutgoing(ctx context.Context, namespace uint64) context.Context {
func ContextFromNamespace(...)
) There might be a scenario where in a multi-tenant cluster a malicious user leases out a large number of UIDs or bump it to a large number, causing an issue for the other tenants. This PR adds a super flag --limit to zero. This flag limits the number of UIDs (uid-lease=100000;) that can be leased by a namespace within an interval specified by refill-interval=30s;. This PR also changes the xidmap's New() API as we need to pass in the dgo client. Along with it, it does some minor cleaning where flag parsing was in the critical path.
Fixes DGRAPH-3124
There might be a scenario where in a multi-tenant cluster a malicious user leases out a large number of UIDs or bump it to a large number, causing an issue for the other tenants.
This PR adds a super flag
--limit
to zero. This flag limits the number of UIDs (uid-lease=100000;
) that can be leased by a namespace within an interval specified byrefill-interval=30s;
. This PR also changes the xidmap'sNew()
API as we need to pass in the namespace and dgo client.Along with it, it does some minor cleaning where flag parsing was in the critical path.
This change is