Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(zero): add a flag to limit the uid lease per namespace #7568

Merged
merged 17 commits into from
Mar 30, 2021

Conversation

NamanJain8
Copy link
Contributor

@NamanJain8 NamanJain8 commented Mar 15, 2021

Fixes DGRAPH-3124

There might be a scenario where in a multi-tenant cluster a malicious user leases out a large number of UIDs or bump it to a large number, causing an issue for the other tenants.
This PR adds a super flag --limit to zero. This flag limits the number of UIDs (uid-lease=100000;) that can be leased by a namespace within an interval specified by refill-interval=30s;. This PR also changes the xidmap's New() API as we need to pass in the namespace and dgo client.

Along with it, it does some minor cleaning where flag parsing was in the critical path.


This change is Reviewable

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there're a lot of changes. So, reduce the scope.

Reviewed 10 of 18 files at r2, 3 of 3 files at r3, 5 of 6 files at r4.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @NamanJain8 and @vvbalaji-dgraph)


dgraph/cmd/bulk/loader.go, line 192 at r4 (raw file):

	for {
		ctx, cancel := context.WithTimeout(context.Background(), time.Second)
		ctx = x.AttachNamespaceOutgoing(ctx, x.GalaxyNamespace)

We don't need to do this. If Zero doesn't find a namespace, then just assume GalaxyNs.


dgraph/cmd/live/run.go, line 634 at r4 (raw file):

	x.Checkf(err, "Unable to connect to zero, Is it running at %s?", opt.zero)

	creds := z.NewSuperFlag(Live.Conf.GetString("creds")).MergeAndCheckDefault(x.DefaultCreds)

same over here?


dgraph/cmd/live/run.go, line 636 at r4 (raw file):

	creds := z.NewSuperFlag(Live.Conf.GetString("creds")).MergeAndCheckDefault(x.DefaultCreds)
	alloc := xidmap.New(xidmap.XidMapOptions{
		Zero:      connzero,

Might want to rename, if this could be a conn to Alpha.


dgraph/cmd/zero/assign.go, line 184 at r4 (raw file):

	defer span.End()

	validateAndGetToken := func() error {

rateLimit


dgraph/cmd/zero/assign.go, line 186 at r4 (raw file):

	validateAndGetToken := func() error {
		if num.GetType() != pb.Num_UID {
			// We only rate limit lease of UIDs.

same thing would apply to timestamp too.


dgraph/cmd/zero/assign.go, line 194 at r4 (raw file):

			return err
		}
		if num.Val > opts.limit.GetUint64("uid-lease") {

Parse it once, and use it. Don't parse it again and again.


dgraph/cmd/zero/assign.go, line 201 at r4 (raw file):

		if !s.rateLimiter.Allow(ns, num.Val) {
			// Return error after random delay.
			delay := rand.Intn(int(opts.limit.GetDuration("refill-interval")))

parse once. For rand, use a locally generated rand. rand.NewSource rand.NewRand or something.


dgraph/cmd/zero/assign.go, line 203 at r4 (raw file):

			delay := rand.Intn(int(opts.limit.GetDuration("refill-interval")))
			time.Sleep(time.Duration(delay) * time.Second)
			return errors.New("Cannot lease UID because UID lease for the namespace is exhausted." +

for namespace: %#x is exhausted.


dgraph/cmd/zero/assign.go, line 236 at r4 (raw file):

		// pass on the incoming metadata to the zero leader.
		if md, ok := metadata.FromIncomingContext(ctx); ok {
			ctx = metadata.NewOutgoingContext(ctx, md)

Do you need to do this?


dgraph/cmd/zero/http.go, line 71 at r4 (raw file):

	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	// Consider all the request coming through /assign belong to GalaxyNamespace.
	ctx = x.AttachNamespace(ctx, x.GalaxyNamespace)

can remove.


dgraph/cmd/zero/run.go, line 109 at r4 (raw file):

			in an interval specified by refill-interval.`).
		Flag("refill-interval",
			"The interval after which the tokens for UID lease are replenished,").

comma at the end.


x/x.go, line 1413 at r4 (raw file):

// RateLimiter implements a basic rate limiter.
type RateLimiter struct {

There are good libraries which exist already.


x/x.go, line 1425 at r4 (raw file):

	r := &RateLimiter{
		limiter: lockedMap{
			mp: make(map[uint64]uint64),

Use sync.Map for the map, you store the key and the pointer. And use atomics for the values.

And then also try maybe current approach. Do some parallel benchmarking, see which one is faster. In your benchmarks assume that all the namespaces that need to be accessed are already in the map.

Also benchmark with 3rd party solutions.

@NamanJain8 NamanJain8 requested a review from pawanrawal as a code owner March 17, 2021 17:16
@NamanJain8 NamanJain8 marked this pull request as draft March 23, 2021 16:08
Copy link
Contributor Author

@NamanJain8 NamanJain8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @manishrjain from 7 discussions.
Reviewable status: 2 of 17 files reviewed, 7 unresolved discussions (waiting on @manishrjain, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)


dgraph/cmd/bulk/loader.go, line 192 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

We don't need to do this. If Zero doesn't find a namespace, then just assume GalaxyNs.

Thanks. Done.


dgraph/cmd/live/run.go, line 634 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

same over here?

Done.


dgraph/cmd/live/run.go, line 636 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Might want to rename, if this could be a conn to Alpha.

Renamed it to UidAssigner


dgraph/cmd/zero/assign.go, line 184 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

rateLimit

Done.


dgraph/cmd/zero/assign.go, line 194 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Parse it once, and use it. Don't parse it again and again.

Done.


dgraph/cmd/zero/assign.go, line 201 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

parse once. For rand, use a locally generated rand. rand.NewSource rand.NewRand or something.

Not sure about the benefit of it. As the local source is thread-unsafe and we will need to have some extra mechanism to do locking and stuff.


dgraph/cmd/zero/assign.go, line 203 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

for namespace: %#x is exhausted.

Done.


dgraph/cmd/zero/assign.go, line 236 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Do you need to do this?

Yes, this is needed because the metadata from the incoming context is lost when the context is passed to some other server.


dgraph/cmd/zero/http.go, line 71 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

can remove.

Done.


dgraph/cmd/zero/run.go, line 109 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

comma at the end.

Done.


x/x.go, line 1425 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Use sync.Map for the map, you store the key and the pointer. And use atomics for the values.

And then also try maybe current approach. Do some parallel benchmarking, see which one is faster. In your benchmarks assume that all the namespaces that need to be accessed are already in the map.

Also benchmark with 3rd party solutions.

Done benchmarking various approaches (See https://gist.github.com/NamanJain8/5a935fb33cc43b6175d7da66cc7ead70). Sync map with atomics works best. Couldn't find any third-party rate limiter that fits our use case.

@NamanJain8 NamanJain8 marked this pull request as ready for review March 25, 2021 06:06
@NamanJain8 NamanJain8 requested a review from manishrjain March 25, 2021 13:26
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 17 of 21 files at r5.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @manishrjain, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)


dgraph/cmd/alpha/run.go, line 735 at r5 (raw file):

	x.Config.LimitQueryEdge = x.Config.Limit.GetUint64("query-edge")
	x.Config.BlockClusterWideDrop = x.Config.Limit.GetBool("disallow-drop")
	x.Config.LimitNormalizeNode = int(x.Config.Limit.GetInt64("normalize-node"))

try to keep this local.


x/x.go, line 470 at r5 (raw file):

// AttachJWTNamespaceOutgoing attaches the namespace in the JWT claims to the outgoing metadata of
// the context.
func AttachJWTNamespaceOutgoing(ctx context.Context) (context.Context, error) {

ContextFromJWT


x/x.go, line 482 at r5 (raw file):

// AttachNamespaceOutgoing adds given namespace in the outgoing metadata of the context.
func AttachNamespaceOutgoing(ctx context.Context, namespace uint64) context.Context {

func ContextFromNamespace(...)

x/x.go Show resolved Hide resolved
@NamanJain8 NamanJain8 changed the title fix(zero): add a flag to limit the uid lease in a single request fix(zero): add a flag to limit the uid lease per namespace Mar 29, 2021
@NamanJain8 NamanJain8 merged commit 1f84a52 into release/v21.03 Mar 30, 2021
@NamanJain8 NamanJain8 deleted the naman/uid-lease branch March 30, 2021 05:58
aman-bansal pushed a commit that referenced this pull request Apr 8, 2021
)

There might be a scenario where in a multi-tenant cluster a malicious user leases out a large number of UIDs or bump it to a large number, causing an issue for the other tenants.
This PR adds a super flag --limit to zero. This flag limits the number of UIDs (uid-lease=100000;) that can be leased by a namespace within an interval specified by refill-interval=30s;. This PR also changes the xidmap's New() API as we need to pass in the dgo client.

Along with it, it does some minor cleaning where flag parsing was in the critical path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants