Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved latency in live loader using conflict resolution at client level #4362

Merged
merged 31 commits into from
Dec 18, 2019

Conversation

harshil-goel
Copy link
Contributor

@harshil-goel harshil-goel commented Dec 5, 2019

The live loader takes about 1 hour for 21 million dataset when run with -c=1. But when we run the same benchmark for c=10, the transactions are conflicting between themselves. This increases the time taken to process the nquads exponentially (~1 day).

Benchmarks for master at c=1
Number of TXs run : 21240 Number of N-Quads processed : 21239870 Time spent : 1h8m12.999735722s N-Quads processed per second : 5190

This PR brings in a conflict detection at the client level, using which we can decide when to send the request. We also implemented a heuristic, that batches conflicting NQuads together so that there are less conflicts to start with.
Benchmark for this PR at c = 10
Number of TXs run : 21240 Number of N-Quads processed : 21239870 Time spent : 23m42.685914685s N-Quads processed per second : 14936

This change is Reviewable

return true
}

func (l *loader) removeMap(req api.Mutation) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func (*loader).removeMap is unused (from unused)

@@ -190,15 +201,24 @@ func (l *loader) processLoadFile(ctx context.Context, rd *bufio.Reader, ck chunk
if len(nqs) == 0 {
continue
}
remainingNqs := make([]*api.NQuad, 0, 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S1019: should use make([]*api.NQuad, 0) instead (from gosimple)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a map[uint64]struct{}. When a txn starts, it would acquire lock, loop over the keys to check. If they all check out, then do the writes, then release lock. When a txn is done, then delete the keys.

If a req has conflict, block and loop repeatedly until it can proceed.

After further discussion, we decided to have a local buffer of size N in each goroutine. We can then test out N = 1 and N = inf, to see how the live loader performs.

Reviewable status: 0 of 2 files reviewed, 4 unresolved discussions (waiting on @harshil-goel)


dgraph/cmd/live/batch.go, line 78 at r2 (raw file):

	threadId uint64

	currentUIDS map[string]uint64

Keys can be uint64 and vals can be bool, I think.


dgraph/cmd/live/batch.go, line 170 at r2 (raw file):

func (l *loader) loadOrStore(id string, threadId uint64) bool {
	if val, ok := l.currentUIDS[id]; !(ok && val != threadId) {

Don't need thread id thing.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 11 unresolved discussions (waiting on @golangcibot and @harshil-goel)


dgraph/cmd/live/batch.go, line 212 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

func (*loader).removeMap is unused (from unused)

removeConflictKeys


dgraph/cmd/live/batch.go, line 78 at r4 (raw file):

	start time.Time

	currentUIDS map[uint64]struct{}

conflicts


dgraph/cmd/live/batch.go, line 139 at r4 (raw file):

			atomic.AddUint64(&l.nquads, uint64(len(req.Set)))
			atomic.AddUint64(&l.txns, 1)
			l.removeMap(req)

l.deregister(req)


dgraph/cmd/live/batch.go, line 139 at r4 (raw file):

			atomic.AddUint64(&l.nquads, uint64(len(req.Set)))
			atomic.AddUint64(&l.txns, 1)
			l.removeMap(req)

Do a defer upfront?


dgraph/cmd/live/batch.go, line 169 at r4 (raw file):

}

func (l *loader) writeMap(req *api.Mutation) bool {

addConflictKeys


dgraph/cmd/live/batch.go, line 176 at r4 (raw file):

	for _, i := range req.Set {
		objectKey := farm.Fingerprint64([]byte(i.ObjectId))

Conflict keys look at pred, subject. And then other things to generate more keys.

Better to generate the list of conflict keys upfront, before acquiring locks. Then acquire locks, do the checks, etc. Can even make it part of the request.


dgraph/cmd/live/batch.go, line 184 at r4 (raw file):

		mSlice = append(mSlice, objectKey)

		if !reversePreds[i.Predicate] {

No reverse pred stuff. Use conflict keys.


dgraph/cmd/live/run.go, line 145 at r4 (raw file):

	for _, schemaLine := range strings.Split(string(b), "\n") {
		if !strings.Contains(schemaLine, "@reverse") {

Remove all this special handling.

@arijitAD arijitAD force-pushed the harshil-goel/remove-conflict-uids branch from 72f4463 to 6cc0cbd Compare December 9, 2019 17:39
@@ -159,12 +167,99 @@ func (l *loader) request(req api.Mutation, reqNum uint64) {
go l.infinitelyRetry(req, reqNum)
}

func (l *loader) print(req api.Mutation) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func (*loader).print is unused (from unused)


// TLS configuration
x.RegisterClientTLSFlags(flag)
}

// processSchemaFile process schema for a given gz file.
func processSchemaFile(ctx context.Context, file string, dgraphClient *dgo.Dgraph) error {
func processSchemaFile(ctx context.Context, file string, dgraphClient *dgo.Dgraph) (*schema.ParsedSchema, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 114 characters (from lll)

op := &api.Operation{}
op.Schema = string(b)
return dgraphClient.Alter(ctx, op)
sch, err := schema.Parse(op.Schema)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ineffectual assignment to err (from ineffassign)

fmt.Println("here")
}
txn := dgraphClient.NewTxn()
defer txn.Discard(ctx)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of txn.Discard is not checked (from errcheck)

}

var sch LiveSchema
json.Unmarshal(res.GetJson(), &sch)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of json.Unmarshal is not checked (from errcheck)

bufferSize int
}

type LivePredicate struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type name will be used as live.LivePredicate by other packages, and that stutters; consider calling this Predicate (from golint)

ValueType types.TypeID
}

type LiveSchema struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type name will be used as live.LiveSchema by other packages, and that stutters; consider calling this Schema (from golint)

@harshil-goel harshil-goel marked this pull request as ready for review December 12, 2019 15:26
@harshil-goel harshil-goel requested review from martinmr and a team as code owners December 12, 2019 15:26
Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r8, 3 of 5 files at r9, 2 of 6 files at r11.
Reviewable status: all files reviewed, 32 unresolved discussions (waiting on @golangcibot, @harshil-goel, and @martinmr)


dgraph/cmd/live/batch.go, line 55 at r11 (raw file):

	PrintCounters bool
	MaxRetries    uint32
	bufferSize    int

Add a comment about how this is different from Size.


dgraph/cmd/live/batch.go, line 175 at r11 (raw file):

}

func typeValFrom(val *api.Value) (types.Val, error) {

Just expose this function in the gql package and reuse from there.


dgraph/cmd/live/batch.go, line 255 at r11 (raw file):

}

func (l *loader) getConflictKeys(nq *api.NQuad) []uint64 {

Would have liked this function to be shared between server and client but looks like that is not easily possible?


dgraph/cmd/live/batch.go, line 320 at r11 (raw file):

func (l *loader) getConflicts(req *api.Mutation) []uint64 {
	keys := make([]uint64, 0, len(req.Set))
	for _, i := range req.Set {

Call it nquad and not i, i is for index.


dgraph/cmd/live/batch.go, line 327 at r11 (raw file):

func (l *loader) addConflictKeys(req *api.Mutation) bool {
	mSlice := l.getConflicts(req)

Whats mSlice? Call it keys or conflictKeys.


dgraph/cmd/live/batch.go, line 332 at r11 (raw file):

	defer l.uidsLock.Unlock()

	for _, val := range mSlice {
for _, key := range conflictKeys {

dgraph/cmd/live/batch.go, line 346 at r11 (raw file):

func (l *loader) deregister(req *api.Mutation) {
	mSlice := l.getConflicts(req)

This calculation would happen multiple times, could we cache this info somewhere after the first time its calculated?


dgraph/cmd/live/batch.go, line 371 at r11 (raw file):

		}

		for len(buffer) >= l.opts.bufferSize-1 {

Why have -1 when you have >=?


dgraph/cmd/live/batch.go, line 387 at r11 (raw file):

	for len(buffer) > 0 {
		i := 0

All the code below this is repeat of the code above, please store it in a sub-function within makeRequests and call that at both these places.


dgraph/cmd/live/run.go, line 151 at r11 (raw file):

		md.Append("auth-token", opt.authToken)
		ctx = metadata.NewOutgoingContext(ctx, md)
		fmt.Println("here")

remove this


dgraph/cmd/live/run.go, line 259 at r11 (raw file):

			buffer = append(buffer, nqs...)

			if len(buffer) >= opt.bufferSize*opt.batchSize {

if len < ... {
continue
}

// rest of the logic


dgraph/cmd/live/run.go, line 259 at r11 (raw file):

			buffer = append(buffer, nqs...)

			if len(buffer) >= opt.bufferSize*opt.batchSize {

Is the first value supposed to be opt.bufferSize or the concurrency?


dgraph/cmd/live/run.go, line 265 at r11 (raw file):

					t := func(a *predicate) int {
						if a == nil {

The idea here just seems to be to have the predicates with Count at last. Could we simplified to.

if a != nil && a.Count {
  return 1
}
return 0

Also add a comment talking about what you are doing here, that is keeping predicates with count index after those without it. And then also sorting predicates by their name.


dgraph/cmd/live/run.go, line 295 at r11 (raw file):

		}

		for len(buffer) > 0 {

again repeat logic, can be put in a function called drain or something.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address @pawanrawal ’s comments. I can review this tomorrow in the office.

Reviewable status: all files reviewed, 32 unresolved discussions (waiting on @golangcibot, @harshil-goel, and @martinmr)

Copy link
Contributor Author

@harshil-goel harshil-goel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 32 unresolved discussions (waiting on @golangcibot, @harshil-goel, @manishrjain, @martinmr, and @pawanrawal)


dgraph/cmd/live/batch.go, line 212 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

removeConflictKeys

Done.


dgraph/cmd/live/batch.go, line 78 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Keys can be uint64 and vals can be bool, I think.

Done.


dgraph/cmd/live/batch.go, line 170 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't need thread id thing.

Done.


dgraph/cmd/live/batch.go, line 78 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

conflicts

Done.


dgraph/cmd/live/batch.go, line 139 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

l.deregister(req)

Done.


dgraph/cmd/live/batch.go, line 139 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Do a defer upfront?

Done.


dgraph/cmd/live/batch.go, line 169 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

addConflictKeys

Done.


dgraph/cmd/live/batch.go, line 176 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Conflict keys look at pred, subject. And then other things to generate more keys.

Better to generate the list of conflict keys upfront, before acquiring locks. Then acquire locks, do the checks, etc. Can even make it part of the request.

Done.


dgraph/cmd/live/batch.go, line 184 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

No reverse pred stuff. Use conflict keys.

Done.


dgraph/cmd/live/batch.go, line 170 at r6 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

func (*loader).print is unused (from unused)

Done.


dgraph/cmd/live/batch.go, line 55 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Add a comment about how this is different from Size.

Done.


dgraph/cmd/live/batch.go, line 255 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Would have liked this function to be shared between server and client but looks like that is not easily possible?

A lot of the data structures are different and since there are no generics, we will have to rewrite those functions.


dgraph/cmd/live/batch.go, line 320 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Call it nquad and not i, i is for index.

Done.


dgraph/cmd/live/batch.go, line 327 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Whats mSlice? Call it keys or conflictKeys.

Done.


dgraph/cmd/live/batch.go, line 332 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…
for _, key := range conflictKeys {

Done.


dgraph/cmd/live/batch.go, line 371 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Why have -1 when you have >=?

Done.


dgraph/cmd/live/batch.go, line 387 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

All the code below this is repeat of the code above, please store it in a sub-function within makeRequests and call that at both these places.

Done.


dgraph/cmd/live/run.go, line 204 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

S1019: should use make([]*api.NQuad, 0) instead (from gosimple)

Done.


dgraph/cmd/live/run.go, line 145 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove all this special handling.

Done.


dgraph/cmd/live/run.go, line 118 at r7 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 114 characters (from lll)

Done.


dgraph/cmd/live/run.go, line 155 at r7 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

ineffectual assignment to err (from ineffassign)

Done.


dgraph/cmd/live/run.go, line 70 at r8 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

type name will be used as live.LivePredicate by other packages, and that stutters; consider calling this Predicate (from golint)

Done.


dgraph/cmd/live/run.go, line 82 at r8 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

type name will be used as live.LiveSchema by other packages, and that stutters; consider calling this Schema (from golint)

Done.


dgraph/cmd/live/run.go, line 151 at r8 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

Error return value of txn.Discard is not checked (from errcheck)

Done.


dgraph/cmd/live/run.go, line 159 at r8 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

Error return value of json.Unmarshal is not checked (from errcheck)

Done.


dgraph/cmd/live/run.go, line 151 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

remove this

Done.


dgraph/cmd/live/run.go, line 259 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

if len < ... {
continue
}

// rest of the logic

Done.


dgraph/cmd/live/run.go, line 259 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Is the first value supposed to be opt.bufferSize or the concurrency?

It is supposed to be opt.bufferSize. It's just the count of the number of requests that a thread can store in memory.


dgraph/cmd/live/run.go, line 265 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

The idea here just seems to be to have the predicates with Count at last. Could we simplified to.

if a != nil && a.Count {
  return 1
}
return 0

Also add a comment talking about what you are doing here, that is keeping predicates with count index after those without it. And then also sorting predicates by their name.

Done.


dgraph/cmd/live/run.go, line 295 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

again repeat logic, can be put in a function called drain or something.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Looks great. Add a lot of comments. This PR's description should be pretty elaborate about what and why and how. Also, mention the benchmarks that you ran and how things are improved.

Got a bunch of comments to address, so do those. Also, wait for @pawanrawal 's LGTM.

Reviewed 1 of 3 files at r8, 3 of 5 files at r9, 3 of 3 files at r12.
Reviewable status: all files reviewed, 52 unresolved discussions (waiting on @golangcibot, @harshil-goel, @manishrjain, @martinmr, and @pawanrawal)


dgraph/cmd/live/batch.go, line 55 at r11 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Done.

could this be a const? If so, can be declared outside.


dgraph/cmd/live/batch.go, line 346 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

This calculation would happen multiple times, could we cache this info somewhere after the first time its calculated?

Don't calculate again. Store the keys in a derived struct. All refs to api.Mutation, can now be a request object or something.


dgraph/cmd/live/batch.go, line 85 at r12 (raw file):

	start time.Time

	conflicts map[uint64]bool

Maybe make it a struct{}


dgraph/cmd/live/batch.go, line 91 at r12 (raw file):

	reqs     chan api.Mutation
	zeroconn *grpc.ClientConn
	sch      *schema

schema


dgraph/cmd/live/batch.go, line 216 at r12 (raw file):

		Facets: nq.Facets,
	}
	val, err := getTypeVal(nq.ObjectValue)

if err != nil { return ... }
// do something.


dgraph/cmd/live/batch.go, line 237 at r12 (raw file):

}

func (l *loader) getConflictKeys(nq *api.NQuad) []uint64 {

return error.


dgraph/cmd/live/batch.go, line 237 at r12 (raw file):

}

func (l *loader) getConflictKeys(nq *api.NQuad) []uint64 {

conflictKeysForNQuad(nq)


dgraph/cmd/live/batch.go, line 238 at r12 (raw file):

func (l *loader) getConflictKeys(nq *api.NQuad) []uint64 {
	sid, _ := strconv.ParseUint(nq.Subject, 0, 64)

error? If so, at least capture it and log it.


dgraph/cmd/live/batch.go, line 253 at r12 (raw file):

	keys := make([]uint64, 0, 1)

extra vertical space.


dgraph/cmd/live/batch.go, line 267 at r12 (raw file):

	if pred.Reverse {
		oi, _ := strconv.ParseUint(nq.ObjectId, 0, 64)

oi, err := ..
if err == nil { keys = append(...) }


dgraph/cmd/live/batch.go, line 271 at r12 (raw file):

	}

	if nq.ObjectValue == nil || !(pred.Count || pred.Index) {

Put a comment at the top, referring to the code from which this logic is inspired.


dgraph/cmd/live/batch.go, line 275 at r12 (raw file):

	}

	for _, tokerName := range pred.Tokenizer {

tokName


dgraph/cmd/live/batch.go, line 276 at r12 (raw file):

	for _, tokerName := range pred.Tokenizer {
		toker, ok := tok.GetTokenizer(tokerName)

tok


dgraph/cmd/live/batch.go, line 278 at r12 (raw file):

		toker, ok := tok.GetTokenizer(tokerName)
		if !ok {
			fmt.Printf("unknown tokenizer %q", tokerName)

continue?

Also, are we using fmt.Printfs or are we using glog.Infofs.


dgraph/cmd/live/batch.go, line 287 at r12 (raw file):

		schemaVal, err := types.Convert(storageVal, types.TypeID(pred.ValueType))
		x.Check(err)

This can trigger. Should not be a check failure.


dgraph/cmd/live/batch.go, line 289 at r12 (raw file):

		x.Check(err)
		toks, err := tok.BuildTokens(schemaVal.Value, tok.GetLangTokenizer(toker, nq.Lang))
		x.Check(err)

just return err


dgraph/cmd/live/batch.go, line 300 at r12 (raw file):

}

func (l *loader) getConflicts(req *api.Mutation) []uint64 {

conflictKeysForRequest()


dgraph/cmd/live/batch.go, line 304 at r12 (raw file):

	for _, nq := range req.Set {
		keys = append(keys, l.getConflictKeys(nq)...)
	}

Put a comment saying that live loader only needs to look at sets, not deletes.


dgraph/cmd/live/batch.go, line 321 at r12 (raw file):

	for _, key := range keys {
		l.conflicts[key] = true

= struct{}{}


dgraph/cmd/live/batch.go, line 345 at r12 (raw file):

	buffer := make([]api.Mutation, 0, l.opts.bufferSize)

Remove vert space.


dgraph/cmd/live/batch.go, line 347 at r12 (raw file):

	drain := func(min int) {
		for len(buffer) > min {

for len(buffer) > maxSize

maxSize = l.opts.bufferSize


dgraph/cmd/live/batch.go, line 349 at r12 (raw file):

		for len(buffer) > min {
			i := 0
			for _, mu := range buffer {

for _, req := range buffer


dgraph/cmd/live/batch.go, line 350 at r12 (raw file):

			i := 0
			for _, mu := range buffer {
				if l.addConflictKeys(&mu) {

if !l.addConflictKeys() { buffer[i] = mu; i++; continue }
Execute req. req will no longer be part of the buffer.


dgraph/cmd/live/batch.go, line 351 at r12 (raw file):

			for _, mu := range buffer {
				if l.addConflictKeys(&mu) {
					reqNum := atomic.AddUint64(&l.reqNum, 1)

Move it to l.request.


dgraph/cmd/live/batch.go, line 355 at r12 (raw file):

					continue
				}
				buffer[i] = mu

Add a comment here explaining that there's a shift going on.


dgraph/cmd/live/batch.go, line 365 at r12 (raw file):

	for req := range l.reqs {
		if l.addConflictKeys(&req) {
			reqNum := atomic.AddUint64(&l.reqNum, 1)

Doubt we need to pass reqNum to l.request.


dgraph/cmd/live/run.go, line 151 at r11 (raw file):

Previously, harshil-goel (Harshil Goel) wrote…

Done.

Remove this auth token business. The dgo client can deal with it on its own.


dgraph/cmd/live/run.go, line 263 at r12 (raw file):

					return t(iPred) < t(jPred)
				}

vert space


dgraph/cmd/live/run.go, line 268 at r12 (raw file):

			for len(buffer) > 0 {

vert space


dgraph/cmd/live/run.go, line 269 at r12 (raw file):

			for len(buffer) > 0 {

				f := opt.batchSize

sz


dgraph/cmd/live/run.go, line 293 at r12 (raw file):

			buffer = append(buffer, nqs...)
			if len(buffer) < opt.bufferSize*opt.batchSize {

All of this needs a lot of explanation for what is going on.

Copy link
Contributor Author

@harshil-goel harshil-goel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 52 unresolved discussions (waiting on @golangcibot, @manishrjain, @martinmr, and @pawanrawal)


dgraph/cmd/live/batch.go, line 55 at r11 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

could this be a const? If so, can be declared outside.

This is a user-defined variable, with default = 100


dgraph/cmd/live/batch.go, line 175 at r11 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Just expose this function in the gql package and reuse from there.

Done.


dgraph/cmd/live/batch.go, line 346 at r11 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't calculate again. Store the keys in a derived struct. All refs to api.Mutation, can now be a request object or something.

Done.


dgraph/cmd/live/batch.go, line 85 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Maybe make it a struct{}

Done.


dgraph/cmd/live/batch.go, line 91 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

schema

Done.


dgraph/cmd/live/batch.go, line 216 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if err != nil { return ... }
// do something.

Done.


dgraph/cmd/live/batch.go, line 237 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

return error.

Done.


dgraph/cmd/live/batch.go, line 237 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

conflictKeysForNQuad(nq)

Done.


dgraph/cmd/live/batch.go, line 238 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

error? If so, at least capture it and log it.

Done.


dgraph/cmd/live/batch.go, line 253 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

extra vertical space.

Done.


dgraph/cmd/live/batch.go, line 267 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

oi, err := ..
if err == nil { keys = append(...) }

Done.


dgraph/cmd/live/batch.go, line 271 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Put a comment at the top, referring to the code from which this logic is inspired.

Done.


dgraph/cmd/live/batch.go, line 275 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

tokName

Done.


dgraph/cmd/live/batch.go, line 276 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

tok

Done.


dgraph/cmd/live/batch.go, line 278 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

continue?

Also, are we using fmt.Printfs or are we using glog.Infofs.

fmt.Printf


dgraph/cmd/live/batch.go, line 287 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This can trigger. Should not be a check failure.

Done.


dgraph/cmd/live/batch.go, line 289 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

just return err

Done.


dgraph/cmd/live/batch.go, line 300 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

conflictKeysForRequest()

Done.


dgraph/cmd/live/batch.go, line 304 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Put a comment saying that live loader only needs to look at sets, not deletes.

Done.


dgraph/cmd/live/batch.go, line 321 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

= struct{}{}

Done.


dgraph/cmd/live/batch.go, line 345 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove vert space.

Done.


dgraph/cmd/live/batch.go, line 347 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

for len(buffer) > maxSize

maxSize = l.opts.bufferSize

Done.


dgraph/cmd/live/batch.go, line 349 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

for _, req := range buffer

Done.


dgraph/cmd/live/batch.go, line 350 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if !l.addConflictKeys() { buffer[i] = mu; i++; continue }
Execute req. req will no longer be part of the buffer.

Done.


dgraph/cmd/live/batch.go, line 351 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move it to l.request.

Done.


dgraph/cmd/live/batch.go, line 355 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a comment here explaining that there's a shift going on.

Done.


dgraph/cmd/live/batch.go, line 365 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Doubt we need to pass reqNum to l.request.

Done.


dgraph/cmd/live/run.go, line 263 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

vert space

Done.


dgraph/cmd/live/run.go, line 268 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

vert space

Done.


dgraph/cmd/live/run.go, line 269 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

sz

Done.


dgraph/cmd/live/run.go, line 293 at r12 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

All of this needs a lot of explanation for what is going on.

Done.

@harshil-goel harshil-goel changed the title Remove conflict uids Improved latency in live loader using conflict resolution at client level Dec 18, 2019
Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 5 of 7 files reviewed, 40 unresolved discussions (waiting on @golangcibot, @harshil-goel, @manishrjain, and @martinmr)


dgraph/cmd/live/run.go, line 249 at r14 (raw file):

			// We collect opt.bufferSize requests and preprocess them. For the requests
			// to not confict between themself, we sort them on the basis of their predicates.
			// Predicates with count index will conflict among themselfs, so we keep them at

themselves

@harshil-goel harshil-goel merged commit b9627f1 into master Dec 18, 2019
@harshil-goel harshil-goel deleted the harshil-goel/remove-conflict-uids branch December 18, 2019 14:29
danielmai pushed a commit that referenced this pull request Jan 12, 2020
…evel (#4362)

The live loader takes about 1 hour for 21 million dataset when run with -c=1. But when we run the same benchmark for c=10, the transactions are conflicting between themselves. This increases the time taken to process the nquads exponentially (~1 day). 

Benchmarks for master at c=1
`
Number of TXs run            : 21240
Number of N-Quads processed  : 21239870
Time spent                   : 1h8m12.999735722s
N-Quads processed per second : 5190
`


This PR brings in a conflict detection at the client level, using which we can decide when to send the request. We also implemented a heuristic, that batches conflicting NQuads together so that there are less conflicts to start with.  
Benchmark for this PR at c = 10
`
Number of TXs run            : 21240
Number of N-Quads processed  : 21239870
Time spent                   : 23m42.685914685s
N-Quads processed per second : 14936
`

<!-- Reviewable:start -->
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/dgraph-io/dgraph/4362)
<!-- Reviewable:end -->

(cherry picked from commit b9627f1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants