Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live loading on 21M data not returning consistent result for count index #4689

Closed
ashish-goswami opened this issue Jan 28, 2020 · 8 comments · Fixed by #5689
Closed

Live loading on 21M data not returning consistent result for count index #4689

ashish-goswami opened this issue Jan 28, 2020 · 8 comments · Fixed by #5689
Labels
area/indexes Related to indexes (indices) and their usage. kind/bug Something is broken. status/accepted We accept to investigate/work on it.

Comments

@ashish-goswami
Copy link
Contributor

What version of Dgraph are you using?

Master(commit - abc6afa)

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

ubuntu 18.10, 16 core CPU, 64 GB RAM

Steps to reproduce the issue (command/config used to run Dgraph).

  • Run live loader on 21M data set, on 1 zero and 1 alpha cluster
dgraph live -f 21million.rdf.gz -s 21million.schema
  • Run below code to get uid count of predicates where count index is enabled.
package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"

	"github.com/dgraph-io/dgo"
	"github.com/dgraph-io/dgo/protos/api"
	"google.golang.org/grpc"
)

type QRes struct {
	Me []struct {
		Count int `json:"count"`
	} `json:"me"`
}

func main() {
	conn, err := grpc.Dial("127.0.0.1:9080", grpc.WithInsecure())
	if err != nil {
		log.Fatal("While trying to dial gRPC")
	}

	dc := api.NewDgraphClient(conn)
	dg := dgo.NewDgraphClient(dc)

	edgeCount := []int{1, 2, 3, 30, 100}
	preds := []string{"starring", "actor.film", "director.film", "~director.film"}

	for _, pred := range preds {
		for _, c := range edgeCount {
			q := fmt.Sprintf(`{
				me(func: eq(count(%s), %d)) {
					count(uid)
				}
			}`, pred, c)

			res, err := dg.NewReadOnlyTxn().Query(context.Background(), q)
			if err != nil {
				panic(err)
			}

			var r QRes
			err = json.Unmarshal(res.GetJson(), &r)
			if err != nil {
				panic(err)
			}

			if len(r.Me) > 0 {
				fmt.Println("predicate name:", pred, ",Count:", r.Me[0].Count)
			} else {
				fmt.Println("predicate name:", pred, "Count: empty")
			}
		}
	}
}
  • Note down the count and repeat the above steps on clean cluster.

Expected behaviour and actual result.

Results should be same after every run.

@sleto-it sleto-it added kind/bug Something is broken. status/accepted We accept to investigate/work on it. area/indexes Related to indexes (indices) and their usage. labels Jan 28, 2020
@ashish-goswami
Copy link
Contributor Author

Output of Dgraph version:

Dgraph version   : v2.0.0-beta1-22-g2eca86c46
Dgraph SHA-256   : 940c47f22fb71b38ed2a49e7a880392de89b58af5e73aa09fe87de99f70c8ed3
Commit SHA-1     : 2eca86c46
Commit timestamp : 2020-02-07 12:58:27 +0530
Branch           : master
Go version       : go1.13.4

I ran live loader using below command:
dgraph live -f 21million.rdf.gz -s 21million.schema

After each successful run, I tried to get the count by running above go code on alpha:
Result After first run:

predicate name: starring ,Count: 34821
predicate name: starring ,Count: 17162
predicate name: starring ,Count: 15315
predicate name: starring ,Count: 181
predicate name: starring ,Count: 10
predicate name: actor.film ,Count: 303114
predicate name: actor.film ,Count: 58938
predicate name: actor.film ,Count: 26932
predicate name: actor.film ,Count: 326
predicate name: actor.film ,Count: 10
predicate name: director.film ,Count: 61879
predicate name: director.film ,Count: 13509
predicate name: director.film ,Count: 5792
predicate name: director.film ,Count: 55
predicate name: director.film ,Count: 0
predicate name: ~director.film ,Count: 201405
predicate name: ~director.film ,Count: 14660
predicate name: ~director.film ,Count: 1534
predicate name: ~director.film ,Count: 1
predicate name: ~director.film ,Count: 0

After second run:

predicate name: starring ,Count: 33408
predicate name: starring ,Count: 16504
predicate name: starring ,Count: 14728
predicate name: starring ,Count: 181
predicate name: starring ,Count: 10
predicate name: actor.film ,Count: 298306
predicate name: actor.film ,Count: 57893
predicate name: actor.film ,Count: 26522
predicate name: actor.film ,Count: 326
predicate name: actor.film ,Count: 10
predicate name: director.film ,Count: 58623
predicate name: director.film ,Count: 12817
predicate name: director.film ,Count: 5614
predicate name: director.film ,Count: 55
predicate name: director.film ,Count: 0
predicate name: ~director.film ,Count: 190155
predicate name: ~director.film ,Count: 14299
predicate name: ~director.film ,Count: 1534
predicate name: ~director.film ,Count: 1
predicate name: ~director.film ,Count: 0

After third run:

predicate name: starring ,Count: 34821
predicate name: starring ,Count: 17162
predicate name: starring ,Count: 15315
predicate name: starring ,Count: 181
predicate name: starring ,Count: 10
predicate name: actor.film ,Count: 309357
predicate name: actor.film ,Count: 60038
predicate name: actor.film ,Count: 27311
predicate name: actor.film ,Count: 326
predicate name: actor.film ,Count: 10
predicate name: director.film ,Count: 60450
predicate name: director.film ,Count: 13224
predicate name: director.film ,Count: 5748
predicate name: director.film ,Count: 55
predicate name: director.film ,Count: 0
predicate name: ~director.film ,Count: 198444
predicate name: ~director.film ,Count: 13964
predicate name: ~director.film ,Count: 1534
predicate name: ~director.film ,Count: 1
predicate name: ~director.film ,Count: 0

@ashish-goswami
Copy link
Contributor Author

Running live loader on same version on 1M dataset with below data returns consistent result:

director.film        : [uid] @reverse @count .
actor.film           : [uid] @count .
genre                : [uid] @reverse @count .
rating               : [uid] @reverse .
country              : [uid] @reverse .
loc                  : geo @index(geo) .
name                 : string @index(hash, term, trigram, fulltext) @lang .
starring             : [uid] @count .
performance.character_note : string @lang .
tagline              : string @lang .
cut.note             : string @lang .
rated                : [uid] @reverse .
email                : string @index(exact) @upsert .

Result for 3 runs is consistent.

predicate name: starring ,Count: 1037
predicate name: starring ,Count: 634
predicate name: starring ,Count: 336
predicate name: starring ,Count: 32
predicate name: starring ,Count: 4
predicate name: actor.film ,Count: 52534
predicate name: actor.film ,Count: 9471
predicate name: actor.film ,Count: 3554
predicate name: actor.film ,Count: 5
predicate name: actor.film ,Count: 0
predicate name: director.film ,Count: 827
predicate name: director.film ,Count: 169
predicate name: director.film ,Count: 110
predicate name: director.film ,Count: 6
predicate name: director.film ,Count: 0
predicate name: ~director.film ,Count: 5861
predicate name: ~director.film ,Count: 374
predicate name: ~director.film ,Count: 54
predicate name: ~director.film ,Count: 0
predicate name: ~director.film ,Count: 0

@ashish-goswami
Copy link
Contributor Author

Disabled rollups by commenting below lines in worker/draft.go.
https://github.com/dgraph-io/dgraph/blob/b76fcbd46367ceb568870103d420b1fec8a7b539/worker/draft.go#L439-L446

Ran live loader 3 times and got consistent result.

predicate name: starring ,Count: 34821
predicate name: starring ,Count: 17162
predicate name: starring ,Count: 15315
predicate name: starring ,Count: 181
predicate name: starring ,Count: 10
predicate name: actor.film ,Count: 309357
predicate name: actor.film ,Count: 60038
predicate name: actor.film ,Count: 27311
predicate name: actor.film ,Count: 326
predicate name: actor.film ,Count: 10
predicate name: director.film ,Count: 62640
predicate name: director.film ,Count: 13684
predicate name: director.film ,Count: 5847
predicate name: director.film ,Count: 55
predicate name: director.film ,Count: 0
predicate name: ~director.film ,Count: 204296
predicate name: ~director.film ,Count: 14761
predicate name: ~director.film ,Count: 1534
predicate name: ~director.film ,Count: 1
predicate name: ~director.film ,Count: 0

@mangalaman93 mangalaman93 self-assigned this Mar 6, 2020
@mangalaman93
Copy link
Member

I think this PR #4916 has fixed this issue. I am writing a systest for it.

@mangalaman93
Copy link
Member

mangalaman93 commented Mar 31, 2020

All of the above test pass on master but existing test fail (sometimes) when you run the following command:

./systest/21million/test-21million.sh --loader live

@mangalaman93
Copy link
Member

mangalaman93 commented Apr 1, 2020

Dgraph is at 0e90415
cc: @jarifibrahim @ashish-goswami

I noticed that the Alpha returns different results when the query is executed multiple times. This could be reproduced as follows:

./systest/21million/test-21million.sh --loader live --cleanup server

This will run live loader and leave the dgraph_data volume. Take the p directory out of the volume and run single node zero and alpha cluster using this p directory. To take out the data out of the volume, you could run following command to get the path of the data on the host machine

docker inspect dgraph_data

If you run the following curl command, it may return different results:

curl "http://localhost:8080/query" -d '{me(func: gt(count(director.film), 5)) {count(uid)}}' -H "Co
ntent-Type: application/graphql+-"

Example output that I observed

aman@dgraph:~/gocode/src/github.com/dgraph-io/dgraph/scratch/count$ curl "http://localhost:8080/query" -d '{me(func: gt(count(director.film), 5)) {count(uid)}}' -H "Content-Type: application/graphql+-"
{"data":{"me":[{"count":7579}]},"extensions":{"server_latency":{"parsing_ns":92339,"processing_ns":62198132,"encoding_ns":1451940,"assign_timestamp_ns":550206,"total_ns":64375107},"txn":{"start_ts":260002},"metrics":{"num_uids":{"director.film":0}}}}
aman@dgraph:~/gocode/src/github.com/dgraph-io/dgraph/scratch/count$ curl "http://localhost:8080/query" -d '{me(func: gt(count(director.film), 5)) {count(uid)}}' -H "Content-Type: application/graphql+-"
{"data":{"me":[{"count":7576}]},"extensions":{"server_latency":{"parsing_ns":65559,"processing_ns":28964336,"encoding_ns":1207962,"assign_timestamp_ns":480137,"total_ns":30781844},"txn":{"start_ts":260003},"metrics":{"num_uids":{"director.film":0}}}}
aman@dgraph:~/gocode/src/github.com/dgraph-io/dgraph/scratch/count$ curl "http://localhost:8080/query" -d '{me(func: gt(count(director.film), 5)) {count(uid)}}' -H "Content-Type: application/graphql+-"
{"data":{"me":[{"count":7574}]},"extensions":{"server_latency":{"parsing_ns":67739,"processing_ns":24988380,"encoding_ns":1179102,"assign_timestamp_ns":543826,"total_ns":26850677},"txn":{"start_ts":260004},"metrics":{"num_uids":{"director.film":0}}}}

I thought that this could occur due to a data race in the code. I decided to compile the alpha with -race and run the alpha. I observed following data race:

==================
WARNING: DATA RACE
Read at 0x00c00028eed8 by goroutine 20:
  github.com/dgraph-io/badger/v2/table.(*Table).DoesNotHave()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/table.go:515 +0x3f8
  github.com/dgraph-io/badger/v2.(*IteratorOptions).pickTables()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:407 +0x29e
  github.com/dgraph-io/badger/v2.(*levelHandler).appendIterators()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/level_handler.go:308 +0x38e
  github.com/dgraph-io/badger/v2.(*levelsController).appendIterators()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/levels.go:1025 +0xcc
  github.com/dgraph-io/badger/v2.(*Txn).NewIterator()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:468 +0x4d3
  github.com/dgraph-io/badger/v2.(*Txn).NewKeyIterator()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:489 +0xfb
  github.com/dgraph-io/dgraph/posting.getNew()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/posting/mvcc.go:345 +0x20c
  github.com/dgraph-io/dgraph/posting.(*LocalCache).getInternal()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/posting/lists.go:208 +0x4d6
  github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func1()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/posting/lists.go:241 +0x416
  github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func2()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/worker/task.go:811 +0x47

Previous write at 0x00c00028eed8 by goroutine 291:
  [failed to restore the stack]

Goroutine 20 (running) created at:
  github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/worker/task.go:810 +0x58e
  github.com/dgraph-io/dgraph/worker.(*queryState).helpProcessTask()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/worker/task.go:966 +0x10c9
  github.com/dgraph-io/dgraph/worker.processTask()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/worker/task.go:887 +0x7ac
  github.com/dgraph-io/dgraph/worker.ProcessTaskOverNetwork()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/worker/task.go:149 +0x5f9
  github.com/dgraph-io/dgraph/query.ProcessGraph()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/query/query.go:2001 +0x3864

Goroutine 291 (running) created at:
  github.com/dgraph-io/dgraph/query.(*Request).ProcessQuery()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/query/query.go:2662 +0x1016
  github.com/dgraph-io/dgraph/query.(*Request).Process()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/query/query.go:2717 +0x8c
  github.com/dgraph-io/dgraph/edgraph.processQuery()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/edgraph/server.go:913 +0x472
  github.com/dgraph-io/dgraph/edgraph.(*Server).doQuery()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/edgraph/server.go:844 +0x717
  github.com/dgraph-io/dgraph/edgraph.(*Server).Query()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/edgraph/server.go:760 +0xd9
  github.com/dgraph-io/dgraph/dgraph/cmd/alpha.queryHandler()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/http.go:236 +0x59f
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2007 +0x51
  net/http.(*ServeMux).ServeHTTP()
      /usr/local/go/src/net/http/server.go:2387 +0x288
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2802 +0xce
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1890 +0x837
==================

I also tried compiling with the suggestion here golang/go#10661, though I couldn't see a better stack trace.

@jarifibrahim
Copy link
Contributor

I am aware of the data race in badger. I'll send a PR to fix it in Badger and Dgraph. The issue was introduced in cc3c5cd

==================
WARNING: DATA RACE
Read at 0x00c00028eed8 by goroutine 20:
  github.com/dgraph-io/badger/v2/table.(*Table).DoesNotHave()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/table.go:515 +0x3f8
  github.com/dgraph-io/badger/v2.(*IteratorOptions).pickTables()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:407 +0x29e
  github.com/dgraph-io/badger/v2.(*levelHandler).appendIterators()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/level_handler.go:308 +0x38e
  github.com/dgraph-io/badger/v2.(*levelsController).appendIterators()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/levels.go:1025 +0xcc
  github.com/dgraph-io/badger/v2.(*Txn).NewIterator()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:468 +0x4d3
  github.com/dgraph-io/badger/v2.(*Txn).NewKeyIterator()
      /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:489 +0xfb
  github.com/dgraph-io/dgraph/posting.getNew()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/posting/mvcc.go:345 +0x20c
  github.com/dgraph-io/dgraph/posting.(*LocalCache).getInternal()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/posting/lists.go:208 +0x4d6
  github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func1()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/posting/lists.go:241 +0x416
  github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func2()
      /home/aman/gocode/src/github.com/dgraph-io/dgraph/worker/task.go:811 +0x47

@mangalaman93
Copy link
Member

Now that I think of it, this data race may not have anything to do with this issue. Though, I observed while debugging this issue.

jarifibrahim pushed a commit to dgraph-io/badger that referenced this issue Apr 8, 2020
There is a race condition in the `t.bf` variable if `table.LoadBloomsOnOpen` is set to `false`.
This issue was also seen in hypermodeinc/dgraph#4689 (comment)
manishrjain pushed a commit to outcaste-io/outserv that referenced this issue Jul 6, 2022
There is a race condition in the `t.bf` variable if `table.LoadBloomsOnOpen` is set to `false`.
This issue was also seen in hypermodeinc/dgraph#4689 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/indexes Related to indexes (indices) and their usage. kind/bug Something is broken. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

4 participants