Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live & bulk loader changes #2961

Merged
merged 62 commits into from
Feb 6, 2019
Merged

Conversation

codexnull
Copy link
Contributor

@codexnull codexnull commented Feb 1, 2019

Fixes #2889, #2848, #2927.

Summary of loader changes:

  • Support live loading JSON files

  • Support creating UID on the fly based on one or more other JSON fields in both loaders

  • Support loading RDF or JSON stream instead of requiring files in live loader

  • Auto-detect compressed load data instead of requiring extension in filename in both loaders

  • Auto-detect JSON load data instead of requiring extension in filename in both loaders

  • Lots of refactoring of live and bulk loaders to share code for the same functionality


This change is Reviewable

Javier Alvarado added 30 commits January 11, 2019 16:11
@codexnull codexnull requested a review from a team February 1, 2019 01:50
z/exec.go Outdated
}
}

cmd[i].Start()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of (*os/exec.Cmd).Start is not checked (from errcheck)

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 21 of 21 files at r1.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @codexnull and @golangcibot)


dgraph/cmd/bulk/loader.go, line 40 at r1 (raw file):

	"github.com/dgraph-io/dgraph/x"
	"github.com/dgraph-io/dgraph/xidmap"
	"google.golang.org/grpc"

Why is this line being moved?


dgraph/cmd/bulk/loader.go, line 69 at r1 (raw file):

	shardOutputDirs []string
	keyFields       []string
}

If you are going to have two fields that look the same, there should probably a comment explaining the difference to avoid confusion in the future.

Personally, if I understand the difference, I would call keyFields something like parsedKeyFields.


dgraph/cmd/live/load_json_test.go, line 39 at r1 (raw file):

var testDataDir string
var dg *dgo.Dgraph
var tmpDir string

move these into a var block


dgraph/cmd/live/run.go, line 69 at r1 (raw file):

var Live x.SubCommand

var keyFields []string

also move all these vars into a var block


dgraph/cmd/live/run.go, line 174 at r1 (raw file):

}

// forward file to the RDF or JSON processor as appropriate

change it to "processFile forwards file to the ...."


edgraph/nquads_from_json.go, line 422 at r1 (raw file):

func replaceBlankIds(nqs []*api.NQuad, keyFields *[]string) error {
	field2idx := make(map[string]int)

I think this would be a bit more readable if it was called fieldToIdx


edgraph/nquads_from_json.go, line 450 at r1 (raw file):

	if err == nil {
		if nqs[0].Subject == "_:blank-0" {
			if keyFields == nil {

maybe assert this at the very beginning and exit early if keyFields == nil?


loadfile/chunk.go, line 62 at r1 (raw file):

func (rdfChunker) Begin(r *bufio.Reader) error {
	return nil

Add a comment on why this chunker just returns nil.


loadfile/chunk.go, line 68 at r1 (raw file):

	batch.Grow(1 << 20)
	for lineCount := 0; lineCount < 1e5; lineCount++ {

maybe assign these constants to a variable and explain why these values were chosen.


loadfile/chunk.go, line 122 at r1 (raw file):

func (rdfChunker) End(r *bufio.Reader) error {
	return nil

a comment explaining why this function is empty would also be helpful


loadfile/chunk.go, line 145 at r1 (raw file):

func (jsonChunker) Chunk(r *bufio.Reader) (*bytes.Buffer, error) {
	out := new(bytes.Buffer)
	out.Grow(1 << 20)

Also this constant but I assume since it has the same value as above it represents the same quantity.


z/exec.go, line 67 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

Error return value of (*os/exec.Cmd).Start is not checked (from errcheck)

Fix this


z/json.go, line 91 at r1 (raw file):

func sortJSONArray(a []interface{}) uint64 {
	h := uint64(0)

var h uint64

Copy link
Contributor Author

@codexnull codexnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @golangcibot and @martinmr)


dgraph/cmd/bulk/loader.go, line 40 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Why is this line being moved?

Because the automatic formatting in my editor decided to do it. Sometimes it rearranges imports and I don't always catch it.


dgraph/cmd/bulk/loader.go, line 69 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

If you are going to have two fields that look the same, there should probably a comment explaining the difference to avoid confusion in the future.

Personally, if I understand the difference, I would call keyFields something like parsedKeyFields.

Done.


dgraph/cmd/live/load_json_test.go, line 39 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
var testDataDir string
var dg *dgo.Dgraph
var tmpDir string

move these into a var block

OK. Is that just because it's the preferred Go style?


dgraph/cmd/live/run.go, line 69 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

also move all these vars into a var block

Done.


dgraph/cmd/live/run.go, line 174 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

change it to "processFile forwards file to the ...."

Done.


edgraph/nquads_from_json.go, line 422 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I think this would be a bit more readable if it was called fieldToIdx

Done.


edgraph/nquads_from_json.go, line 450 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

maybe assert this at the very beginning and exit early if keyFields == nil?

It's OK for keyFields to be nil most of the time. Only when a JSON map does not contain a uid field (which is what the "_:blank-0" check is for) is it required.


loadfile/chunk.go, line 62 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Add a comment on why this chunker just returns nil.

Done.


loadfile/chunk.go, line 68 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
	batch.Grow(1 << 20)
	for lineCount := 0; lineCount < 1e5; lineCount++ {

maybe assign these constants to a variable and explain why these values were chosen.

Those values were there already, but my guess is they are arbitrary.


loadfile/chunk.go, line 122 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

a comment explaining why this function is empty would also be helpful

Done.


loadfile/chunk.go, line 145 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Also this constant but I assume since it has the same value as above it represents the same quantity.

Yes, this I just copied for consistency.


z/json.go, line 91 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

var h uint64

Done.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @codexnull, @golangcibot, and @martinmr)


dgraph/cmd/live/load_json_test.go, line 39 at r1 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

OK. Is that just because it's the preferred Go style?

I believe so


loadfile/chunk.go, line 61 at r2 (raw file):

}

// RDF files don't require any special processing at the beginning of the file

period at the end


loadfile/chunk.go, line 122 at r2 (raw file):

}

// RDF files don't require any special processing at the end of the file

period at the end


z/json.go, line 91 at r2 (raw file):

func sortJSONArray(a []interface{}) uint64 {
	h := uint64(0)

var h uint64 here as well

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 21 files at r1, 7 of 7 files at r2.
Reviewable status: all files reviewed, 30 unresolved discussions (waiting on @codexnull, @golangcibot, and @martinmr)


dgraph/cmd/bulk/loader.go, line 204 at r2 (raw file):

			defer thr.Done()

			r, cleanup_fn := x.FileReader(file)

r, cleanup := x.FileReader(file)

defer cleanup()


dgraph/cmd/bulk/mapper.go, line 120 at r2 (raw file):

}

func (m *mapper) run(inputFormat int, keyFields *[]string) {

Slices are pointers by default. No need to pass in a pointer to a slice.

If you're changing the slice, then the Go way is to return the updated slice as a result.


dgraph/cmd/bulk/mapper.go, line 135 at r2 (raw file):

			}

			gqlNqs := make([]gql.NQuad, len(nqs))

What is this for? Are you copying them over for some reason?


dgraph/cmd/bulk/run.go, line 51 at r2 (raw file):

	flag := Bulk.Cmd.Flags()
	flag.StringP("rdfs", "r", "",
		"Location of RDF data to load.")

I think the previous comment was clearer than this one.


dgraph/cmd/bulk/run.go, line 54 at r2 (raw file):

	// would be nice to use -j to match -r, but already used by --num_go_routines
	flag.String("jsons", "",
		"Location of JSON data to load.")

same here. This isn't as clear -- unless you changed how we detect files.


dgraph/cmd/bulk/run.go, line 94 at r2 (raw file):

	flag.String("custom_tokenizers", "",
		"Comma separated list of tokenizer plugins")
	flag.StringP("key", "k", "", "Comma-separated list of JSON fields to identify a uid")

Add (Optional) prefix in the description.

... to identify a node uniquely.


dgraph/cmd/bulk/run.go, line 184 at r2 (raw file):

	for _, f := range strings.Split(opt.KeyFields, ",") {
		opt.parsedKeyFields = append(opt.parsedKeyFields, strings.TrimSpace(f))

We should avoid having 2 fields in opt, which both represent the same information. You can just overwrite opt.KeyFields here.


dgraph/cmd/live/run.go, line 85 at r2 (raw file):

	flag := Live.Cmd.Flags()
	flag.StringP("rdfs", "r", "", "Location of RDF or JSON file(s) to load")

Rename to files, and f. What do you expect the pattern to be? like this:

/*.json.gz? Whatever it is, mention it in the description so it is clear. ___ *[dgraph/cmd/live/run.go, line 179 at r2](https://reviewable.io/reviews/dgraph-io/dgraph/2961#-LXuWoxX2EeASxWdBWgZ:-LXuWoxX2EeASxWdBWg_:bvbuzff) ([raw file](https://github.com/dgraph-io/dgraph/blob/116a7943ec9e9618ea97d5f93a0eaad25152b3d6/dgraph/cmd/live/run.go#L179)):* > ```Go > > rd, cleanup_fn := x.FileReader(file) > defer cleanup_fn() > ```

Go uses camelcase. But, as mentioned above, no need for _fn. Just call it cleanup. Similar to how context.WithCancel works.


dgraph/cmd/live/run.go, line 189 at r2 (raw file):

	} else {
		isJson, err = x.IsJSONData(rd)
		if isJson {

Shouldn't you check for error first? But, also, why try to guess? Just complain and crash out. Users' responsibility to give us the right files.


dgraph/cmd/live/run.go, line 199 at r2 (raw file):

}

func (l *loader) nextNquads(batch []*api.NQuad, nqs []*api.NQuad) []*api.NQuad {

Typically these things just get inlined into the loops, instead of being funcs on their own.


dgraph/cmd/live/run.go, line 211 at r2 (raw file):

		mu := api.Mutation{Set: batch[:opt.batchSize]}
		l.reqs <- mu
		batch = batch[opt.batchSize:]

I smell some funkiness here. The way batch is being passed around doesn't look good. Yeah, just inline this in the caller.

Also, after passing batch to a channel, create a new batch, don't modify it anymore. This particular case might just work, but in general, a recipe for disaster.


dgraph/cmd/live/run.go, line 217 at r2 (raw file):

}

func (l *loader) finalNquads(batch []*api.NQuad) {

Inline this func.


dgraph/cmd/live/run.go, line 253 at r2 (raw file):

}

func (l *loader) processJsonFile(ctx context.Context, rd *bufio.Reader) error {

Not a fan of 2-line functions. Rather inline it and avoid the code jumps required to read this code.


dgraph/cmd/live/run.go, line 258 at r2 (raw file):

}

func (l *loader) processRdfFile(ctx context.Context, rd *bufio.Reader) error {

And here.


dgraph/cmd/live/run.go, line 325 at r2 (raw file):

	for _, f := range strings.Split(opt.keyFields, ",") {
		keyFields = append(keyFields, strings.TrimSpace(f))

Some other place, this was being set back into the opt? (looks like in bulk) But comments there apply here as well. Overwrite opt.keyFields, decrease a variable.


edgraph/nquads_from_json.go, line 437 at r2 (raw file):

	for _, nq := range nqs {
		nq.Subject = str

This won't work if another node is pointing to this node. The ObjectId which was a variable before would have been correctly set by xidToUid map, but now the subjects get changed, leaving those pointers dangling.


loadfile/chunk.go, line 62 at r1 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

Done.

I don't see the changes required by Martin.


loadfile/chunk.go, line 17 at r2 (raw file):

 */

package loadfile

package chunker.

Also, move rdf parsing within this package.


loadfile/chunk.go, line 214 at r2 (raw file):

	}

	nqs, err := edgraph.JsonToNquads(chunkBuf.Bytes(), keyFields)

Move it to this package.


x/file.go, line 105 at r2 (raw file):

// and decompressed automatically even without the gz extension. The caller is responsible for
// calling the returned cleanup function when done with the reader.
func FileReader(file string) (rd *bufio.Reader, cleanup func()) {

Move this to chunker pkg.


z/client.go, line 2 at r2 (raw file):

/*
 * Copyright 2017-2018 Dgraph Labs, Inc. and Contributors

2019


z/exec.go, line 2 at r2 (raw file):

/*
 * Copyright 2017-2018 Dgraph Labs, Inc. and Contributors

2019


z/json.go, line 2 at r2 (raw file):

/*
 * Copyright 2017-2018 Dgraph Labs, Inc. and Contributors

2019

@@ -139,7 +140,7 @@ func (m *mapper) run(inputFormat int) {
}
}

m.processNQuad(nq)
m.processNQuad(gql.NQuad{nq})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gql.NQuad composite literal uses unkeyed fields (from govet)

Copy link
Contributor Author

@codexnull codexnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 21 files reviewed, 31 unresolved discussions (waiting on @codexnull, @golangcibot, @manishrjain, and @martinmr)


dgraph/cmd/bulk/loader.go, line 204 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

r, cleanup := x.FileReader(file)

defer cleanup()

Done.


dgraph/cmd/bulk/mapper.go, line 120 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Slices are pointers by default. No need to pass in a pointer to a slice.

If you're changing the slice, then the Go way is to return the updated slice as a result.

Done.


dgraph/cmd/bulk/mapper.go, line 135 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

What is this for? Are you copying them over for some reason?

chunker.Parse() returns a slice of *api.NQuad, but the call to mapper.processNquad() below needs a gql.NQuad, so this code was converting from a slice of *api.NQuad to a slice of gql.NQuad. Looking at the code again, though, it's not necessary to create an array of gql.NQuad; each one can be just converted at the mapper.processNquad() invocation.


dgraph/cmd/bulk/mapper.go, line 143 at r3 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

gql.NQuad composite literal uses unkeyed fields (from govet)

Done.


dgraph/cmd/bulk/run.go, line 51 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I think the previous comment was clearer than this one.

The reason I changed the wording is because for the live loader, the argument isn't necessarily a directory. It may be the rdf file directly, a comma-separated list of file, or a directory containing the rdf files. I changed the bulk loader to behave the same way as the live loader.


dgraph/cmd/bulk/run.go, line 54 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

same here. This isn't as clear -- unless you changed how we detect files.

The argument here may also be a file, a list of files, or a directory.


dgraph/cmd/bulk/run.go, line 94 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add (Optional) prefix in the description.

... to identify a node uniquely.

Aren't all options optional? :)

Removing the --key option since the feature requires more thought.


dgraph/cmd/bulk/run.go, line 184 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

We should avoid having 2 fields in opt, which both represent the same information. You can just overwrite opt.KeyFields here.

I don't think we can do that given that opt.KeyFields is the string given as the option argument (e.g. "last,first,middle") and opt.parsedKeyFields is a slice of the individual fields (e.g. ["last","first","middle"]), but I'll see if I can move parsedKeyFields out of opt to some other (non-global variable) spot.


dgraph/cmd/live/run.go, line 85 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Rename to files, and f. What do you expect the pattern to be? like this:

/*.json.gz? Whatever it is, mention it in the description so it is clear.

I kept --rdfs because I didn't want to break backward compatibility. Should we keep it and deprecate it? I will change the option to --files in the bulk loader so that they are consistent.


dgraph/cmd/live/run.go, line 179 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Go uses camelcase. But, as mentioned above, no need for _fn. Just call it cleanup. Similar to how context.WithCancel works.

Done.


dgraph/cmd/live/run.go, line 189 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Shouldn't you check for error first? But, also, why try to guess? Just complain and crash out. Users' responsibility to give us the right files.

I check isJson first because it can only be true if err is nil and I wanted to let errors fall through to the return at the end of the function, but I'll change it to check err since that seems to be the preferred go idiom.

As for detecting JSON data, the reason is so that the user can do something like "curl | dgraph live ..." to stream data in (that is, there's no filename to check for an extension). Would it be better to make the user specify something like --format=json instead?


dgraph/cmd/live/run.go, line 199 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Typically these things just get inlined into the loops, instead of being funcs on their own.

OK


dgraph/cmd/live/run.go, line 211 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I smell some funkiness here. The way batch is being passed around doesn't look good. Yeah, just inline this in the caller.

Also, after passing batch to a channel, create a new batch, don't modify it anymore. This particular case might just work, but in general, a recipe for disaster.

batch is not a mutation, it's just a regular slice that holds a queue of nquads. When there are batchSize or more nquads in the queue, then the mutation is created, the leading batchSize items are copied to the mutation and removed from the queue, and the mutation is sent off through the channel and forgotten about. I think the queue will be unnecessary when I inline the functions


dgraph/cmd/live/run.go, line 253 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Not a fan of 2-line functions. Rather inline it and avoid the code jumps required to read this code.

Done.


dgraph/cmd/live/run.go, line 258 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

And here.

Done.


dgraph/cmd/live/run.go, line 325 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Some other place, this was being set back into the opt? (looks like in bulk) But comments there apply here as well. Overwrite opt.keyFields, decrease a variable.

Yes, the bulk and live loaders have different opt structures. Removing this feature since it requires more thought.


edgraph/nquads_from_json.go, line 437 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This won't work if another node is pointing to this node. The ObjectId which was a variable before would have been correctly set by xidToUid map, but now the subjects get changed, leaving those pointers dangling.

Removing this feature since it requires more thought.


loadfile/chunk.go, line 61 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

period at the end

Done.


loadfile/chunk.go, line 122 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

period at the end

Done.


z/exec.go, line 67 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Fix this

Done.


z/json.go, line 91 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

var h uint64 here as well

Done.

@@ -31,6 +31,7 @@ import (
"sync/atomic"

"github.com/dgraph-io/dgo/protos/api"
"github.com/dgraph-io/dgraph/chunker"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is not goimports-ed (from goimports)

Copy link
Contributor Author

@codexnull codexnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 30 files reviewed, 32 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)


dgraph/cmd/bulk/mapper.go, line 135 at r2 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

chunker.Parse() returns a slice of *api.NQuad, but the call to mapper.processNquad() below needs a gql.NQuad, so this code was converting from a slice of *api.NQuad to a slice of gql.NQuad. Looking at the code again, though, it's not necessary to create an array of gql.NQuad; each one can be just converted at the mapper.processNquad() invocation.

Done.


dgraph/cmd/bulk/mapper.go, line 34 at r4 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

File is not goimports-ed (from goimports)

Done.


dgraph/cmd/bulk/run.go, line 184 at r2 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

I don't think we can do that given that opt.KeyFields is the string given as the option argument (e.g. "last,first,middle") and opt.parsedKeyFields is a slice of the individual fields (e.g. ["last","first","middle"]), but I'll see if I can move parsedKeyFields out of opt to some other (non-global variable) spot.

Removing the --key option since the feature requires more thought.


dgraph/cmd/live/run.go, line 85 at r2 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

I kept --rdfs because I didn't want to break backward compatibility. Should we keep it and deprecate it? I will change the option to --files in the bulk loader so that they are consistent.

Done.


dgraph/cmd/live/run.go, line 217 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Inline this func.

Done.


x/file.go, line 105 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move this to chunker pkg.

Done.


z/client.go, line 2 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

2019

Done.


z/exec.go, line 2 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

2019

Done.


z/json.go, line 2 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

2019

Done.


loadfile/chunk.go, line 17 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

package chunker.

Also, move rdf parsing within this package.

Done.


loadfile/chunk.go, line 214 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move it to this package.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Made and pushed a small change to fix an issue. Good to go.

Reviewed 3 of 9 files at r3, 8 of 15 files at r4, 12 of 13 files at r5, 1 of 1 files at r6.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @codexnull, @golangcibot, @manishrjain, and @martinmr)


dgraph/cmd/live/run.go, line 189 at r2 (raw file):

Previously, codexnull (Javier Alvarado) wrote…

I check isJson first because it can only be true if err is nil and I wanted to let errors fall through to the return at the end of the function, but I'll change it to check err since that seems to be the preferred go idiom.

As for detecting JSON data, the reason is so that the user can do something like "curl | dgraph live ..." to stream data in (that is, there's no filename to check for an extension). Would it be better to make the user specify something like --format=json instead?

OK.


dgraph/cmd/live/run.go, line 221 at r5 (raw file):

			batch = append(batch, nqs...)
			for len(batch) >= opt.batchSize {
				mu := api.Mutation{Set: batch[:opt.batchSize]}

send the whole batch out, and create a new batch. Don't modify a slice after its ownership has changed. I've pushed a change to this


edgraph/server.go, line 35 at r5 (raw file):

	"github.com/dgraph-io/dgo/y"

	nqjson "github.com/dgraph-io/dgraph/chunker/json"

cjson? short of chunker/json?

@codexnull codexnull merged commit bcc7796 into master Feb 6, 2019
@codexnull codexnull deleted the javier/issue2889_live_load_json branch March 1, 2019 22:53
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
Fixes hypermodeinc#2889, hypermodeinc#2927.

Summary of loader changes:

    * Support live loading JSON files

    * Support loading RDF or JSON stream instead of requiring files in live loader

    * Auto-detect compressed load data instead of requiring extension in filename in both loaders

    * Auto-detect JSON load data instead of requiring extension in filename in both loaders

    * Lots of refactoring of live and bulk loaders to share code for the same functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants