-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for JSON export #3309
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 15 of 15 files at r1, 4 of 4 files at r2.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @codexnull)
dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):
return } default:
is this right? It looks like this logic will reply to requests with invalid if "format" is not the first key in the array.
If multiple keys are allowed in form, then the default case should be a continue
statement. If "format" is not found in r.Form
, use the default format.
If only format is allowed in the form, check that r.Form
is either empty or only has that key before the iteration.
Am I missing something?
worker/export.go, line 97 at r1 (raw file):
// trim null character at end trimmed := strings.TrimRight(str.Value.(string), "\x00") buf.WriteString(strconv.Quote(trimmed))
I am guessing you changed from using buffers to just printing the strings directly because it seemed overkill.
Can you just confirm (maybe with Manish) whether there was a specific reason buffers were being used before?
worker/export.go, line 162 at r1 (raw file):
fmt.Fprint(bp, "]") } else { if p.PostingType != pb.Posting_VALUE_LANG {
flip the cases here so that the first branch is
if p.PostingType == pb.Posting_VALUE_LANG {
//
} else {
//
}
worker/export.go, line 172 at r1 (raw file):
// Copying this behavior from RDF exporter, // but how is this not data loss?
leave a TODO to investigate.
For now, I think the export process should inform the user that entries were ignored. Maybe the logs are enough but I sense they might be easily missed.
worker/export.go, line 444 at r1 (raw file):
switch { case pk.IsSchema():
Types are not being handled here. There's a different key type for types that should have its own case. It shouldn't be too different from the schema but feel free to ask me for help.
I suppose import has a similar issue but that might be handled in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @codexnull and @martinmr)
dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
is this right? It looks like this logic will reply to requests with invalid if "format" is not the first key in the array.
If multiple keys are allowed in form, then the default case should be a
continue
statement. If "format" is not found inr.Form
, use the default format.If only format is allowed in the form, check that
r.Form
is either empty or only has that key before the iteration.Am I missing something?
You're right. Since "format" is the only valid parameter, a request containing any other key is invalid.
worker/export.go, line 162 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
flip the cases here so that the first branch is
if p.PostingType == pb.Posting_VALUE_LANG { // } else { // } </blockquote></details> Why? Just to to use '==' instead of '!='? I like to put what I think is the more common case first. I don't mind changing it, just curious about the rationale. <!-- Sent from Reviewable.io -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @codexnull)
dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):
Previously, codexnull (Javier Alvarado) wrote…
You're right. Since "format" is the only valid parameter, a request containing any other key is invalid.
Then you should probably check that condition is valid before iterating over the map.
worker/export.go, line 162 at r1 (raw file):
Previously, codexnull (Javier Alvarado) wrote…
Why? Just to to use '==' instead of '!='? I like to put what I think is the more common case first. I don't mind changing it, just curious about the rationale.
I think it make the code a bi more readable because it gives more emphasis that VALUE_LANG are a special case that should be handled differently. It also makes the restriction more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a general look. Overall structure looks OK. I did not take a deep look at the logic, i.e. whether the type conversions are correct or not. Leave it to @martinmr to have a deeper look at that and do the final approval.
Reviewed 13 of 15 files at r1, 4 of 4 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @codexnull and @martinmr)
dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Then you should probably check that condition is valid before iterating over the map.
Given what you're looking for, this way of iteration seems misleading. You could do:
if vals, ok := r.Form["format"]; ok {
// deal with vals.
}
Even if a user sent multiple options, we can just ignore those, instead of being strict about them.
worker/export.go, line 162 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
I think it make the code a bi more readable because it gives more emphasis that VALUE_LANG are a special case that should be handled differently. It also makes the restriction more explicit.
Should really be:
if p == REF { } else if p == VALUE_LANG { } else { .. }
worker/export.go, line 120 at r2 (raw file):
// fctToString convert a facet value to a string. func fctToStr(fct *api.Facet) (string, error) {
Maybe call it facetToStr
. In general, not a fan of removing vowels to shorten names.
worker/export.go, line 138 at r2 (raw file):
var buf bytes.Buffer bp := &buf
How about buf := new(bytes.Buffer)? Do we need another variable bp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @codexnull, @manishrjain, and @martinmr)
dgraph/cmd/alpha/admin.go, line 81 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Given what you're looking for, this way of iteration seems misleading. You could do:
if vals, ok := r.Form["format"]; ok { // deal with vals. }
Even if a user sent multiple options, we can just ignore those, instead of being strict about them.
OK, I'll change it.
As a general rule, though, I think it's better to do strict validation (especially of external input) for both security and correctness. I've run into cases where programs ran with the wrong configuration due to a typo in an argument name and, because the invalid name was silently ignored, went undetected in production for weeks or months. But I think the risk here is low.
worker/export.go, line 97 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
I am guessing you changed from using buffers to just printing the strings directly because it seemed overkill.
Can you just confirm (maybe with Manish) whether there was a specific reason buffers were being used before?
I was using buf.WriteString(fmt.Sprintf(..)) and Manish suggested using fmt.Fprintf(&buf, ...) instead.
worker/export.go, line 172 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// Copying this behavior from RDF exporter, // but how is this not data loss?
leave a TODO to investigate.
For now, I think the export process should inform the user that entries were ignored. Maybe the logs are enough but I sense they might be easily missed.
Done.
worker/export.go, line 120 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Maybe call it
facetToStr
. In general, not a fan of removing vowels to shorten names.
Done.
worker/export.go, line 138 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
How about buf := new(bytes.Buffer)? Do we need another variable
bp
?
Done. Meant to do that earlier and then forgot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @manishrjain)
worker/export.go, line 444 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Types are not being handled here. There's a different key type for types that should have its own case. It shouldn't be too different from the schema but feel free to ask me for help.
I suppose import has a similar issue but that might be handled in a separate PR.
I am marking this as resolved. The type import/export can be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @manishrjain)
worker/export.go, line 162 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Should really be:
if p == REF { } else if p == VALUE_LANG { } else { .. }
The code for the last two cases (p = VALUE | VALUE_LANG) is the same except for one print statement.
* Parse query args in export request url. * Handle import of empty JSON arrays. * Add format field to ExportRequest protobuf message. * Add format arg to ExportOverNetwork(). * Add language tags to JSON live load test. * Add note about JSON exports to documentation. * Add export JSON test. * Refactor out common code from JSON and RDF export tests. * Try to put all export format related settings in one spot. * Replace buf.WriteString()s with fmt.Fprintf()s. * Make toRDF and toJSON methods instead of functions. * Change all buf.Write* to fmt.Fprint* for consistency. * Fix logic bug. * Refactor RDF exporter to share more code with JSON exporter. * Remove unused function. * Add test of request arg handling. * Set HTTP status in response. * Changes suggested by PR review.
Change summary:
This change is