Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make export namespace aware #7377

Merged
merged 22 commits into from
Feb 4, 2021
Merged

Conversation

NamanJain8
Copy link
Contributor

@NamanJain8 NamanJain8 commented Jan 28, 2021

This PR makes export and backup/restore to work.
Export
Export will be namespace-aware with each RDF containing the info about the namespace.
In NQuads, I have kept that in the label.

RDF format:

<0x1> <dgraph.type> "dgraph.type.cors"^^<xs:string> <0x0> .
<0x2> <person_name> "alice berry" <0x0> .
...
<0x9> <person_name> "bob berry" <0x12> .

JSON format:

{"uid":"0x7532","namespace":"0x0","person_name":"bob berry"},

Schema Format:

[0x0] <dgraph.cors>:[string] @index(exact) @upsert . 
[0x0] <dgraph.type>:[string] @index(exact) . 
[0x0] <person_name>:default . 
...
[0x0] <dgraph.graphql.schema_created_at>:datetime . 
[0x12] <person_name>:default . 
...
[0x13] <dgraph.cors>:[string] @index(exact) @upsert . 
[0x13] <dgraph.type>:[string] @index(exact) . 
[0x13] <person_name>:default . 
...
[0x14] <dgraph.graphql.p_sha256hash>:string @index(exact) . 
[0x14] <dgraph.graphql.schema_history>:string . 
[0x14] <dgraph.graphql.schema_created_at>:datetime . 
[0x0] type <dgraph.graphql> {
	dgraph.graphql.schema
	dgraph.graphql.xid
}
[0x0] type <dgraph.type.cors> {
	dgraph.cors
}
...

Backup
We can't figure out if the backups taken on an older version (v20.11), or on 20.03.
Either we ask the user to specify that and trust him, else we add a field in the manifest
about the version the backup was taken on.
The backup taken on v20.11, when restored, goes into the default namespace.


This change is Reviewable

ee/backup/run.go Outdated
@@ -294,6 +296,8 @@ func initExportBackup() {
"The folder to which export the backups.")
flag.StringVarP(&opt.format, "format", "f", "rdf",
"The format of the export output. Accepts a value of either rdf or json")
flag.Uint64VarP(&opt.namespace, "namespace", "n", math.MaxUint64, "The namespace for which the"+
"backup has to be exported. By default, it will export all namespaces.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't verified the functioning of export_backup command.

worker/backup_common.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got some comments. I still have to review some code.

dgraph/cmd/bulk/loader.go Outdated Show resolved Hide resolved
edgraph/server.go Show resolved Hide resolved
edgraph/server.go Outdated Show resolved Hide resolved
edgraph/server.go Outdated Show resolved Hide resolved
ee/backup/run.go Outdated Show resolved Hide resolved
worker/backup_common.go Outdated Show resolved Hide resolved
worker/export.go Show resolved Hide resolved
worker/export.go Show resolved Hide resolved
worker/export.go Show resolved Hide resolved
worker/export.go Show resolved Hide resolved
worker/export.go Outdated Show resolved Hide resolved
// Parse parses a schema string and returns the schema representation for it.
func Parse(s string) (*ParsedSchema, error) {
// parse parses a schema string and returns the schema representation for it.
func parse(s string, namespace uint64) (*ParsedSchema, error) {
Copy link
Contributor Author

@NamanJain8 NamanJain8 Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:
Schema 1:

[ns1] name: string .
[ns2] age: string .

parse(schema, 0) --> All the schema fields go to namespace 0.
parse(schema, x) --> All the schema fields go to namespace x.
parse(schema, math.MaxUint64) --> name (ns1), age(ns2) // Preserve the namespace

Schema 2:

name: string .
age: string .

parse(schema, 0) --> All the schema fields go to namespace 0.
parse(schema, x) --> All the schema fields go to namespace x.
parse(schema, math.MaxUint64) --> All the schema fields go to namespace 0.

This function can be modified if we want to enforce that the schema should contain a namespace. Though not a big issue, but I think allowing schema without namespace will be helpful with data loading.
If we want the user to give either consistent schema (all fields should have namespace or none should), then we can make this code section simpler.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Remove the version stuff by using an extra field for namespace in backup key proto. Fix the lexer to spit and parse hex namespaces, but still work with int namespaces.

Rest of it looks good.

Reviewed 17 of 38 files at r3, 7 of 13 files at r4, 3 of 4 files at r5, 2 of 2 files at r7.
Reviewable status: all files reviewed, 31 unresolved discussions (waiting on @ahsanbarkati, @danielmai, @jarifibrahim, @martinmr, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)


edgraph/server.go, line 117 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

But don't you need to wait for the indexing to complete?

I think types need to be replicated to all the groups, if I'm not wrong. But, we could change the behavior by having types lie only on group 1, and having those streamed out to other groups, like we do with ACL.xn


ee/backup/run.go, line 318 at r7 (raw file):

	flag.StringVarP(&opt.format, "format", "f", "rdf",
		"The format of the export output. Accepts a value of either rdf or json")
	flag.Uint64VarP(&opt.namespace, "namespace", "n", math.MaxUint64, "The namespace for which the"+

Backup can only backup every namespace. Not a specific namespace. So, no need for this flag.


ee/backup/run.go, line 330 at r7 (raw file):

	exporter := worker.BackupExporter{}
	return exporter.ExportBackup(opt.location, opt.destination, opt.format, opt.namespace, opt.key)

Instead of having 5 arguments, just send opt itself.


posting/index.go, line 579 at r7 (raw file):

		WithEncryptionKey(x.WorkerConfig.EncryptionKey).
		WithLoggingLevel(badger.WARNING).
		WithNamespaceOffset(1)

Don't need it here?


schema/parse.go, line 417 at r7 (raw file):

		return 0, errors.Errorf("Unable to peek: %v", err)
	}
	if nextItems[0].Typ != itemNumber || nextItems[1].Typ != itemRightSquare {

can itemNumber identify hex?

"0xabc" should be identified as a number. Use hex.


schema/parse.go, line 461 at r7 (raw file):

// parse parses a schema string and returns the schema representation for it.
func parse(s string, namespace uint64) (*ParsedSchema, error) {
	defaultNs := x.DefaultNamespace

Add a comment for the roles that namespace and defaultNs play. It's not clear why are there two different variables.


schema/state.go, line 113 at r7 (raw file):

}

func lexNumber(l *lex.Lexer) lex.StateFn {

Make this also parse hex.


worker/export.go, line 290 at r7 (raw file):

func toSchema(attr string, update *pb.SchemaUpdate) *bpb.KV {
	// bytes.Buffer never returns error for any of the writes. So, we don't need to check them.
	ns, attr := x.ParseNamespaceAttr(attr)

We should have a separate field in the backup key proto, which contains the namespace. It would be zero for older backups. And it would have a legit value for newer backups. So, it would work naturally.


worker/export.go, line 813 at r7 (raw file):

			case x.ByteSchema:
				if !skipZero {
					if servesTablet, err := groups().ServesTablet(pk.Attr); err != nil || !servesTablet {

100 chars.


worker/online_restore_ee.go, line 240 at r7 (raw file):

	// Version is 0 if the backup was taken on an old version (v20.11).
	if lastManifest.Version == 0 {

Remove all the version stuff.x


worker/sort.go, line 61 at r7 (raw file):

		return &emptySortResult, err
	} else if gid == 0 {
		return &emptySortResult, errors.Errorf("Cannot sort by unknown attribute %s", x.ParseAttr(q.Order[0].Attr))

100 chars.


worker/sort.go, line 65 at r7 (raw file):

	if span := otrace.FromContext(ctx); span != nil {
		span.Annotatef(nil, "worker.SortOverNetwork. Attr: %s. Group: %d", x.ParseAttr(q.Order[0].Attr), gid)

and here.


x/keys.go, line 253 at r7 (raw file):

}

func DefaultSchemaKey(attr string) []byte {

can't we say

SchemaKey(attr string, ns uint64)?


x/keys.go, line 258 at r7 (raw file):

}

func DefaultTypeKey(attr string) []byte {

Move them to test util or something, if possible.


x/x.go, line 134 at r7 (raw file):

	DgraphCostHeader = "Dgraph-TouchedUids"

	DgraphVersion = 2103

don't need all this.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 38 files at r3.
Reviewable status: 12 of 27 files reviewed, 31 unresolved discussions (waiting on @ahsanbarkati, @danielmai, @jarifibrahim, @manishrjain, @martinmr, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)


graphql/admin/export.go, line 99 at r8 (raw file):

	err = json.Unmarshal(inputByts, &input)

	// Export everything if namespace is not specified.

Add a todo to fix this and use jwt later


schema/parse.go, line 287 at r8 (raw file):

				seen[tokenizer.Name()] = true
			} else {
				return errors.Errorf("Duplicate tokenizers present for attr %s", x.ParseAttr(schema.Predicate))

100 chars


worker/export.go, line 865 at r8 (raw file):

		return nil, err
	}
	if err := writeSchema(x.ByteType); err != nil {

call it writePrefix


x/keys.go, line 321 at r8 (raw file):

// SkipSchema returns the first key after all the schema keys.
func (p ParsedKey) SkipSchema() []byte {

Remove these in a separate PR.

Copy link
Contributor Author

@NamanJain8 NamanJain8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 27 files reviewed, 30 unresolved discussions (waiting on @ahsanbarkati, @danielmai, @jarifibrahim, @manishrjain, @martinmr, @NamanJain8, @pawanrawal, and @vvbalaji-dgraph)


edgraph/server.go, line 289 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Remove this.

Done.


edgraph/server.go, line 314 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Remove this as well.

Done.


graphql/admin/admin.go, line 212 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Don't make this mandatory. We can find the namespace from the jwt token.

Will be done once ACL is there.


graphql/admin/export.go, line 55 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Add name next to todo.

Done.


graphql/admin/export.go, line 99 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Add a todo to fix this and use jwt later

Done.


posting/index.go, line 579 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't need it here?

Done.


protos/pb.proto, line 653 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Check ctx for token, if not found, export default namespace.

How would we export all the namespaces?
-> Use the guardian of the galaxy token for this.

Will be done with ACL.


schema/parse.go, line 417 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

can itemNumber identify hex?

"0xabc" should be identified as a number. Use hex.

Done. It can lex hexadecimal as well as integer now.


schema/parse.go, line 461 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a comment for the roles that namespace and defaultNs play. It's not clear why are there two different variables.

Done.


schema/parse.go, line 287 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

100 chars

Done.


schema/state.go, line 113 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Make this also parse hex.

Done.


worker/export.go, line 595 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Add a comment that this stream is just for the data

Done.


worker/export.go, line 597 at r3 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Get this from token.

Added TODO.


worker/export.go, line 829 at r4 (raw file):

Previously, NamanJain8 (Naman Jain) wrote…

They don't return an error. So removed it from the return values of the function.

Done.


worker/export.go, line 290 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

We should have a separate field in the backup key proto, which contains the namespace. It would be zero for older backups. And it would have a legit value for newer backups. So, it would work naturally.

Done.


worker/export.go, line 813 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

100 chars.

Done.


worker/export.go, line 865 at r8 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

call it writePrefix

Done.


worker/online_restore_ee.go, line 240 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove all the version stuff.x

This is needed because for the predicates stored in manifest.Group does not have info regarding the namespace. It will not be prefixed with a namespace in case of older backups.


worker/sort.go, line 61 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

100 chars.

Done.


worker/sort.go, line 65 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

and here.

Done.


x/keys.go, line 258 at r7 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Move them to test util or something, if possible.

Done.

@NamanJain8 NamanJain8 merged commit 597891d into ibrahim/multitenancy Feb 4, 2021
@NamanJain8 NamanJain8 deleted the naman/multitenancy/export branch February 4, 2021 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants