Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds @noconflict directive to prevent conflict detection. #4454

Merged
merged 4 commits into from
Dec 31, 2019

Conversation

arijitAD
Copy link

@arijitAD arijitAD commented Dec 20, 2019

Fixes #4079
This is an experimental feature.
It adds @noconflict directive to prevent conflict detection at the predicate level. This can cause data loss, especially when using count indexes. This is not a recommended directive, but exists to help avoid conflicts for predicates which don't have high correctness requirements.


This change is Reviewable

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r1.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @arijitAD and @manishrjain)


query/query4_test.go, line 279 at r1 (raw file):

}

func TestSchemaUpdateNoConflict(t *testing.T) {

I don't think this test is needed but you could simplify it by having the predicate be in the initial schema (found in common_test.go) and just verifying that the no_conflictflag is assigned to this predicate.


query/query4_test.go, line 307 at r1 (raw file):

Quoted 15 lines of code…
	// Verify queries work as expected.
	q1 = `schema(pred: [name]) { }`
	js = processQueryNoErr(t, q1)
	require.JSONEq(t, `{
		"data": {
			"schema": [{
				"predicate": "name",
				"type": "string",
				"index": true,
				"tokenizer": ["term", "exact", "trigram"],
				"count": true,
				"lang": true
			}]
		}
	}`, js)

why is this querying a different predicate? Is this needed?


query/query4_test.go, line 314 at r1 (raw file):

}

func TestNoConflictQuery(t *testing.T) {

I would also have a test where you send mutations that contains triples for a predicate with the noconflict tag and a predicate without it and verify that conflicts happen.


query/query4_test.go, line 323 at r1 (raw file):

	schema := `
		type node {
		name: string

Some tests are breaking because you are using the name predicate and then adding data without deleting it. The query tests are a bit of a weird case because we're running them all on the same database without clearing the data in between so you get these type of errors.

The easiest way to fix this would be to use a different predicate for the name (e.g node_name).


query/query4_test.go, line 348 at r1 (raw file):

	in := []node{}
	for i := 0; i < 2; i++ {

I'd have more than just two mutations. It's possible (not sure if likely) that the go scheduler will send one of the mutations after the other has been committed, in which case there would be no conflict anyways. More mutations would increase the chance of conflict happening. I'd say 5 is a good number.


schema/parse_test.go, line 272 at r1 (raw file):

	result, err := Parse("age:uid @noconflict .")
	require.NotNil(t, result)
	require.Nil(t, err)

use require.NoError instead


schema/schema.go, line 307 at r1 (raw file):

Quoted 8 lines of code…
func (s *state) HasNoConflict(pred string) bool {
	s.RLock()
	defer s.RUnlock()
	if schema, ok := s.predicate[pred]; ok {
		return schema.NoConflict
	}
	return false
}
func (s *state) HasNoConflict(pred string) bool {
    s.Rlock()
    defer s.RUnlock()
    return s.predicate[pred].GetNoConflict()
}

I think this should work. If pred is not in predicates, go will return a zero value (in this case a nil pointer). And calling GetNoConflict will handle the nil case correctly (in golang, it's safe to call functions on a nil value as long as the function correctly deals with this case).

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manishrjain @arijitAD: Just thought about something that should be considered before merging this. Should other directives (e.g @count) be compatible with @noconflict? I don't see how @count and @noconflict can be used on the same predicate since conflict resolution is most likely needed to maintain accurate counts.

Thoughts?

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @arijitAD and @manishrjain)

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 10 files reviewed, 8 unresolved discussions (waiting on @arijitAD and @manishrjain)


schema/schema.go, line 307 at r2 (raw file):

	s.RLock()
	defer s.RUnlock()
	if schema, ok := s.predicate[pred]; ok {

schema is a package name, can avoid this as a variable

@arijitAD
Copy link
Author

@martinmr when using @noconflict directive, the user is aware that there could be data loss since we are allowing conflicting transactions to proceed. Hence I don't think there is any special handling need for @count directive. We could write about the consequence of using @noconflict in Dgraph docs.

Copy link
Author

@arijitAD arijitAD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 11 files reviewed, 8 unresolved discussions (waiting on @mangalaman93, @manishrjain, and @martinmr)


query/query4_test.go, line 279 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I don't think this test is needed but you could simplify it by having the predicate be in the initial schema (found in common_test.go) and just verifying that the no_conflictflag is assigned to this predicate.

Added a test in common_test.go aswell. This test is to make sure that noconflict is returned in the JSON response when querying the predicate's schema.


query/query4_test.go, line 307 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
	// Verify queries work as expected.
	q1 = `schema(pred: [name]) { }`
	js = processQueryNoErr(t, q1)
	require.JSONEq(t, `{
		"data": {
			"schema": [{
				"predicate": "name",
				"type": "string",
				"index": true,
				"tokenizer": ["term", "exact", "trigram"],
				"count": true,
				"lang": true
			}]
		}
	}`, js)

why is this querying a different predicate? Is this needed?

To check whether the predicate without @noconflict is working as expected.


query/query4_test.go, line 314 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I would also have a test where you send mutations that contains triples for a predicate with the noconflict tag and a predicate without it and verify that conflicts happen.

Done.


query/query4_test.go, line 323 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Some tests are breaking because you are using the name predicate and then adding data without deleting it. The query tests are a bit of a weird case because we're running them all on the same database without clearing the data in between so you get these type of errors.

The easiest way to fix this would be to use a different predicate for the name (e.g node_name).

Done.


query/query4_test.go, line 348 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I'd have more than just two mutations. It's possible (not sure if likely) that the go scheduler will send one of the mutations after the other has been committed, in which case there would be no conflict anyways. More mutations would increase the chance of conflict happening. I'd say 5 is a good number.

Done.


schema/parse_test.go, line 272 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

use require.NoError instead

Done.


schema/schema.go, line 307 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
func (s *state) HasNoConflict(pred string) bool {
	s.RLock()
	defer s.RUnlock()
	if schema, ok := s.predicate[pred]; ok {
		return schema.NoConflict
	}
	return false
}
func (s *state) HasNoConflict(pred string) bool {
    s.Rlock()
    defer s.RUnlock()
    return s.predicate[pred].GetNoConflict()
}

I think this should work. If pred is not in predicates, go will return a zero value (in this case a nil pointer). And calling GetNoConflict will handle the nil case correctly (in golang, it's safe to call functions on a nil value as long as the function correctly deals with this case).

Done.


schema/schema.go, line 307 at r2 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

schema is a package name, can avoid this as a variable

Done.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2, 3 of 4 files at r3.
Reviewable status: 10 of 11 files reviewed, 3 unresolved discussions (waiting on @arijitAD, @mangalaman93, @manishrjain, and @martinmr)


dgraph/cmd/live/batch.go, line 240 at r3 (raw file):

func (l *loader) conflictKeysForNQuad(nq *api.NQuad) ([]uint64, error) {
	keys := make([]uint64, 0)

define keys after the if statement below and return nil in the if statement.

	if _, found := noConflictPreds[nq.Predicate]; found {
		return nil, nil
	}
	keys := make([]uint64, 0)

dgraph/cmd/live/run.go, line 202 at r3 (raw file):

Quoted 8 lines of code…
	noConflictPreds = make(map[string]struct{})
	for _, schemaLine := range strings.Split(string(b), "\n") {
		if !strings.Contains(schemaLine, "@noconflict") {
			continue
		}
		predicate := strings.Fields(schemaLine)[0]
		noConflictPreds[predicate] = struct{}{}
	}

This could potentially cause bugs in certain inputs (e.g "name @noconflict string" would be added to the map but it's not a valid declaration).

Maybe do the following.

  1. Run alter command
  2. Query the schema.
  3. Check which predicates are marked as non-conflict.

Copy link
Author

@arijitAD arijitAD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 11 files reviewed, 3 unresolved discussions (waiting on @mangalaman93, @manishrjain, and @martinmr)


dgraph/cmd/live/batch.go, line 240 at r3 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

define keys after the if statement below and return nil in the if statement.

	if _, found := noConflictPreds[nq.Predicate]; found {
		return nil, nil
	}
	keys := make([]uint64, 0)

Done.


dgraph/cmd/live/run.go, line 202 at r3 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
	noConflictPreds = make(map[string]struct{})
	for _, schemaLine := range strings.Split(string(b), "\n") {
		if !strings.Contains(schemaLine, "@noconflict") {
			continue
		}
		predicate := strings.Fields(schemaLine)[0]
		noConflictPreds[predicate] = struct{}{}
	}

This could potentially cause bugs in certain inputs (e.g "name @noconflict string" would be added to the map but it's not a valid declaration).

Maybe do the following.

  1. Run alter command
  2. Query the schema.
  3. Check which predicates are marked as non-conflict.

Done.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r3, 2 of 2 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @mangalaman93 and @manishrjain)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Do add a major documentation update, plus PR description explaining that this could cause adverse effect on the correctness of the indices. Also, that this is not a recommended directive, but exists to help avoid conflicts for predicates which don't have high correctness requirements.

Reviewed 5 of 8 files at r1, 4 of 4 files at r3, 2 of 2 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @mangalaman93)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Maybe mark this feature as experimental in the PR description.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @mangalaman93)

@arijitAD arijitAD changed the title Adds @noconflict directive to prevent conflict detection of predicate. Adds @noconflict directive to prevent conflict detection. Dec 31, 2019
@arijitAD arijitAD merged commit a4d6998 into master Dec 31, 2019
@arijitAD arijitAD deleted the arijitAD/noconflict branch December 31, 2019 09:26
arijitAD pushed a commit that referenced this pull request Dec 31, 2019
Adds documentation for the @noconflict directive that was introduced in #4454.
This was referenced Dec 31, 2019
danielmai pushed a commit that referenced this pull request Jan 12, 2020
* Add @noconflict directive to prevent conflict detection of predicate.

* Add handling of noconflict directive in live loader.

* Addressed comments

* Get schema from dgraph alpha.

(cherry picked from commit a4d6998)
arijitAD pushed a commit that referenced this pull request Jan 20, 2020
Adds documentation for the @noconflict directive that was introduced in #4454.
arijitAD pushed a commit that referenced this pull request Jan 24, 2020
* @noconflict directives docs

Adds documentation for the @noconflict directive that was introduced in #4454.
danielmai pushed a commit that referenced this pull request Jan 28, 2020
* @noconflict directives docs

Adds documentation for the @noconflict directive that was introduced in #4454.
danielmai pushed a commit that referenced this pull request Feb 21, 2020
* @noconflict directives docs

Adds documentation for the @noconflict directive that was introduced in #4454.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent write of nodes referencing the same node fails with Dgraph v1.1
4 participants