Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store namespace in predicate as a hex separated by a hyphen to prevent json marshal issues #8601

Merged
merged 1 commit into from
Feb 3, 2023

Conversation

harshil-goel
Copy link
Contributor

@harshil-goel harshil-goel commented Jan 10, 2023

We used to store predicate as | (pipe | signifies concatenation). We store this as a string. is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

  • Live loader using guardian of galaxy
  • Backup and List Backup
  • Http clients and Ratel
  • Schema and predicate

Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: - (- is the hyphen literal) is a string "81" in hex

We also update the manifest version after update. This diff takes care that older backups are still compatible and can be used to restore.

@CLAassistant
Copy link

CLAassistant commented Jan 10, 2023

CLA assistant check
All committers have signed the CLA.

@harshil-goel harshil-goel changed the base branch from main to harshil/manifest_tool January 10, 2023 19:45
@github-actions github-actions bot added area/commercial area/testing Testing related issues labels Jan 10, 2023
@harshil-goel harshil-goel changed the title Harshil/manifest bug fix Fix manifest update logic in dgraph after #7810 Jan 10, 2023
@mangalaman93 mangalaman93 added the slash-to-main PRs which bring slash branch on par with main. label Jan 11, 2023
@mangalaman93 mangalaman93 added this to the v23.0.0 milestone Jan 11, 2023
@harshil-goel harshil-goel force-pushed the harshil/manifest_bug_fix branch from 1680463 to dc1ccb0 Compare January 16, 2023 14:59
@harshil-goel harshil-goel changed the base branch from harshil/manifest_tool to harshil/cp January 16, 2023 15:00
@mangalaman93
Copy link
Member

Do we need this tool now? Can we not just make sure that we can do the upgrade ourselves in the process?

ee/updatemanifest/run.go Outdated Show resolved Hide resolved
worker/backup_ee.go Outdated Show resolved Hide resolved
worker/restore_map.go Outdated Show resolved Hide resolved
@harshil-goel harshil-goel force-pushed the harshil/manifest_bug_fix branch from dc1ccb0 to 79347e7 Compare January 30, 2023 04:10
@mangalaman93 mangalaman93 changed the base branch from harshil/cp to main January 31, 2023 10:21
@harshil-goel harshil-goel force-pushed the harshil/manifest_bug_fix branch 3 times, most recently from 8716ada to d954797 Compare February 1, 2023 10:39
@harshil-goel harshil-goel force-pushed the harshil/manifest_bug_fix branch from d954797 to 813c83c Compare February 1, 2023 10:51
ee/updatemanifest/run.go Outdated Show resolved Hide resolved
systest/backup/filesystem/backup_test.go Outdated Show resolved Hide resolved
worker/backup_manifest.go Outdated Show resolved Hide resolved
@coveralls
Copy link

coveralls commented Feb 1, 2023

Coverage Status

Coverage: 66.687% (-0.2%) from 66.879% when pulling 35103c8 on harshil/manifest_bug_fix into f5f49da on main.

@harshil-goel harshil-goel force-pushed the harshil/manifest_bug_fix branch from 813c83c to 2cfd03d Compare February 2, 2023 04:41
@harshil-goel harshil-goel changed the title Fix manifest update logic in dgraph after #7810 Store namespace in predicate as a hex separated by a hyphen to prevent json marshal issues Feb 2, 2023
@harshil-goel harshil-goel force-pushed the harshil/manifest_bug_fix branch from 2cfd03d to 95b0684 Compare February 2, 2023 04:49
x/keys.go Show resolved Hide resolved
@harshil-goel harshil-goel force-pushed the harshil/manifest_bug_fix branch 3 times, most recently from 014b6fe to eea16be Compare February 3, 2023 09:55
mangalaman93
mangalaman93 previously approved these changes Feb 3, 2023
We used to store predicate as <namespace>|<attribute> (pipe | signifies concatenation). We store this as a string. <namespace> is 8 bytes uint64, which when marshaled to JSON bytes mess up the predicate. This is because for the namespace greater than 127, the UTF-8 encoding might take up several bytes (also if the mapping does not exist, then it replaces it with some other rune). This affects three identified places in Dgraph:

Live loader
Backup and List Backup
Http clients and Ratel
Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is to use UTF-8 string even for internal operations. Only when we read/write to badger we convert it into the format of the byte.
New Format: <anmespace>-<attribute> (- is the hyphen literal)
@harshil-goel harshil-goel merged commit 43f1d8a into main Feb 3, 2023
@harshil-goel harshil-goel deleted the harshil/manifest_bug_fix branch February 3, 2023 18:58
all-seeing-code pushed a commit that referenced this pull request Feb 8, 2023
…t json marshal issues (#8601)

We used to store predicate as <namespace>|<attribute> (pipe | signifies
concatenation). We store this as a string. <namespace> is 8 bytes
uint64, which when marshaled to JSON bytes mess up the predicate. This
is because for the namespace greater than 127, the UTF-8 encoding might
take up several bytes (also if the mapping does not exist, then it
replaces it with some other rune). This affects three identified places
in Dgraph:

- Live loader using guardian of galaxy
- Backup and List Backup
- Http clients and Ratel
- Schema and predicate

Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is
to use UTF-8 string even for internal operations. Only when we
read/write to badger we convert it into the format of the byte.
New Format: <namespace>-<attribute> (- is the hyphen literal)
<namespace> is a string "81" in hex

We also update the manifest version after update. This diff takes care
that older backups are still compatible and can be used to restore.

Contains: 
#7838
#7828
#7825
#7815
#7810
all-seeing-code pushed a commit that referenced this pull request Feb 8, 2023
…t json marshal issues (#8601)

We used to store predicate as <namespace>|<attribute> (pipe | signifies
concatenation). We store this as a string. <namespace> is 8 bytes
uint64, which when marshaled to JSON bytes mess up the predicate. This
is because for the namespace greater than 127, the UTF-8 encoding might
take up several bytes (also if the mapping does not exist, then it
replaces it with some other rune). This affects three identified places
in Dgraph:

- Live loader using guardian of galaxy
- Backup and List Backup
- Http clients and Ratel
- Schema and predicate

Fix:
Fix is to have a UTF-8 string when dealing with JSON. A better idea is
to use UTF-8 string even for internal operations. Only when we
read/write to badger we convert it into the format of the byte.
New Format: <namespace>-<attribute> (- is the hyphen literal)
<namespace> is a string "81" in hex

We also update the manifest version after update. This diff takes care
that older backups are still compatible and can be used to restore.

Contains: 
#7838
#7828
#7825
#7815
#7810
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Testing related issues slash-to-main PRs which bring slash branch on par with main.
Development

Successfully merging this pull request may close these issues.

5 participants