Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when importing dataset exported previously (--new_uids flag fix this issue). #4996

Closed
MichelDiz opened this issue Mar 21, 2020 · 0 comments · Fixed by #5132
Closed

Bug when importing dataset exported previously (--new_uids flag fix this issue). #4996

MichelDiz opened this issue Mar 21, 2020 · 0 comments · Fixed by #5132
Labels
area/import-export Issues related to data import and export. area/live-loader Issues related to live loading. area/performance Performance related issues. kind/bug Something is broken. status/accepted We accept to investigate/work on it.

Comments

@MichelDiz
Copy link
Contributor

MichelDiz commented Mar 21, 2020

What version of Dgraph are you using?

v20.03.0-beta.20200320 (Local build, due #4995)

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

macOS Catalina, 32GB of RAM

Steps to reproduce the issue (command/config used to run Dgraph).

1 - import the dataset
2 - export it (JSON and RDF, both tested equally)
3 - Create a new env and reimport the dataset exported.

At the time of the load, it goes well. However, over the course of seconds, the number of N-Quads/s drops dramatically to 0. However, the load continues to log zero N-Quads/s. Even so, the system accuses that there is writing. Writing at 40 MB/s, but it doesn't finish. Until the moment I kill the load, the system continues to write at 40 MB/s.

The way I temporarily resolved was to use the --new_uids flag. With this flag the load was extremely fast. Getting to do 103800 N-Quads/s. What was impressive, twice as performative as normal.

When using the --new_uids flag, it makes me sure that the problem comes from the fact that we are using predefined UIDs instead of requesting new UIDs.

Expected behaviour and actual result.

Expected

Running transaction with dgraph endpoint: 127.0.0.1:9080

Processing schema file "/Users/micheldiz/Desktop/Test place/dgraph-darwin-amd64/export/dgraph.r2091.u0321.1603/g01.schema.gz"
Processed schema file "/Users/micheldiz/Desktop/Test place/dgraph-darwin-amd64/export/dgraph.r2091.u0321.1603/g01.schema.gz"

Found 1 data file(s) to process
Processing data file "/Users/micheldiz/Desktop/Test place/dgraph-darwin-amd64/export/dgraph.r2091.u0321.1603/g01.rdf.gz"
[13:12:25-0300] Elapsed: 05s Txns: 70 N-Quads: 70000 N-Quads/s [last 5s]: 14000 Aborts: 0
[13:12:30-0300] Elapsed: 10s Txns: 140 N-Quads: 140000 N-Quads/s [last 5s]: 14000 Aborts: 0
[13:12:35-0300] Elapsed: 15s Txns: 236 N-Quads: 236000 N-Quads/s [last 5s]: 19200 Aborts: 0
[13:12:40-0300] Elapsed: 20s Txns: 755 N-Quads: 755000 N-Quads/s [last 5s]: 103800 Aborts: 0
[13:12:45-0300] Elapsed: 25s Txns: 1033 N-Quads: 1032244 N-Quads/s [last 5s]: 55449 Aborts: 0
Number of TXs run            : 1042                                                                 
Number of N-Quads processed  : 1041244
Time spent                   : 28.309227713s
N-Quads processed per second : 37187

Actual

Running transaction with dgraph endpoint: 127.0.0.1:9080

Processing schema file "/Users/micheldiz/Desktop/Test place/dgraph-darwin-amd64/export/dgraph.r2091.u0321.1603/g01.schema.gz"
Processed schema file "/Users/micheldiz/Desktop/Test place/dgraph-darwin-amd64/export/dgraph.r2091.u0321.1603/g01.schema.gz"

Found 1 data file(s) to process
Processing data file "/Users/micheldiz/Desktop/Test place/dgraph-darwin-amd64/export/dgraph.r2091.u0321.1603/g01.rdf.gz"
[13:05:30-0300] Elapsed: 05s Txns: 73 N-Quads: 73000 N-Quads/s [last 5s]: 14600 Aborts: 0
[13:05:35-0300] Elapsed: 10s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:  5400 Aborts: 0
[13:05:40-0300] Elapsed: 15s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:05:45-0300] Elapsed: 20s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:05:50-0300] Elapsed: 25s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:05:55-0300] Elapsed: 30s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:00-0300] Elapsed: 35s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:05-0300] Elapsed: 40s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:10-0300] Elapsed: 45s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:15-0300] Elapsed: 50s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:20-0300] Elapsed: 55s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:25-0300] Elapsed: 01m00s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:30-0300] Elapsed: 01m05s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:35-0300] Elapsed: 01m10s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:40-0300] Elapsed: 01m15s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:45-0300] Elapsed: 01m20s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:50-0300] Elapsed: 01m25s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:06:55-0300] Elapsed: 01m30s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:07:00-0300] Elapsed: 01m35s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:07:05-0300] Elapsed: 01m40s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:07:10-0300] Elapsed: 01m45s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:07:15-0300] Elapsed: 01m50s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:07:20-0300] Elapsed: 01m55s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:07:25-0300] Elapsed: 02m00s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:07:30-0300] Elapsed: 02m05s Txns: 100 N-Quads: 100000 N-Quads/s [last 5s]:     0 Aborts: 0
@MichelDiz MichelDiz added kind/bug Something is broken. area/performance Performance related issues. status/accepted We accept to investigate/work on it. area/import-export Issues related to data import and export. area/live-loader Issues related to live loading. labels Mar 21, 2020
martinmr added a commit that referenced this issue Apr 7, 2020
The live loader is having trouble loading exported data with the
existing uids because there are too many requests for new uids.
The current version requests new Uids to be allocated for every
uids greater than the maximum. In the exported data, the uids can come
in increasing order, which causes a new request for uids with every
NQuad.

This PR changes the code to pre-allocate the uids, once per batch of
NQuad received from the NQuad buffer channel.

Tested it with the 1 million movie data set and now I am getting times
similar to the live loader with the --new_uids option enabled.

Fixes #4996
martinmr added a commit that referenced this issue Apr 9, 2020
The live loader is having trouble loading exported data with the
existing uids because there are too many requests for new uids.
The current version requests new Uids to be allocated for every
uids greater than the maximum. In the exported data, the uids can come
in increasing order, which causes a new request for uids with every
NQuad.

This PR changes the code to pre-allocate the uids, once per batch of
NQuad received from the NQuad buffer channel.

Tested it with the 1 million movie data set and now I am getting times
similar to the live loader with the --new_uids option enabled.

Fixes #4996
martinmr added a commit that referenced this issue Apr 9, 2020
The live loader is having trouble loading exported data with the
existing uids because there are too many requests for new uids.
The current version requests new Uids to be allocated for every
uids greater than the maximum. In the exported data, the uids can come
in increasing order, which causes a new request for uids with every
NQuad.

This PR changes the code to pre-allocate the uids, once per batch of
NQuad received from the NQuad buffer channel.

Tested it with the 1 million movie data set and now I am getting times
similar to the live loader with the --new_uids option enabled.

Fixes #4996
dna2github pushed a commit to dna2fork/dgraph that referenced this issue Jul 18, 2020
The live loader is having trouble loading exported data with the
existing uids because there are too many requests for new uids.
The current version requests new Uids to be allocated for every
uids greater than the maximum. In the exported data, the uids can come
in increasing order, which causes a new request for uids with every
NQuad.

This PR changes the code to pre-allocate the uids, once per batch of
NQuad received from the NQuad buffer channel.

Tested it with the 1 million movie data set and now I am getting times
similar to the live loader with the --new_uids option enabled.

Fixes hypermodeinc#4996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/import-export Issues related to data import and export. area/live-loader Issues related to live loading. area/performance Performance related issues. kind/bug Something is broken. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

1 participant