Cost of each mutation grows as more mutations are in a transaction #3046

mooncake4132 · 2019-02-20T00:35:31Z

I originally asked this on slack, but it might be more useful to track it as an issue.

Every few days our application will need to insert up to 3 million (this number may grow) predicates into the database. To assess dgraph's performance, I wrote this little python script below to benchmark the time it takes to insert 1000, 10000, 30000, 50000, and 100000 predicates. Results are as follows:

Updated schema in 1.824007272720337 seconds.
Mutating 1000 N-Quads took 0.0899970531463623 seconds.
Mutating 10000 N-Quads took 1.6726512908935547 seconds.
Mutating 30000 N-Quads took 11.846931219100952 seconds.
Mutating 50000 N-Quads took 27.030992031097412 seconds.
Mutating 100000 N-Quads took 111.02126455307007 seconds.

The growth of the time is a bit worrying. Why does inserting 100 thousand predicates take 70x the time to insert 10 thousand predicates?

Here's the script:

#!/usr/bin/env python3
import time

import pydgraph


client_stub = pydgraph.DgraphClientStub('localhost:9080')
client = pydgraph.DgraphClient(client_stub)
client.alter(pydgraph.Operation(drop_all=True))

schema = """
test: string @index(fulltext) @lang .
"""
start_time = time.time()
client.alter(pydgraph.Operation(schema=schema))
print('Updated schema in {} seconds.'.format(time.time() - start_time))

for n in (1_000, 10_000, 30_000, 50_000, 100_000):
    rdf = '\n'.join('<_:node_{}> <test> "test" .'.format(i) for i in range(n))
    transaction = client.txn()
    start_time = time.time()
    transaction.mutate(set_nquads=rdf, commit_now=True)
    print('Mutating {} N-Quads took {} seconds.'.format(n, time.time() - start_time))

Initially, I thought it's because of the fulltext index. So I also tried without without @index(fulltext). Here are the results:

Updated schema in 0.004003763198852539 seconds.
Mutating 1000 N-Quads took 0.07899928092956543 seconds.
Mutating 10000 N-Quads took 1.236546277999878 seconds.
Mutating 30000 N-Quads took 7.040283203125 seconds.
Mutating 50000 N-Quads took 16.69643545150757 seconds.
Mutating 100000 N-Quads took 59.379029989242554 seconds.

It's slightly better, but the time growth is still worrying.

Any guidance is appreciated.

Configurations:

Running in docker on Windows.
One zero and one alpha.
Dgraph version : v1.0.11
Commit SHA-1 : b2a09c5
Commit timestamp : 2018-12-17 09:50:56 -0800
Branch : HEAD
Go version : go1.11.1

The text was updated successfully, but these errors were encountered:

codexnull · 2019-02-21T00:56:51Z

Thanks for the report and for providing the test script. We confirmed that the transaction time does grow more than linearly with the transaction size and will dig deeper for improvements.

In the mean time, we suggest clients use transaction sizes of 1000 or so and use concurrency instead to increase throughput.

mooncake4132 · 2019-02-21T04:19:39Z

Thanks for confirming. We can definitely split the mutations into different transactions.

I'll let you decide if you want to close this issue or leave it open for tracking.

minhaj-shakeel · 2020-07-20T18:39:26Z

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

manishrjain added the investigate Requires further investigation label Feb 20, 2019

codexnull added the optimization label Feb 21, 2019

manishrjain removed the investigate Requires further investigation label Feb 21, 2019

campoy added area/performance Performance related issues. and removed optimization labels May 31, 2019

mooncake4132 mentioned this issue Aug 22, 2019

has() not working until transaction is committed #3841

Closed

lgalatin added priority/P1 Serious issue that requires eventual attention (can wait a bit) and removed priority/P2 Somehow important but would not block a release. labels Apr 6, 2020

minhaj-shakeel closed this as completed Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cost of each mutation grows as more mutations are in a transaction #3046

Cost of each mutation grows as more mutations are in a transaction #3046

mooncake4132 commented Feb 20, 2019

codexnull commented Feb 21, 2019

mooncake4132 commented Feb 21, 2019

minhaj-shakeel commented Jul 20, 2020

Cost of each mutation grows as more mutations are in a transaction #3046

Cost of each mutation grows as more mutations are in a transaction #3046

Comments

mooncake4132 commented Feb 20, 2019

codexnull commented Feb 21, 2019

mooncake4132 commented Feb 21, 2019

minhaj-shakeel commented Jul 20, 2020