Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(countindex): Optimize count index (#4666) #5971

Merged
merged 1 commit into from
Jul 13, 2020

Conversation

ashish-goswami
Copy link
Contributor

@ashish-goswami ashish-goswami commented Jul 13, 2020

Fixes: DGRAPH-890

Currently count index is calculated in the following way:

Calculate length of posting list for which mutation is getting performed.
Perform the actual mutation.
Calculate the length of posting list after mutation is performed.
Add count index for new count and delete count index for old count.
Finding length of posting list:
Length of posting list is calculated by merging mutable and immutable layer. Immutable layer is already sorted. We need to sort mutable layer for merging. This sorting turns out to be a costly function and shows up in CPU profile. Since we are calculating length twice, sorting is happening twice.

Optimization:
This PR avoids calculating length twice. It uses length calculated before and edge(for mutation) information, to calculate length after mutation.

Live loader performance comparison:
Master:

Number of TXs run : 21240
Number of N-Quads processed : 21239870
Time spent : 14m13.252187311s
N-Quads processed per second : 24900
This PR:

Number of TXs run : 21240
Number of N-Quads processed : 21239870
Time spent : 12m52.835002981s
N-Quads processed per second : 27512

(cherry picked from commit 1167250)


This change is Reviewable

Docs Preview: Dgraph Preview

Fixes: DGRAPH-890

Currently count index is calculated in the following way:

Calculate length of posting list for which mutation is getting performed.
Perform the actual mutation.
Calculate the length of posting list after mutation is performed.
Add count index for new count and delete count index for old count.
Finding length of posting list:
Length of posting list is calculated by merging mutable and immutable layer. Immutable layer is already sorted. We need to sort mutable layer for merging. This sorting turns out to be a costly function and shows up in CPU profile. Since we are calculating length twice, sorting is happening twice.

Optimization:
This PR avoids calculating length twice. It uses length calculated before and edge(for mutation) information, to calculate length after mutation.

Live loader performance comparison:
Master:

Number of TXs run            : 21240                                                                
Number of N-Quads processed  : 21239870
Time spent                   : 14m13.252187311s
N-Quads processed per second : 24900
This PR:

Number of TXs run            : 21240                                                                
Number of N-Quads processed  : 21239870
Time spent                   : 12m52.835002981s
N-Quads processed per second : 27512

(cherry picked from commit 1167250)
@ashish-goswami ashish-goswami requested review from manishrjain, martinmr and a team as code owners July 13, 2020 17:16
Copy link
Contributor

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 3 files reviewed, all discussions resolved (waiting on @manishrjain and @martinmr)

@parasssh parasssh merged commit 7922c65 into release/v20.03 Jul 13, 2020
@NamanJain8 NamanJain8 deleted the ashish/v20.03-countindex branch May 20, 2021 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants