Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Intersect to Narrow Iterate Range and Reduce Memory Allocation #9271

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
64 changes: 54 additions & 10 deletions posting/list.go
Original file line number Diff line number Diff line change
Expand Up @@ -1672,10 +1672,8 @@ func (l *List) Uids(opt ListOptions) (*pb.List, error) {
if opt.First == 0 {
opt.First = math.MaxInt32
}
// Pre-assign length to make it faster.
l.RLock()
// Use approximate length for initial capacity.
res := make([]uint64, 0, l.mutationMap.len()+codec.ApproxLen(l.plist.Pack))

out := &pb.List{}
if l.mutationMap.len() == 0 && opt.Intersect != nil && len(l.plist.Splits) == 0 {
if opt.ReadTs < l.minTs {
Expand All @@ -1687,16 +1685,62 @@ func (l *List) Uids(opt ListOptions) (*pb.List, error) {
return out, nil
}

absFirst := opt.First
if opt.First < 0 {
absFirst = -opt.First
}
preAllowcateLength := min(absFirst, l.mutationMap.len()+codec.ApproxLen(l.plist.Pack))
Copy link
Contributor

@harshil-goel harshil-goel Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a function for it (getting approx len)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the actual length could be way more for some postings (split postings). Lets use the exact length function here. It would only hurt in case of split postings. We can resolve that later.

if opt.Intersect != nil {
preAllowcateLength = min(preAllowcateLength, len(opt.Intersect.Uids))
}
// Pre-assign length to make it faster.
res := make([]uint64, 0, preAllowcateLength)

checkLimit := func() bool {
// We need the last N.
// TODO: This could be optimized by only considering some of the last UidBlocks.
if opt.First < 0 {
if len(res) > -opt.First {
res = res[1:]
}
} else if len(res) > opt.First {
return true
}
return false
}

if opt.Intersect != nil && len(opt.Intersect.Uids) < l.mutationMap.len()+codec.ApproxLen(l.plist.Pack) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same: use exact length.
lets create another function inside list that intersects with the list of uids.

start := 0
if opt.AfterUid > 0 {
start = sort.Search(len(opt.Intersect.Uids), func(i int) bool {
return opt.Intersect.Uids[i] > opt.AfterUid
})
}

for i := start; i < len(opt.Intersect.Uids); i++ {
uid := opt.Intersect.Uids[i]

found, _, err := l.findPosting(opt.ReadTs, uid)
if err != nil {
l.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of manually calling RUnlock() each time a return happens, you could do defer l.RUnlock() at the start of the function for the same effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of manually calling RUnlock() each time a return happens, you could do defer l.RUnlock() at the start of the function for the same effect.

When implementing this code, I also considered making this change, but I referred to the previous implementation and believe it was designed to minimize the time spent holding the read lock. Therefore, I chose not to change this implementation.

return out, errors.Wrapf(err, "While find posting for UIDs")
}
if found {
res = append(res, uid)
if checkLimit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking for negative first here like this, why not just check the ids in reverse.

break
}
}
}
out.Uids = res
l.RUnlock()
return out, nil
}

err := l.iterate(opt.ReadTs, opt.AfterUid, func(p *pb.Posting) error {
if p.PostingType == pb.Posting_REF {
res = append(res, p.Uid)
if opt.First < 0 {
// We need the last N.
// TODO: This could be optimized by only considering some of the last UidBlocks.
if len(res) > -opt.First {
res = res[1:]
}
} else if len(res) > opt.First {
if checkLimit() {
return ErrStopIteration
}
}
Expand Down
Loading