Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up removal from a set #127

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

FrnchFrgg
Copy link

Introduce a new Filter() command to doubly linked lists that can perform several removals in a single O(n) sweep, and use that new method to make set removal O(n) instead of O(n**2)

Julien "_FrnchFrgg_" Rivaud added 2 commits November 22, 2019 23:37
The method removes all elements that do not satisfy the hook, in one go.
This is often faster than calling IndexOf then Remove multiple times.
Instead of using O(n) times an O(n) List.Remove(), use the new
List.Filter() method to keep items that were not deleted.
@FrnchFrgg
Copy link
Author

All tests pass and I introduced a new test for the Filter() method.

@emirpasic
Copy link
Owner

@FrnchFrgg

how is the current implementation O(n**2) as I only see a single loop in https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/doublylinkedlist.go#L104

also check out the enumerable functions found in also O(n) AFAICS https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/enumerable.go#L33

@emirpasic emirpasic closed this Mar 31, 2020
@FrnchFrgg
Copy link
Author

Indeed, I didn't see the Select method in enumerable; it can do the same as Filter (except that my proposed Filter does it in place and thus does not reallocate anything, whereas Select creates a new list).

As for the current implementation, removing a single element in a doubly linked list is O(n). But the API to remove elements from a set accepts multiple elements, and removes each one separately, in a loop. In effect, in means the complexity of set removal is O(m × n) where m is the amount of items to remove and n the set size. Worse comes to worst, m = n/2 and we get O(n**2).

My patch series make that removal O(n) regardless on the amount of elements to remove. It could build up on Select instead, but that would mean the set would reallocate its internal list for each removal, which is wasteful.

@emirpasic
Copy link
Owner

emirpasic commented Mar 31, 2020

@FrnchFrgg

It is common that these filter/select/remove/map (enumerable) functions return a new collection, AFAIK at least in Ruby/Java/Javascript, etc.

You could still use the current Select function to have O(n) removal for multiple indexes given the condition function evaluates this, right? It would create a new collection, but it'd run in O(n).

Now to think about it, I am not sure why I haven't made the Remove(i) a variadic argument function so that it can accept multiple indexes at once Remove(1,3,4,5, ...) etc.. Implementing that wouldn't even be a breaking change :)

ref. https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/doublylinkedlist.go#L104

@emirpasic
Copy link
Owner

...really strange as most of the other functions have variadic arguments :/

@emirpasic
Copy link
Owner

Let me reopen this PR to have a reminder to have that implemented for all lists, I would start making this a variadic function and applying it to all three list sub-types:
https://github.com/emirpasic/gods/blob/master/lists/lists.go#L20

@emirpasic emirpasic reopened this Mar 31, 2020
@FrnchFrgg
Copy link
Author

You'd have to first loop through the list to find out (and remember in a toRemove list) all indices of elements that need removal, to then call the remove function with "toRemove...". A lot of allocations that can be avoided. But making remove accept variadic arguments is a good idea nevertheless.

I don't think using Select is a very good idea: the common case is probably removing a single item from the set, and having the set re-create a new list each time is very wasteful.

I agree that Select is good API in general, especially in go where append, etc. return the new slice, but in that case it is not optimal to use it.

@FrnchFrgg
Copy link
Author

The idea behind "Filter" is that it was not index-based, so it would sidestep this issue. With the updated set, checking if an element needs to be filtered out of the doubly linked list is O(1), given its value.

@emirpasic
Copy link
Owner

@FrnchFrgg so same can be achieved with the Select function, right? ...just not in place.

@FrnchFrgg
Copy link
Author

Indeed, the same can be achieved with the Select function. At the very least, there should be a check for the number of items to remove from the set, to avoid doing a complete reallocation for the common case.

The better way would be to change the "Filter" I introduced into an in-place "Select", i.e. reuse the exact same signature, and make it equivalent to list = list.Select(...). In most List structures it would be possible to do so without reallocation, but at worst harnessing the existing Select gives an easy implementation.

@FrnchFrgg
Copy link
Author

Oh I just found out that Select is not yet part of the EnumerableWithIndex interface, it is commented out.

@emirpasic
Copy link
Owner

emirpasic commented Apr 1, 2020

@emirpasic
Copy link
Owner

emirpasic commented Apr 1, 2020

So it's implemented, but I could not enforce it here (some technical issue that I could not resolve without type checking, which I didn't want to do):

https://github.com/emirpasic/gods/blob/master/containers/enumerable.go#L19

In other words you are right, not part of the EnumerableWithIndex, but you can still use Select for lists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants