Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bisect.insort should return the index #96571

Closed
Serpens66 opened this issue Sep 5, 2022 · 9 comments
Closed

bisect.insort should return the index #96571

Serpens66 opened this issue Sep 5, 2022 · 9 comments
Assignees
Labels
pending The issue will be closed if no feedback is provided type-feature A feature request or enhancement

Comments

@Serpens66
Copy link

Serpens66 commented Sep 5, 2022

Feature or enhancement

https://docs.python.org/3/library/bisect.html#bisect.insort
bisect.insort (and of course also insort_left/right) should return the index they inserted the "x" in "a".

Pitch

Usecase:
One wants to insort something and also needs the index it was insorted.

Problem:
The current way would be to use "insort" (which finds the index and inserts it, but does not return the index) and afterwards use bisect_right to find the index again. So this is ~double of the work that is needed.
I assume the user could also instead call bisect_right and afterwards insert "x" in "a" themself and return the index, but I'm sure this also would be much slower than the C implementation of bisect.insort.

Solution:
Simply let "insort" return the index it already found. This should be backwards compatible (since it returns None currently) and not cost any extra time. So the usecase would be fullfilled in ~half of the time that is needed right now.

Disadvantage:
I see no harm in doing so, only advantage for everyone who needs this.

Projects I know which would profit:
grantjenks/python-sortedcontainers#195

edit::
I was a bit confused because the python source code has a python and a C version of bisect... But both versions lack the return of the index, so this request is valid for both.

@rhettinger
Copy link
Contributor

rhettinger commented Sep 5, 2022

The norm for Python is for mutating methods is to return None. This is different from the norms in other languages. We don't really want to deviate from our norms without good reason.

The current way would be to use "insort" (which finds the index and inserts it, but does not return the index) and afterwards use bisect_right to find the index again.

That doesn't make sense to me. Why wouldn't you just do the bisection first and then insert?

i = bisect(arr, val)
arr.insert(i, val)           # Or perhaps arr[i:i] = val

I'll send @grantjenks a note to find out more about what he was thinking.

@rhettinger rhettinger self-assigned this Sep 5, 2022
@Serpens66
Copy link
Author

Serpens66 commented Sep 5, 2022

Thank you for your response.

Yes, one can do replace insort with these two steps (like I already wrote, maybe I edited that line after you read it), but without measuring I think I can safely assume that doing this in python is much slower than the C implementation of insort.
...
And the whole purpose of bisect (and using C) is to optimize the insertion process.
So to argue "you can do this in two steps" sounds for me like "why implement something like bisect, you can iterate over every single value in the list and compare it one by one to find the index". So why did python implement bisect and even in C at all?
I think this invalidates your argument about the "two steps".

So the only argument left is "we always return None for mutating methods".
Ok, there is nothing I can do to invalidate this. It is just not efficient and quite sad.

edit:
But I admit I'm no professional programmer, I just need this functionality as fast as possible and thought there is not a single disadvantage in returning the index. But maybe better programmers/grantjenks want to add something to my view..

@rhettinger
Copy link
Contributor

So this is ~double of the work that is needed

If insort was followed by bisect, that would be double work. In contrast, running bisect followed by list.insert (not insort) is not double work. There is no overlap between the two steps. The only extra work is a bit of method call overhead.

@Serpens66
Copy link
Author

So this is ~double of the work that is needed

If insort was followed by bisect, that would be double work. In contrast, running bisect followed by list.insert (not insort) is not double work. There is no overlap between the two steps. The only extra work is a bit of method call overhead.

Yes, but just as I already wrote twice, I assume that it still would be slower than the C implementation of insort.
If list.insert(...) is exactly as fast (or faster) as/than the insertion code used in the C implementation of insort, then everything is fine and no need to change anything.
But I doubt it and I'm sure grantjenks has more insights in regards to performance, so lets wait what he says...

But does it even matter? If everything I write was correct, would this be enough to break the rule to always return None?

@pochmann
Copy link
Contributor

pochmann commented Sep 5, 2022

What if there's code that relies on the None result, for example similar to this one relying on set.add returning None? You'd break that.

Wouldn't it create (and at some point delete) a Python int object, which it doesn't now, and that would make it slower? I.e., hurt everyone who doesn't want the index?

@rhettinger
Copy link
Contributor

If list.insert(...) is exactly as fast (or faster) as/than the insertion code
used in the C implementation of insort, then everything is fine and no need to change anything.

They both share the same underlying code, ins1(). Aside from calling overhead, the speed is identical.

Yes, but just as I already wrote twice, I assume that it still would be slower than the C implementation of insort.

Your mental model seems to be inaccurate. The Python call list.insert() is implemented in C and runs at C speed.

@Serpens66
Copy link
Author

Okay, thank you very much for your responses. You convinced me.
Lets wait a day or two if grantjenks wants to add something. If not I will close this request.

Thank you for your time and explanations.

@hauntsaninja hauntsaninja added the pending The issue will be closed if no feedback is provided label Sep 6, 2022
@matthiasgoergens
Copy link
Contributor

I would suggest, you implement a version of what you like as a third party library. Cython is a really convenient way to write C-extensions, if you need that.

You can call your library 'better-bisect' if you want to.

Look at 'more-itertools' for inspiration.

@grantjenks
Copy link
Contributor

@rhettinger wrote:

That doesn't make sense to me. Why wouldn't you just do the bisection first and then insert?

And indeed, that's what the code in SortedKeyList.add() does (sortedlist.py):

                idx = bisect_right(_keys[pos], key)
                _lists[pos].insert(idx, value)
                _keys[pos].insert(idx, key)

SortedKeyList.add() is a very hot function and shows up in nearly every benchmark and real-world use case. So removing a function call would be helpful. But it is only the function call overhead cost.

@pochmann raises two points:

What if there's code that relies on the None result, for example similar to this one relying on set.add returning None? You'd break that.

That's a good point. I'm not sure it's worth a breaking change.

Wouldn't it create (and at some point delete) a Python int object, which it doesn't now, and that would make it slower? I.e., hurt everyone who doesn't want the index?

Also a good point. And in SortedContainers, maintaining the positional index is much more expensive which is why it's not done by default.

Thanks @Serpens66 for asking here and to all the helpful replies.

@rhettinger rhettinger closed this as not planned Won't fix, can't repro, duplicate, stale Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants