Use vectorcall (where possible) when calling Python functions #4456

ChayimFriedman2 · 2024-08-19T18:36:11Z

This works without any changes to user code.

The way it works is by creating a methods on IntoPy to call functions, and specializing them for tuples.

This currently supports only non-kwargs for methods, and kwargs with somewhat slow approach (converting from PyDict) for functions. This can be improved, but that will require additional API.

We may consider adding more impls IntoPy<Py<PyTuple>> that specialize (for example, for arrays and Vec), but this is a good start.

What should I put in the news? There is no perf.

davidhewitt

Thanks for this, this is a nice win. changed is the right choice of newsfragment 👍

I think given the upcoming changes for the IntoPyObject trait we need to adjust how this is implemented slightly, might as well figure out in this PR.

Aside from that, I have just a few suggestions which might help with clarity for myself and future readers :)

It would be nice to also be able to use vectorcall for keyword arguments, though that needs a new API designed as per #4414 or similar. Can leave for the future.

davidhewitt · 2024-08-19T22:09:40Z

src/conversion.rs

+    // The following methods are helpers to use the vectorcall API where possible.
+    // They are overridden on tuples to perform a vectorcall.
+    // Be careful when you're implementing these: they can never refer to `Bound` call methods,
+    // as those refer to these methods, so this will create an infinite recursion.
+    #[doc(hidden)]
+    #[inline]
+    fn __py_call_vectorcall1<'py>(


Rather than adding to this trait, we should look at its upcoming replacement IntoPyObject and consider how to slot these methods on there or a companion trait. We need to migrate the IntoPy<Py<PyTuple>> bound on the .call functions anyway, so this is a good time to bring this up cc @Icxolu.

I don't know how you expect this new trait(s) to look like, but it shouldn't be hard to migrate. I believe it is out of scope for this PR though.

Also, I think IntoPyObject has a worse API (in some part): it cannot convert one Rust type to multiple Python type, which can especially hurt calls (for example, because it prevents supporting calling with arrays or Vec without an inefficient conversion).

This has advantages - less type annotation, but I think these trait can coexist (with calls using IntoPy).

Hey, thanks for the ping David! I have to say upfront I'm not really familiar with these different calling conversions.

Also, I think IntoPyObject has a worse API (in some part): it cannot convert one Rust type to multiple Python type,

Together with fallibility I would considers the the two major advantages of the new API. During the experimentation phase we concluded that there is generally a clear Python target type for any Rust type. The additional complexity would make this overall less ergonomic to while bringing not much benefit in general.

This has advantages - less type annotation, but I think these trait can coexist (with calls using IntoPy).

IMO we should not keep IntoPy around. It has clear problems regarding fallible conversions. Also there should really be one trait responsible for converting Rust value into Python objects. Everything else is way harder to explain and to maintain. For example the implementations could get out of sync and the same value in Rust will be converted differently depending on which API it is given to. (This can already happen with ToPyObject and IntoPy currently, and I think we should get rid of it and not introduce a new form here)

If I understood correctly the problem is that we also want to convert arrays, Vecs, ... to a PyTuple while there normally convert into a PyList. I think we can support that special casing with IntoPyObject as well, using another method that converts Self into a PyTuple "args" object. A quick sketch below with my limited understanding.

pub trait IntoPyObject<'py>: Sized { .... #[doc(hidden)] /// Turn `Self` into callable args, can be specialized for tuples, array, ... fn into_args(self, py: Python<'py>, _: private::Token) -> PyResult<Bound<'py, PyTuple>> where PyErr: From<Self::Error>, { (self,).into_pyobject(py) // for tuples this can then be `self.into_pyobject(py)` } #[doc(hidden)] /// Call `function` with `obj` as `arg`; can use specialized calling conventions fn vectorcall( obj: Self, py: Python<'py>, function: Borrowed<'_, 'py, PyAny>, token: private::Token, ) -> PyResult<Bound<'py, PyAny>> where PyErr: From<Self::Error>, { #[inline] fn inner<'py>( py: Python<'py>, function: Borrowed<'_, 'py, PyAny>, args: Bound<'py, PyTuple>, ) -> PyResult<Bound<'py, PyAny>> { use crate::ffi_ptr_ext::FfiPtrExt; unsafe { ffi::PyObject_Call(function.as_ptr(), args.as_ptr(), std::ptr::null_mut()) .assume_owned_or_err(py) } } // make this use `into_args` inner(py, function, obj.into_args(py, token)?.into_bound()) } }

If I got something wrong, or overlooked something, let me know, but in general I think it should be possible to support this with IntoPyObject as well.

I believe it is indeed possible to support this with IntoPyObject.

If we are already making a breaking change, I think a better path than adding methods on IntoPyObject is to use another trait for calls, say PyCallArgs. This has the following advantage:

Assuming we seal PyCallArgs, this will allow us to easily enable future possibilities, even ones that we cannot predict, around perf and not only.

If you take IntoPyObject, you have to check you actually got a tuple. The overhead can be mitigated for known-tuples by specializing methods on them, but it is still not the best API since it does not prevent non-tuples at compile time and doesn't even signal the user their code is going to fail.

Anyway, this is unrelated to this PR. We can land it now, and I expect any changes around calling can be adjusted fairly trivially.

An additional reason I find the different trait approach tempting is that it can be used for both more convenient and more performant approach for kwargs, even without waiting for a pycall! macro - if we choose this path, we can instead of taking kwargs: Option<PyDict> take generic type that can convert to a dict.

For example,

fn call<Args, Kwargs>(&self, args: Args, kwargs: Kwargs) where (Args, Kwargs): PyCallArgs { ... }

That already means people can more nicely use kwargs with syntax like call((arg1, arg2, ...), [("a", 1), ("b", 2), ...]). But in addition, we may specialize the impls to instead of converting to PyDict, using the vectorcall API directly.

@Icxolu, what do you think of doing that (i.e. release 0.23 now as an interim towards a complete switchover for 0.24)?

Generally I'm open to that. I guess that depends a little on how we want to structure/explain the migration. I guess the current state is fairly minimal with the amount of actual breakage. My proposal for the trait bounds migration would have been to provide impl<'a, 'py, T> IntoPyObject<'py> for &'a T where T: ToPyObject {} this blanket, since the vast majority of the APIs are generic of ToPyObject. I would hope that that would keep breakage still low, but it's probably gonna be higher that now. So if you prefer we can definitely delay that to 0.24

On a different note, there is still a bit if bound api cleanup left that I think we should finish before 0.23 and I think #4449 we can also put in 0.23

My proposal for the trait bounds migration would have been to provide impl<'a, 'py, T> IntoPyObject<'py> for &'a T where T: ToPyObject {}

Hmm, interesting. So I played around with this (and ideas like a blanket-impl of ToPyObject from IntoPyObject, i.e. the reverse direction). TBH, neither felt great. For example implementing IntoPyObject for &'a T where T: ToPyObject will only help when users pass references for their custom types. Having the blanket might just be more confusion.

Having looked at that more, I think that in 0.23 we should just go for it and migrate all trait bounds without a blanket and commit to the bigger breakage. While it's a big (ish) breakage, I think it's actually the easiest state for users to understand, and I think we can make the migration easier for users by adding the derive proposed in #4458. (They might then just be able to switch to the derive and delete code in a lot of cases).

That said, I think we need to cut a 0.22.3 release to resolve #4452 and ship #4396, so I am open to the idea of merging this PR as-is and cherry-picking it as a perf enhancement in 0.22.3. @ChayimFriedman2, if we did that, would you be willing to help work on the follow-up to move this off IntoPy and onto new traits?

@ChayimFriedman2, if we did that, would you be willing to help work on the follow-up to move this off IntoPy and onto new traits?

Yes. Ping me when you need my help.

I'm actually trying to work now on a pycall!() draft, which will be both the most performant, most capable and most convenient way to call a Python method. Let's see where this'll bring us (it is still worth landing this PR because it benefits user we haven't migrated).

For example implementing IntoPyObject for &'a T where T: ToPyObject will only help when users pass references for their custom types. Having the blanket might just be more confusion.

That's true, haven't thought of that. In that case I think I tend to agree, providing any blanket will probably make it worse.

Having looked at that more, I think that in 0.23 we should just go for it and migrate all trait bounds without a blanket and commit to the bigger breakage. While it's a big (ish) breakage, I think it's actually the easiest state for users to understand, and I think we can make the migration easier for users by adding the derive proposed in #4458.

Sure thing, I'll prepare the PR with the trait bounds change and afterwards look into the derive macro.

src/conversion.rs

src/types/any.rs

src/types/tuple.rs

pyo3-benches/benches/bench_call.rs

ChayimFriedman2 · 2024-08-22T08:00:53Z

Done using the compat functions.

davidhewitt

Thanks, as agreed let's merge this as it's. I'll pick it into 0.22.3 and then let's work out new trait bounds for 0.23.

davidhewitt · 2024-08-24T06:52:10Z

Ah, needs a conflict resolved. Sorry for the delay on my part.

This works without any changes to user code. The way it works is by creating a methods on `IntoPy` to call functions, and specializing them for tuples. This currently supports only non-kwargs for methods, and kwargs with somewhat slow approach (converting from PyDict) for functions. This can be improved, but that will require additional API. We may consider adding more impls IntoPy<Py<PyTuple>> that specialize (for example, for arrays and `Vec`), but this i a good start.

ChayimFriedman2 · 2024-08-24T18:45:03Z

@davidhewitt Resolved the conflict.

davidhewitt · 2024-08-24T21:53:09Z

src/instance.rs

@@ -1516,9 +1514,8 @@ impl<T> Py<T> {
    ) -> PyResult<PyObject>
    where
        N: IntoPyObject<'py, Target = PyString>,
-        A: IntoPyObject<'py, Target = PyTuple>,
+        A: IntoPy<Py<PyTuple>>,


@Icxolu I reverted the bounds here and on the other call functions given the likely plan is to more these bounds to a separate trait.

* Use vectorcall (where possible) when calling Python functions This works without any changes to user code. The way it works is by creating a methods on `IntoPy` to call functions, and specializing them for tuples. This currently supports only non-kwargs for methods, and kwargs with somewhat slow approach (converting from PyDict) for functions. This can be improved, but that will require additional API. We may consider adding more impls IntoPy<Py<PyTuple>> that specialize (for example, for arrays and `Vec`), but this i a good start. * Add vectorcall benchmarks * Fix Clippy (elide a lifetime) --------- Co-authored-by: David Hewitt <[email protected]>

ChayimFriedman2 force-pushed the call-vectorcall branch 6 times, most recently from b4ff834 to 8c6658e Compare August 19, 2024 20:49

davidhewitt reviewed Aug 19, 2024

View reviewed changes

ChayimFriedman2 force-pushed the call-vectorcall branch from 8c6658e to a8179a9 Compare August 19, 2024 23:08

davidhewitt mentioned this pull request Aug 20, 2024

ffi: add compat functions for no-argument calls #4461

Merged

ChayimFriedman2 force-pushed the call-vectorcall branch from a8179a9 to 964b1f1 Compare August 22, 2024 08:00

Icxolu mentioned this pull request Aug 23, 2024

Contribution opportunity: improved conversion traits #4041

Closed

davidhewitt approved these changes Aug 24, 2024

View reviewed changes

ChayimFriedman2 added 2 commits August 24, 2024 21:42

Add vectorcall benchmarks

9a80eba

ChayimFriedman2 force-pushed the call-vectorcall branch from 964b1f1 to 9a80eba Compare August 24, 2024 18:44

davidhewitt enabled auto-merge August 24, 2024 18:47

davidhewitt added this pull request to the merge queue Aug 24, 2024

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Aug 24, 2024

Merge branch 'main' into call-vectorcall

2e310c2

davidhewitt reviewed Aug 24, 2024

View reviewed changes

davidhewitt enabled auto-merge August 24, 2024 21:53

Fix Clippy (elide a lifetime)

7b3872a

auto-merge was automatically disabled August 25, 2024 02:30
Head branch was pushed to by a user without write access

davidhewitt added this pull request to the merge queue Aug 25, 2024

Merged via the queue into PyO3:main with commit 2e891d0 Aug 25, 2024
43 checks passed

Icxolu mentioned this pull request Oct 13, 2024

deprecate IntoPy in favor or IntoPyObject #4618

Merged

3 tasks

This was referenced Oct 25, 2024

migrate call API to IntoPyObject #4653

Merged

Reintroduce vectorcall specialization for Python call API. #4656

Closed

Icxolu mentioned this pull request Dec 5, 2024

reintroduce vectorcall optimization with new PyCallArgs trait #4768

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use vectorcall (where possible) when calling Python functions #4456

Use vectorcall (where possible) when calling Python functions #4456

ChayimFriedman2 commented Aug 19, 2024

davidhewitt left a comment

davidhewitt Aug 19, 2024

ChayimFriedman2 Aug 19, 2024

ChayimFriedman2 Aug 19, 2024

Icxolu Aug 20, 2024

ChayimFriedman2 Aug 20, 2024

ChayimFriedman2 Aug 20, 2024

Icxolu Aug 20, 2024

davidhewitt Aug 22, 2024

ChayimFriedman2 Aug 22, 2024

Icxolu Aug 22, 2024

ChayimFriedman2 commented Aug 22, 2024

davidhewitt left a comment

davidhewitt commented Aug 24, 2024

ChayimFriedman2 commented Aug 24, 2024

davidhewitt Aug 24, 2024

Use vectorcall (where possible) when calling Python functions #4456

Use vectorcall (where possible) when calling Python functions #4456

Conversation

ChayimFriedman2 commented Aug 19, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChayimFriedman2 commented Aug 22, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt commented Aug 24, 2024

ChayimFriedman2 commented Aug 24, 2024

Choose a reason for hiding this comment