Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add second lifetime to FromPyObject #4390

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

Icxolu
Copy link
Contributor

@Icxolu Icxolu commented Jul 28, 2024

This adds the second lifetime to FromPyObject and removes FromPyObjectBound. I hope I adjusted all the bounds correctly. I believe the equivalent to our current implementation is use HRTB for the input lifetime on any generic bound. But this should only be necessary for containers that create temporary Python references during extraction. For wrapper types it should be possible to just forward the relaxed bound.

For easier trait bounds FromPyObjectOwned is introduced. This is blanket implemented for any FromPyObject type that does not depend on the input lifetime. It is intended to be used as a trait bound, the idea is inspired by serde (Deserialize <=> DeserializeOwned)

I tried to document different cases in the migration guide, but it can probably still be extended. Changelog entry is still missing, will write that tomorrow.

Breaking changes:

  • the second lifetime
  • the extract method without default
  • the addition lifetime bound on extract_bound to make the default work

@mejrs
Copy link
Member

mejrs commented Jul 28, 2024

I propose that we add a FromPyObjectOwned trait similar to serde's DeserializeOwned. It's much easier to explain "just use this trait if the output doesn't borrow from Python" rather than "you need higher ranked trait bounds".

@Icxolu
Copy link
Contributor Author

Icxolu commented Jul 29, 2024

Thanks! I think that's a great idea. It's essentially the same thing, but it's much easier to grasp by giving it a distinct name, plus we can better document it in the API docs. I will adapt this to make use of that.

@Icxolu Icxolu force-pushed the from-pyobject branch 3 times, most recently from c960124 to a1c0bed Compare July 29, 2024 18:16
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing on with this! Overall it looks good to me, and I only have a few docs suggestions.

Before we definitely commit to this, I want to take a brief moment just to reflect on the choice of Borrowed over &Bound. There's at least two good technical reasons, which is the lifetime constraint and performance by avoiding pointer-to-pointer. The codspeed branch does show a few slight perf improvements in the ~4% range on tuple extraction and we have further possibilities like PyRef containing Borrowed once we do this.

That said, we deliberately kept the Borrowed smart pointer out of the way as much as possible in the original 0.21 design to try to keep new concepts down. Is it fine to increase its use? I think the upsides justify this, though we might want to increase the quality of documentation and examples on Borrowed (which I'd left quite minimal so far).

guide/src/migration.md Outdated Show resolved Hide resolved
guide/src/migration.md Outdated Show resolved Hide resolved
guide/src/migration.md Outdated Show resolved Hide resolved
guide/src/migration.md Outdated Show resolved Hide resolved
pyo3-macros-backend/src/frompyobject.rs Outdated Show resolved Hide resolved
@@ -616,8 +616,9 @@ pub fn build_derive_from_pyobject(tokens: &DeriveInput) -> Result<TokenStream> {
let ident = &tokens.ident;
Ok(quote!(
#[automatically_derived]
impl #trait_generics #pyo3_path::FromPyObject<#lt_param> for #ident #generics #where_clause {
fn extract_bound(obj: &#pyo3_path::Bound<#lt_param, #pyo3_path::PyAny>) -> #pyo3_path::PyResult<Self> {
impl #trait_generics #pyo3_path::FromPyObject<'_, #lt_param> for #ident #generics #where_clause {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be an interesting question for later if we ever support the 'a lifetime in our #[derive(FromPyObject)]. Definitely not for now 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something to experiment with in the future, but out of scope for now for sure 😅

/// the normal `FromPyObject` trait. This trait has a blanket implementation
/// for `T: FromPyObject`.
/// Note: depending on the implementation, the lifetime of the extracted result may
/// depend on the lifetime of the `obj` or the `prepared` variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a rather than prepared here and below? Also maybe written as 'py or 'a, as we do elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I was a bit lazy and just uncommented the docs that we temporarily removed during gil-ref migration.

How about

/// Note: depending on the implementation, the extracted result may
/// depend on the Python lifetime `'py` or the input lifetime `'a` of `obj`.

Also the example below that seem a bit confusing, what do you think about using Cow<str> as an example, which may or may not borrow from the input ('a) depending on the Python runtime type and maybe Bound<'py, PyString> as an example that depends on the Python lifetime? Maybe we also add an example of a collection type (maybe Vec) to introduce FromPyObjectOwned...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of that sounds great, yes please 👍

src/conversion.rs Outdated Show resolved Hide resolved
Comment on lines +309 to +559
impl<'py, T> FromPyObject<'_, 'py> for PyRef<'py, T>
where
T: PyClass,
{
fn extract_bound(obj: &Bound<'py, PyAny>) -> PyResult<Self> {
fn extract(obj: Borrowed<'_, 'py, PyAny>) -> PyResult<Self> {
obj.downcast::<T>()?.try_borrow().map_err(Into::into)
}
}

impl<'py, T> FromPyObject<'py> for PyRefMut<'py, T>
impl<'py, T> FromPyObject<'_, 'py> for PyRefMut<'py, T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then the follow-up future PR will be what to do about PyRef. Should we split into PyRef / PyRefBorrowed, or keep the one type PyRef<'a, 'py, T> and accept some breakage? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I we want to provide both variants, I would prefer PyRef<'a, 'py, T> and PyRefOwned<'py, T> since that would be more consistent with FromPyObjectOwned.

In any case we would need to find a good way to differentiate their constructors. Neither obj.borrow_borrowed() nor obj.borrow_owned() feel to appealing to me 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's tough to find names. I'm somewhat tempted to wait until after 0.23 and to first see how all these trait changes we've already made play out before we change PyRef.

@Icxolu
Copy link
Contributor Author

Icxolu commented Aug 3, 2024

Thanks for the review! You are right, the choice of Borrowed over &Bound is definitely something to discuss about. I used it here because it was the final form of FromPyObjectBound that we landed on. Looking back, the change to Borrowed happened in #3959, which was related to the gil-refs API, so it might not be strictly necessary anymore (haven't checked). The points you mentioned seem like good justification why it still makes sense to introduce additional complexity here.

I think the upsides justify this, though we might want to increase the quality of documentation and examples on Borrowed (which I'd left quite minimal so far).

I agree, if we put Borrowed in such a core part of the infrastructure it deserves a documentation overhaul. We might also want to take a look at the current (public) API and additionally expose some helpers. Something like to_any comes to mind, which is currently internal only.

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the recently added FromPyObject documentation is too "implementation detail"-y.

I'd prefer if this documentation is kept short and concise, one paragraph max.

For example:

Depending on the python version and implementation some `FromPyObject`
 implementations may produce Rust types that point into the Python type.
For example `PyString` and `PyBytes` may convert into `str` and `[u8]` references
 that borrow from the original Python object.

Types that do not borrow from the input should use `FromPyObjectOwned` instead.

src/conversion.rs Outdated Show resolved Hide resolved
src/conversion.rs Outdated Show resolved Hide resolved
/// borrow from the input lifetime `'a`. The behavior depends on the runtime
/// type of the Python object. For a Python byte string, the existing string
/// data can be borrowed (lifetime: `'a`) into a [`Cow::Borrowed`]. For a Python
/// Unicode string, the data may have to be reencoded to UTF-8, and copied into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not, depending on the python version the string will be interned in the unicode object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm aware of that, but that felt like too much detail for me, and is not really relevant for the point I am trying to make here. That's why I tried to phrase it in a rather vague way.

@Icxolu
Copy link
Contributor Author

Icxolu commented Aug 4, 2024

Thanks for the feedback! I took another pass over it and tried to make it a bit more concise. I do however think that it's worth to talk about the individual lifetimes at play here, since this trait is right in the center of PyO3. I moved the paragraph about collections into the FromPyObjectOwned docs, because it works much better there with the code example, and replaced it with a smaller note for more info. I reworded my examples a bit and moved them under a # Details section, maybe this works better?

Let me know what you think about this.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for keeping this branch alive, I guess let's move forward with this before it gets too painful to keep updating.

I had yet another reflection on &Bound vs Borrowed and I'm feeling reasonably comfortable that Borrowed is the correct choice.

src/conversion.rs Outdated Show resolved Hide resolved
src/conversion.rs Outdated Show resolved Hide resolved
@@ -656,8 +656,8 @@ macro_rules! tuple_conversion ({$length:expr,$(($refN:ident, $n:tt, $T:ident)),+
}
}

impl<'py, $($T: FromPyObject<'py>),+> FromPyObject<'py> for ($($T,)+) {
fn extract_bound(obj: &Bound<'py, PyAny>) -> PyResult<Self>
impl<'a, 'py, $($T: FromPyObject<'a, 'py>),+> FromPyObject<'a, 'py> for ($($T,)+) {
Copy link
Member

@davidhewitt davidhewitt Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record and the decision to stick with Borrowed, the reason is that if we used &'a Bound here for the input argument, then tuple extraction would be forced to restrict elements to FromPyObjectOwned.

That maybe wouldn't be a terrible thing, as this is currently a special case when compared with every other container, which is forced to use FromPyObjectOwned due to their mutability.

But I think it would be a breaking change, so it's probably not a good change overall. (E.g. users would lose the ability to extract tuples containing &str, &Bound and Borrowed)

I think as a secondary effect, by using Borrowed here we might be able to change extract_argument.rs to make more use of Borrowed, which IIRC is a possible performance win.

Co-authored-by: David Hewitt <[email protected]>
@Icxolu
Copy link
Contributor Author

Icxolu commented Aug 31, 2024

I believe this now also finishes the _bound methods deprecations 🎉

@davidhewitt
Copy link
Member

Fantastic, thank you so much for your help on pushing that over the line. When I get some time I'll try to write up a list of what we might have left for 0.23 (hopefully not much).

@davidhewitt davidhewitt added this pull request to the merge queue Aug 31, 2024
@davidhewitt davidhewitt removed this pull request from the merge queue due to a manual request Aug 31, 2024
@davidhewitt
Copy link
Member

Ah, I just started playing with this locally and found one additional finding: having Borrowed<'a, 'py, T> as the input type means that it's impossible to implement FromPyObject<'a, 'py> for &'a Bound<'py, T> - because the input borrowed can't be cast into a &'a Bound<'py, T> for a sufficient lifetime (pointer goes in, pointer-to-pointer goes out).

Is that a problem? I think not, because that's what .downcast() is for (and it's potentially nice to separate them like this), however just worth a brief double-check that others agree.

@davidhewitt
Copy link
Member

Oh, and one more case which comes up in local testing. At the moment from_py_with extractors take &Bound<'py, PyAny>. Should we be considering changing those to Borrowed too, for consistency?

@davidhewitt
Copy link
Member

Should we be considering changing those to Borrowed too, for consistency?

It feels to me like we might want to consider macro tricks such that users can use both &Bound and Borrowed.

@Icxolu
Copy link
Contributor Author

Icxolu commented Sep 1, 2024

having Borrowed<'a, 'py, T> as the input type means that it's impossible to implement FromPyObject<'a, 'py> for &'a Bound<'py, T>

That's true, but as you mention below, I actually think this a good thing and makes a clearer use case between .extract() and .downcast().

Oh, and one more case which comes up in local testing. At the moment from_py_with extractors take &Bound<'py, PyAny>. Should we be considering changing those to Borrowed too, for consistency?

Hmm, good question. I guess that is part of the question about how prominent we want Borrowed to be. Providing some flexibility via macro magic as you suggested sounds like a good compromise. I can explore that as a followup.

@davidhewitt
Copy link
Member

Yes agreed, it's definitely a follow-up to make changes to expand from_py_with.

I wonder, are there any other options we have here? E.g. does it work to have a trait with both extract and extract,_borroeed, maybe with a default implementation of extract,_borrowed? I suspect that might not work because of the same reason we can't currently have &Bound as an output.

I am still feeling a little uneasy about the choice to commit Borrowed more directly into the face of users. I think it is probably the right choice still, as long as we improve the documentation and API on borrowed. Maybe I will quickly try to get the perf wins out of extract_argument.rs later today to add another data point.

@davidhewitt
Copy link
Member

And (sorry to keep dangling ideas on here), inspired by the new IntoPyObject, I have two further perf related thoughts while we are making breaking changes here:

  • In the past we have noted that the PyResult return type forced conversion of errors to PyErr and made .extract() a performance footgun compared to .downcast(). There is still this problem for FromPyObject for Bound<T>. Should we add a type Error in the same way we are adding for IntoPyObject?
  • It is possible to call .extract() on e.g. Bound<PyString> to get a String, but this comes at the cost of a redundant Python type check because FromPyObject always works on PyAny. Could we add a type Source, and add a way to make Bound<T> directly offer extract() where U: FromPyObject<Source = T>?

Both changes would significantly complicate the trait. I wonder, is there a way that we can support a simple trait like FromPyObject and a more advanced FromPyObjectImpl, which has a blanket from FromPyObject, so users can just implement the simple trait if that's good enough?

All food for thought and I think worth discussing while we're committing to breaking changes here.

@Icxolu
Copy link
Contributor Author

Icxolu commented Sep 1, 2024

And (sorry to keep dangling ideas on here), inspired by the new IntoPyObject, I have two further perf related thoughts while we are making breaking changes here:

No problem 😄

  • In the past we have noted that the PyResult return type forced conversion of errors to PyErr and made .extract() a performance footgun compared to .downcast(). There is still this problem for FromPyObject for Bound<T>. Should we add a type Error in the same way we are adding for IntoPyObject?

If we're willing to do more breaking changes here, that sounds like a good idea to me. The only downsides I can see

  • It is possible to call .extract() on e.g. Bound<PyString> to get a String, but this comes at the cost of a redundant Python type check because FromPyObject always works on PyAny. Could we add a type Source, and add a way to make Bound<T> directly offer extract() where U: FromPyObject<Source = T>?

This would be really cool indeed. I guess we would maybe somehow need to take advantage of method precedence between trait and inherent methods, to make the fall through to PyAny work...

@davidhewitt davidhewitt mentioned this pull request Sep 13, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants