Stop leaking in `new_closure` #2690

mejrs · 2022-10-16T16:17:06Z

Previously this leaked memory on every call:

use pyo3::prelude::*;
use pyo3::types::*;

fn main() {
    Python::with_gil(|_py| loop {
        let pool = unsafe { _py.new_pool() };
        let py = pool.python();
        let add_one = |args: &PyTuple, _kwargs: Option<&PyDict>| -> PyResult<_> {
            let i = args.extract::<(i64,)>()?.0;
            Ok(i + 1)
        };
        let closure = PyCFunction::new_closure(add_one, py).unwrap();
    });
}

Now it does no more:

The PyMethodDef is cleaned up when the closure gets dropped
Any Cstrings are deallocated if necessary

I am not sure what the best next step is. I'm leaning towards an api that takes &'str (and always allocates a CString) and another api taking &'static CStr that does not allocate cstrings.

mejrs · 2022-10-16T16:19:15Z

src/types/function.rs

+            .as_method_def()
+            .map_err(|err| PyValueError::new_err(err.0))?;
+
+        let def = Box::into_raw(Box::new(def));


This still leaks; I haven't looked at what actually calls this function so I'm not sure if that's important.

It's called by the PyCFunction::new and new_with_keywords methods.

I think the only way around this would be to have a way to construct &static ffi::PyMethodDef safely and change the PyCFunction apis to take it (even if it's wrapped up in another type so users can't see it).

Either that, or should they be changed to use a capsule in the same way as PyCFunction::new_closure?

Either that, or should they be changed to use a capsule in the same way as PyCFunction::new_closure?

I think this would be the smallest change to fix this. Everything else could be a follow-up optimization.

(Generally speaking, I think there would be an optimization opportunity by creating a #[pyclass] wrapping a ffi::PyMethodDef instead of using a capsule to avoid one layer of dynamic allocation as ffi::PyMethodDef could be stored inline in the #[pyclass] whereas the capsule is itself limited to a pointer if I understand things correctly.)

While true, we've steered clear of having #[pyclass] types internal to PyO3 because the wouldn't be publicly accessible or re-used between PyO3 modules. Solutions to that seem hard. (Maybe it's not actually a problem though.)

because the wouldn't be publicly accessible

I think that would not be a problem here as this should really be an implementation detail. I suspect having to manually write the #[pyclass] to avoid the macros as it used to be done in rust-numpy would be a bit of a hassle though (but easier from within PyO3 than from within rust-numpy).

re-used between PyO3 modules.

I guess this makes it a trade-off between less indirection but more type objects. There are usually multiple method definitions per extension module, so I suspect this could work out. At least I know of no complaints w.r.t rust-numpy's PySliceContainer which suffers from the same issue.

(This does mean that this could be applied only for "one-way" types though, i.e. when we hand data to Python for taking care of. If we might happen to consume such a type created by a different PyO3-written extension module, this would break.)

Actually can use the #[pyclass(crate = "crate")] to just use the macros I think. However it'd mean this feature wouldn't work if the macros feature was disabled.

Yes, perhaps you are right that in this "internal" case it would be suitable.

mejrs · 2022-10-16T16:27:10Z

src/types/function.rs

+        let method_def = pymethods::PyMethodDef::cfunction_with_keywords(
+            "pyo3-closure\0",
+            pymethods::PyCFunctionWithKeywords(run_closure::<F, R>),
+            "\0",


This should probably be a None

adamreichold · 2022-10-16T18:09:57Z

src/internal_tricks.rs

@@ -35,15 +35,45 @@ macro_rules! pyo3_exception {
 #[derive(Debug)]
 pub(crate) struct NulByteInString(pub(crate) &'static str);

+#[derive(Copy, Clone)]
+pub(crate) enum MaybeLeaked {


Isn't this actually MaybeOwned::Owned(*mut c_char) because the point of the type is to not leak things?

Also could this also be something like

pub(crate) enum MaybeOwned { Static(&'static CStr), Owned(CString), }

or does this violate any aliasing guarantees?

I went with my approach over this for some reasons:

the type is smaller (doesn't really matter, I suppose)

I don't think aliasing CString is a problem in practice but using it would make ClosureDestructor (more of) a self referential struct, so I'd really prefer to use raw pointers

it's a lot easier to create a *const c_char at compile time than a &'static CStr (msrv 😭 )

Another option that comes to mind is to not store the Static variant at all (we do not need it for destruction) and use a signature similar to as_method_def above for extract_cstr_or_leak_cstring, e.g. make it return

Result<(*const c_char, ManuallyDrop<Option<CString>>), NulByteInString>

and use Option<CString> inside of PyMethodDefDestructor (which is a sort of "double-option" ATM).

adamreichold · 2022-10-16T18:13:05Z

src/types/function.rs

@@ -45,11 +55,12 @@ where
    let trap = PanicTrap::new("uncaught panic during drop_closure");
    let pool = GILPool::new();
    if let Err(payload) = std::panic::catch_unwind(|| {
-        let boxed_fn: Box<F> = Box::from_raw(ffi::PyCapsule_GetPointer(
+        let destructor: Box<ClosureDestructor<F>> = Box::from_raw(ffi::PyCapsule_GetPointer(


Would this code benefit from using our generic capsule API?

adamreichold · 2022-10-17T13:36:08Z

src/internal_tricks.rs

-        .map_err(|_| NulByteInString(err_msg))
+) -> Result<MaybeLeaked, NulByteInString> {
+    let bytes = src.as_bytes();
+    let nul_pos = memchr::memchr(0, bytes);


Why do we need to do this ourselves? Can't we keep using CStr::from_bytes_with_nul and CString::new and just wrap the result in MaybeLeaked?

Otherwise we need to traverse bytes twice; both of those functions check for internal nul bytes.

I understand that. I would say though that the price of a new dependency (memchr was only a dev-dependency until now) and more unsafe blocks is relatively steep for that. I also suspect that the compiler might just be able to optimize out the redundant checks in the safe alternative.

(As an aside, the change is somewhat orthogonal to the topic of the PR. It might be good to propose it separately to avoid derailing the fix for the memory leak. But then, this is probably not too convincing coming from the one doing the derailing in the first place.)

How about this alternative (pseudocode):

let bytes = src.as_bytes(); if !bytes.is_empty() && bytes[-1] == '\0' { CStr::from_bytes_with_nul(bytes) } else { CString::new(bytes) }

Checking for the trailing nul doesn't need an external dependency. We can then branch on the leak-or-not operation and we know that any errors must be due to internal nuls.

davidhewitt · 2022-12-27T20:30:14Z

Sorry didn't mean to force-push here; opened #2842 instead as a simpler alternative inspired by this one.

2842: Stop leaking in `new_closure` r=adamreichold a=davidhewitt This is a rebase of #2690 which simplifies the `MaybeLeaked` abstraction from that PR with just `Cow<'static, CStr>`. This enabled me to annotate with `FIXME` all the places where we still leak; I wonder if we could potentially use `GILOnceCell` in future and statics to avoid those. As those callsities are in `#[pyclass]` and `#[pyfunction]` these are effectively in statics anyway, but it would be nice to tidy up. Co-authored-by: David Hewitt <[email protected]>

Stop leaking in new_closure

4876f1d

mejrs commented Oct 16, 2022

View reviewed changes

adamreichold reviewed Oct 16, 2022

View reviewed changes

mejrs marked this pull request as draft October 16, 2022 21:14

adamreichold reviewed Oct 17, 2022

View reviewed changes

dalcde mentioned this pull request Oct 23, 2022

Support passing name and doc to new_closure. #2686

Merged

adamreichold mentioned this pull request Nov 27, 2022

0.18 release #2785

Closed

7 tasks

davidhewitt modified the milestones: 0.19, 0.18 Dec 27, 2022

davidhewitt force-pushed the leaky branch 2 times, most recently from a4353cb to d4a64f3 Compare December 27, 2022 20:26

davidhewitt mentioned this pull request Dec 27, 2022

Stop leaking in new_closure #2842

Merged

davidhewitt force-pushed the leaky branch from d4a64f3 to a4353cb Compare December 27, 2022 20:29

davidhewitt force-pushed the leaky branch 5 times, most recently from 196eafb to 4876f1d Compare December 27, 2022 21:21

davidhewitt closed this Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop leaking in `new_closure` #2690

Stop leaking in `new_closure` #2690

mejrs commented Oct 16, 2022

mejrs Oct 16, 2022

davidhewitt Oct 18, 2022

adamreichold Oct 18, 2022

davidhewitt Oct 18, 2022

adamreichold Oct 18, 2022 •

edited

Loading

davidhewitt Oct 18, 2022

mejrs Oct 16, 2022

adamreichold Oct 16, 2022

mejrs Oct 16, 2022

adamreichold Oct 17, 2022 •

edited

Loading

adamreichold Oct 16, 2022

adamreichold Oct 17, 2022

mejrs Oct 18, 2022

adamreichold Oct 18, 2022 •

edited

Loading

davidhewitt Oct 27, 2022

davidhewitt commented Dec 27, 2022

Stop leaking in new_closure #2690

Stop leaking in new_closure #2690

Conversation

mejrs commented Oct 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamreichold Oct 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamreichold Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamreichold Oct 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidhewitt commented Dec 27, 2022

Stop leaking in `new_closure` #2690

Stop leaking in `new_closure` #2690

adamreichold Oct 18, 2022 •

edited

Loading

adamreichold Oct 17, 2022 •

edited

Loading

adamreichold Oct 18, 2022 •

edited

Loading