Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute intermittently returns the wrong object #249

Closed
mhsmith opened this issue Jan 18, 2023 · 5 comments · Fixed by #246
Closed

Attribute intermittently returns the wrong object #249

mhsmith opened this issue Jan 18, 2023 · 5 comments · Fixed by #246
Labels
bug A crash or error in behavior.

Comments

@mhsmith
Copy link
Member

mhsmith commented Jan 18, 2023

This was discovered while testing beeware/toga#1707: I'll leave it to @freakboy3742 to fill in the details.


The Details

This is something that turns up when running the newly added Toga GUI test suite. I believe it could manifest in other scenarios, but it's proving hard to replicate. Even in the Toga GUI test suite, the problem is intermittent - maybe 1 in 20 runs.

This is the sequence of issues that occurs:

  1. An ObjC object is created, and a Python proxy is created. (In the test case where I'm seeing it, the object is an NSLayoutConstraint; presumably one of the objects created when setting up a layout.
  2. The ObjC object falls out of scope, and is disposed. However, the Python object has not yet been garbage collected of remains on the ObjCInstance._cached_objects web dictionary.
  3. The Objective C runtime re-allocates the same memory address to a new object (In the example case, it's reliably being re-used as the return value for NSView.subviews (an NSArray).

When the ObjCInstance is constructed for the NSArray object, Rubicon checks the memory address in the instance weakref dictionary, it determines that an instance already exists, and returns that instance. However, that instance has been previously loaded as an NSLayoutConstraint, so the methods cache has been loaded for a different class; and attempting to iterate over the NSArray raises an error that it isn't an iterable object.

It isn't yet clear if the issue is the gap between deletion of the object on ObjC and deletion/garbage collection of the Python object; or an issue with the retain/release strategy. However, a workaround has been to intercept the return value of ObjCMethod.__call__, and if it's an ObjC object, use an object_getClass to compare the class name associated with the returned pointer with the cached object; if there's a discrepancy, the cache is purged.

@mhsmith mhsmith changed the title Attribute intermittently returns a different object Attribute intermittently returns the wrong object Jan 18, 2023
@mhsmith
Copy link
Member Author

mhsmith commented Jan 18, 2023

Some notes on how Chaquopy deals with object identity and deletion. All paths are relative to product/runtime.

JNI doesn't expose memory addresses directly; instead it uses opaque JNI references. These are represented by a low-level class (never exposed to the Chaquopy user) in src/main/python/java/env.pxi. It has a __dealloc__ method (the Cython equivalent of __del__), which automatically releases the reference when the object is destroyed. As long as a Java object has at least one JNI reference, it will not be garbage collected. JNI references which refer to the same object compare equal because they implement __richcmp__ in terms of IsSameObject (src/main/python/java/env.pxi).

In src/main/python/java/class.pxi there's an instance_cache which is a WeakValueDictionary. The key is a tuple of (class, JNI reference), and the value is the Python proxy object which is exposed to the Chaquopy user. The class has to be included in the key because Chaquopy allows a Java object to be "cast" to one of its ancestor classes.

The Python proxy object itself also stores a copy of the JNI reference (assigned to _chaquopy_this in the function set_this).

Once the Python proxy is destroyed, the WeakValueDictionary is automatically notified, and removes the object's entry along with the JNI reference in the key. If the Java object is passed to Python again after this point, a new Python proxy object will be created for it.

Then, the Python proxy's __dict__ is destroyed along with the other JNI reference in _chaquopy_this. With all JNI references released, whether the object continues to exist now depends on whether it has other references on the Java side.

This is tested in src/test/python/chaquopy/test/test_reflect.py:

  • test_identity tests reusing the Python proxy object.
  • test_gc tests the deletion process. It creates a DelTrigger object (defined in src/test/java/com/chaquo/python/TestReflect.java), and checks that deleting the Python object causes the Java object to be deleted as well.

Hopefully this will give you some ideas.

@freakboy3742
Copy link
Member

@mhsmith Thanks for those details - sounds remarkably close to what Rubicon is doing. The only real differences appear to be the garbage collection chain on the weak references (which doesn't appear to be quite as immediate from the ObjC side); and the lack of a good "IsSameObject" analog (Rubicon has to work with the memory address, which (a) can be re-used, and (b) has no way to identify if it's currently valid)

@freakboy3742 freakboy3742 added the bug A crash or error in behavior. label Jan 19, 2023
@mhsmith
Copy link
Member Author

mhsmith commented Jan 22, 2023

The ObjC object falls out of scope, and is disposed. However, the Python object has not yet been garbage collected

At first I was confused by how this is possible. But I guess it's explained by the documentation section on Reference counting, which says that Rubicon will only take "ownership" of an Objective-C instance when you create it using a method whose name begins with “alloc”, “new”, “copy”, or “mutableCopy”, or when you manually retain an object.

Does this mean that all other Python wrapper objects do nothing to keep the Objective-C object alive, and can have the object disappear from underneath them at any time? If so, doesn't the workaround in #246 still leave the potential for all kinds of incomprehensible bugs and crashes?

  • Python object spontaneously switches to a different Objective-C object of the same type (mentioned in this comment).
  • Python object pointing into random memory in the middle of another object.
  • Python object pointing into freed memory.

In Chaquopy, all Python wrapper objects have a strong reference to the Java object, so the object cannot be disposed by Java until after the Python object is destroyed. Could this be implemented in Rubicon?

@freakboy3742
Copy link
Member

So - yes, there's potential for crashes for the reasons you identify - if you haven't got your memory allocation correct. Objective C is "C with syntactic sugar", not a fully memory managed language; and as such, the author needs to be very aware of malloc/free usage (spelled retain and release).

Unfortunately, what you need to do is not completely straightforward, because it's dependent on the API you're invoking. The line about "alloc, new, copy or mutableCopy" comes from the Objective C runtime's memory management guide; but other APIs will explicitly call out that "the user calling the method takes ownership of the allocated object" (or similar).

There's one small affordance - autorelease - that allows you to mark an object to be automatically released when the retain count reaches 0. The main.m generated by Briefcase turns this on for all Python projects.

I don't think the "strong reference" route is entirely viable (although I'm open to ideas); it would definitely prevent crashes - if you retain everything and only release when the Python object is deleted, then Objective C can't free or reallocate the memory. However, this would be very prone to memory leaks - it would very very easy to get into a situation where Python is the entity preventing the release of memory.

The other issue is timing. Garbage collection on the Python side isn't guaranteed to be immediate, which leaves a (small, but clearly reproducible) window where an object still exists on the Python side, but it hasn't been garbage collected; and as a result, it hasn't been removed from the weak reference dictionary.

I definitely won't rule out the possibility that there's something we could do to improve things. However, Rubicon's current strategy has (until this PR, anyway) been reasonably stable; messing around with memory allocation code is a I'm hesitant to go spelunking in this area without a clear idea of a future state, or a reproducible problem to solve.

@mhsmith
Copy link
Member Author

mhsmith commented Oct 19, 2023

There's one small affordance - autorelease - that allows you to mark an object to be automatically released when the retain count reaches 0. The main.m generated by Briefcase turns this on for all Python projects.

Based on this page, it looks like objects are always deallocated when their reference count reaches 0, but there's a way to defer a release until the end of the current main loop iteration. So if I understand correctly, the @autoreleasepool block in main.m, which wraps around the entire main loop, isn't doing anything useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A crash or error in behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants