-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve correspondence between ObjectiveC objects and Python wrappers #256
Comments
A similar segfault occurred due to some missing |
I was revisiting this again due to beeware/toga#2468, and wanted to jot down some options before I forget them again. BackgroundThe changes from #201 were introduced as convenience to rubicon-objc users. We cannot use Automatic Reference Counting from Python, which means that objects created by The affordance introduced in #201 was to always
Possible options
|
FWIW - I think the option described by (1) (i.e., the #249) is worth exploring. (3) is working well enough, but is still prone to leaks (and occasional crashes); as you've noted (2) requires a set of skills that most Python programmers (and, to be brutally honest, most programmers period) either don't have, or can't exercise with 100% reliability. (1) has the benefit of being easy to explain conceptually, and should (?) be relatively straightforward to implement. I have 2 core sources of concern around (1) as an approach:
|
Agreed with your broad assessment that option (2) is very hard to do without enforcing particular design patterns and potential tooling support. There is a reason why Apple strongly recommends ARC. Regarding your concerns around #249:
|
Alternatively, how about this?
That covers all the use cases I can think of, except for the pattern from beeware/toga#2468, where a failed Then, as long as we can assume that the plain This may clash with the way the Python constructor is currently being used to wrap an existing Objective-C object, but it should be possible to distinguish that mode by checking argument types, or by passing a keyword argument like
Yes, I think that's the only reasonable option. |
There are also some type methods that don't follow the naming convention, like How does Objective-C's automatic reference counting handle this? Does it depend on metadata in header files, and is that information available at runtime? Unless we can fix this completely, I think the documentation is a bit overoptimistic to say "You won’t have to manage reference counts in Python, Rubicon Objective-C will do that work for you." |
The baseline memory management rules described the conditions for API memory retention; as I understand it, As far as I can make out, we are currently leaking in the
seems to slowly leak memory (as measured by RSS allocated size); whereas:
seems much more stable over time. There's still a lot of growth in process size, but this seems to be recovered over time, whereas there's no (or, at least, much less) observable memory recovery in the former version. In both cases, the number of cached objects ( So... it would appear that we definitely have work to do here. The approach you've described of adding a The other issue I can see is with uses that currently require the use of |
Yes, that's why I said "Call
The current implementation of those methods calls
Maybe we can be more specific: what if Rubicon automatically does a retain/autorelease on any object which is Either way, any existing code that uses retain/release/autorelease correctly would continue to work, but many of those uses would now be redundant and could be removed when convenient. |
🤦 My apolgies - you did.
Looking closer at the Tree/Table/DetailedList code, I think the Making that situation work would require Rubicon to Alternatively, the ObjCInstance reference could be considered unrelated to the ObjC memory handling, and require full retain/release calls... but that seems like a bit of a headache.
I'm not certain that's true. If you have a method that is an accessor, rather than a factory, the memory handling requirements of return values are different. Factory methods only need a An automatic |
You're right; I didn't understand that calling
That's why I said retain/autorelease, not just retain. But after thinking about all these examples, I agree it's simpler not to have any special-casing of a returned object, and just change the
The main effect is that any Objective-C object referenced from Python would have its lifetime extended until the end of the current main loop iteration. That should only cause a problem if something depends on the object's As an escape hatch, we could still provide a way to do a more immediate release. For example, we could make |
That doesn't prove whether And as far as I can see, the documentation of |
That's a good point... I guess I should also add a call to iterate one or more iterations of the Cocoa event loop (or possibly run the memory test as a callback in the event loop, to ensure that all the autorelease pool handling is a active).
No object is created autoreleased; but it can be created unowned - which is what happens with both constraints and the URLWithString constructor. The crash that was being addressed with the extra retain calls is really a different manifestation of the deeper issue we're trying to address in this ticket. The constraint object is being created without being owned; however, as part of cleaning up constraints, the Python code tries to use the object. Since there's no strong correspondence between the existence of the Python object and the underlying ObjC object, it's possible that the object that is referenced as a constraint has been disposed; and so we get a crash when Toga does a cleanup on constraints. Explicitly retaining and releasing the constraint means that the Python side retains a permanent reference to the constraint instance - which is what would ideally be implied by the existence of a Python-side ObjCInstance that hasn't been garbage collected. If the code wasn't trying to explicitly remove constraints on the cleanup of the |
Catching upI've been trying to catch up on this discussion just now. Summarizing:
@mhsmith's approach vs what we haveThe approach described by #249 (comment) still makes a lot of sense to me and it builds on some concepts that we've already started to use in Rubicon: An ObjCInstance on the Python side effectively "owns" the a single refcount in ObjC as long as the Python refcount is >= 1. When the Python instance is deleted, we Currently, this lifecycle handling is only automatic if we take ownership through a "alloc", "new", etc, method. In other cases, Rubicon users will need to explicitly call Also summing up @mhsmith's proposal to for my own understanding: For every Rubicon method returning an ObjC object we check if we already have a corresponding ObjCInstance on the Python side, in some weakref registry (e.g., the existing
Our Open questionsI see the main complexity in ensuring that this lookup works properly since we cannot use Chaquopy's approach of JNI references and Storing a unique identifier on the ObjC side might also be an option, e.g., with Besides that, there is also a question around whether we still want to allow manual |
Thanks, this is a good summary.
As long as the Python wrapper is removed from the WeakValueDictionary before its Objective-C reference is released, I think the address alone is a good enough key. I'm not sure if there's any guarantee of the relative order of an object's weakref callbacks and its
I think we should keep them as an escape hatch in case we've missed something. It should be possible to implement this in such a way that any existing correct use of these methods would continue to be correct, even if redundant. |
Agreed - thanks. To rephrase this it in terms high level requirements rather than implementation detail: It should be impossible to have a Python-side reference to an Objective C object that has been disposed. This should currently be true for any object created on the Python side by
There's an edge case where a new ObjC instance of the same type is created and wrapped by Python before garbage collection has concluded on the original wrapper - this would cause the old Python wrapper to be re-used, so any Python-side Using
My concern here is whether there are any consequences to "always use autorelease". I think the only consequence is slightly deferred Obj C deallocation... but I'm not 100% sure on that. A related question: does it make any difference if we call autorelease in the
Agreed. Longer term, we may want to consider making them no-ops, but at least in the short term, having extra manual (but balanced) retain/release calls shouldn't make things any less stable (although we should confirm that as part of testing of any patch). |
If the Python wrapper is removed from the WeakValueDictionary before its Objective-C reference is fully released, then it's impssible for any Objective-C object subsequently allocated at the same address to use the same wrapper. Calling Chaquopy doesn't have an equivalent to
I guess the only way to answer this is to try it. The Toga test suite should be a pretty thorough check.
Not sure what you mean by this – wouldn't that cause the Python wrapper to become invalid if it was kept into the next main loop iteration? |
🤦 Of course - too many cobweb the existing implementation in my mind.
Ignore me - this is another cobweb in my brain (the same cobweb that led to the retain/autorelease pair we recently removed from Toga). |
Coming back this issue after looking at #539, it seems to me that we have two separate problems:
If we have the need to reliably identify the actual ObjC instance for which we cached a Python instance, could we potentially store some additional info, e.g., a UUID, with the ObjC instance? For example using |
If we keep the ObjC object alive until there are no Python references left, then wrong cache hits will no longer be possible, because the address could never be reused as long as its corresponding Python object exists. All the other workarounds will then become unnecessary, including the class name check. |
Fair point! There might be some value of having a cache key that more directly corresponds to what we are caching, independent of guarantees about the lifecycle of the Python-wrapper or the corresponding ObjC object. But its definitely not required if such guarantees exist. |
What is the problem or limitation you are having?
In the process of building the Toga GUI testbed, we discovered #249. A fix for this problem was included in #246; however, that fix was really a safety catch for a specific set of circumstances. The bigger question: how did that set of circumstances happen in the first place?
At present, it is possible for an object to be disposed of by Objective C, but due to delays in propagating garbage collection, the
ObjCInstance
wrapping that object may not be immediately disposed of. This means the weak reference dictionary can contain stale objects.#246 protected against this by protecting against the specific case where a memory address was reused - but if a Python object points at memory that has been disposed of and not reused, you'll generate a segfault. This is obviously undesirable.
Describe the solution you'd like
Functionally - it should be impossible to generate a segfault from Python code.
An idea posited by @mhsmith in the discussion about #249 was that we could maintain stronger reference to the ObjC object on the Python side. This would mean we add add a
retain
calls whenever a Python-side wrapper is created;release
-ing only when Python has finished with the object. The idea here would be that it's better to have larger memory footprint (or even a small leak), than to have a segfault because a Python object has a stale memory reference.Describe alternatives you've considered
Do nothing. The status quo is stable(ish?), and while it, may be theoretically possible to get a stale ObjCInstance object, in practice we're not seeing that problem (after #246 landed, anyway).
Additional context
Other than the general hesitation of fiddling with a complex area of code that appears to work at present - the complication to watch out for here is cyclic references between Python objects and Objective C objects that prevents Objective C from ever releasing memory. It's easy to avoid segfaults by never releasing... but eventually the memory will run out.
The text was updated successfully, but these errors were encountered: