-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Challenges for supporting copying GC include
- Some fields cannot be updated, and have to pin the objects they point to.
- Some data structures depend on the address of objects, and need to be updated.
- ID-to-address and address-to-ID map
- address-to-gen_ivtbl map
-
finalizer_trable
Handling of roots:
- Pin roots using the object-pinning API (
mmtk::memory_manager::pin_object) - Make all roots "black roots" (i.e. use the mark bits of Immix directly instead of the pinning bit)
- Support for pinning roots (transitive and non-transitive) has been merged into mmtk-core. This PR enables transitively pinning (TP) objects from particular roots for Immix/StickyImmix mmtk-core#897
- mmtk-ruby now presents all roots to mmtk-core as pinning roots. 94ae9c1
-
Make some roots movable.- It is not worth doing so. Only about 10 roots can be moved.
Upstream changes needed for copying GC
- mmtk-core
-
Scan roots usingtrace_objectdirectly. Exposetrace_objectto the VM binding mmtk-core#710Needed to support movable rootsNo. We probably don't need it.
- Support black roots, i.e. roots that pin the immediate children (and keep them alive), but do not pin transitively.
Adding support for red/black roots mmtk-core#706This PR enables transitively pinning (TP) objects from particular roots for Immix/StickyImmix mmtk-core#897- Needed to support non-movable "black" roots.
- Ruby doesn't need red roots.
- Can be worked around using the object-pinning API, but not as efficient. Immix can use the existing "marked bit" to both keep the object from moving and keep the object alive.
- Ability to pin objects (but do not keep them alive): Support for object pinning mmtk-core#703
-
- Ruby
- Make more PPP types movable by implementing updating functions and removing
rb_gc_mark.
- Make more PPP types movable by implementing updating functions and removing
Correctness goals
- Pass all btests
- Note: an upstream bug is causing some tests to fail randomly: Trying to open buckets when there are pending coordinator packets mmtk-core#770
- Pass "full" tests that are already enabled
- Tests related to Ractors are disabled because it is not our current goal to support Ractor.
Performance goals
- Performance on par or exceed CRuby's vanilla GC
- Currently when running Liquid benchmark, MMTk-Ruby is close to vanilla Ruby at 2x minimum heap size, and can outperform vanilla Ruby at 3x minimum heap size or greater.
- Still room for improvement for STW time. See: GC performance issues for MMTk Ruby #25
Un-update-able references
One challenge of supporting copying GC in Ruby is that some object references cannot be updated. Specifically,
- Due to conservative stack scanning, local variables (in C functions, not in Ruby functions) cannot be updated.
- Some fields in global data structures are marked with
rb_gc_markduringgc_mark_roots. Those fields cannot be updated. - Some objects have fields that cannot be updated.
- See Remove pinning fields in built-in Ruby types. ruby#54 for a list of such fields
Because object references held in those places cannot be updated, the objects pointed by those reference must be pinned. In other words, if an object has type T_DATA, T_IMEMO, T_HASH or has the , the object itself can move, but it pins its children.EXIVAR flag
Recording "potential pinning parents"
The Ruby binding shall maintain a runtime list of "potential pinning parents" (PPP for short). That includes all types of object in (3) in the previous section. Specifically,
- When
T_DATA, and those listedT_MEMOare instantiated, we add them to the PPP list. - When a
T_HASHbecomescompare_by_identityor when an object gets theEXIVARflag, we add it to the PPP list.
Note that some PPPs don't always pin their children. Some T_DATA can actually move their children because it is modern, and the developers used rb_gc_mark_movable and provided the dcompact function. For other PPPs, their pinning fields may just be nil at the moment of the GC. But we have to add them to the PPP list conservatively because we don't know if a T_DATA is modern enough or if any field is nil.
We visit all PPPs before GC and pin their children (via pinning fields only), so when GC starts, those children won't move. Note that
- those children are not kept alive, and
- the PPPs' children's children are not recursively pinned.
After GC, re-visit the PPP list, and remove all dead objects from it. Unpin live objects in the PPP list.
More language-neutral discussions on this topic are here: mmtk/mmtk-core#690
Ways to reduce the number of potential pinning parents
- Introduce declarative marking.
T_DATAobjects that support declarative marking are not considered PPPs. - Whitelist
T_DATAtypes in Ruby core/stdlib that are known not to pin children - Fix
T_DATAtypes in Ruby core/stdlib, replace theirrb_gc_markwithrb_gc_mark_movable, and introduce compaction functions for them.
Address-aware data structures
Global address-to-ID table
In Ruby, objects may optionally have an ID. Once the ID of an object is seen, it will never change as long as it is alive. In vanilla Ruby, it maintains two tables:
typedef struct rb_objspace {
// ...
st_table *id_to_obj_tbl;
st_table *obj_to_id_tbl;
// ...
} rb_objspace_t;Those table are maintained when objects are moved (in gc_move) and entries are removed when objects die (in obj_free).
In MMTk, we should consider them weak maps, and use our existing weak reference processing framework to handle them. Effectively, those table entries are things that gets garbage-collected when their owners (the object) die. We can treat both id_to_obj_tbl and obj_to_id_tbl as one single bi-directional weak map.
Interestingly, this is exactly what WeakReferences are intended for. In Java documentation:
Weak reference objects, which do not prevent their referents from being made finalizable, finalized, and then reclaimed. Weak references are most often used to implement canonicalizing mappings.
Global address-to-gen_ivtbl map
If an object is not T_OBJECT, its instance variables (@foo, @bar, ... in Ruby language) are stored in an external generic instance variable table (gen_ivtbl), and is associated to the object via a global hash map (generic_iv_tbl_), with the object address as key, and the gen_ivtbl as value.
This global map (generic_iv_tbl_ in variable.c) needs to be updated whenever an object is moved. In vanilla Ruby, this is done in gc_move, by calling rb_mv_generic_ivar
Like the address-to-ID table, this is also a "canonical map", and should be treated as a weak map with weak keys.