Skip to content

Support copying GC #18

@wks

Description

@wks

Challenges for supporting copying GC include

  • Some fields cannot be updated, and have to pin the objects they point to.
  • Some data structures depend on the address of objects, and need to be updated.
    • ID-to-address and address-to-ID map
    • address-to-gen_ivtbl map
    • finalizer_trable

Handling of roots:

Upstream changes needed for copying GC

Correctness goals

Performance goals

  • Performance on par or exceed CRuby's vanilla GC
    • Currently when running Liquid benchmark, MMTk-Ruby is close to vanilla Ruby at 2x minimum heap size, and can outperform vanilla Ruby at 3x minimum heap size or greater.
    • Still room for improvement for STW time. See: GC performance issues for MMTk Ruby #25

Un-update-able references

One challenge of supporting copying GC in Ruby is that some object references cannot be updated. Specifically,

  1. Due to conservative stack scanning, local variables (in C functions, not in Ruby functions) cannot be updated.
  2. Some fields in global data structures are marked with rb_gc_mark during gc_mark_roots. Those fields cannot be updated.
  3. Some objects have fields that cannot be updated.

Because object references held in those places cannot be updated, the objects pointed by those reference must be pinned. In other words, if an object has type T_DATA, T_IMEMO, T_HASH or has the EXIVAR flag, the object itself can move, but it pins its children.

Recording "potential pinning parents"

The Ruby binding shall maintain a runtime list of "potential pinning parents" (PPP for short). That includes all types of object in (3) in the previous section. Specifically,

  • When T_DATA, and those listed T_MEMO are instantiated, we add them to the PPP list.
  • When a T_HASH becomes compare_by_identity or when an object gets the EXIVAR flag, we add it to the PPP list.

Note that some PPPs don't always pin their children. Some T_DATA can actually move their children because it is modern, and the developers used rb_gc_mark_movable and provided the dcompact function. For other PPPs, their pinning fields may just be nil at the moment of the GC. But we have to add them to the PPP list conservatively because we don't know if a T_DATA is modern enough or if any field is nil.

We visit all PPPs before GC and pin their children (via pinning fields only), so when GC starts, those children won't move. Note that

  • those children are not kept alive, and
  • the PPPs' children's children are not recursively pinned.

After GC, re-visit the PPP list, and remove all dead objects from it. Unpin live objects in the PPP list.

More language-neutral discussions on this topic are here: mmtk/mmtk-core#690

Ways to reduce the number of potential pinning parents

  • Introduce declarative marking. T_DATA objects that support declarative marking are not considered PPPs.
  • Whitelist T_DATA types in Ruby core/stdlib that are known not to pin children
  • Fix T_DATA types in Ruby core/stdlib, replace their rb_gc_mark with rb_gc_mark_movable, and introduce compaction functions for them.

Address-aware data structures

Global address-to-ID table

In Ruby, objects may optionally have an ID. Once the ID of an object is seen, it will never change as long as it is alive. In vanilla Ruby, it maintains two tables:

typedef struct rb_objspace {
    // ...
    st_table *id_to_obj_tbl;
    st_table *obj_to_id_tbl;
    // ...
} rb_objspace_t;

Those table are maintained when objects are moved (in gc_move) and entries are removed when objects die (in obj_free).

In MMTk, we should consider them weak maps, and use our existing weak reference processing framework to handle them. Effectively, those table entries are things that gets garbage-collected when their owners (the object) die. We can treat both id_to_obj_tbl and obj_to_id_tbl as one single bi-directional weak map.

Interestingly, this is exactly what WeakReferences are intended for. In Java documentation:

Weak reference objects, which do not prevent their referents from being made finalizable, finalized, and then reclaimed. Weak references are most often used to implement canonicalizing mappings.

Global address-to-gen_ivtbl map

If an object is not T_OBJECT, its instance variables (@foo, @bar, ... in Ruby language) are stored in an external generic instance variable table (gen_ivtbl), and is associated to the object via a global hash map (generic_iv_tbl_), with the object address as key, and the gen_ivtbl as value.

This global map (generic_iv_tbl_ in variable.c) needs to be updated whenever an object is moved. In vanilla Ruby, this is done in gc_move, by calling rb_mv_generic_ivar

Like the address-to-ID table, this is also a "canonical map", and should be treated as a weak map with weak keys.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions