Skip to content

NMatrix Developer Guide

Colin J. Fuller edited this page Feb 24, 2014 · 17 revisions

The goal of this page is to give a general overview of NMatrix for those who may want to hack on the code, and to make it easier for new contributors to jump right in. This is obviously still a work in progress, but will hopefully fill out over time.

Portions of this guide were taken from the original NMatrix Red Book by John Woods.

Terminology

####dtype ext/nmatrix/nmatrix.h The data type stored in a matrix, e.g., int8, int16, float64, rational128, Ruby object. These are generally represented by symbols, and include both the basic type and the number of bits in their names (except for Ruby objects, which vary system to system). These are represented by enum dtype_t in nmatrix.h. ####stype ext/nmatrix/nmatrix.h The storage type of a matrix, such as dense, list-of-lists, or “new” Yale. We generally refer to these as simply dense, list, or yale. These are represented by enum stype_t in nmatrix.h. ####itype ext/nmatrix/nmatrix.h Yale matrices have this property, which is similar to dtype but indicates the type of integer storage used for the matrix indices. The itype of a matrix is a function of the matrix’s shape. itype and dtype may not be interchanged in function pointer arrays except under very specific circumstances. These are represented by enum itype_t in nmatrix.h. ####shape The dimensions of a matrix, e.g., n-by-m, which is given in the order: [rows, columns, …].

Namespaces and Prefixes

While most of NMatrix is now written in C++, it is important that compatibility be maintained with C. If a templated global C++ function needs to be accessed from C (e.g., for either API or the Ruby interface), the following convention is utilized:

  • The C++ function is placed in a namespace (e.g., namespace nm { }) or is declared static if possible. The C function receives the prefix nm_, e.g., nm_multiply() (this function also happens to be static).
  • C macros are capitalized and generally have the prefix NM_, as with NM_DTYPE().
  • C functions (and macros, for consistency) are placed within extern “C” { } blocks to turn off C++ mangling.
  • C macros (in extern blocks) may represent C++ constants (which are always defined in namespace nm {} or a child thereof).

Components and how they work

Dense matrices

List matrices

Yale matrices

Interacting with the ruby garbage collector

####Background

From within C/C++ code, ruby exposes objects as the typedef VALUE, which as of this writing is an unsigned long. Internally, this might be just the (slightly modified) value of a number (for things like fixnum), or a casted pointer to a ruby data structure.

The ruby garbage collector is a mark and sweep type collector. For ruby programs (ignoring C extensions for the moment), this means, in the simplest sense, that the interpreter keeps track of all objects in existence, and when garbage collection runs, a first pass marks all objects that are accessible from the code, and a second pass frees up all the objects that are no longer accessible. In a C extension, this might cause problems: the C code might use ruby VALUEs that aren't accessible from the ruby side (and thus wouldn't be marked), but still shouldn't be freed up. There are a number of mechanisms in place to prevent this problem:

  • Garbage collection only runs once C code has returned to ruby, or during a call to a ruby C API method. This means that you don't need to worry about something like a dedicated GC thread starting garbage collection at any arbitrary point in your function; only defined points can be problematic.

  • Data_Wrap_Struct: this is a ruby C API method that is used to wrap a C struct in a ruby VALUE (see ruby's README.ext for more details). It allows you to pass a marking function and a freeing function. If the ruby garbage collector marks this ruby VALUE, then the marking function will be called. By creating an appropriate marking function, it's possible to mark VALUEs hidden in the C struct and prevent them from being garbage collected. For NMatrix, this mechanism is key for the implementation of object-dtype NMatrix objects.

  • Ruby checks the stack for VALUEs and pointers to VALUEs still in use by your C code. This is pretty neat. If you for instance have a case where your code is:

VALUE x = ...;
rb_call_some_c_api_method();
return x;

Then ruby should see x on the stack and make sure not to garbage collect it during that api call. The same is true if x is a VALUE* to some VALUE(s) on the heap.

The problem

Two cases aren't sufficiently dealt with by these mechanisms.

  • You have a pointer on the stack to some struct that internally contains VALUEs, but you don't have a pointer to those VALUEs (or the VALUEs themselves) on the stack, and you want to make a ruby C API call. This would be simply solved by just putting the VALUEs on the stack before the API call if not for the second problem.

  • Optimizing compilers. If you're running the compiler with any optimizations turned on, it's hard to guarantee that any particular VALUE is actually on the stack when you need it to be. Given that NMatrix is a library for scientific computing, in which it's common to be CPU-limited, turning off optimizations is not ideal.

The typical solution

The typical solution to the problem of the optimizing compiler is to mark VALUEs as volatile, a keyword that (simplistically) indicates that some code that the compiler doesn't know about (whether hardware, another thread, etc.) might interact with the variable declared volatile. This generally means that the compiler won't optimize volatile variables out because there might be some unintended side effect.

To solve the problem using volatile:

  • Find everywhere there's a call to a ruby API method (or a call to an NMatrix method that calls a ruby API method, etc.).
  • Before each call, ensure that all VALUEs in use by the code (whether normally declared directly or as part of a struct, etc.) are stored in a volatile variable on the stack.

However, it's not completely clear whether this will prevent all optimizations that would cause issues with the garbage collector. Even if volatile does prevent all problematic optimizations, it's not clear that this is desirable from a performance perspective (however, more testing would be needed to figure this out). A reasonable interpretation of recent C++ specifications might also be that use of volatile is discouraged except for hardware interactions. Thus, just marking all VALUEs volatile is perhaps not ideal.

The NMatrix solution

NMatrix has chosen instead to use a static data structure that tracks all VALUEs currently in use by the code and that can tell the garbage collector to mark these at the appropriate time. (This structure is created in ruby_nmatrix.c in __nm_initialize_value_container and marked by __nm_mark_value_container.) This structure is currently just a linked list implementation of a stack of VALUEs in use. While this in theory is not great because it requires a linear-time search for a VALUE in order to remove it when it's no longer needed, in practice, this removal usually happens in the reverse order that VALUEs were added, so only the top of the stack needs to be checked.

In order for this solution to work, a function must ensure that any VALUEs in use are registered in this data structure before any ruby API call that could cause garbage collection. To do this, there are functions nm_register_value(VALUE&), nm_register_values(VALUE*, size_t), nm_register_storage(stype_t, STORAGE*), and nm_register_nmatrix(NMATRIX*).

Here's an (admittedly contrived) example:

VALUE* do_some_calculation() {
  VALUE result = some_other_function();
  nm_register_value(result);
  VALUE* result_holder = NM_ALLOC(VALUE);
  *result_holder = result;
  nm_unregister_value(result);
  return result_holder;
}

In this example it's possible with optimizations on that result might be garbage collected in the NM_ALLOC macro, which just expands to the ruby ALLOC macro, which in turns expands to an allocation function that can run garbage collection if memory is low. The nm_register_value call ensures that result is safe during the ruby API call.

Note from this example that a call to nm_unregister_value is also required. All of the nm_register_ functions have a matching nm_unregister_ partner. This should be called after the last dangerous ruby API call to remove the VALUE from the registration data structure. Otherwise the VALUE will never be freed up during GC and will effectively be leaked. This is analogous to malloc and free, although one important difference is that by convention, a VALUE registered in a function should always be unregistered in that same function (since it's only necessary to maintain the registration during ruby API calls). This makes it easier to avoid memory leaks resulting from having to track registrations across function boundaries. For the current implementation of the registration structure (a stack), it's most efficient to unregister VALUEs in reverse order from their registrations, so if there's a choice, do this.

What happens when we're not sure?

The registration approach is complicated in that it's not always easy to know whether a given registration is required or not, since the appearance of bugs will depend on the precise timing of when GC runs and what compiler optimizations are happening. It's undesirable to have too many extra registrations because each registration imposes a slight performance decrease, and NMatrix is designed to be high-performance. For cases where we're not sure that a registration is needed or isn't needed, we've added a preprocessor macro NM_CONSERVATIVE, which just deletes a statement. For example:

NM_CONSERVATIVE(nm_register_value(result));

would make this registration not happen in production code. This does make the code messier, though, so if you're totally sure a registration isn't needed, don't put it in at all.

For debugging purposes, you can turn back on all the NM_CONSERVATIVE statements by defining NM_GC_CONSERVATIVE during compilation. This is the first thing to try if you're seeing a bug where it looks like something is being garbage collected too soon. If this fixes it, it should then be possible to script going through all the NM_CONSERVATIVE statements and disable each one in turn to find which is the problematic one.

Also note that when marking registrations with NM_CONSERVATIVE its partner unregistration should also be marked conservative. While unregistering a VALUE that wasn't registered won't cause a serious problem, it may incur a performance overhead from checking through all the VALUEs that are registered to see if any match.

NMatrix allocation macros

As an additional debugging tool, we've defined NMatrix-specific allocation macros that mirror the ruby ones. For example, NM_ALLOC(type) does the same thing as ruby's ALLOC(type). We've also added NM_FREE to replace xfree. Use these NMatrix-specific ones in NMatrix code. When debugging memory issues, this allows easier instrumentation of allocations and deallocations throughout the NMatrix code without having to worry about redefining ALLOC everywhere that its definition is included.

Clone this wiki locally