Stack overflow in the garbage collector #21

DemiMarie · 2016-06-19T02:54:02Z

When running julia --lisp (so using Julia's version of Femtolisp) with this Scheme file:

(define (longer-and-longer x)
  (let ((my-car (car x)))
    (set-car! x (cons my-car #f))
    (longer-and-longer x)))

(longer-and-longer (cons #f #f))

the result is a SIGSEGV due to a stack overflow (confirmed by Valgrind).

I suspect this is in the garbage collector, since replacing car with cdr (and set-car! with set-cdr! takes much longer before segfaulting (possibly due to a use-after-free error, most likely due to integer overflow).

The text was updated successfully, but these errors were encountered:

briandamgaard · 2017-01-12T11:31:28Z

It's obvious why the stack overflow occurs. The Cheeney-style copying garbage collectors I've seen in Bezanson's projects have always suffered from the foolish bug that the relocate() function calls itself recursively when it relocates the car part of a lisp pair, and also when it relocates the final cdr part of the lisp pair at the end of a list.

This has always been an unfortunate blemish on all the otherwise very elegant and intelligent code he so graciously has published over the years.

static value_t relocate(value_t v)
...
car_(nc) = relocate(a);
...
*pcdr = (d==NIL) ? NIL : relocate(d);

JeffBezanson · 2017-01-12T16:06:39Z

projects have always
over the years

How many projects are we talking about here?

briandamgaard · 2017-01-12T16:35:50Z

Jeff wrote:

How many projects are we talking about here?

The main FemtoLisp project and all its so-called tiny variants, e.g., "lisp.c" and "lisp2.c"

JeffBezanson · 2017-01-12T17:35:44Z

Ok, so one project.

JeffBezanson · 2017-01-12T17:51:39Z

I would be happy to see a pull request fixing this. I do think there are trade-offs though --- we would need to use more heap space, and it might be slower, and car-deep structures are pretty rare in real code. Of course it could be a build-time option if necessary.

briandamgaard · 2017-01-12T20:39:34Z

Jeff wrote:

we would need to use more heap space, and it might be slower,

It does not need more heap space.
It does not run slower.

Your GC code aspires to be is a simple copying collector, which is a perfectly fine choice, in particular for the tiny variants. You have just misunderstood how such a simple copying GC works, so you have conflated copying and reference-updating.

A correct implementation is at least as simple and elegant as what you have, and it won't even take an hour's work to rewrite your code.

For convenience, here is a link: https://en.wikipedia.org/wiki/Cheney's_algorithm

Please note that this is not a different algorithm. This is what your code always has aspired to implement: A simple copying garbage collector. Your existing code just suffers from a misunderstanding about how it's done.

JeffBezanson · 2017-01-12T21:01:28Z

It seems to me the difference is breadth-first vs. depth-first order. Yes, I see breadth-first order might be better.

briandamgaard · 2017-01-12T21:22:33Z

Jeff wrote:

It seems to me the difference is breadth-first vs. depth-first order.
Yes, I see breadth-first order might be better.

It's not just a question about being better, it's a question about being correct. When it's done the right way, there is no recursion. Put differently, there is no drain on stack space, so stack overflow is not an issue.

The "might" in your sentence indicates that you haven't fully understood the crucial difference yet. When you've had the time to let it sink in, I'm sure you'll realize how easy and quickly you can fix it, both for the benefit of the users, but also for the algorithmic beauty, which is one of the rewards for the (us) programmers. Your projects are so good that they surely deserve a correctly implemented and elegant garbage collector!

nilern · 2020-02-26T11:31:05Z

Indeed I assumed that Cheney's algorithm would be used as it is very much compatible with the "simple, interesting & fast" philosophy of Femtolisp. I was going to just implement it myself and make a PR but then I realized that

It actually isn't a small change. Femtolisp does not have uniform heap object headers like e.g. Chicken, instead using tagged pointers also for heap object type tags. But the Cheney scan pointer is not a tagged pointer so the algorithm would not have access to the layout information for the current object.
Adding the required object headers would actually use more heap space.
Breadth-first copying in itself might actually be faster (no stack!), but on the other hand it spreads list conses around while depth-first copying colocates them, so overall cache utilization would be worse.

briandamgaard · 2020-02-26T12:43:39Z

Nilern wrote:

It actually isn't a small change. Femtolisp does not have uniform heap
object headers like e.g. Chicken, instead using tagged pointers
also for heap object type tags.

Ah, the missing object headers is indeed a nuisance, but from the top of my head, I'd say that it's still doable in an easy manner. All it takes is, that you ensure that all objects on the heap are 8-byte aligned. That buys you the needed extra tag-value to signal that an object in the old heap contains a forward pointer after it has been relocated.

nilern wrote:

Breadth-first copying in itself might actually be faster (no stack!),
but on the other hand it spreads list conses around while
depth-first copying colocates them, so overall cache utilization would be worse.

FemtoLisp is a small and elegant Lisp. It's cannot be - and is not meant to be - particularly efficient, so this is an unimportant detail.

briandamgaard · 2020-02-26T13:27:35Z

Brian wrote:

All it takes is, that you ensure that all objects on the heap are 8-byte aligned.
That buys you the needed extra tag-value to signal that an object in the old
heap contains a forward pointer after it has been relocated.

I have only looked at the very small FemtoLisp versions (I cannot remember if there is a larger one, more geared toward production mode), and here it's not even necessary with an extra tag-value for a forward pointer. The only objects on the heap are cons cells.

This means that an untagged pointer (as if it has the tag "TAG_NUM") can be used to signal that an object in the old heap contains a forward pointer. A pleasant side effect is, that it makes the handling of the forward pointers as efficient as it possibly can be.

nilern · 2020-02-26T16:43:24Z

Breadth-first copying in itself might actually be faster (no stack!),
but on the other hand it spreads list conses around while
depth-first copying colocates them, so overall cache utilization would be worse.

FemtoLisp is a small and elegant Lisp. It's cannot be - and is not meant to be - particularly efficient, so this is an unimportant detail.

It's not particularly inefficient either. Just thought to mention that the breadth-first algorithm is not strictly superior.

I have only looked at the very small FemtoLisp versions (I cannot remember if there is a larger one, more geared toward production mode), and here it's not even necessary with an extra tag-value for a forward pointer. The only objects on the heap are cons cells.

Indeed there is a main production version and it has vectors and C interop values and whatnot.

This means that an untagged pointer (as if it has the tag "TAG_NUM") can be used to signal that an object in the old heap contains a forward pointer. A pleasant side effect is, that it makes the handling of the forward pointers as efficient as it possibly can be.

Forwarding pointers are already implemented more or less like that. The problem is finding the start of the next object in the tospace queue. If there were just cons cells we could just bump a value_t pointer by two. When there are many types their sizes have to be stored on the heap in some uniform manner and the start of an object has to be distinguishable from alignment holes if some objects are more than word-aligned.

Of course there is also the Big Bag of Pages arrangement (i.e. group the objects by type so headers are not needed) but I really don't think we want to go there.

briandamgaard · 2020-02-26T17:40:42Z

nilern wrote:

The problem is finding the start of the next object in the tospace queue.
If there were just cons cells we could just bump a value_t pointer by two.
When there are many types their sizes have to be stored on the heap
in some uniform manner and the start of an object has to be distinguishable
from alignment holes if some objects are more than word-aligned.

It's not a problem to find the the start of the next object, and you don't need to store object sizes for all objects types.

It's easy and efficient to use a lookup table indexed by object type, reserving the return value "length = 0" to represent "variable length object type".

All variable sized objects on the heap contain their length, or they contain enough information to compute the length on the fly, including alignment.

Strictly speaking, the length information stored in variable sized objects don't need to have their lengths stored in a uniform manner. Uniformity is more a matter of convenience for the programmer.

nilern · 2020-02-26T19:45:57Z

It's easy and efficient to use a lookup table indexed by object type, reserving the return value "length = 0" to represent "variable length object type".
All variable sized objects on the heap contain their length, or they contain enough information to compute the length on the fly, including alignment.

Yes, if we have the type at hand. But the Cheney scan pointer cannot be a tagged pointer, so we don't have the tag bits of the current object, so there is nothing to index with; the type is unknown.

Also if there are uninitialized alignment holes we can't get the alignment of the next object because we don't know where it starts so we can't hop over the hole to its start to get the alignment...

Strictly speaking, the length information stored in variable sized objects don't need to have their lengths stored in a uniform manner. Uniformity is more a matter of convenience for the programmer.

"Uniform" was a poor choice of words. What I meant is that the heap contains enough "reflection" information for the GC to traverse it in order of addresses instead of following pointers. The specific term is "heap parsability". Actual uniformity does also have some performance advantage but I don't know how significant that is, especially in this context.

briandamgaard · 2020-02-26T20:48:45Z

nilern wrote:

Yes, if we have the type at hand. But the Cheney scan pointer
cannot be a tagged pointer, so we don't have the tag bits of the
current object, so there is nothing to index with; the type is unknown.

That's not true. You do have the tag bits of the current object. An object copied into the new heap is complete, but its pointers to other objects will first be updated when the object is scanned.

There is nothing difficult in implementing a Lisp heap and using Cheney's algorithm for the garbage collection. I'm sure that the necessary minor modifications can be accomplished in a day's work or so.

Nilern wrote:

Also if there are uninitialized alignment holes we can't get the alignment
of the next object because we don't know where it starts
so we can't hop over the hole to its start to get the alignment...

Alignment is no problem. There are no gaps. It's the responsibility of variable-sized objects always to allocate memory up to alignment, if necessary. If the implementation doesn't do that already, then it's your job to complete the task. If it makes things easier and more efficient, then add a field with the total length to the variable-sized objects. These objects has a size where is doesn't matter to add an extra field.

nilern wrote:

Actual uniformity does also have some performance advantage

Uniformity in itself doesn't matter for performance, but it may make things easier for the programmer, and if there is one efficient method that works well for all object types, then there is no reason to do it differently. But strictly speaking, since there are individual code lumps for scanning each object type anyway, then object type A is welcome to use one method for computing or storing its length, while object type B use a different strategy.

briandamgaard · 2020-02-27T07:46:05Z

Brian wrote:

That's not true. You do have the tag bits of the current object.
An object copied into the new heap is complete, but its pointers
to other objects will first be updated when the object is scanned.

Ah, You're right nivern. I woke up this morning finally understanding what you say. Your initial description of the problem sent me in the wrong direction by saying that the problem is the omission of object header fields.

It's perfectly all right to omit an explicit object header field, also for a copying garbage collector like Cheney's algorithm. All it takes is, of course, that the heap still can be parsed, as you also point out.

To ensure that the heap can be parsed, the program should store the object type in form of tag bits in the first field of the object. This is what I - very naturally - assumed the program was doing because that's necessary in order to make things work with a copying garbage collector. This is the background for all what I have written until now.

There is room for the type tag in the low bits of the first field of each object, and in effect this works as a minimal object header, without adding to the memory footprint.

So what you're pointing out is, that FemtoLisp has made the design mistake not to tag the objects, but to tag pointers to objects. That's a near-fatal mistake when the intention is to use a copying garbage collector.

As I see it, there are two choices to turn FemtoLisp into a usable (i.e., non-crashing) Lisp implementation:

Change the tagging, so it tags objects instead of the pointers to the objects, thereby making is a clean and straightforward design.

This modification is probably not as complicated as it may sound. The "is-object-type?" functions and the "extract-pointer" functions are already isolated in macros.

It's funny: This operation confirms the old saying, that there is nothing in programming that cannot be solved by adding an extra layer of indirection :-)

While garbage-collecting, use a vector with one-byte-per-heap-lisp-cell to store the type of the relocated object starting at that cell, if any. The vector is nor needed anymore after the garbage collection.

Maybe this is the easiest and most safe way to make the program work, without changing anything else in the program, but according to my taste, it's a less attractive alternative because of the lack of elegance.

nilern · 2020-02-27T13:27:53Z

To ensure that the heap can be parsed, the program should store the object type in form of tag bits in the first field of the object. This is what I - very naturally - assumed the program was doing because that's necessary in order to make things work with a copying garbage collector. This is the background for all what I have written until now.

The tag bits of the first field are already used for the type of the field value. The object type does not need to be stored in the heap to use a copying collector. All that is needed is a way to convert a heap object into a "broken heart" which stores a forwarding pointer and can be identified as such by the GC. Femtolisp does as much and implements copying collection just fine (except for the stack overflow).

So what you're pointing out is, that FemtoLisp has made the design mistake not to tag the objects, but to tag pointers to objects. That's a near-fatal mistake when the intention is to use a copying garbage collector.

I would not call it a mistake. It saves space and makes type tests faster. But it does severely limit the choice of GC algorithms.

Change the tagging, so it tags objects instead of the pointers to the objects, thereby making is a clean and straightforward design.
This modification is probably not as complicated as it may sound. The "is-object-type?" functions and the "extract-pointer" functions are already isolated in macros.

Nevertheless it is quite a fundamental change. These things are also subtle and tedious to debug.

While garbage-collecting, use a vector with one-byte-per-heap-lisp-cell to store the type of the relocated object starting at that cell, if any. The vector is nor needed anymore after the garbage collection.

That feels like a kludge.

As I see it, there are two choices to turn FemtoLisp into a usable (i.e., non-crashing) Lisp implementation

It would also be possible to retain the current algorithm by removing the recursion and using a heap-allocated stack instead of the rather small C stack. At least for mark and sweep there is also the pointer reversal trick that stores the mark stack in the objects themselves (the 'Deutsch-Schorr-Waite algorithm').

briandamgaard · 2020-02-27T14:23:37Z

nilern wrote:

The tag bits of the first field are already
used for the type of the field value.

Yes, that's how it is now, but just for the record, there was nothing wrong in what I said, which was that I suggested to do it differently, i.e., to store the type tags in the objects and not the pointers.

nilern wrote:

It would also be possible to retain the current algorithm
by removing the recursion and using a heap-allocated
stack instead of the rather small C stack.

That's right, and to me that sounds as the best choice, given the current architecture.

nilern wrote:

At least for mark and sweep there is also the pointer
reversal trick that stores the mark stack in the
objects themselves (the 'Deutsch-Schorr-Waite algorithm').

I also considered that alternative briefly, but I would have to think more about it, before I know for sure, if it can be done without running into the problem with the missing types in the objects. Upon return to a pointer slot, it requires quite a lot of information to know how to proceed.

briandamgaard · 2020-02-27T20:25:13Z

nilern wrote:

It would also be possible to retain the current algorithm
by removing the recursion and using a heap-allocated
stack instead of the rather small C stack.

Brian wrote:

That's right, and to me that sounds as the best choice,
given the current architecture.

A heap-allocated stack is not the right choice, but the good news are, that considering the stack as one of the few good possibilities provides the killer-argument against choosing anything but the natural solution.

The memory footprint of the stack is theoretically exactly the same as if a header field is added to each object in the heap. This means that there is no argument against adding such a header field. That's a big relief because there is no need to feel bad about spending memory on the header field.

The header field contains the obvious pieces of information for the copying garbage collector:

A. The object size, including alignment, in the upper bits, thereby providing the size in a uniform and efficient manner, so the copying garbage collector knows where the next object begins.

B. The object type in the lower bits, so the garbage collector knows what to do with the current object.

Its'a win-win situation. It's the easiest way to implement the copying garbage collector, and it's efficient. The added memory footprint is of no concern. All the existing code runs unmodified, with its types stored in pointers. The only changes in the source code are the few added code lines - no changes - to the object constructor functions. They build the header fields for their respective object types.

So the bottom line is, that there is no reason to look for alternatives to the natural solution.

JeffBezanson added the bug label Nov 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack overflow in the garbage collector #21

Stack overflow in the garbage collector #21

DemiMarie commented Jun 19, 2016

briandamgaard commented Jan 12, 2017 •

edited

Loading

JeffBezanson commented Jan 12, 2017

briandamgaard commented Jan 12, 2017 •

edited

Loading

JeffBezanson commented Jan 12, 2017

JeffBezanson commented Jan 12, 2017

briandamgaard commented Jan 12, 2017 •

edited

Loading

JeffBezanson commented Jan 12, 2017

briandamgaard commented Jan 12, 2017 •

edited

Loading

nilern commented Feb 26, 2020

briandamgaard commented Feb 26, 2020

briandamgaard commented Feb 26, 2020

nilern commented Feb 26, 2020

briandamgaard commented Feb 26, 2020 •

edited

Loading

nilern commented Feb 26, 2020 •

edited

Loading

briandamgaard commented Feb 26, 2020 •

edited

Loading

briandamgaard commented Feb 27, 2020 •

edited

Loading

nilern commented Feb 27, 2020

briandamgaard commented Feb 27, 2020

briandamgaard commented Feb 27, 2020

Stack overflow in the garbage collector #21

Stack overflow in the garbage collector #21

Comments

DemiMarie commented Jun 19, 2016

briandamgaard commented Jan 12, 2017 • edited Loading

JeffBezanson commented Jan 12, 2017

briandamgaard commented Jan 12, 2017 • edited Loading

JeffBezanson commented Jan 12, 2017

JeffBezanson commented Jan 12, 2017

briandamgaard commented Jan 12, 2017 • edited Loading

JeffBezanson commented Jan 12, 2017

briandamgaard commented Jan 12, 2017 • edited Loading

nilern commented Feb 26, 2020

briandamgaard commented Feb 26, 2020

briandamgaard commented Feb 26, 2020

nilern commented Feb 26, 2020

briandamgaard commented Feb 26, 2020 • edited Loading

nilern commented Feb 26, 2020 • edited Loading

briandamgaard commented Feb 26, 2020 • edited Loading

briandamgaard commented Feb 27, 2020 • edited Loading

nilern commented Feb 27, 2020

briandamgaard commented Feb 27, 2020

briandamgaard commented Feb 27, 2020

briandamgaard commented Jan 12, 2017 •

edited

Loading

briandamgaard commented Jan 12, 2017 •

edited

Loading

briandamgaard commented Jan 12, 2017 •

edited

Loading

briandamgaard commented Jan 12, 2017 •

edited

Loading

briandamgaard commented Feb 26, 2020 •

edited

Loading

nilern commented Feb 26, 2020 •

edited

Loading

briandamgaard commented Feb 26, 2020 •

edited

Loading

briandamgaard commented Feb 27, 2020 •

edited

Loading