-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Recursive datatypes #1337
base: master
Are you sure you want to change the base?
Conversation
This makes it easier to work with validation functions at call sites as well as paves the way for permitting recursive types (we pass along the type name to validation procedures).
This commit implements initial support for recursive product data types. In C, they're represented as structs that have a field that is a pointer to the same struct type. In Carp, we currently substitute recursive references with pointers to the type, and users must provide a pointer argument during instantiation. To make creating initial values of these types easier, we define a make function, which initializes a value of the type with its recursive part set to the null pointer.
This commit adds a number of alternative type getters/initers for recursive product types. These are primarily needed to hide the underlying pointer implementation from the user (otherwise, users need to deal with pointers explicitly). This permits one to write: ```clojure (deftype IntList [head Int tail IntList]) (IntList.tail &(IntList.init 2 (IntList.make 1))) ``` Instead of writing: ```clojure (IntList.tail (Pointer.to-ref &(IntList.init 2 (Pointer.to-value (IntList.make 1))))) ```
This is in keeping with the way we handle other structs in Carp.
Before, we attempted to free some memory that was never allocated (since we just print type string literals for recursive portions of a type).
Previously we did not delete the pointers of children of recursive structs, only their immediate member pointers. This commit fixes that issue. Note that this is currently handled as a special case and should be made general.
This commit is bigger than it should be, for which I apologize, but it bundles a couple of changes that all work toward supporting recursive data types: - It makes type candidates their own module and additionally allows them to specify interface constraints -- that one or more member types must implement some set of interfaces. - Updates recursive type handling to allow for "indirect" recursion. This permits using types that implement two interfaces alloc and indirect as containers for the recursive part. - We now forward declare recursive types to support the case above. - Adds a (currently unsafe) Box type for supporting heap allocated, memory managed indirection.
Enables users to use "direct" recursion on sumtypes and abstracts away pointers in function signatures such as case initers.
We need to indirect implicitly casted pointers to structs back to their values in order for match to work similarly for recursive types as it does for non-recursive types.
@carp-lang/maintainers if you want an early look. So far we have:
I need to clean things up, add some tests and add support for generics, but we're getting there. |
e.g. you can run this: (deftype IntList (Nil []) (Cons [Int IntList]))
(defn main []
(let [is (IntList.Cons 2 (IntList.Cons 1 (IntList.Nil)))]
(match is
(IntList.Cons x next)
(match next
(IntList.Nil) 0
(IntList.Cons _ rest)
(match rest
(IntList.Nil) 0
_ 1))
_ 1))
) Or this, but note the final call in (deftype IntList [head Int tail (Box IntList)])
(defn printit []
(let [is (IntList.init 3 (Box.init (IntList.init 2 (Box.init (IntList.init 1 (Box.nil))))))]
(do (IO.println &(str (IntList.head &is)))
(IO.println &(str (IntList.tail &is)))
(IO.println &(str (IntList.head (Box.unbox (IntList.tail &is)))))
(IO.println &(str (IntList.tail &(Box.deref @(IntList.tail (Box.unbox (IntList.tail &is)))))))
(IO.println &(str (Box.unbox (IntList.tail &(Box.deref @(IntList.tail (Box.unbox (IntList.tail &is)))))))))))
(defn main []
(do (printit)
0)) |
Of course, I'll need to fix the newly introduced errors as well. But you get the idea. |
For empty structs, we generate a dummy field for ANSI C compatibility. This field needs to be included in initializers for the struct, but should not be emitted in any other functions. I erroneously included it in other functions in a previous merge. This commit fixes the issue by ensuring the dummy field is only included in the struct initializer.
Our current recursion check introduced a bug whereby generic types receiving instances of themselves e.g. `(Trivial t)` would be identified as recursive and generate incorrect type emissions. For now, we simply don't consider generic types as recursive, though a future change will add recursivity support for these types as well.
Instead of passing types and members separately to routines, we use type candidates as input to recursivity checks. This simplifies both validation and recursiveness checking on types and abstracts away differences in structure between sum type and product type members. I also had to adjust some test output, will restore them in a future commit.
Looks real useful, you know I've been wanting this for a long time :) I would much prefer if we would dissallowed creating NULL Box however it feels like a big footgun for something that would be used quite often. Aside from that I'm not a big fan of the implicit tail boxing, feels a bit at odd with "allocation and copying are explicit" but I feel a lot less strongly about it than nullable Box 😄 |
Sounds good! In terms of removing
Yeah I go back and forth on this too. On the one hand, it's nice for things like continuous signal applications, since the "bottom" of the recursion can be a fixpoint of the data type--for example, you can modulate a recursive value representing some signal every N clicks and return the fixpoint every M clicks...stuff like that (e.g. accessing the recursive part of a value created with |
I think @TimDeve suggests that the Box would always contain a live pointer, right? So you can safely unwrap ( In regards to the infinite data structures, how would you construct those? A code example would be great! Like we said in the chat, I think a good way to prevent surprises would be a combination of meta data and/or interfaces on the types so that you can't accidentally create a heap allocated struct without knowing. |
Yes that's what I mean, a nullable pointer would be Was the main reason you wanted nullable Box because we don't have Maybe in the compiler and you wanted to do some stuff with nullable pointers in the compiler? If we did have Maybe in the compiler we could do some magic Rust-like stuff where |
Yes this was the only reason. The bulletproof impl would be conversions from Actually, I quite like the following, wdyt? In the compiler:
Maybe there's some magical way to do init without copying but idk. This would mean you can't use box for recursive indirection in product types--but it doesn't make much sense in product types anyway since you need a bottom value (which is providable via a sumtype case). |
I prefer init taking ownership personally so Are you saying it |
I like it. |
You’re both right! I don’t really know why I selected a ref—i think conceptually it made sense to me to take something that’s already a pointer and transform it into another kind of pointer—but actually taking a value is better!
ohhh I hadn’t thought of that. It would require altering the memory management a bit then—before I was working under the assumption that a box was necessarily a newly heap allocated value, but it could totally be a shallow copy too—but I think then its delete function would need to preform a page/nullness check before calling delete—and I guess vice versa for the other side of the copy |
We're talking about the |
That is how I implemented the deleter. Likewise for box a shallow copy should be enough because you would be taking ownership of the data structure. |
@TimDeve thanks! re the // pseudo c
// this what we have in the recursive type branch
box_init(t t) {
Box_t box;
box.data = CARP_MALLOC(sizeof(t));
*box.data = t;
return box;
} For delete we currently have: // if t has delete
box_delete(Box_t box) {
t_delete(*box.data);
CARP_FREE(box.data);
}
// if t doesn't have delete (non-managed type)
box_delete(Box_t box) {
/* Ignore non-managed type inside box */
CARP_FREE(box.data);
} How should we change these, if at all? Or can we remove |
That looks good to me, do we need the struct or would it work with a pointer? The benefit of the pointer is that you can register a type with a field that's a managed pointer without converting from |
I think in theory there's really no difference--I think it would only make a difference if we had plans to augment the |
fixes box stringer templates
@eriksvedang and I chatted about this a bit. I'm going to break some of these changes out into separate PRs so that they're easier to review and merge cleanly! This'll stick around in draft form as a reference until all the pieces are in master. |
This PR is a WIP implementation of recursive data type support in Carp. At the moment, it only supports recursive product types, which are backed by structs with fields that point to values of themselves.
Currently, the following sample code will work:
Here's a list of what's been implemented thus far, and what remains:
make
function for initializing a recursive type with a null recursive part (the end of the recursion chain).