Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making Hy models immutable. #1542

Closed
gilch opened this issue Mar 25, 2018 · 13 comments · Fixed by #1804
Closed

Consider making Hy models immutable. #1542

gilch opened this issue Mar 25, 2018 · 13 comments · Fixed by #1804

Comments

@gilch
Copy link
Member

gilch commented Mar 25, 2018

We've discussed this a bit recently in other issues. Particularly #1519. (also #1538). But it's important enough for a dedicated issue.

The mutable models are causing serious but subtle bugs (the worst kind) in the compiler which are most obvious in macros. User macros may run into the same kind of issues, even if we add workarounds to ours. Not good.

Fixing the compiler without switching to immutable Hy models may be very difficult. Our tests are not adequate--so we can't be very sure we've fixed all the problems, and it would be easy to reintroduce similar problems by accident. It not clear we could add enough tests to catch everything either. The best practice without changing the interface may be to copy on receipt and copy-on-write everywhere in the compiler, which would be slow. Switching to immutable models may actually be easier and perform faster.

The obvious approach would be to base our models on tuple instead of list, but that's not the only option. We could use Pyrsistent, or roll our own basic linked list, or add an immutable seq abstraction for iterators in general hylang/hyrule#32.

Regardless of the chosen approach, this is a very deep change that will break the world--to fix it. Fixing Hy's core will be a pain. Most basic user code would be unchanged, but anything that touches the Hy models directly (particularly macros) will break and have to be rewritten to the new interface.

@vodik
Copy link
Contributor

vodik commented Mar 25, 2018

Also related to this, I think this could improve hy2py as well with macro output. It be nicer to see this:

HyExpression([HySymbol('setv'), HySymbol('x'), HyInteger(1)])

instead of

HyExpression([] + [HySymbol('setv')] + [HySymbol('x')] + [HyInteger(1)])

@gilch
Copy link
Member Author

gilch commented Mar 25, 2018

It's a consequence of the implementation of quasiquoting. We'd basically need the list monad to get the ~@ unquote splice, i.e. if it's already an iterable, just convert it to a list instead of wrapping it in one. E.g.

`(0 ~@[1 2 3] 4)

currently compiles to

HyExpression([] + [HyInteger(0)] + list([1, 2, 3] or []) + [HyInteger(4)])

but could be

HyExpression([HyInteger(0), *list([1, 2, 3] or []), HyInteger(4)])

but only since PEP 448 in v3.5. We're still supporting 3.4 though.

[Edit:
better yet, use tuples:

HyExpression([HyInteger(0), *tuple([1, 2, 3] or ()), HyInteger(4)])

]

I'm not sure how immutable models help us here.

@Kodiologist
Copy link
Member

Kodiologist commented Apr 6, 2018

#1575 does at least make the copying step that we currently depend on more obvious.

@Kodiologist
Copy link
Member

A thought: traditionally, Lisp expressions are built out of cons cells, which are mutable (they're just plain old linked lists). I can't remember ever getting into trouble in Emacs or Common Lisp by mutating a macro argument that I shouldn't have. Does Hy have some other feature that makes mutable models a problem, or have I simply not used other Lisps enough to be bitten by mutability?

@gilch
Copy link
Member Author

gilch commented Aug 7, 2018

Clojure's core data structures are very immutable (not Java's though). That includes its linked list implementation used for code, so you certainly don't need mutability.

I know that mutability of cons cells can bite you in Common Lisp, though in practice, it doesn't happen often because you usually don't mutate them, even in macros.

With real linked lists, you can attach multiple heads pointing to the same shared tail list (using cons) without causing problems, as long as you don't mutate the tail. (Which you can't do anyway in Clojure.) Mutation is sometimes done as a kind of performance optimization, but you have to be very careful not to mutate a shared tail.

Python's array-based lists can't do that kind of sharing, and they have a lot of mutating methods that are so tempting to use in macros.

And worse, our compiler mutates the code as it compiles it, which is very weird if your macro points to the same subexpression twice. Other Lisps don't do that.

@Kodiologist
Copy link
Member

Thanks, that's a good description.

@Linden6
Copy link

Linden6 commented Apr 4, 2019

I'm completely new to Hy, but looking into using it for a project - I love the idea of coding with the range of packages available in Python, but with the elegance of lisp.

Before taking the plunge I'm trying to understand potential pitfalls with using Hy. Apart from the issue of breaking changes during this beta development stage, it appears that there are two critical issues as outlined by Gilch in #1649 : order of evaluation and bugs relating to mutable models. Although it appears to me that if models became immutable then evaluation order would no longer be a problem.

Although it has been flagged up that bugs relating to mutable models arise in macros, it appears that they arise in simple functions as well. This function appears to suffer from both bugs relating to order evaluation and mutable models:

(defn bug-func [a]
  [a (do (.append a 1) a)])

(bug-func [0 0])
Result: [[0, 0, 1], [0, 0, 1]]
Anticipated result: [[0, 0], [0, 0, 1]]

Does this illustrate the nub of the problem, or is there a more significant issue that I'm overlooking?

If this is the problem, it seems to me that rather than attempting to replace mutable lists with tuples in HyExpressions - which sounds like it would be a lot of work - there are two other options:

  1. explicitly exclude in-place list methods (such as .append, .pop, .insert, .remove etc.) from Hy.
  2. detect in-place/destructive methods during parsing and take remedial action, by - for example - copying incoming data structures, and using these for evaluation.

It appears that what makes a data structure immutable isn't anything intrinsic to it, but just an absence of mutable methods. Remove the methods and the structure become immutable. And, because lisp is such a versatile language non-destructive replacement methods can easily be introduced.

Side-effects, such as destructive changes to objects, appear to be frowned upon in functional programming. So taking sanctions against in-place methods - if only initially raising warnings against 'undefined behaviour' - seem entirely justified and not a sign of a deficiency or defect in a lisp dialect such as Hy.

Let me know if any of this makes sense!

@Kodiologist
Copy link
Member

Kodiologist commented Apr 4, 2019

[0 0], the value you're passing to bug-func, is a plain Python list, not a Hy model object. The mutability of Python lists is a matter of Python semantics which Hy doesn't touch. The immutable equivalent of Python lists are tuples, like (, 0 0). It has always been my intention to allow Hy users to write imperative code if they want to. Python is multi-paradigm, and so is Hy.

Order of evaluation also wouldn't affect what [a (do (.append a 1) a)] returns, because this expression includes a in both cases by reference, not by value. Since the first and second object in the list are in fact the same object, they must have the same value at all times.

The moral of the story is, understanding Python semantics is necessary to understand Hy semantics, because Hy's semantics are mostly just Python's.

@Linden6
Copy link

Linden6 commented Apr 4, 2019

Thanks, and apologies for my ignorance.

I'm still trying to get my head around how potential bugs might arise from the mutability of data structures used in expressions - in particular this #1542 (comment) from Gilch:
'And worse, our compiler mutates the code as it compiles it, which is very weird if your macro points to the same subexpression twice'.

Here I've put together a very contrived example, where two identical subexpressions before and after an in-place method evaluate differently:

(setv a [0 0])
(setv b 1)
(defmacro bug-macro [c d] `(do (setv before (sum ~c))
                                  (.append ~c ~d) 
                                  (setv after (sum ~c))
                                  [before after]))

(setv e (bug-macro a b))
e
Result: [0, 1]

The first subexpression '(sum ~c)' evaluates as 0, and the second '(sum ~c)' evaluates as 1.

Is this is the kind of bug causing mutation that Gilch is mentioning? Or, am I barking up the wrong tree again? Thanks!

@Kodiologist
Copy link
Member

That code looks correct to me. The macro call expands to:

  (do (setv before (sum a))
      (.append a b) 
      (setv after (sum a))
      [before after])

Here it should be clear that before should be set to the sum of a before the append, while after should be set to the sum of a after the append. Hence, [0 1].

Our compiler code used to explicitly mutate models a lot, but this has been reduced by various changes, especially #1593. I couldn't readily point you to a a use of model mutation in the compiler as it is now. Maybe we actually got rid of it all. But chances are there's a few cases left that will only be revealed once we try to make models immutable.

An example of how you could mutate a model in a macro is if you take your bug-func example and say (defmacro bug-func ...) instead of (defn bug-func ...). Because it's a macro, bug-func will receive the model object (HyList [(HyInteger 0) (HyInteger 0)]), which represents the code [0 0], instead of the plain Python list that results from evaluating that code. As for how mutating a model in a macro could lead to an unexpected bug, I have to admit that it's not clear to me how that would happen.

@Kodiologist
Copy link
Member

Kodiologist commented Apr 4, 2019

Maybe a clearer example of model mutation is

(defmacro m [x]
  (setv (get x -1) 4)
  x)
(print (m (* 2 3)))

which replaces the 3 with a 4 and prints 8.

@Linden6
Copy link

Linden6 commented Apr 4, 2019

Many thanks.

I still fail to understand why mutable models are viewed by some as a 'critical issue' #1649 (comment) requiring a potential change from list-based to tuple-based HyExpressions. I can see that making models immutable would prevent unanticipated bug-causing mutations - in the same way that say 'defvar' prevents accidental alterations once set. But, is this really such a problem? And the flipside is that mutable models might potentially offer greater code efficiency and flexibility.

What am I missing? If there is a compelling reason then I'd like to understand it so that I can to some extent anticipate future changes and mitigate against them - such as avoiding using exclusively list-based methods in new code.

@Kodiologist
Copy link
Member

Kodiologist commented Apr 4, 2019

@gilch may be able to give his take on it, but I wouldn't say this is a critical issue. It's just something that could potentially lead to surprising problems at some point, so it would be nice to have out of the way. The only thing you have to do as a Hy user is to avoid mutating models explicitly, which probably isn't something you'd do anyway. Macros are typically written with syntax-quote (the backtick operator), which lends itself to a non-imperative style of constructing new forms out of old ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants