Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looks like replace does unnecessary work #1519

Closed
vodik opened this issue Feb 27, 2018 · 7 comments
Closed

Looks like replace does unnecessary work #1519

vodik opened this issue Feb 27, 2018 · 7 comments

Comments

@vodik
Copy link
Contributor

vodik commented Feb 27, 2018

I think I'm seeing some unnecessary work going on: HyList.replace calls replace_hy_obj, which in turn can wrap_value. But the result of replace_hy_obj is discarded and thus, depending on how a particular expression is constructed, its possible wrap_value is evaluated 3 or 4 times on the same object.

Fixing HyList.replace so it instead self mutates:

    def replace(self, other):
        self[:] = [replace_hy_obj(x, other) for x in self]
        HyObject.replace(self, other)
        return self

Wrapping only happens once now. But this seems to have some consequences with using iterables in unquote. The following expression now does this:

=> (defn numbers [] (yield 1) (yield 2) (yield 3))
=> (macroexpand `(do ~(numbers)))
HyExpression([
  HySymbol('do'),
  HyList([
    HyInteger(1),
    HyInteger(2),
    HyInteger(3)])])

instead of

=> (defn numbers [] (yield 1) (yield 2) (yield 3))
=> (macroexpand `(do ~(numbers)))
HyExpression([
  HySymbol('do'),
  <generator object numbers at 0x7f2d2da38db0>])

(which can't be compiled: TypeError: Don't know how to wrap a <class 'generator'> object to a HyObject)

It also means unquote-splice's compiler logic can be simplified a bit as well.

Are there any other concerns or thoughts on the topic before I dig further into this? I couldn't find any other open issues around this, but maybe I just don't know what to look for?

@gilch
Copy link
Member

gilch commented Feb 27, 2018

Using a mutable data structure for Hy code is a bit dangerous, but convenient in macros. If the compiler has to look at the same code object more than once, but mutates it during the compilation process, that could cause bugs. There's nothing stopping a macro from returning multiple references to the same HyExpression object. The compiler must not break code it hasn't gotten to yet.

Ideally, we'd be using immutable data structures like Clojure does, so we wouldn't have to worry about it. But relying on Pyrsistent or something would add another dependency. And if we used a simple singly-linked list like most other Lisps, we'd lose most of the nice list methods that Python programmers are used to. That means we need to be very careful about mutations in the compiler, and I'm already not certain if we've tested this well enough.

@vodik
Copy link
Contributor Author

vodik commented Feb 27, 2018

I think making replace return a new object is likely going to work well enough. We can be a bit more aggressive with __slots__, and maybe also consider backing HyList with tuple instead of list - that should also be a slight performance improvement.

You shouldn't see any loops in the data structures, so the Python reference counter should be just fine.

The mutation was a quick hack to see what would happen. For all I knew the everything would have outright blown up.

@refi64
Copy link
Contributor

refi64 commented Feb 28, 2018

One thing to note: compile times are pretty slow to begin with, so we'd have to be pretty careful using immutable data structures, and definitely not linked lists.

@vodik
Copy link
Contributor Author

vodik commented Feb 28, 2018

Yeah, but immutable should be faster, not because immutable datastructures are faster, but because using pop to iterate through a list is slow:

In [1]: def consumelist1(x):
   ...:     while x:
   ...:         x.pop(0)
   ...:         

In [2]: def consumelist2(x):
   ...:     for _ in x:
   ...:         pass
   ...:     

In [3]: %timeit consumelist1(list(range(100)))
21.2 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [4]: %timeit consumelist2(list(range(100)))
2.24 µs ± 332 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

I suspect making HyExpression an immutable tuple, and dealing with the occasional mutability when needed will perform a lot faster than constantly mutating lists.

Some initial work I've done seems promising like it might improve compile times, but I haven't quite got Hy working fully yet.

Where we need lookahead, we can built it into the iterator instead: https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.py#L135 (we could always borrow the code too instead of adding another dependency)

@Kodiologist
Copy link
Member

Using a mutable data structure for Hy code is a bit dangerous

In fact, it created #1537, and probably a lot of other subtle bugs we haven't found yet.

@gilch
Copy link
Member

gilch commented Mar 21, 2018

using pop to iterate through a list is slow

See #1406.

we'd have to be pretty careful using immutable data structures, and definitely not linked lists.

Performance claims need actual measurements. I don't know where the compiler bottlenecks are. We could try different approaches and profile, but changing the Hy model interface seems like a lot of work because it's a pretty deep change.

Linked lists seem to work fine for other Lisps. But maybe they have some kind of background optimizations for that kind of thing. Something like the Clojure seq abstraction might be an option. We could back it with any iterable for performance and still treat it like an immutable linked list by caching the returned values. hylang/hyrule#32 Hy also has a cons. We'd be able to integrate that better with seq.

But maybe inheriting from tuple is good enough. In macros, you'd usually use the quasiquote syntax anyway. But if you did want to construct code using a listcomp or something, you could use a normal list and still convert it afterwards to the tuple form using the HyExpression constructor. It's a little less convenient, but a lot less error prone than the status quo.

vodik added a commit to vodik/hy that referenced this issue Mar 25, 2018
Calling `replace` can trigger a call to `replace_hy_obj`, which will
try to also wrap values inside a `HyObject`.

However, this means when we try to recusively replace inside a nested
object, like a `HyList`, we should try and capture these new wrapped
objects.

Hy still works if we don't, but it results in the object getting
rewrapped multiple times during a compile.

Closes hylang#1519
vodik added a commit to vodik/hy that referenced this issue Mar 25, 2018
Calling `replace` can trigger a call to `replace_hy_obj`, which will
try to also wrap values inside a `HyObject`.

However, this means when we try to recusively replace inside a nested
object, like a `HyList`, we should try and capture these new wrapped
objects.

Hy still works if we don't, but it results in the object getting
rewrapped multiple times during a compile.

Closes hylang#1519
@Kodiologist
Copy link
Member

Now it looks like we always use the return value of hy.as_model (née wrap_value), and the sequential types like List are immutable anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants