-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Builtins] Monomorphize 'readKnown' #4307
[Builtins] Monomorphize 'readKnown' #4307
Conversation
/benchmark |
Comparing benchmark results of 'c8c5183f7' (base) and '916535235' (PR)
|
WAT |
/benchmark |
Comparing benchmark results of 'c8c5183f7' (base) and '916535235' (PR)
|
/benchmark |
Comparing benchmark results of 'c8c5183f7' (base) and '916535235' (PR)
|
9165352
to
1db9fbb
Compare
/benchmark |
Comparing benchmark results of 'c8c5183f7' (base) and '1db9fbbd2' (PR)
|
I'm quite puzzled as to how this branch is so ridiculously (and consistently) faster than the baseline. Below is the comparison of changes in generated Core that I was able to recognize. In
this PR:
In
this PR:
And that's it. Like yeah, this PR clearly compiles better than Anyway, I don't want to spend more time digging through Core, so I'll write some docs, ask for reviews and merge this. |
One possible explanation would be that we somehow end up forcing the call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My best hypothesis is that the second diff you show is the key one. In particular, it stops us from doing anything with the ST state token, except passing it to the continuation. Maybe that matters!
coerceVia _ = coerce | ||
{-# INLINE coerceVia #-} | ||
|
||
coerceArg :: Coercible a b => (a -> r) -> b -> r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you hate eta-expansion so much? :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much do when it comes to performance-critical code. It's pretty common to call coerce
over the entire function instead of just its argument, because that does not create any closures and in some cases makes the whole definition more sharing-friendly. I'll make a comment.
let runtime' = BuiltinRuntime schB (f x) . exF $ toExMemory arg | ||
res <- evalBuiltinApp fun term' env runtime' | ||
returnCek unbudgetedSteps ctx res | ||
TypeSchemeArrow _ schB -> case readKnown (Just argTerm) arg of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious to see what happens if you do it this way:
do
x <- liftEither $ readKnown ...
...
That might force us back into ST (if it doesn't get optimized away, which it probably will), which might be enlightening. And if it does get optimized away, it's arguably better style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, thank you.
But actually, I can't think why it would matter. We replace one pattern match (to get the state token) with another one (to take apart the Either and call throwError). So it seems like no win. Maybe it's just that |
1db9fbb
to
cc30b67
Compare
/benchmark |
cc30b67
to
3d1f229
Compare
Comparing benchmark results of 'c8c5183f7' (base) and 'cc30b671b' (PR)
|
/benchmark |
Comparing benchmark results of 'c8c5183f7' (base) and '2a500e25b' (PR)
|
This reverts commit 2a500e2.
/benchmark |
Comparing benchmark results of 'c8c5183f7' (base) and 'eb1706059' (PR)
|
Oops, @kwxm, sorry, I forgot to ask for your review. Doing that now. I'll address comments in another PR. |
I don't have much to say, except that the performance changes seem rather mysterious. Maybe we should try some detailed profiling on one of the examples with a big speedup to see if anything jumps out. |
Also, looking at the top of
It's a pity we can't tell it which extensions we don't want... |
HLS told me that we don't need |
@thealmarty I'm using it in #4312, which I'm planning to get merged next, so don't bother. |
This gave us a speedup of up to 19% for unclear reasons.
Do not look here yet.