-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Open
Labels
performanceMust go fasterMust go faster
Description
Someone on Discourse recently posted about findmax on floats being 4x slower than the C version. We do a bunch of special handling that doesn't necessarily match what the fast native instructions do, but I did some poking around and I think we could do much better than we currently are:
- On ARM the
fmaxandfmininstructions already do the right thing (propagate NaNs), so we should make sure that we emit the simple code there. - On x64, it seems like the native instructions actually happen to implement
isless(all NaNs sort after all non-NaNs), and we should make sure that this is used where appropriate. - For
maximumandfindmaxthis is actually already the right order! So we should make sure that we just use the native instruction there. - For
minimumandfindminthis is not the right order, but we can just negate, take the max, and then negate again at the end and get the right answer.
Am I missing something here, or could we be doing much better?
Metadata
Metadata
Assignees
Labels
performanceMust go fasterMust go faster