-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
type-stable inner loop for sqrtm #20214
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
abe1129
type-stable inner loop for sqrtm
felixrehren a18f8da
dispatch sqrtm on real-or-not bool
felixrehren a538250
remove call site type assertion
felixrehren 5718a1c
more type stability
felixrehren 60fcb28
casting changes
felixrehren ce830fe
one -> two
felixrehren a7e7ce8
div 2 -> mult 1/2
felixrehren 075fd38
typo
felixrehren 03ef1b4
algebraicness, unwrapping
felixrehren 7b990f1
sylvester for numbers [ci skip]
felixrehren e3cf0a9
move sylvester
felixrehren File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we could replace this with
in terms of the abovementioned method. Given a
Val
method that is type-stable, the only remaining reason to duplicate the code here seems to be to save thesqrt
call for the diagonal elements, but I doubt this is worth the trouble since there are only O(n) such calls.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will look at this tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
UnitUpperTriangular{T}
is not a subtype ofUpperTriangular{T}
(??), I think it would beFor some reason this is faster than the present method, which confuses me. This version involves two additional conversions and some unnecessary arithmetic operations in the loop (of order O(n) as you mention)! Are some ops on
UnitUpperTriangular
slower than the same op onUpperTriangular
?Will make this change later unless someone can enlighten me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Julia doesn't support inheritance of concrete types.)
Indexing expressions
A[i,j]
are slower forUnitUpperTriangular
than forUpperTriangular
, and both are slower than forArray
, because it has to check ifi <= j
and (in the unit case) whetheri == j
.Since by construction you only access the upper triangle, I would do:
at the beginning of the function (for both the
UnitUpperTriangular
andUpperTriangular
methods), and then useB[i,j]
rather thanA[i,j]
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There might be other methods operating on triangular matrix types that might benefit from a similar change.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
@inbounds
means that it never checks wherei
andj
are in1:n
. It still checks whetheri ≤ j
, since ifi > j
then it just returns zero rather than looking atA.data
.Timing such a small operation is tricky; I would normally recommend BenchmarkTools instead of
@time
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It may be that the effect on the
UpperTriangular
case is too small to measure, however. Or maybe even LLVM is smart enough to figure out thati ≤ j
is always true. But it wouldn't hurt to unwrap the type anyway.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After improving the type stability, the specialised implementation for
UnitUpperTriangular
is faster than converting.Now, on 0.5.0 using
@benchmark
ing, I see a slowdown when unwrapping the type first insqrtm
. Relatedly, on my machine (juliabox.com) there appears to be a slowdown for the atomic operations:(what does the
$
do here?)This persists when wrapping these calls in a function. All in all I am hesitant to make the unwrapping change for
sqrtm
(but not against)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you unwrap the type? I hope you didn't do
A = A.data
, which hurts type stability.I can't reproduce your benchmark results. Maybe the benchmarks aren't reliable for such a short operation, and one needs to put a bunch of indexing operations in a loop? (Probably you also want to time them with
@inbounds
). The$
splices the value of the variable directly into the expression, which gets rid of the penalty for benchmarking with a global variable (it means the compiler does not have to do runtime type inference to find the type ofA
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example: