Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More resizing for truncating return values from LAPACK #1190

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

BioTurboNick
Copy link
Contributor

Found additional locations that could benefit from the changes in #1176 , but I haven't tested these as yet.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Feb 3, 2025

More to come. These two don't look all that much like an improvement... I wonder if it's because the wrapping to vec and then back to a matrix isn't that helpful. Can't create it as a vector because it's what's provided as input to the function...

EDIT 2/12/25: Determined I wasn't using the right inputs to test the new path. Using a 100x110 array to test, which triggers the truncation path:

using Random
Random.seed!(100)
A,tau = LAPACK.geqrf!(rand(elty,100,110))
function f(A, tau)
    for i = 1:100
        LAPACK.orgqr!(A,tau)
    end
end
@benchmark f(A, tau) seconds = 60

Random.seed!(100)
A,tau = LAPACK.geqlf!(rand(elty,100,110))
function f(A, tau)
    for i = 1:100
        LAPACK.orgql!(A,tau)
    end
end
@benchmark f(A, tau) seconds = 60

orgqr! Before:
image
After:
image

orgql! Before:
image
After:
image

(more to come)

@dkarrasch
Copy link
Member

Test failures may be real?

@ViralBShah
Copy link
Member

@BioTurboNick Looks like a nice PR but seems to break some tests.

@BioTurboNick
Copy link
Contributor Author

Thanks! I'm mystified why it would work on linux-i686 and macos-aarch64 but not linux-x86_64 or windows-x86_64, though it's suggestive.

Linux segfaults and isn't helpful in figuring out the cause, but Windows provides a ReadOnlyMemoryError that is converted to a Julia exception, with a nice stack trace.

The proximate cause is in reshape(resize!(Z, ldz * m[]), ldz, m[]) at the end of stegr!. Nearest I can figure is that m[] can sometimes be larger than the second dimension of Z or of the length of w, which would explain why the conditional was needed in the original code. Let's see if that fixes it...

But still, that should only produce garbage values in the extra part, not a segfault? And why only on x86_64?

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 70.00000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 91.86%. Comparing base (443aa0f) to head (4835a0f).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
src/lapack.jl 70.00% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1190      +/-   ##
==========================================
- Coverage   91.89%   91.86%   -0.03%     
==========================================
  Files          34       34              
  Lines       15360    15373      +13     
==========================================
+ Hits        14115    14123       +8     
- Misses       1245     1250       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Feb 12, 2025

Oy. Either the debug code caused it to go away, or the change in nightly from DEV.33 to DEV.38 did it. EDIT: Nope, not fixed by the newer nightlies. It's a Heisenbug.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Feb 12, 2025

Finally, a minimal reproducer:

a = SymTridiagonal(rand(12), rand(11)) # SymTridiagonal([1.0129540363709792, 0.5125544740879742, 1.4404024510136622, 0.9645500736079791, 0.8349894041911953, 1.0404785847552251, 0.9418430974856163, 0.1401583600348122, 0.38426770335791244, 0.4551768643291618, 0.4200328536585094, 1.8593998829530265e-7], [0.40739613651353346, 0.10643163002740841, 0.6222058356713083, 0.33352809216796453, 0.3799591152214755, 0.21073828255014115, 0.8158852900425236, 0.1683998539581063, 0.5002702262460241, 0.6123852845929411, 0.7701699043551441])"
A = copy(a); LAPACK.stegr!('V', 'A', A.dv, A.ev, 0.0, 0.0, 0, 0)

It seems that this is a bug in Julia (1.11 and nightly)? Seems to be producing undefined behavior. With various configurations of the end code I get a Julia crash with EXCEPTION_ACCESS_VIOLATION, or a Julia-handled ReadOnlyMemoryError(). And this is happening when the size actually isn't changing.

It seems some necessary ingredients are 1) initiating as a vector*; 2) passing the vector into the ccall; 3) reshaping the vector into a matrix (resize call not necessary, though can independently trigger a similar error).

If the array is initiated as a matrix, and then later reshaped with reshape(resize!(vec(Z), ldz * Zm), ldz, Zm), there's no problem at all.

And the issue goes away entirely if the offending line is wrapped in try/catch.

*As far as C is concerned, an array is just a pointer to the start of a chunk of memory, so should make no difference on the C side of things?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants