-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move from 2d to 3d array operations #12
base: main
Are you sure you want to change the base?
Conversation
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
Hey @paulmueller, when you get a chance please go through this PR. I added docs which you should look through. Main things left are "QLSI init - fix the instanciation" and updating the changelog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only have a few comments. It's not yet waterproof, but it's almost there.
Thanks!
data_3d_prep, _ = convert_data_to_3d_array_layout(data_2d) | ||
data_3d_bg_prep, _ = convert_data_to_3d_array_layout(data_2d_bg) | ||
|
||
for fft_interface in fft_interfaces: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that you are not taking into account FFTW wisdom (https://en.wikipedia.org/wiki/FFTW). FFTW should be much faster than numpy, because it initially tests several FFTs on the input data shape and then takes the fastest one. The wisdom is forgotten unless you store it locally on disk everytime you use pyfftw (#5).
I assume that if you added PyFFTW a second time to the fft_interfaces
list, you will get faster results (because the wisdom is already there from the first run).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a third one without the initial wisdom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is not right here. The FFTW with wisdom must always be faster than the one without wisdom. Did you measure the time with time.perf_counter()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also be that we are not using PyFFTW correctly in qpretrieve. I always thought wisdom is handled automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't what we are seeing here that the wisdom is calculated during the first loop (batchsize 8), and thereafter Pyfftw uses this for all further calculations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyFFTW should compute the wisdom for every batch size individually. For batch sizes 8 and 256 the one with wisdom is slower than the one without wisdom. I would not have expected that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you did not use time.perf_counter
in your script. Maybe you can try that. It could explain this, since you are normalizing by batch size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About
We should allow stacks of 2D arrays as inputs. This will likely speed up processing of large datasets, and certainly will when we add Cupy as an FFTFilter (#10). Several things to note:
To do
run_pipeline
steps for OAH and QLSI should also work with 3D arrays.test_fft_comparison_data_input_fmt
get_array_with_input_layout
in code reference