-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single-threaded performance improvements in forward DWT for 5-3 and 9-7 (and other improvements) #1253
Single-threaded performance improvements in forward DWT for 5-3 and 9-7 (and other improvements) #1253
Commits on May 20, 2020
-
Add multithreading support in the T1 (entropy phase) encoder
- API wise, opj_codec_set_threads() can be used on the encoding side - opj_compress has a -threads switch similar to opj_uncompress
Configuration menu - View commit details
-
Copy full SHA for 97eb7e0 - Browse repository at this point
Copy the full SHA 97eb7e0View commit details -
Add multithreaded support in the DWT encoder.
Update the bench_dwt utility to have a -decode/-encode switch Measured performance gains for DWT encoder on a Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded) Encoding time: $ ./bin/bench_dwt -encode -num_threads 1 time for dwt_encode: total = 8.348 s, wallclock = 8.352 s $ ./bin/bench_dwt -encode -num_threads 2 time for dwt_encode: total = 9.776 s, wallclock = 4.904 s $ ./bin/bench_dwt -encode -num_threads 4 time for dwt_encode: total = 13.188 s, wallclock = 3.310 s $ ./bin/bench_dwt -encode -num_threads 8 time for dwt_encode: total = 30.024 s, wallclock = 4.064 s Scaling is probably limited by memory access patterns causing memory access to be the bottleneck. The slightly worse results with threads==8 than with thread==4 is due to hyperthreading being not appropriate here.
Configuration menu - View commit details
-
Copy full SHA for 07d1f77 - Browse repository at this point
Copy the full SHA 07d1f77View commit details -
dwt.c: change sign of constants to match standard and compensate (no …
…functional change)
Configuration menu - View commit details
-
Copy full SHA for 99107d5 - Browse repository at this point
Copy the full SHA 99107d5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 00cff6f - Browse repository at this point
Copy the full SHA 00cff6fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3d35d0f - Browse repository at this point
Copy the full SHA 3d35d0fView commit details -
Configuration menu - View commit details
-
Copy full SHA for c2b9d09 - Browse repository at this point
Copy the full SHA c2b9d09View commit details -
Testing: revise testing of lossy encoding by comparing PEAK and MSE w…
…ith original image
Configuration menu - View commit details
-
Copy full SHA for fe4c15f - Browse repository at this point
Copy the full SHA fe4c15fView commit details -
Configuration menu - View commit details
-
Copy full SHA for c6a413a - Browse repository at this point
Copy the full SHA c6a413aView commit details -
opj_j2k_setup_encoder(): add validation of tile width and height to a…
…void potential division by zero
Configuration menu - View commit details
-
Copy full SHA for 4ab2ed0 - Browse repository at this point
Copy the full SHA 4ab2ed0View commit details -
Configuration menu - View commit details
-
Copy full SHA for e46e300 - Browse repository at this point
Copy the full SHA e46e300View commit details -
Irreversible decoding: align code more closely to the standard by avo…
…id messing up with stepsize (no functional change)
Configuration menu - View commit details
-
Copy full SHA for f38c069 - Browse repository at this point
Copy the full SHA f38c069View commit details -
Irreversible compression/decompression DWT: use 1/K constant as per s…
…tandard The previous constant opj_c13318 was mysteriously equal to 2/K , and in the DWT, we had to divide K and opj_c13318 by 2... The issue was that the band->stepsize computation in tcd.c didn't take into account the log2gain of the band. The effect of this change is expected to be mostly equivalent to the previous situation, except some difference in rounding. But it leads to a dramatic reduction of the mean square error and peak error in the irreversible encoding of issue141.tif !
Configuration menu - View commit details
-
Copy full SHA for 3cd1305 - Browse repository at this point
Copy the full SHA 3cd1305View commit details -
Irreversible decoding: partially revert previous commit, to fix failu…
…res in test suite
Configuration menu - View commit details
-
Copy full SHA for adccbc8 - Browse repository at this point
Copy the full SHA adccbc8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c09062 - Browse repository at this point
Copy the full SHA 0c09062View commit details
Commits on May 21, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 47943da - Browse repository at this point
Copy the full SHA 47943daView commit details -
Remove useless + 5U margin in opj_dwt_decode_tile_97()
Nothing in code analysis nor test suite shows that this margin is needed. It dates back to commit dbeebe7 where vector 9x7 decoding was introduced.
Configuration menu - View commit details
-
Copy full SHA for 272b3e0 - Browse repository at this point
Copy the full SHA 272b3e0View commit details -
Speed-up 9x7 IDWD by ~30% with OPJ_NUM_THREADS=2
"bench_dwt -I" time goes from 2.2s to 1.5s
Configuration menu - View commit details
-
Copy full SHA for 45a3522 - Browse repository at this point
Copy the full SHA 45a3522View commit details
Commits on May 22, 2020
-
Forward DWT: small code refactoring to allow future improvements for …
…the horizontal pass
Configuration menu - View commit details
-
Copy full SHA for bd5f5ee - Browse repository at this point
Copy the full SHA bd5f5eeView commit details -
Forward DWT 5x3: performance improvements in horizontal pass, and mod…
…est in vertical pass
Configuration menu - View commit details
-
Copy full SHA for 97b384a - Browse repository at this point
Copy the full SHA 97b384aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 33d3d0d - Browse repository at this point
Copy the full SHA 33d3d0dView commit details -
Forward DWT: small code refactoring to allow future improvements for …
…the vertical pass
Configuration menu - View commit details
-
Copy full SHA for e69fa09 - Browse repository at this point
Copy the full SHA e69fa09View commit details -
Forward DWT 5-3: major speed up by vectorizing vertical pass
`bench_dwt -encode` times goes from 7.9s to 1.7s
Configuration menu - View commit details
-
Copy full SHA for a38e970 - Browse repository at this point
Copy the full SHA a38e970View commit details -
Forward DWT 9-7: major speed up by vectorizing vertical pass
`bench_dwt -I -encode` times goes from 8.6s to 2.1s
Configuration menu - View commit details
-
Copy full SHA for 1e931fd - Browse repository at this point
Copy the full SHA 1e931fdView commit details
Commits on May 24, 2020
-
T1 encoder: speed-up by aggressive inlining and more cache friendly d…
…ata organization ~ 9% speed improvement seen on 10980x10980 uint16 image, T36JTT_20160914T074612_B02.tif opj_compress time from 17.2s to 15.8s
Configuration menu - View commit details
-
Copy full SHA for 1c5627e - Browse repository at this point
Copy the full SHA 1c5627eView commit details