Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-threaded performance improvements in forward DWT for 5-3 and 9-7 (and other improvements) #1253

Merged

Commits on May 20, 2020

  1. Add multithreading support in the T1 (entropy phase) encoder

    - API wise, opj_codec_set_threads() can be used on the encoding side
    - opj_compress has a -threads switch similar to opj_uncompress
    rouault committed May 20, 2020
    Configuration menu
    Copy the full SHA
    97eb7e0 View commit details
    Browse the repository at this point in the history
  2. Add multithreaded support in the DWT encoder.

    Update the bench_dwt utility to have a -decode/-encode switch
    
    Measured performance gains for DWT encoder on a
    Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (4 cores, hyper threaded)
    
    Encoding time:
    $ ./bin/bench_dwt -encode -num_threads 1
    time for dwt_encode: total = 8.348 s, wallclock = 8.352 s
    
    $ ./bin/bench_dwt -encode -num_threads 2
    time for dwt_encode: total = 9.776 s, wallclock = 4.904 s
    
    $ ./bin/bench_dwt -encode -num_threads 4
    time for dwt_encode: total = 13.188 s, wallclock = 3.310 s
    
    $ ./bin/bench_dwt -encode -num_threads 8
    time for dwt_encode: total = 30.024 s, wallclock = 4.064 s
    
    Scaling is probably limited by memory access patterns causing
    memory access to be the bottleneck.
    The slightly worse results with threads==8 than with thread==4
    is due to hyperthreading being not appropriate here.
    rouault committed May 20, 2020
    Configuration menu
    Copy the full SHA
    07d1f77 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    99107d5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    00cff6f View commit details
    Browse the repository at this point in the history
  5. tcd.c: add comment

    rouault committed May 20, 2020
    Configuration menu
    Copy the full SHA
    3d35d0f View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c2b9d09 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    fe4c15f View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    c6a413a View commit details
    Browse the repository at this point in the history
  9. opj_j2k_setup_encoder(): add validation of tile width and height to a…

    …void potential division by zero
    rouault committed May 20, 2020
    Configuration menu
    Copy the full SHA
    4ab2ed0 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    e46e300 View commit details
    Browse the repository at this point in the history
  11. Irreversible decoding: align code more closely to the standard by avo…

    …id messing up with stepsize (no functional change)
    rouault committed May 20, 2020
    Configuration menu
    Copy the full SHA
    f38c069 View commit details
    Browse the repository at this point in the history
  12. Irreversible compression/decompression DWT: use 1/K constant as per s…

    …tandard
    
    The previous constant opj_c13318 was mysteriously equal to 2/K , and in
    the DWT, we had to divide K and opj_c13318 by 2... The issue was that the
    band->stepsize computation in tcd.c didn't take into account the log2gain of
    the band.
    
    The effect of this change is expected to be mostly equivalent to the previous
    situation, except some difference in rounding. But it leads to a dramatic
    reduction of the mean square error and peak error in the irreversible encoding
    of issue141.tif !
    rouault committed May 20, 2020
    Configuration menu
    Copy the full SHA
    3cd1305 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    adccbc8 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    0c09062 View commit details
    Browse the repository at this point in the history

Commits on May 21, 2020

  1. Speed-up 9x7 IDWD by ~20%

    "bench_dwt -I" time goes from 2.8s to 2.2s
    rouault committed May 21, 2020
    Configuration menu
    Copy the full SHA
    47943da View commit details
    Browse the repository at this point in the history
  2. Remove useless + 5U margin in opj_dwt_decode_tile_97()

    Nothing in code analysis nor test suite shows that this margin is
    needed.
    It dates back to commit dbeebe7
    where vector 9x7 decoding was introduced.
    rouault committed May 21, 2020
    Configuration menu
    Copy the full SHA
    272b3e0 View commit details
    Browse the repository at this point in the history
  3. Speed-up 9x7 IDWD by ~30% with OPJ_NUM_THREADS=2

    "bench_dwt -I" time goes from 2.2s to 1.5s
    rouault committed May 21, 2020
    Configuration menu
    Copy the full SHA
    45a3522 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2020

  1. Configuration menu
    Copy the full SHA
    bd5f5ee View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    97b384a View commit details
    Browse the repository at this point in the history
  3. dwt.c: remove unused typedef

    rouault committed May 22, 2020
    Configuration menu
    Copy the full SHA
    33d3d0d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e69fa09 View commit details
    Browse the repository at this point in the history
  5. Forward DWT 5-3: major speed up by vectorizing vertical pass

    `bench_dwt -encode` times goes from 7.9s to 1.7s
    rouault committed May 22, 2020
    Configuration menu
    Copy the full SHA
    a38e970 View commit details
    Browse the repository at this point in the history
  6. Forward DWT 9-7: major speed up by vectorizing vertical pass

    `bench_dwt -I -encode` times goes from 8.6s to 2.1s
    rouault committed May 22, 2020
    Configuration menu
    Copy the full SHA
    1e931fd View commit details
    Browse the repository at this point in the history

Commits on May 24, 2020

  1. T1 encoder: speed-up by aggressive inlining and more cache friendly d…

    …ata organization
    
    ~ 9% speed improvement seen on 10980x10980 uint16 image, T36JTT_20160914T074612_B02.tif
    opj_compress time from 17.2s to 15.8s
    rouault committed May 24, 2020
    Configuration menu
    Copy the full SHA
    1c5627e View commit details
    Browse the repository at this point in the history