Skip to content

core: further vectorize copyTo with mask#27145

Merged
asmorkalov merged 7 commits intoopencv:4.xfrom
fengyuentau:4x/core/copyMask-simd
Apr 7, 2025
Merged

core: further vectorize copyTo with mask#27145
asmorkalov merged 7 commits intoopencv:4.xfrom
fengyuentau:4x/core/copyMask-simd

Conversation

@fengyuentau
Copy link
Member

@fengyuentau fengyuentau commented Mar 25, 2025

Merge with opencv/opencv_extra#1247.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentau
Copy link
Member Author

fengyuentau commented Mar 25, 2025

Updated: dropped 32SC2 kernel.

Performance results:

i7 12700K, GCC 12.3.0

                    Name of Test                     copyMask-base copyMask-patch copyMask-patch
                                                                                        vs
                                                                                  copyMask-base
                                                                                    (x-factor)
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC1)         0.000         0.000           0.99
Mat_CopyToWithMask::Size_MatType::(127x61, 16UC1)        0.000         0.000           0.99
Mat_CopyToWithMask::Size_MatType::(127x61, 32SC1)        0.001         0.001           0.99
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC2)         0.000         0.000           0.99
Mat_CopyToWithMask::Size_MatType::(127x61, 32SC2)        0.002         0.001           1.46
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC3)         0.001         0.001           0.97
Mat_CopyToWithMask::Size_MatType::(127x61, 16UC3)        0.001         0.001           1.00
Mat_CopyToWithMask::Size_MatType::(127x61, 32FC4)        0.002         0.002           1.00
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC1)      0.102         0.103           0.99
Mat_CopyToWithMask::Size_MatType::(1920x1080, 16UC1)     0.163         0.163           1.00
Mat_CopyToWithMask::Size_MatType::(1920x1080, 32SC1)     0.339         0.340           1.00
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC2)      0.181         0.165           1.10
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC3)      0.237         0.240           0.99
Mat_CopyToWithMask::Size_MatType::(1920x1080, 16UC3)     0.672         0.686           0.98
Mat_CopyToWithMask::Size_MatType::(1920x1080, 32FC4)     3.529         3.527           1.00

M2, Apple Clang 16.0.0

                    Name of Test                     copyMask-base copyMask-patch copyMask-patch
                                                                                        vs
                                                                                  copyMask-base
                                                                                    (x-factor)
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC1)         0.000         0.000           0.98
Mat_CopyToWithMask::Size_MatType::(127x61, 16UC1)        0.001         0.001           0.99
Mat_CopyToWithMask::Size_MatType::(127x61, 32SC1)        0.003         0.001           3.04
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC2)         0.001         0.001           1.04
Mat_CopyToWithMask::Size_MatType::(127x61, 32SC2)        0.003         0.002           1.77
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC3)         0.003         0.001           3.98
Mat_CopyToWithMask::Size_MatType::(127x61, 16UC3)        0.003         0.002           2.08
Mat_CopyToWithMask::Size_MatType::(127x61, 32FC4)        0.003         0.003           0.99
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC1)      0.119         0.116           1.03
Mat_CopyToWithMask::Size_MatType::(1920x1080, 16UC1)     0.151         0.149           1.01
Mat_CopyToWithMask::Size_MatType::(1920x1080, 32SC1)     0.797         0.270           2.95
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC2)      0.156         0.155           1.01
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC3)      0.916         0.226           4.05
Mat_CopyToWithMask::Size_MatType::(1920x1080, 16UC3)     0.974         0.464           2.10
Mat_CopyToWithMask::Size_MatType::(1920x1080, 32FC4)     1.139         1.138           1.00

Orin, GCC 11.4.0

                    Name of Test                     copyMask-base copyMask-patch copyMask-patch
                                                                                        vs
                                                                                  copyMask-base
                                                                                    (x-factor)
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC1)         0.001         0.001           0.98
Mat_CopyToWithMask::Size_MatType::(127x61, 16UC1)        0.001         0.001           0.98
Mat_CopyToWithMask::Size_MatType::(127x61, 32SC1)        0.006         0.003           2.06
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC2)         0.001         0.001           0.99
Mat_CopyToWithMask::Size_MatType::(127x61, 32SC2)        0.006         0.006           0.98
Mat_CopyToWithMask::Size_MatType::(127x61, 8UC3)         0.010         0.004           2.76
Mat_CopyToWithMask::Size_MatType::(127x61, 16UC3)        0.010         0.007           1.39
Mat_CopyToWithMask::Size_MatType::(127x61, 32FC4)        0.010         0.010           1.05
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC1)      0.345         0.329           1.05
Mat_CopyToWithMask::Size_MatType::(1920x1080, 16UC1)     0.783         0.776           1.01
Mat_CopyToWithMask::Size_MatType::(1920x1080, 32SC1)     2.018         1.769           1.14
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC2)      0.779         0.791           0.98
Mat_CopyToWithMask::Size_MatType::(1920x1080, 8UC3)      2.820         1.261           2.24
Mat_CopyToWithMask::Size_MatType::(1920x1080, 16UC3)     3.233         2.734           1.18
Mat_CopyToWithMask::Size_MatType::(1920x1080, 32FC4)     6.006         6.000           1.00

@asmorkalov asmorkalov self-assigned this Mar 25, 2025
@fengyuentau
Copy link
Member Author

Decided to drop copyMask_<Vec2i> as its performance is not consistent on different platforms.

@fengyuentau fengyuentau changed the title core: vectorize copyTo with mask core: further vectorize copyTo with mask Mar 28, 2025
@asmorkalov asmorkalov merged commit 1b3db54 into opencv:4.x Apr 7, 2025
26 of 28 checks passed
@fengyuentau fengyuentau deleted the 4x/core/copyMask-simd branch April 22, 2025 08:05
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments