Merged
Conversation
asmorkalov
requested changes
Feb 26, 2025
Comment on lines
+52
to
+64
| /** | ||
| * @brief Matrix multiplication of two float type matrices | ||
| * R = a*A*B + b*C where A,B,C,R are matrices and a,b are constants | ||
| * It is optimized for Qualcomm's processors | ||
| * @param src1 First source matrix of type CV_32F | ||
| * @param src2 Second source matrix of type CV_32F with same rows as src1 cols | ||
| * @param dst Resulting matrix of type CV_32F | ||
| * @param alpha multiplying factor for src1 and src2 | ||
| * @param src3 Optional third matrix of type CV_32F to be added to matrix product | ||
| * @param beta multiplying factor for src3 | ||
| */ | ||
| CV_EXPORTS_W void gemm(InputArray src1, InputArray src2, OutputArray dst, float alpha = 1.0, | ||
| InputArray src3 = noArray(), float beta = 0.0); |
Contributor
There was a problem hiding this comment.
OpenCV HAL has GEMM options:
/**
The function performs generalized matrix multiplication similar to the gemm functions in BLAS level 3:
\f$D = \alpha*AB+\beta*C\f$
@param src1 pointer to input \f$M\times N\f$ matrix \f$A\f$ or \f$A^T\f$ stored in row major order.
@param src1_step number of bytes between two consequent rows of matrix \f$A\f$ or \f$A^T\f$.
@param src2 pointer to input \f$N\times K\f$ matrix \f$B\f$ or \f$B^T\f$ stored in row major order.
@param src2_step number of bytes between two consequent rows of matrix \f$B\f$ or \f$B^T\f$.
@param alpha \f$\alpha\f$ multiplier before \f$AB\f$
@param src3 pointer to input \f$M\times K\f$ matrix \f$C\f$ or \f$C^T\f$ stored in row major order.
@param src3_step number of bytes between two consequent rows of matrix \f$C\f$ or \f$C^T\f$.
@param beta \f$\beta\f$ multiplier before \f$C\f$
@param dst pointer to input \f$M\times K\f$ matrix \f$D\f$ stored in row major order.
@param dst_step number of bytes between two consequent rows of matrix \f$D\f$.
@param m number of rows in matrix \f$A\f$ or \f$A^T\f$, equals to number of rows in matrix \f$D\f$
@param n number of columns in matrix \f$A\f$ or \f$A^T\f$
@param k number of columns in matrix \f$B\f$ or \f$B^T\f$, equals to number of columns in matrix \f$D\f$
@param flags algorithm options (combination of CV_HAL_GEMM_1_T, ...).
*/
//! @addtogroup core_hal_interface_matrix_multiplication Matrix multiplication
//! @{
inline int hal_ni_gemm32f(const float* src1, size_t src1_step, const float* src2, size_t src2_step,
float alpha, const float* src3, size_t src3_step, float beta, float* dst, size_t dst_step,
int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm64f(const double* src1, size_t src1_step, const double* src2, size_t src2_step,
double alpha, const double* src3, size_t src3_step, double beta, double* dst, size_t dst_step,
int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm32fc(const float* src1, size_t src1_step, const float* src2, size_t src2_step,
float alpha, const float* src3, size_t src3_step, float beta, float* dst, size_t dst_step,
int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm64fc(const double* src1, size_t src1_step, const double* src2, size_t src2_step,
double alpha, const double* src3, size_t src3_step, double beta, double* dst, size_t dst_step,
int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
//! @}
I propose to implement rather then add extension.
Comment on lines
+17
to
+38
| /** | ||
| * @brief Creates one multi-channel mat out of several single-channel CV_8U mats. | ||
| * Optimized for Qualcomm's processors | ||
| * @param mv input vector of matrices to be merged; all the matrices in mv must be of CV_8UC1 and have the same size | ||
| * Note: numbers of mats can be 2,3 or 4. | ||
| * @param dst output array of depth CV_8U and same size as mv[0]; The number of channels | ||
| * will be the total number of matrices in the matrix array | ||
| */ | ||
| CV_EXPORTS_W void merge(InputArrayOfArrays mv, OutputArray dst); | ||
|
|
||
| //! @} | ||
|
|
||
| //! @addtogroup fastcv | ||
| //! @{ | ||
|
|
||
| /** | ||
| * @brief Splits an CV_8U multi-channel mat into several CV_8UC1 mats | ||
| * Optimized for Qualcomm's processors | ||
| * @param src input 2,3 or 4 channel mat of depth CV_8U | ||
| * @param mv output vector of size src.channels() of CV_8UC1 mats | ||
| */ | ||
| CV_EXPORTS_W void split(InputArray src, OutputArrayOfArrays mv); |
Contributor
There was a problem hiding this comment.
The same question on HAL.
Contributor
Author
There was a problem hiding this comment.
We are trying optimizations to achieve consistent perf across various targets. So for now we added it in extension
asmorkalov
requested changes
Mar 5, 2025
asmorkalov
approved these changes
Mar 19, 2025
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adding FastCV extensions for merge, split, gemm and arithm APIs add, subtract
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.