Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Adds TH_TENSOR_APPLY2_PARALLEL #395

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fmassa
Copy link
Contributor

@fmassa fmassa commented Oct 4, 2015

Following the discussion in #323, I've tried to add a macro TH_TENSOR_APPLY2_PARALLEL, which uses omp if both tensors are contiguous. For the moment, I haven't set a threshold to use omp or not.

As a proof of concept, I added it to the unary operations implemented by LAB_IMPLEMENT_BASIC_FUNCTION (like abs, tan, etc).

Any thoughts ?
cc @dominikgrewe

@fmassa
Copy link
Contributor Author

fmassa commented Oct 5, 2015

Just a quick heads up on this PR.
A simple benchmark of the log function on a 2 cores/4 threads machine gave an speed up of 2.5 times compared to the non-threaded version, for a 2e8 elements contiguous double vector.
This could be specially useful in pointwise operations in nn, as for now the parallelism in some of the transfer functions has been removed torch/nn#403

break; \
} \
} \
} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like there's a lot of code duplication here. Can't we just call THTensor_isContiguous? Same for counting the number of elements. We'd need the tensor type as parameter to the macro (in addition to the value type), but that can be easily added.

@dominikgrewe
Copy link
Member

Sorry for the delay in getting back. It looks good to me in principle, but we should try to reduce the amount of code duplication.

@fmassa
Copy link
Contributor Author

fmassa commented Oct 23, 2015

Hi Dominik,
Thanks for the comments.
I agree that there are lots of code duplication. It actually comes from the original TH_TENSOR_APPLY2 macro.
As I wanted to make this apply function as similar as possible to the non-omp one, I copied/pasted the TH_TENSOR_APPLY2 macro and added few lines to make it work with omp, the main difference is here.
When I found the need of the is_Contiguous function, I realised it was already present in the original macro (but not explicitly set up, see here for example), so I just added the variable isContiguous in this part.

I'll replace those repetitive parts by their respective function calls.

@fmassa
Copy link
Contributor Author

fmassa commented Nov 16, 2015

Hi @dominikgrewe ,

I was thinking again about this again, and I'm not sure we would like to have a different interface than the one in TH_TENSOR_APPLY2.
My initial goal with this PR was to modify TH_TENSOR_APPLY2, but then I realised that there are some use cases where this parallelisation woudn't work (because of reductions or breaks in the middle of the for loop). I thus decided to create a new macro (following your idea in #323 ), but keeping the interface the same to allow easily replacing the TH_TENSOR_APPLY2 by this new one, when applicable (including in nn).

I agree though that there are lots of code duplication, making the compilation time quite long.

What do you think ? @soumith @andresy @koraykv @dominikgrewe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants