-
Couldn't load subscription status.
- Fork 13.5k
cuda : use fast copy when src and dst are of different type and contiguous #16789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Though, does the shape matter here? We already assert that it's the same number of elements... Also, probably should use |
|
If the tensors are contiguous, did you try just using |
That surely only works when types are equal, which is caught at the top. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, sorry. Do you need me to click the merge button or do you have the permissions to do it yourself?
I have the power. :) |
…guous (ggml-org#16789) * use fast copy when src and dst are contiguous and same shape * use int64_t ne and ignore shape
…guous (ggml-org#16789) * use fast copy when src and dst are contiguous and same shape * use int64_t ne and ignore shape
Before:
After:
Note/Edit: I fudged the permuted tests by making them contiguous (and changed type) just to verify that different shapes are OK, normally they would not be faster.