Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miopenGetRNNTrainingReserveSize is not rounded to a convenient boundary #2529

Open
amberhassaan opened this issue Nov 15, 2023 · 16 comments
Open
Assignees

Comments

@amberhassaan
Copy link
Contributor

Test code is having to round up the reserve space size obtained from miopenGetRNNWorkspaceSize to the next multiple of 4. This should be done inside the library call. See for example test/gru_common.hpp and search for the above api call. This is also applicable to lstm_common.hpp and rnn_vanilla_common.hpp.

CC: @JehandadKhan , @junliume

@shurale-nkn
Copy link
Contributor

No, miopenGetRNNTrainingReserveSize returns number of bytes not floats, it may not be a multiple of 4. This is problem of testing file to correctly match API limitations.

@amberhassaan
Copy link
Contributor Author

I'm asking us to round up to a multiple of 4 or 8 to be safe so that it's not needed in the tests that use float (and perhaps double if we ever choose to support it). I think it's common sense. @JehandadKhan : what do you think?

@shurale-nkn
Copy link
Contributor

This is not correct to work with this buffer as float or double. It contains not only one type. Bigger allocation is not more safe, it will just miss out-of-bounds array errors more often.

@amberhassaan
Copy link
Contributor Author

I don't understand. The code (for example test/gru_common.hpp) is setting the RNN descriptor data type as float then the reserve space size should be a multiple of sizeof(float). Otherwise, I'd think it's a bug. What's the use if I have to round it up in the test code?

@shurale-nkn
Copy link
Contributor

This is incorrect allocation in GRU test.

@shurale-nkn
Copy link
Contributor

the authors decided to allocate float to simplify their work, but this does not mean that this is the correct solution.

@amberhassaan
Copy link
Contributor Author

That little rounding up seems crucial because the tests failed when I took it out. Regardless, the library should return a workspace size that is a multiple of 8 (or 4 at least). The expectation is that the user will allocate memory for that size and almost all allocators already align allocations to 8 byte boundary at least.

@shurale-nkn
Copy link
Contributor

nobody expects that. This is not correct, user can make bigger allocation, but this is not expected or required.

@amberhassaan
Copy link
Contributor Author

I disagree but let's leave it at that.

@JehandadKhan
Copy link
Collaborator

JehandadKhan commented Nov 16, 2023

@shurale-nkn Is the size returned from the API usable directly or does it need to updated for the resulting buffer to be correct ?

@shurale-nkn
Copy link
Contributor

@JehandadKhan User can directly use byte size returned from API. This is a correct and valid value, it does not require modification.

@amberhassaan
Copy link
Contributor Author

@JehandadKhan : here's one of many examples where the user code doesn't work if we take out this "round up to next multiple of sizeof(T)" : https://github.com/ROCmSoftwarePlatform/MIOpen/blob/60cf9c095356a58d9ef369856b585ac1f4d39946/test/lstm_common.hpp#L918

@shurale-nkn
Copy link
Contributor

@amberhassaan This is not example how it doesn't work. This is example how it used in our test, this does not mean that the user cannot use this value directly or computation fails without modification. Please do not mislead others.

In PR#2493 you can discover how the test and the user can perfectly use this value without any modifications and everything works correctly.

@amberhassaan
Copy link
Contributor Author

All I know is that 1) our tests fail when the size is not rounded up 2) I fail to understand how we return a size that's not multiple of floats when we work with floats.

@CAHEK7
Copy link
Contributor

CAHEK7 commented Nov 19, 2023

As far as I can see 1) the test should be fixed and 2) std::vector for the workspace must not be T, it's just a memory holder and it's not even used anywhere else (and must not be used).
So if the vector will be std::byte the problem will vanish.

@CAHEK7
Copy link
Contributor

CAHEK7 commented Nov 22, 2023

@amberhassaan
The last argument is that - what do you think about the case when malloc suddenly started to demand size to be, for example, 8 bytes aligned. The arguments are the same - malloc internally allocated 8byte chunks, so why would we requested less then 8?
What shall we do if at some point such malloc function decided to allocate 256bytes chunks since it's more efficient?

That's exactly the case for this ticket - the library which allocates the memory (test wrapper) started to demand specific allocation size from the user (MIOpen algorithm) who knows exact amount of memory what he needs and don't care about any sort of internals of allocation routines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants