This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix/public internal header #12374
Merged
anirudh2290
merged 19 commits into
apache:master
from
azai91:fix/public-internal-header
Sep 13, 2018
Merged
Fix/public internal header #12374
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
dd06489
move random-gen header to internal code
azai91 cb2cf5c
uncomment header
azai91 9662fe7
remove public method from cc
azai91 6f92bf4
move cuda randgen logic to cc
azai91 9371d4c
remove private files
azai91 eff6d60
include correct header in cu random_gen
azai91 cb8652a
fix include in cu random-gen
azai91 57a211a
fix template
azai91 d48c9b1
remove static kw
azai91 29194f3
add missing template expression
azai91 c0f7e51
change dtype to float
azai91 4346698
remove old header
azai91 661f827
fix lint
azai91 aa95f42
fix imports
azai91 46d07cc
fix space
azai91 d73fffc
fix template
azai91 19fcae6
fix template
azai91 2a714c3
add todo
azai91 e6fb503
fix lint
azai91 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,8 +23,8 @@ | |
* \brief gpu implements for parallel random number generator. | ||
*/ | ||
|
||
#include <mxnet/random_generator.h> | ||
#include <algorithm> | ||
#include "./random_generator.h" | ||
#include "../operator/mxnet_op.h" | ||
|
||
namespace mxnet { | ||
|
@@ -59,6 +59,17 @@ void RandGenerator<gpu, float>::Seed(mshadow::Stream<gpu> *s, uint32_t seed) { | |
s->Wait(); | ||
} | ||
|
||
template<> | ||
void RandGenerator<gpu, float>::AllocState(RandGenerator<gpu> *inst) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. doesnt this remove support for other DTypes like half_t ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, the random generator always generates floats between 0 and 1, we scale them as needed in wherever the random generator is used |
||
CUDA_CALL(cudaMalloc(&inst->states_, | ||
kNumRandomStates * sizeof(curandStatePhilox4_32_10_t))); | ||
} | ||
|
||
template<> | ||
void RandGenerator<gpu, float>::FreeState(RandGenerator<gpu> *inst) { | ||
CUDA_CALL(cudaFree(inst->states_)); | ||
} | ||
|
||
} // namespace random | ||
} // namespace common | ||
} // namespace mxnet |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do this differently with CPU implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? we are calling cudaMalloc in this case instead of just "new"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is not well phrased. What I meant is why move out from header file while leaving the same function for CPU in the header
See line: https://github.com/apache/incubator-mxnet/pull/12374/files#diff-ba5bcd7d0b76b85a2df1f793dc4d3302R82
Aside from that, I think these functions are inside the inner class Impl which is supposed to handle all the implementation. Therefore I think it is very logical to leave them here in the header file. Not to mention the performance advantage of calling inline function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. these cuda calls depends on cuda_utils.h which is in the commons folder. I am not sure if we want to expose those. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apeforest are you indicating about perf advantages during compile time ? I think its okay to put the definition in random_generator.cu since we don't want to expose cuda_utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apeforest any thoughts on what to revise? if we want to be consistent we could create a random_generator.cc file and put the non-cuda implementations in there, but I personally think that is unnecessary for such small functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@azai91 Thanks for your efforts in refactoring this. Since the main purpose of this PR is refactoring, I hope we can do it in the most elegant way. If we are making this file a public header, I would extract the implementation class Impl to another internal header file so that we do not expose the internal implementation details. By doing that, we can have put the implementation details for both CPU and GPU there. Please let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that they way we are using the random_generator, we expect the developer to be able to access the internal Impl class (https://github.com/azai91/incubator-mxnet/blob/6f7254c91709904a9fb6290f1998fcf2da818d0e/src/operator/random/unique_sample_op.h#L118).
the class is is public as well. I may be interpretting the developers intentions incorrectly though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the developers did not intend to directly access internal Impl class but was rather lacking proper interface API. In fact, throughout the repository, I only found one place that uses
RandGenerator<cpu,
GType>::Impl` directly, and therefore I think it's not a big overhead to refactor that one line of code. Regarding how to separating Impl class from the interface, you might find this reference helpful: https://cpppatterns.com/patterns/pimpl.html