Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate temp data on the fly for some casting operations #149

Merged
merged 1 commit into from
Aug 9, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions src/operator/tensor/cast_storage-inl.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,15 @@ struct FillRspValsKernel {
}
};

template<typename xpu, int ndim, typename DType>
inline mshadow::Tensor<xpu, ndim, DType> AllocateTempDataForCast(const OpContext& op_ctx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I feel GetTempSpace is a better name.

  2. Why is this function defined in .cuh but still using xpu? If it serves for general purpose, maybe we can put it somewhere accessible to all op files, such as "mxnet_op.h".

  3. template function does not need to be "explicitly inlined".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intention ally named it after the Cast usage in order to discourage it from being used in a general way to circumvent the proper temp resource allocation mechanism.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'inline' asks the compiler to in-line the function. It is not just a linkage directive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It is not gpu-specific, and we may find it necessary for CPU casts as well, in which case it can be moved. That being said, it is a temporary hack and as the first item stated, I am trying to limit its use to this file and cast only. Putting it in some general place with a general name will cause people to use it for general-purpose temp allocation.

Copy link
Author

@cjolivier01 cjolivier01 Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const mshadow::Shape<ndim>& shape) {
Resource rsc = ResourceManager::Get()->Request(op_ctx.run_ctx.ctx,
ResourceRequest(ResourceRequest::kTempSpace));
mshadow::Stream<xpu> *stream = op_ctx.run_ctx.get_stream<xpu>();
return rsc.get_space_typed<xpu, ndim, DType>(shape, stream);
};

/*!
* \brief GPU implementation of casting a dns tensor to rsp type.
*/
Expand Down Expand Up @@ -226,8 +235,8 @@ inline void CastStorageDnsRspImpl(const OpContext& ctx,
mshadow::Stream<gpu>::GetStream(s));

// Allocate temp storage for marking non-zero rows and for cub's prefix sum
mshadow::Tensor<gpu, 1, char> workspace = ctx.requested[0]
.get_space_typed<gpu, 1, char>(Shape1(num_rows*sizeof(RType)+temp_storage_bytes), s);
auto workspace = AllocateTempDataForCast<gpu, 1, char>(ctx, Shape1(num_rows*sizeof(RType)
+ temp_storage_bytes));
row_flg = reinterpret_cast<RType*>(workspace.dptr_);
d_temp_storage = workspace.dptr_ + num_rows*sizeof(RType);

Expand Down Expand Up @@ -633,8 +642,8 @@ inline void CastStorageDnsCsrImpl(const OpContext& ctx,
mshadow::Stream<gpu>::GetStream(s));

// Allocate temporary storage
mshadow::Tensor<gpu, 1, char> workspace = ctx.requested[0]
.get_space_typed<gpu, 1, char>(Shape1(temp_storage_bytes), s);
auto workspace = AllocateTempDataForCast<gpu, 1, char>(ctx, Shape1(temp_storage_bytes));

d_temp_storage = workspace.dptr_;

// Compute indptr through inclusive prefix sum
Expand Down