You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NCHW is the most widespread format and is the default format in:
CuDNN, Torch, PyTorch, Chainer
N is the first index which is the most familiar to data scientists. It however presents optimization challenges even on Nvidia side. See soumith/convnet-benchmarks#93
CHWN
CHWN is the format used by Neon and the (dead) pioneer cuda-convnet:
It is perfect for Winograd convolution. Main issue is that N on the right is unfamiliar. Also for RNNs it might be worse to have batch as the innermost dimension.
Some models also concatenates other the feature maps which can be done directly similar to (C1 + C2 + C3)HWN
CuDNN supports it according to doc
but for CuDNN v7 cudnn.h only has this:
typedefenum
{
CUDNN_TENSOR_NCHW=0, /* row major (wStride = 1, hStride = w) */CUDNN_TENSOR_NHWC=1, /* feature maps interleaved ( cStride = 1 )*/CUDNN_TENSOR_NCHW_VECT_C=2/* each image point is vector of element of C : the length of the vector is carried by the data type*/
} cudnnTensorFormat_t;
HWCN
HWCN is the format used by Tensorflow. It is also better than NCHW for Winograd (but not as good as CHWN), it is also the best format for implementing "Memory Efficient Convolution" #131
Format conversion
Converting between NCHW and CHWN can be done very efficiently by considering a transposition between [N, CHW] and [CHW, N]
N: batch size
C: Channel / convolution feature_map
H: Height
W: Width
NCHW
NCHW is the most widespread format and is the default format in:
N is the first index which is the most familiar to data scientists. It however presents optimization challenges even on Nvidia side. See soumith/convnet-benchmarks#93
CHWN
CHWN is the format used by Neon and the (dead) pioneer cuda-convnet:
It is perfect for Winograd convolution. Main issue is that N on the right is unfamiliar. Also for RNNs it might be worse to have batch as the innermost dimension.
Some models also concatenates other the feature maps which can be done directly similar to
(C1 + C2 + C3)HWN
but for CuDNN v7 cudnn.h only has this:
HWCN
HWCN is the format used by Tensorflow. It is also better than NCHW for Winograd (but not as good as CHWN), it is also the best format for implementing "Memory Efficient Convolution" #131
Format conversion
Converting between NCHW and CHWN can be done very efficiently by considering a transposition between [N, CHW] and [CHW, N]
Implementation by Neon: NervanaSystems/neon@682dde6
Implementation by NVIDIA: https://devblogs.nvidia.com/parallelforall/efficient-matrix-transpose-cuda-cc/
Paper: Optimizing memory efficiency for DCNN on GPUs
The text was updated successfully, but these errors were encountered: