Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

v1.8.x MKLDNN BatchNorm with even number of channels - backward call crashes with MXNetError: Check failed: !is_view #20858

Closed
Adam1105 opened this issue Jan 28, 2022 · 4 comments

Comments

@Adam1105
Copy link

Description

I am using the latest release of v1.8.x mxnet installed with pip (mxnet-1.8.0.post0-cp39-cp39-macosx_10_13_x86_64.whl), more info in the Environment section. When using mkldnn and the NaiveEngine a model with an even number of channels in batch norm crashes in the backward call with an "MXNetError: Check failed: !is_view" error.

This seems very similar to the issue described in the bug. Apparently, it was fixed only for the forward pass.

Error Message

Is MKLDNN enabled: True
input channel of 45
[15:54:53] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
input channel of 45, (1, 45, 8, 80, 80)
input channel of 64
Traceback (most recent call last):
File "/Users/gabrysa/./buggy_model.py", line 67, in
l.backward()
File "/usr/local/lib/python3.9/site-packages/mxnet/ndarray/ndarray.py", line 2864, in backward
check_call(_LIB.MXAutogradBackwardEx(
File "/usr/local/lib/python3.9/site-packages/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
File "../src/ndarray/ndarray.cc", line 650
MXNetError: Check failed: !is_view:

To Reproduce

code

from mxnet import init
from mxnet.context import cpu
from mxnet.gluon import nn, loss, Trainer
from mxnet.gluon.block import HybridBlock
from mxnet.gluon.nn import BatchNorm

import mxnet as mx

class BuggyModel(HybridBlock):

    def __init__(
        self,
        channels,
        norm_layer=BatchNorm,
        norm_kwargs=None,
        in_channels=3,
        **kwargs
    ):
        super(BuggyModel, self).__init__(**kwargs)
        self.in_channels = in_channels
        with self.name_scope():
            self.conv1 = nn.Conv3D(
                    in_channels=self.in_channels,
                    channels=channels,
                    kernel_size=(1, 7, 7),
                    strides=(1, 2, 2),
                    padding=(0, 3, 3),
                    use_bias=False,
                    )
            self.bn1 = norm_layer(in_channels=channels, **({} if norm_kwargs is None else norm_kwargs))

    def hybrid_forward(self, F, x):
        """Hybrid forward of R2+1D net"""
        x = self.conv1(x)
        x = self.bn1(x)
        return x

print(f"Is MKLDNN enabled: {mx.runtime.Features().is_enabled('MKLDNN')}")

print(f"input channel of 45")
net = BuggyModel(channels=45)
net.initialize(init=init.Constant(1))
l2_loss = loss.L2Loss()
trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
with mx.autograd.record():
    output = net(input_data)
    target_data = mx.nd.ones(output.shape, ctx=mx.cpu())
    l = l2_loss(output, target_data)
l.backward()

print(f"input channel of 45, {output.shape}")

print(f"input channel of 64")
net = BuggyModel(channels=64)
net.initialize(init=init.Constant(1))
input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
l2_loss = loss.L2Loss()
trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
with mx.autograd.record():
    output = net(input_data)
    target_data = mx.nd.ones(output.shape, ctx=mx.cpu())
    l = l2_loss(output, target_data)
l.backward()
print(f"input channel of 64, {output.shape}")

Steps to reproduce

  1. paste above code to the ./code.py
  2. Run the code with MKLDNN using MXNet Naive Engine: MXNET_ENGINE_TYPE=NaiveEngine python3 ./code.py

Environment

Environment Information
----------Python Info----------
Version      : 3.9.6
Compiler     : Clang 12.0.5 (clang-1205.0.22.9)
Build        : ('default', 'Jun 29 2021 05:25:02')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 21.1.3
Directory    : /usr/local/lib/python3.9/site-packages/pip
----------MXNet Info-----------
Version      : 1.8.0
Directory    : /usr/local/lib/python3.9/site-packages/mxnet
Commit Hash   : 891d36c2d1c28f9486ec34ce4a7812e27896acef
891d36c2d1c28f9486ec34ce4a7812e27896acef
891d36c2d1c28f9486ec34ce4a7812e27896acef
Library      : ['/usr/local/lib/python3.9/site-packages/mxnet/libmxnet.dylib']
Build features:
✖ CUDA
✖ CUDNN
✖ NCCL
✖ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✖ CPU_SSE4_2
✖ CPU_SSE4A
✖ CPU_AVX
✖ CPU_AVX2
✖ OPENMP
✖ SSE
✖ F16C
✖ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✔ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✖ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------Environment----------
KMP_DUPLICATE_LIB_OK="True"
KMP_INIT_AT_FORK="FALSE"
@Adam1105 Adam1105 changed the title v1.8.x MKLDNN BatchNorm with even number of channels - backward call crash with MXNetError: Check failed: !is_view v1.8.x MKLDNN BatchNorm with even number of channels - backward call crashes with MXNetError: Check failed: !is_view Jan 28, 2022
@github-actions
Copy link

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

@anko-intel
Copy link
Contributor

anko-intel commented Jan 31, 2022

@mxnet-label-bot add [MKLDNN]

@anko-intel
Copy link
Contributor

@Adam1105 could you close the issue?

@Adam1105
Copy link
Author

Thanks, for fixing this! Closing the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants