v1.8.x MKLDNN BatchNorm with even number of channels - backward call crashes with MXNetError: Check failed: !is_view #20858

Adam1105 · 2022-01-28T15:09:24Z

Description

I am using the latest release of v1.8.x mxnet installed with pip (mxnet-1.8.0.post0-cp39-cp39-macosx_10_13_x86_64.whl), more info in the Environment section. When using mkldnn and the NaiveEngine a model with an even number of channels in batch norm crashes in the backward call with an "MXNetError: Check failed: !is_view" error.

This seems very similar to the issue described in the bug. Apparently, it was fixed only for the forward pass.

Error Message

Is MKLDNN enabled: True
input channel of 45
[15:54:53] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
input channel of 45, (1, 45, 8, 80, 80)
input channel of 64
Traceback (most recent call last):
File "/Users/gabrysa/./buggy_model.py", line 67, in
l.backward()
File "/usr/local/lib/python3.9/site-packages/mxnet/ndarray/ndarray.py", line 2864, in backward
check_call(_LIB.MXAutogradBackwardEx(
File "/usr/local/lib/python3.9/site-packages/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
File "../src/ndarray/ndarray.cc", line 650
MXNetError: Check failed: !is_view:

To Reproduce

code

from mxnet import init
from mxnet.context import cpu
from mxnet.gluon import nn, loss, Trainer
from mxnet.gluon.block import HybridBlock
from mxnet.gluon.nn import BatchNorm

import mxnet as mx

class BuggyModel(HybridBlock):

    def __init__(
        self,
        channels,
        norm_layer=BatchNorm,
        norm_kwargs=None,
        in_channels=3,
        **kwargs
    ):
        super(BuggyModel, self).__init__(**kwargs)
        self.in_channels = in_channels
        with self.name_scope():
            self.conv1 = nn.Conv3D(
                    in_channels=self.in_channels,
                    channels=channels,
                    kernel_size=(1, 7, 7),
                    strides=(1, 2, 2),
                    padding=(0, 3, 3),
                    use_bias=False,
                    )
            self.bn1 = norm_layer(in_channels=channels, **({} if norm_kwargs is None else norm_kwargs))

    def hybrid_forward(self, F, x):
        """Hybrid forward of R2+1D net"""
        x = self.conv1(x)
        x = self.bn1(x)
        return x

print(f"Is MKLDNN enabled: {mx.runtime.Features().is_enabled('MKLDNN')}")

print(f"input channel of 45")
net = BuggyModel(channels=45)
net.initialize(init=init.Constant(1))
l2_loss = loss.L2Loss()
trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
with mx.autograd.record():
    output = net(input_data)
    target_data = mx.nd.ones(output.shape, ctx=mx.cpu())
    l = l2_loss(output, target_data)
l.backward()

print(f"input channel of 45, {output.shape}")

print(f"input channel of 64")
net = BuggyModel(channels=64)
net.initialize(init=init.Constant(1))
input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
l2_loss = loss.L2Loss()
trainer = Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

input_data = mx.nd.zeros((1, 3, 8, 160, 160), ctx=mx.cpu())
with mx.autograd.record():
    output = net(input_data)
    target_data = mx.nd.ones(output.shape, ctx=mx.cpu())
    l = l2_loss(output, target_data)
l.backward()
print(f"input channel of 64, {output.shape}")

Steps to reproduce

paste above code to the ./code.py
Run the code with MKLDNN using MXNet Naive Engine: MXNET_ENGINE_TYPE=NaiveEngine python3 ./code.py

Environment

Environment Information

----------Python Info----------
Version      : 3.9.6
Compiler     : Clang 12.0.5 (clang-1205.0.22.9)
Build        : ('default', 'Jun 29 2021 05:25:02')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 21.1.3
Directory    : /usr/local/lib/python3.9/site-packages/pip
----------MXNet Info-----------
Version      : 1.8.0
Directory    : /usr/local/lib/python3.9/site-packages/mxnet
Commit Hash   : 891d36c2d1c28f9486ec34ce4a7812e27896acef
891d36c2d1c28f9486ec34ce4a7812e27896acef
891d36c2d1c28f9486ec34ce4a7812e27896acef
Library      : ['/usr/local/lib/python3.9/site-packages/mxnet/libmxnet.dylib']
Build features:
✖ CUDA
✖ CUDNN
✖ NCCL
✖ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✖ CPU_SSE4_2
✖ CPU_SSE4A
✖ CPU_AVX
✖ CPU_AVX2
✖ OPENMP
✖ SSE
✖ F16C
✖ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✔ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✖ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------Environment----------
KMP_DUPLICATE_LIB_OK="True"
KMP_INIT_AT_FORK="FALSE"

The text was updated successfully, but these errors were encountered:

github-actions · 2022-01-28T15:10:03Z

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

anko-intel · 2022-01-31T10:39:20Z

@mxnet-label-bot add [MKLDNN]

anko-intel · 2022-03-17T10:20:02Z

@Adam1105 could you close the issue?

Adam1105 · 2022-03-18T13:24:39Z

Thanks, for fixing this! Closing the issue.

Adam1105 added Bug needs triage labels Jan 28, 2022

Adam1105 changed the title ~~v1.8.x MKLDNN BatchNorm with even number of channels - backward call crash with MXNetError: Check failed: !is_view~~ v1.8.x MKLDNN BatchNorm with even number of channels - backward call crashes with MXNetError: Check failed: !is_view Jan 28, 2022

mseth10 added the MKLDNN label Jan 31, 2022

piotrwolinski-intel mentioned this issue Feb 15, 2022

[v1.x] Fix issue with even number of channels in BatchNorm #20895

Merged

sechkova mentioned this issue Feb 17, 2022

mxnet error Trusted-AI/adversarial-robustness-toolbox#1457

Open

piotrwolinski-intel mentioned this issue Feb 22, 2022

[master] Fix issue with even number of channels in BatchNorm #20907

Merged

piotrwolinski-intel mentioned this issue Mar 2, 2022

[v1.9.x] Fix issue with even number of channels in BatchNorm #20927

Merged

Adam1105 closed this as completed Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.8.x MKLDNN BatchNorm with even number of channels - backward call crashes with MXNetError: Check failed: !is_view #20858

v1.8.x MKLDNN BatchNorm with even number of channels - backward call crashes with MXNetError: Check failed: !is_view #20858

Adam1105 commented Jan 28, 2022

github-actions bot commented Jan 28, 2022

anko-intel commented Jan 31, 2022 •

edited

Loading

anko-intel commented Mar 17, 2022

Adam1105 commented Mar 18, 2022

v1.8.x MKLDNN BatchNorm with even number of channels - backward call crashes with MXNetError: Check failed: !is_view #20858

v1.8.x MKLDNN BatchNorm with even number of channels - backward call crashes with MXNetError: Check failed: !is_view #20858

Comments

Adam1105 commented Jan 28, 2022

Description

Error Message

To Reproduce

code

Steps to reproduce

Environment

github-actions bot commented Jan 28, 2022

anko-intel commented Jan 31, 2022 • edited Loading

anko-intel commented Mar 17, 2022

Adam1105 commented Mar 18, 2022

anko-intel commented Jan 31, 2022 •

edited

Loading