Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Floating point exception in mxnet.ndarray.op.SequenceLast #18938

Closed
leeyeetonn opened this issue Aug 16, 2020 · 5 comments · Fixed by #19833
Closed

Floating point exception in mxnet.ndarray.op.SequenceLast #18938

leeyeetonn opened this issue Aug 16, 2020 · 5 comments · Fixed by #19833
Labels
Bug C++ Related to C++ good first issue Operator v1.x Targeting v1.x branch

Comments

@leeyeetonn
Copy link

Description

(A clear and concise description of what the bug is.)
mxnet.ndarray.op.SequenceLast has floating point exception when given data's shape containing 0. Please see the provided code for example.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

Floating point exception (core dumped)

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

import mxnet
import numpy as np
data = mxnet.nd.array(np.random.rand(1,0,0))
mxnet.ndarray.op.SequenceLast(data)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. run the provided code in python interpreter or as a script

What have you tried to solve it?

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

Got 404 when trying to get the script.

Some environment information:

  • OS: ubuntu 18.04
  • Python: 3.7.6
  • pip: 20.0.2
  • numpy: 1.18.5
  • mxnet: 1.6.0
@szha
Copy link
Member

szha commented Aug 21, 2020

So here's the problem:

% DMLC_LOG_STACK_TRACE_DEPTH=150 MXNET_ENGINE_TYPE=NaiveEngine lldb python3.7 -- test_18938.py
(lldb) target create "python3.7"
Current executable set to 'python3.7' (x86_64).
(lldb) settings set -- target.run-args  "test_18938.py"
(lldb) run
Process 45668 launched: '/usr/local/bin/python3.7' (x86_64)
Process 45668 stopped
* thread #2, stop reason = exec
    frame #0: 0x0000000100006000 dyld`_dyld_start
dyld`_dyld_start:
->  0x100006000 <+0>: popq   %rdi
    0x100006001 <+1>: pushq  $0x0
    0x100006003 <+3>: movq   %rsp, %rbp
    0x100006006 <+6>: andq   $-0x10, %rsp
(lldb) cont
Process 45668 resuming
[23:29:59] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
Process 45668 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)
    frame #0: 0x0000000115e85453 libmxnet.dylib`mxnet::op::SequenceLastOp<mshadow::cpu, float, int>::Forward(this=0x000000010066a490, ctx=0x00007ffeefbfc2b0, in_data=0x00000001006dcf08, req=0x00007ffeefbfc310, out_data=0x00000001006dcf50, aux_args=0x00000001006dcf38) at sequence_last-inl.h:158
   155
   156 	    auto batch = (axis != 0) ? d0 : d1;
   157 	    auto max_seq_len = in_data[seq_last::kData].size(axis);
-> 158 	    auto rest_size = dsize / (d0 * d1);
   159
   160 	    Tensor<xpu, 3, DType> data =
   161 	        in_data[seq_last::kData].get_with_shape<xpu, 3, DType>(

https://github.com/apache/incubator-mxnet/blob/9bdd4d6347c284770ee5bfe5ae98f1dabc283829/src/operator/sequence_last-inl.h#L158

The code needs to guard against zero-size array for right operand of /, and we should add a smoke test to guard against such problem in this op, similar to https://github.com/apache/incubator-mxnet/pull/18972/files

r3stl355 pushed a commit to r3stl355/incubator-mxnet that referenced this issue Feb 3, 2021
fix mlkdnn version

fix apache#18938
r3stl355 pushed a commit to r3stl355/incubator-mxnet that referenced this issue Feb 3, 2021
fix mlkdnn version

fix apache#18938
r3stl355 pushed a commit to r3stl355/incubator-mxnet that referenced this issue Feb 3, 2021
r3stl355 pushed a commit to r3stl355/incubator-mxnet that referenced this issue Feb 5, 2021
@r3stl355
Copy link
Contributor

r3stl355 commented Feb 6, 2021

This was fixed on master, PR waiting to be reviewed: #19833

@r3stl355
Copy link
Contributor

r3stl355 commented Feb 6, 2021

An additional caveat for this one, what's the expected return of SequenceLast if only the first dimension of the input is zero sized? In the current fix, it would end up returning a non-zero sized array if only the first dimension is zero sized, should it throw an exception instead? E.g. if input shape is [0, 2, 3], the fix will return a shape [2, 3] with some random values

szha pushed a commit that referenced this issue Feb 7, 2021
@szha
Copy link
Member

szha commented Feb 7, 2021

@r3stl355 thanks for the fix! Since any array with a 0-dim is an empty array, I think the expected return array would still be of shape [0, 2, 3] without any data in it.

@r3stl355
Copy link
Contributor

r3stl355 commented Feb 7, 2021

Hey @szha, thank you for merging this. As for the expected shape of the returned array, SequenceLast by design returns an array with one dimension less than the input array, e.g. if input has shape [2, 3, 4], SequenceLast will return an array of shape [3, 4]. However, this becomes tricky if the first dimension of the input array is zero because following the same logic, SequenceLast will end up returning a non empty array given an empty array, e.g. given an array with shape [0, 3, 4], it will still return an array of shape [3, 4].

r3stl355 pushed a commit to r3stl355/incubator-mxnet that referenced this issue Feb 9, 2021
szha pushed a commit that referenced this issue Apr 30, 2021
* fix #18938

* fix #18939, #18940

* fix #18936 and #18937

Co-authored-by: r3stl355 <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug C++ Related to C++ good first issue Operator v1.x Targeting v1.x branch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants