Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Floating point exception in mxnet.ndarray.op.random_pdf_poisson #18937

Closed
leeyeetonn opened this issue Aug 16, 2020 · 5 comments
Closed

Floating point exception in mxnet.ndarray.op.random_pdf_poisson #18937

leeyeetonn opened this issue Aug 16, 2020 · 5 comments
Labels

Comments

@leeyeetonn
Copy link

Description

(A clear and concise description of what the bug is.)
mxnet.ndarray.op.random_pdf_poisson has floating point exception when given lam is shape (0,). Please see the provided code snippet for example.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

Floating point exception (core dumped)

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

import mxnet
import numpy as np
lam = mxnet.nd.array(np.random.rand(0))
sample = mxnet.nd.array(np.random.rand(2))
mxnet.ndarray.op.random_pdf_poisson(sample=sample, lam=lam)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. run the provided code in python interpreter or as a script

What have you tried to solve it?

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

Got 404 when trying to get the script.

Some environment information:

  • OS: ubuntu 18.04
  • Python: 3.7.6
  • pip: 20.0.2
  • numpy: 1.18.5
  • mxnet: 1.6.0
@szha
Copy link
Member

szha commented Aug 21, 2020

Here's the problem:

% DMLC_LOG_STACK_TRACE_DEPTH=150 MXNET_ENGINE_TYPE=NaiveEngine lldb python3.7 -- test_18937.py
(lldb) target create "python3.7"
Current executable set to 'python3.7' (x86_64).
(lldb) settings set -- target.run-args  "test_18937.py"
(lldb) run
Process 36591 launched: '/usr/local/bin/python3.7' (x86_64)
Process 36591 stopped
* thread #2, stop reason = exec
    frame #0: 0x0000000100006000 dyld`_dyld_start
dyld`_dyld_start:
->  0x100006000 <+0>: popq   %rdi
    0x100006001 <+1>: pushq  $0x0
    0x100006003 <+3>: movq   %rsp, %rbp
    0x100006006 <+6>: andq   $-0x10, %rsp
(lldb) cont
Process 36591 resuming
[23:22:22] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
[23:22:22] ../src/storage/storage.cc:198: Using Pooled (Naive) StorageManager for CPU
Process 36591 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)
    frame #0: 0x0000000115bca540 libmxnet.dylib`mxnet::op::PdfCaller<mshadow::cpu, float, mxnet::op::PDF_Poisson<false>, 1, false>::op(inputs=0x00007ffeefbfcb50, outputs=0x00007ffeefbfcb30, s=0x00000001272d39c9) at pdf_op.h:469
   466 	  static void op(const std::vector<TBlob>& inputs,
   467 	                 const std::vector<TBlob>& outputs,
   468 	                 mshadow::Stream<xpu> *s) {
-> 469 	    CHECK_EQ(inputs[0].Size()%inputs[1].Size(), 0);
   470 	    CHECK_EQ(inputs[0].Size()%outputs[0].Size(), 0);
   471 	    index_t num_samples(inputs[0].Size() / inputs[1].Size());
   472 	    mxnet_op::Kernel<LaunchExWrapper<pdf>, xpu>::LaunchEx(s, outputs[0].Size(), num_samples,

https://github.com/apache/incubator-mxnet/blob/9bdd4d6347c284770ee5bfe5ae98f1dabc283829/src/operator/random/pdf_op.h#L469

The code needs to guard against zero-size array for right operand of %, and we should add a smoke test to guard against such problem in this op, similar to https://github.com/apache/incubator-mxnet/pull/18972/files.

@szha
Copy link
Member

szha commented Aug 21, 2020

@xidulu same question as #18936 (comment), since we are deprecating ndarray in favor of np/npx, do we need to register an alias of this op in np/npx? (or is it already registered)

@xidulu
Copy link
Contributor

xidulu commented Aug 21, 2020

@szha
As far as I am concerned, pdf ops are not registered under npx yet and I don't think its that necessary because:

  1. This series of ops do not have very good support for numpy's broadcasting and zero-dim/size (as is shown in this issue) semantics, which could cause great confusion for users.
  2. Secondly, if users would have access to pdf ops, they could now use the pdf operator inside the probability module.

@szha szha added the v1.x Targeting v1.x branch label Aug 21, 2020
@xidulu
Copy link
Contributor

xidulu commented Aug 21, 2020

Btw, a possible solution for this bug could be adding a zero-size check (e.g. https://github.com/apache/incubator-mxnet/blob/master/src/operator/numpy/random/np_normal_op.h#L320) before the kernel launch: https://github.com/apache/incubator-mxnet/blob/master/src/operator/random/pdf_op.h#L514

r3stl355 pushed a commit to r3stl355/incubator-mxnet that referenced this issue Feb 10, 2021
szha pushed a commit that referenced this issue Apr 30, 2021
* fix #18938

* fix #18939, #18940

* fix #18936 and #18937

Co-authored-by: r3stl355 <[email protected]>
@szha
Copy link
Member

szha commented Apr 30, 2021

fixed in the above PR

@szha szha closed this as completed Apr 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants