-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have unit test for it?
@stu1130 I dont think theres an easy way to have a failing unit test. I tried making a small graph with the same structure, but it didnt error out. I dont think it would be good to use the faster-rcnn model (that produces this error as in #14727) as a unit test. @reminisce or @zheng-da or @ZhennanQin can you help us come up with a failing unit test for this problem? |
@samskalicky I think you can use the diagram drawn in this picture as a unit test. The nodes just represent unary or binary ops. You can use some concrete operators such as |
@reminisce heres what i tried, to reproduce the example graph from the issue, but it does not fail:
|
@samskalicky You just need to check the order of input names. See the following script import os
import ctypes
import mxnet as mx
from collections import namedtuple
from mxnet.base import _LIB, check_call, c_str, mx_uint, c_str_array, SymbolHandle
Batch = namedtuple('Batch', ['data'])
#setup whitelist
op_names = ['elemwise_mul', '_plus', '_Plus']
#os.environ['MXNET_SUBGRAPH_BACKEND'] = 'default'
#setup graph
e = mx.sym.var('e')
f = mx.sym.var('f')
c = mx.sym.var('c')
h = mx.sym.var('h')
k = mx.sym.var('k')
d = e * f
b = c * d
g = h * d
j = k + g
a = b + j
sym = a
print(a.list_inputs())
# result: ['c', 'e', 'f', 'k', 'h']
subgraph_backend = 'default'
out = SymbolHandle()
check_call(_LIB.MXBuildSubgraphByOpNames(sym.handle, c_str(subgraph_backend), mx_uint(len(op_names)),
c_str_array(op_names), ctypes.byref(out)))
psym = mx.sym.Symbol(out)
print(psym.list_inputs())
# result: ['c', 'e', 'f', 'h', 'k'] |
Thanks @reminisce ! thats a better way to check. But I just realized that my test code was missing the whole point of the problem (shapes mismatching) because I made all the data shapes (1)! Heres code with different shapes that does indeed fail:
The failure is this message:
|
@samskalicky You are right. I missed the point too in the first place. You can add the failing test case to the unit test set to validate your fix. Thanks. |
@reminisce @stu1130 @ZhennanQin im unable to make a unit test for this. The code I showed above is erroring out irrespective of the bug/fix. Shape propagation in general is extremely restrictive and not does not result in the shapes im looking for. Can someone help provide a working unit test? |
@samskalicky What's the error message did you still see with this fix? |
1 similar comment
@samskalicky What's the error message did you still see with this fix? |
@reminisce @pinaraws its the same error message as above. The simple example doesnt work because shape propagation cannot infer the different shapes correctly. With such a simple example there is no way to infer the specific shapes we're looking for. So rather than hold up this PR further, I just added the fasterRCNN test as the unit test for this issue. |
@samskalicky What's the path forward for this PR ? |
@reminisce is this PR good to go? |
@samskalicky Could you please address the review comments? Thanks! |
@samskalicky Bouncing for an update ... |
@samskalicky Gentle ping .. |
1c4f86a
to
16d2c08
Compare
@pengzhao-intel Im getting a weird failure for the MKL test_subgraph.py test, but all the other tests are passing. Heres one of the failing tests (from the unix-cpu job)
Can someone from the Intel team help debug? |
Yes, we are on the vocation and will take a look next week. |
@samskalicky I believe #15518 is able to address this issue. Can you try nightly build to see if the original network still has this trouble? Thanks. |
@ZhennanQin does #15518 address |
@ChaiBapchya |
Description
fixes #14727
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments