Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Gluon export changing params #19354

Closed
samskalicky opened this issue Oct 15, 2020 · 4 comments
Closed

Gluon export changing params #19354

samskalicky opened this issue Oct 15, 2020 · 4 comments
Labels

Comments

@samskalicky
Copy link
Contributor

Description

When exporting a model from Gluon, some param values can be modified:
https://github.com/apache/incubator-mxnet/blob/ce1e68260eb3cf9219ae4d59df8bdde7361802ec/python/mxnet/gluon/block.py#L1354
The _reduce function here actually performs a sum/division:
https://github.com/apache/incubator-mxnet/blob/ce1e68260eb3cf9219ae4d59df8bdde7361802ec/python/mxnet/gluon/parameter.py#L406

If the data type is floating point, then the resulting IEEE FP standard calls for a rounding operation. Even though summing by 1 and dividing by 1 should result in the same value, it may be slightly altered.

If specific binary data is encoded within a param, then this operation will modify that data and cause corruption.

Possible solution

I propose checking the length of the block, and if it is 1 then skipping the sum/divide operation.

@samskalicky
Copy link
Contributor Author

samskalicky commented Oct 15, 2020

This problem is compounded by the bug in the lambda function used to allocate new tensors for graph passes:
https://github.com/apache/incubator-mxnet/blob/6729cf3bf4edd6837b0feb6417691ce2e00dbcee/src/c_api/c_api.cc#L1417-L1419
where its calling the NDArray constructor:
https://github.com/apache/incubator-mxnet/blob/6729cf3bf4edd6837b0feb6417691ce2e00dbcee/include/mxnet/ndarray.h#L95-L96
and its passing in the dtype to the 3rd argument which is delay_alloc. the 4th argument is actually the dtype and it should be:

NDArray* arr = new NDArray(shape, ctx, false, dtype);

@samskalicky
Copy link
Contributor Author

samskalicky commented Oct 15, 2020

It further compounds in Gluon where creating the new Parameter doesnt set the dtype explicitly:
https://github.com/apache/incubator-mxnet/blob/6729cf3bf4edd6837b0feb6417691ce2e00dbcee/python/mxnet/gluon/block.py#L1002-L1003
This leads to the default dtype mx_real_t which is float32:
https://github.com/apache/incubator-mxnet/blob/ce1e68260eb3cf9219ae4d59df8bdde7361802ec/python/mxnet/gluon/parameter.py#L106
with this error:

Traceback (most recent call last):
  File "test_subgraph.py", line 155, in <module>
    test("myProp")
  File "test_subgraph.py", line 113, in test
    clear=False, backend_opts={'dedup_subgraph':True})
  File "/home/ubuntu/incubator-mxnet/python/mxnet/gluon/block.py", line 1119, in optimize_for
    self._build_cache(x, *args)
  File "/home/ubuntu/incubator-mxnet/python/mxnet/gluon/block.py", line 1003, in _build_cache
    param._load_init(param_data, args[0].context)
  File "/home/ubuntu/incubator-mxnet/python/mxnet/gluon/parameter.py", line 302, in _load_init
    self.name, str(self.dtype), str(data.dtype))
AssertionError: Failed loading Parameter '_op0_input' from saved params: dtype incompatible expected <class 'numpy.float32'> vs saved <class 'numpy.float16'>. Set cast_dtype=True to cast the dtype of saved params.

Instead we should be creating the Parameter like:

param = Parameter(name, dtype=param_data.dtype)

@samskalicky
Copy link
Contributor Author

samskalicky commented Oct 15, 2020

@szha
Copy link
Member

szha commented Feb 8, 2021

@samskalicky thanks for the fix!

@szha szha closed this as completed Feb 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants