Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Gluon] [Fix] Fix HybridBlock when hybridize is not called #16465

Merged
merged 1 commit into from
Oct 20, 2019

Conversation

sxjscience
Copy link
Member

@sxjscience sxjscience commented Oct 13, 2019

Description

When .hybridize() is not called, the forward function of HybridBlock supports a broader range of inputs. For example, the combination of ndarray and scalar values. The following code will run smoothly without error:

from mxnet import gluon
class FooHybrid(gluon.HybridBlock):
    def hybrid_forward(self, F, a, b):
        if isinstance(a, (list, tuple)):
            a = sum(a)
        if isinstance(b, (list, tuple)):
            b = sum(b)
        return a + b
foo_hybrid = FooHybrid()
a = foo_hybrid(mx.nd.ones((10,)), 1)
b = foo_hybrid(mx.nd.ones((10,)), mx.nd.ones((10,)))

However, when hybridized, the second line will trigger an error:

from mxnet import gluon
class FooHybrid(gluon.HybridBlock):
    def hybrid_forward(self, F, a, b):
        if isinstance(a, (list, tuple)):
            a = sum(a)
        if isinstance(b, (list, tuple)):
            b = sum(b)
        return a + b
foo_hybrid = FooHybrid()
foo_hybrid.hybridize()
a = foo_hybrid(mx.nd.ones((10,)), 1)
# line below triggers the error
b = foo_hybrid(mx.nd.ones((10,)), mx.nd.ones((10,)))

#16280 tries to make sure that in both cases, HybridBlock will raise an error. However, it will make the code backward-incompatible. This PR fixes this backward-incompatible problem and adds new tests for such behavior.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • HybridBlock same as Block when not hybridized, test

@sxjscience sxjscience requested a review from szha as a code owner October 13, 2019 23:07
@sxjscience sxjscience changed the title [Gluon] [Fix] Fix HybridBlock when hybridize is not called [Gluon] [Fix] [WIP] Fix HybridBlock when hybridize is not called Oct 14, 2019
@sxjscience
Copy link
Member Author

@leezu

.format([type(ele) for ele in flatten_args]))
is_ndarray = True
exist_sym_nd = True
ctx = ele.context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line introduced in #16280 is not compatible with the previous context handling. Previously, always x.context is used as default context. https://github.com/apache/incubator-mxnet/pull/16280/files#diff-29da832c2145752f3906a2f71c7b63baL982
Does it matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is chosen like this because x can be None now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it should be set to the first non-None argument, not the last?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, all ctxs are supposed to be the same. For example, we should not allow the mixing of cpu and gpu contexts. However, we currently allow to do so because we will need to mix cpu, cpu_pinned, and cpu_shared.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus we should use the first non-None argument not to break backwards compatibility? cpu, cpu_pinned, cpu_shared are different contexts after all

Copy link
Member Author

@sxjscience sxjscience Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leezu I think using the first or last non-None argument does not matter much here. Our goal is to make sure that we will finally pick a meaningful context for the parameters. In fact, the previous implementation has not checked whether the contexts of the arguments are valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the previous implementation hasn't enforced all contexts being equal, we shouldn't start picking a different array to determine the context. As you stated above, it's valid to use a mix of cpu, cpu_pinned, cpu_shared contexts.
For example, after your change, cpu_pinned or cpu_shared may be picked as default context instead of cpu if the user passed a cpu_pinned or cpu_shared as last argument. The extra overhead could cause a performance regression (all parameters will be made available under default context).
No need to risk this given there is no advantage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leezu It's also possible that, previously cpu_pinned is picked as the default argument and after the change, the correct cpu context is picked as the default. My point is we need to probably give special treatment of the cpu, cpu_pinned, cpu_shared. What's your opinion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leezu I agree that the backward-compatible issue is valid. Let me first make it to be backward-compatible. However, this does not fix the issue of the cpu, cpu_pinned, cpu_shared combination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get rid of choosing one array and using it's context as default context. For parameters, users should get the array via self.weight.data(ctx). For the time being I suggest not to break the behaviour, to avoid unintended consequences

raise ValueError('In HybridBlock, there must be one NDArray or one Symbol in the input.'
' Please check the type of the args.\n')
if has_ndarray:
ctx = next(iter(ctx_set))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set is unordered, thus next(iter(ctx_set)) may differ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, should sort it beforehand.

@sxjscience sxjscience changed the title [Gluon] [Fix] [WIP] Fix HybridBlock when hybridize is not called [Gluon] [Fix] Fix HybridBlock when hybridize is not called Oct 18, 2019
fix bug

fix

fix

fix lint

fix

backward-compatible for inferencing the ctx

fix lint

try to improve

try to fix

Update block.py

Revert "Update block.py"

This reverts commit bbcf41f.

Revert "try to fix"

This reverts commit 7d7f35c.

Revert "try to improve"

This reverts commit f510132.

Revert "Revert "try to improve""

This reverts commit 872a7abe34c9afa97eb631fab8c3ce8558a22af8.

Revert "Revert "try to fix""

This reverts commit 48e235dd4c9ee6a88d1a2515a8a1f3e57319c217.

Revert "Revert "Update block.py""

This reverts commit e0d3949245050a4c60c4834db73c764d87fde6f7.

fix

fix lint
@sxjscience sxjscience merged commit 5b67a69 into apache:master Oct 20, 2019
@sxjscience sxjscience deleted the fix_hybrid_block branch October 20, 2019 23:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants