Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG #8013: Remove register_alter_op_layout example from dev/use_pass_infra.py #9076

Merged
merged 2 commits into from
Sep 23, 2021

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented Sep 22, 2021

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:

  • Examples using tensorflow will gobble up GPU memory and not give it back.
  • Any examples which use any of the (many!) global registration mechanisms
    need to ensure the registrant is safe across all tutorials.
    I recall seeing a thread with the sphinx_gallery where they said they'd prefer
    not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.

…_pass_infra.py

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:
 - Examples using tensorflow will gobble up GPU memory and not give it back.
 - Any examples which use any of the (many!) global registration mechanisms
   need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.
I should have run locally, there goes 6hrs of CI.
@jroesch
Copy link
Member

jroesch commented Sep 23, 2021

@mbs-octoml can we just put a backlog item on fixing the tutorial? going to merge for CI

@junrushao junrushao merged commit e887286 into apache:main Sep 23, 2021
@junrushao
Copy link
Member

Thanks @mbs-octoml @jroesch!

@mbs-octoml mbs-octoml deleted the mbs-issue-9013 branch September 23, 2021 21:03
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…_pass_infra.py (apache#9076)

* BUG apache#8013: Remove register_alter_op_layout example from dev/use_pass_infra.py

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:
 - Examples using tensorflow will gobble up GPU memory and not give it back.
 - Any examples which use any of the (many!) global registration mechanisms
   need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.

* [checkpoint] Woops, forgot there was an extra AlterOpLayout

I should have run locally, there goes 6hrs of CI.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…_pass_infra.py (apache#9076)

* BUG apache#8013: Remove register_alter_op_layout example from dev/use_pass_infra.py

This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.

Better would be to register a transform for an entirely fake target, but
that is beyond my current level of expertise.

In general our use of sphinx/sphinx_gallery for running and rendering the
tutorials is highly suspect since there is no inter-example isolation:
 - Examples using tensorflow will gobble up GPU memory and not give it back.
 - Any examples which use any of the (many!) global registration mechanisms
   need to ensure the registrant is safe across all tutorials.
I recall seeing a thread with the sphinx_gallery where they said they'd prefer
not to work on process-level isolation, but it's probably worth pinging again.

While digging into this I noticed we had a slicing cast in AlterOpLayout due
to a derived class of ObjectRef introducing virtuals. I moved the virtuals to
the corresponding Node classes. In this case we got away with it since the
ObjectRef happened to not get copied but we were on very thin ice.

* [checkpoint] Woops, forgot there was an extra AlterOpLayout

I should have run locally, there goes 6hrs of CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants