Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Upgrade archive utility and add back FC improvement #15171

Merged
merged 10 commits into from
Jun 17, 2019

Conversation

anirudh2290
Copy link
Member

@anirudh2290 anirudh2290 commented Jun 7, 2019

This reverts commit 6543488.
Also upgrades archive utility and fixes: #15084

Description

  • Archive utility version 2.26 doesn't support sizes greater than 4GB. With this FC PR and the compile flags mentioned in the issue, the libmxnet.a archive grows beyond 4 GB.
  • The PR uses a newer archive utility version which handles such archive sizes. The current utility version wont support it, please see : https://sourceware.org/bugzilla/show_bug.cgi?id=14625
  • Upgrading this archive utility means that we are forcing the users to use this version of archive utility with make to build mxnet with this set of build flags.
  • As discussed with @larroy offline there is a bigger problem related to code bloat which should be addressed and I have opened an issue for this : Reduce MXNet Code bloat #15182
  • Having said that, I am suggesting that we update archive utility and update our docs. The user impact is all developers/end users who build from source and use this combination of flags. They have to upgrade their ar utility. In ubuntu 16.04 it may not be easy (build from source is one way). I am not aware of a better solution in the short term so that we can continue adding/modifying mxnet code. Other suggestions are welcome.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@anirudh2290 anirudh2290 changed the title Upgrade archive utility and add back FC improvement [WIP] Upgrade archive utility and add back FC improvement Jun 7, 2019
@anirudh2290 anirudh2290 marked this pull request as ready for review June 7, 2019 21:44
@anirudh2290 anirudh2290 requested a review from szha as a code owner June 7, 2019 21:44
@piyushghai
Copy link
Contributor

@mxnet-label-bot Add [pr-work-in-progress, Backend]

@marcoabreu marcoabreu added Backend Issues related to the backend of MXNet pr-work-in-progress PR is still work in progress labels Jun 7, 2019
@anirudh2290 anirudh2290 changed the title [WIP] Upgrade archive utility and add back FC improvement Upgrade archive utility and add back FC improvement Jun 7, 2019
@anirudh2290 anirudh2290 added pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress labels Jun 10, 2019
@anirudh2290
Copy link
Member Author

@larroy @samskalicky @sandeep-krishnamurthy @Chancebair please help review.

Copy link
Contributor

@samskalicky samskalicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @anirudh2290!

@@ -180,6 +180,8 @@ More information on turning these features on or off are found in the following
There is a configuration file for make,
[`make/config.mk`](https://github.com/apache/incubator-mxnet/blob/master/make/config.mk), that contains all the compilation options. You can edit it and then run `make` or `cmake`. `cmake` is recommended for building MXNet (and is required to build with MKLDNN), however you may use `make` instead. For building with Java/Scala/Clojure, only `make` is supported.

**NOTE:** When certain set of build flags are set, MXNet archive increases to more than 4 GB. Since MXNet uses archive internally archive runs into a bug ("File Truncated": [bugreport](https://sourceware.org/bugzilla/show_bug.cgi?id=14625)) for archives greater than 4 GB. Please use ar version 2.27 or greater to overcome this bug. Please see https://github.com/apache/incubator-mxnet/issues/15084 for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explicitly give the flags that causes such bloat, so that, not all users building from source need to worry new archive utility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the most significant part comes from the GPU code, and it's already an explicit switch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes most significant part comes from GPU code. If I provide set of flags that are failing today, no guarantee that tomorrow it would not bloat and not fail for a different set of flags (which includes USE_CUDA=1 and USE_CUDNN=1) and then these instructions may quickly become outdated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to detect this case so users are not faced with a strange error message but specifically prompted to upgrade ar? I'm afraid that users might not see this specific part of our documentation and just get frustrated. If our build pipeline would automatically notify them (ideally ahead of time or) after the error occured, that would be beneficial to the user experience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu I have added a warning message for users if the ar is smaller than 2.27

@edisongustavo
Copy link
Contributor

I don't think it's a good idea to download a custom binutils and use it on CI. It might break in many unexpected ways :/

It also means that MXNet has to stop stating that it supports Ubuntu 16.04. The requirement should be lift to at least 18.04 then.

If the community agrees, then we probably should embark on the quest of upgrading the AMI to the latest Ubuntu 18.04.

@marcoabreu you might want to look at this.

@samskalicky
Copy link
Contributor

@edisongustavo We cannot drop support for 16.04 since its LTS period ends April 2021: https://ubuntu.com/about/release-cycle

@anirudh2290
Copy link
Member Author

@edisongustavo As I suggested this is a short term solution to unblock the PRs. I imagine that upgrading the AMI would probably take longer and also needs more thought. Note that every day lost here is going to cause more pain for devs pushing code to MXNet.

@anirudh2290
Copy link
Member Author

@szha @marcoabreu @sandeep-krishnamurthy Is this good to merge ?

@anirudh2290
Copy link
Member Author

thanks @marcoabreu . will merge unless someone has objections.

@anirudh2290 anirudh2290 merged commit cab1dfa into apache:master Jun 17, 2019
For more info see: https://github.com/apache/incubator-mxnet/issues/15084)
$(shell sleep 5)
endif
endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the corresponding modification to CMake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont yet need it for cmake. we have encountered this issue only with make.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird. Thanks for your answer. I would expect to have similar amount of object code with both builds.

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Upgrade archive utility and add back FC improvement

This reverts commit 6543488.

* Change permissions for Ubuntu AR

* Extract and cd into binutils dir

* Allow AR path to be chosen by user

* Add AR path to build

* Fix AR paths

* Revert AR flag in makefile

* Build from source doc updated

* Add comment

* Add warning for smaller ar versions, add set -ex
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MXNet GPU MKLDNN Build failure
8 participants