Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-43] Fix Jetson compilation #13532

Merged
merged 1 commit into from
Dec 17, 2018
Merged

Conversation

larroy
Copy link
Contributor

@larroy larroy commented Dec 4, 2018

Description

Fix NVidia Jetson builds and add to CI

Was disabled in PR:
#13402

Fixes #13440

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@larroy
Copy link
Contributor Author

larroy commented Dec 4, 2018

@KellenSunderland

@larroy larroy changed the title Fix Jetson compilation [MXNET-43] Fix Jetson compilation Dec 4, 2018
@@ -77,6 +77,7 @@ RUN JETPACK_DOWNLOAD_PREFIX=https://developer.download.nvidia.com/devzone/devcen
dpkg -i --force-architecture $ARM_NVINFER_INSTALLER_PACKAGE && \
dpkg -i --force-architecture $ARM_NVINFER_DEV_INSTALLER_PACKAGE && \
apt update -y || true && apt install -y cuda-libraries-dev-9-0 libcudnn7-dev libnvinfer-dev
RUN cd /usr/include/aarch64-linux-gnu/ && ln -s cudnn_v7.h cudnn.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create the symlink without changing the current work directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that adds additional side effects to this command. If there's no reason to introduce side effects, you generally would want to avoid it. For that symlink, I don't see much reason.
Otherwise, follow up commands might now write to that new path since they assume an unchanged path.

Copy link
Contributor Author

@larroy larroy Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked "why not?" meaning, sure why not. I agreed with you, sorry for the misunderstanding, my bad.

@@ -57,10 +57,10 @@ DEBUG = 0
USE_SIGNAL_HANDLER = 1

# the additional link flags you want to add
ADD_LDFLAGS = -L${CROSS_ROOT}/lib
ADD_LDFLAGS = -L${CROSS_ROOT}/lib -L/usr/lib/aarch64-linux-gnu/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange that this is needed, ${CROSS_ROOT} should be enough... What's the value for ${CROSS_ROOT}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's different, I don't remember. If it wasn't needed I woulnd't have put it there. I checked what you said before I had the same thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be 2 options: host libraries and cross compile libraries. By mixing both we might get in trouble. I suggest to find out why ${CROSS_ROOT}/lib is not working (maybe try without /lib?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like that thought. I believe you Pedro that you checked that, but I think it's still a good idea to have your investigation documented because I also struggled at finding the right headers. Describing your thoughts helps others to understand how and what you did and might also offer the possibility to get maybe improve the solution :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lebeg As I said, CROSS_ROOT != /usr/lib/aarch64-linux-gnu/ I checked before, why do you want me to investigate again? /usr/lib/aarch64-linux-gnu/ is where nvidia packages go, there's no need to investigate further in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/lib/aarch64-linux-gnu/ is where nvidia packages go, there's no need to investigate further in my opinion.

I think that nvidia packages (if they are not in ${CROSS_ROOT}/lib) should be installed or symlinked there instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/usr/lib/aarch64-linux-gnu/ is too generic (based on the name alone) to contain only CUDA libraries. Therefore it's confusing and is complicating the build in my opinion, which can lead to host/target library mixtures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I would suggest you send a PR a posteriori improving this. I think the Jetson & dockcross build is a bit hacky anyway. My PR fixes cross compilation, further refactor and improvements are out scope of this fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu what should I document? tomorrow the nvidia packages will put the library in some other folder and this will break again and the comment will be outdated. You go into the container and look for the library there's not much to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's get @KellenSunderland input as he is more knowledgeable of Jetson and dockcross.

Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you rebase it once more to make sure it's up-to-date with CI changes. I'll keep an eye on it and merge when it passes.

@larroy
Copy link
Contributor Author

larroy commented Dec 13, 2018

Done. Can we merge?

@larroy
Copy link
Contributor Author

larroy commented Dec 16, 2018

@marcoabreu can we merge? @KellenSunderland

@KellenSunderland KellenSunderland merged commit 48e25c4 into apache:master Dec 17, 2018
KellenSunderland added a commit that referenced this pull request Dec 17, 2018
aaronmarkham pushed a commit that referenced this pull request Dec 18, 2018
* Revert "remove omp which can cause ssd accuracy variance (#13622)"

This reverts commit 655f1c6.

* Revert "Fix Jetson compilation (#13532)"

This reverts commit 48e25c4.
mseth10 pushed a commit to mseth10/incubator-mxnet that referenced this pull request Dec 18, 2018
mseth10 pushed a commit to mseth10/incubator-mxnet that referenced this pull request Dec 18, 2018
* Revert "remove omp which can cause ssd accuracy variance (apache#13622)"

This reverts commit 655f1c6.

* Revert "Fix Jetson compilation (apache#13532)"

This reverts commit 48e25c4.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nvidia Jetson compilation in CI is broken
5 participants