Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Dynamic Library Loading Support #15489

Merged
merged 38 commits into from
Aug 5, 2019
Merged

Conversation

mseth10
Copy link
Contributor

@mseth10 mseth10 commented Jul 8, 2019

Description

This PR is part of a larger effort of adding new functionality to MXNet that was compiled separately from MXNet. See here and here for more info.

This PR is the first step that enables libraries to be dynamically loaded into MXNet at runtime .

Features:

  • Dynamic library loading functions for both Linux and Windows to load libraries and access its functions
  • MXLoadLib that loads and initializes a library
  • A new Python API: mx.library.load that calls the new C-API
  • Enhancements to both the Makefile and CMake flows to link with the dynamic loader library (dl)

This PR includes the following:

  • An example, showing how to write a sample library in C++ (mylib.cc) with its own Makefile, a unittest to verify correctness outside of MXNet (libtest.cc), and a test script showing how to load the library in MXNet (test.py).
  • A unit test that loads a precompiled library into MXNet and initializes it. It runs as part of the CI for both Linux and Windows.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mseth10 mseth10 force-pushed the acc_api branch 8 times, most recently from cc55e0d to 94fda6e Compare July 10, 2019 00:50
@mseth10 mseth10 requested a review from szha as a code owner July 11, 2019 01:02
@mseth10 mseth10 force-pushed the acc_api branch 14 times, most recently from 671df6f to ddd0e59 Compare July 12, 2019 21:59
@mseth10 mseth10 requested a review from rahul003 as a code owner July 12, 2019 21:59
@mseth10 mseth10 force-pushed the acc_api branch 3 times, most recently from 448e950 to c490c23 Compare July 15, 2019 09:18
@karan6181
Copy link
Contributor

@mxnet-label-bot add [backend, C API, pr-work-in-progress]

@samskalicky
Copy link
Contributor

samskalicky commented Aug 3, 2019

@leleamol Can you please review again to see if the implementation of the dlclose matches what you were thinking?

@anirudh2290 Thanks for the help with the engine destructor info, can you take a look at the changes to the engines:
https://github.com/apache/incubator-mxnet/blob/134cc37d6b306514fde1a405d3b87cc946a56bd1/src/engine/naive_engine.cc#L88-L91
https://github.com/apache/incubator-mxnet/blob/134cc37d6b306514fde1a405d3b87cc946a56bd1/src/engine/threaded_engine.h#L335-L338

@szha szha dismissed their stale review August 4, 2019 01:25

concern no longer applies

Copy link
Contributor

@leleamol leleamol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

LocalFree(err_msg);
}
#else
*func = dlsym(handle, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.. Sounds good.

Copy link
Contributor

@rondogency rondogency left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eric-haibin-lin eric-haibin-lin merged commit 3112893 into apache:master Aug 5, 2019
marcoabreu added a commit that referenced this pull request Aug 5, 2019
elif (os.name=='nt'):
lib = 'mylib.dll'

fname = mx.test_utils.download('https://mxnet-demo-models.s3.amazonaws.com/lib_binary/'+lib)
Copy link
Contributor

@marcoabreu marcoabreu Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mseth10

Unfortunately, this test introduces a security vulnerability. Please replace it as soon as possible as this poses a security threat to our system as well as our users since this allows arbitrary code execution. I'd appreciate it if you would provide a replacement within the next 24h. #15755 is the corresponding revert PR.

Copy link
Contributor

@larroy larroy Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested before to build this stub during the build process instead of loading from S3. See previous comment above: https://github.com/apache/incubator-mxnet/pull/15489/files#r308932209

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@samskalicky
Copy link
Contributor

samskalicky commented Aug 5, 2019

Thanks @marcoabreu for the catch. As we discussed offline, the problem is not with the feature in this PR but with how test_library_loading is pulling from the S3 bucket in the CI.

We'll work to build the library as part of the build so that it can be used internally in testing the CI without security risks. We'll move the test pulling from the S3 bucket to the examples directory so that it doesnt run in the CI.

We may need some help @marcoabreu to get this working in the CI, since we'll need to build this library and then find the path where it lives during the test so we can pass the path to the load API.

In the meantime, we'll make these changes and then resubmit this PR.

@marcoabreu
Copy link
Contributor

We'll move the test pulling from the S3 bucket to the examples directory so that it doesnt run in the CI.

Could you elaborate?

@samskalicky
Copy link
Contributor

samskalicky commented Aug 5, 2019

@marcoabreu we added an example directory at: example/lib_api. Currently the test.py includes a test that uses the locally built library (built using the included Makefile). But this only provides an example for how to build a library.

We'll create another example python script that pulls from the S3 bucket and loads that one. But it will just be an example, not a unittest that will run in the CI.

Do you have other ideas/suggestions on how we can improve testing or provide more examples?

@marcoabreu
Copy link
Contributor

No pulling of any prebuilt binaries due to the same reasons, please. I understand you trying to make it more convenient, but I think that we're stepping over a line in that case. Give instructions how to compile the example and the user should be fine.

Just to confirm: we will have a unit test running in ci at the end of the day, correct?

@samskalicky
Copy link
Contributor

@marcoabreu I think we're in agreement here. We will be distributing some pre-built libraries in an S3 bucket as examples. We will provide documented examples for how to pull and load these binaries. But these will not be tests that will be automatically run. They will be examples.

Yes, we will have the unit test running in the CI that uses a library that is built in the CI.

wkcn pushed a commit that referenced this pull request Aug 7, 2019
* Revert "Dynamic Library Loading Support (#15489)"

This reverts commit 3112893.

* Retrigger CI

* CI

* retrigger CI

* retrigger CI

* retrigger CI
@wkcn
Copy link
Member

wkcn commented Aug 7, 2019

I have a suggestion. We can check the sha1_hash code of pre-built binary files in the S3 bucket before executing them.

@marcoabreu
Copy link
Contributor

marcoabreu commented Aug 7, 2019

While that eliviates the problem from a ci perspective (if the bucket would be owned by the CI owners), it's still a red flag for our users.

I'm afraid that companies prefer to stay away from the execution of an arbitrary native library that is downloaded from the internet - no matter whether it is backed by a hash or not. It's inevitable in some cases, but we want to reduce the possiblity of compliance scanners flagging us and thus making the usage of the MXNet more complicated than it has to be.

I think the only acceptable way is if the library is getting produced as part of the build process that is entirely retrieved from source. These are also the tenants Apache employs - Apache project must not make any binary releases in any form. Thus let's best stick to that as well :)

mseth10 added a commit to mseth10/incubator-mxnet that referenced this pull request Aug 7, 2019
* Accelerator APIs header file

* adding example to test accelerator loading

* adding c_api function header

* modifying header file path

* creating templates for call to lib fns

* modifying target to mxnet_static for libdl

* rebaseing with master

* returning nullptr handle if library not loaded

* refactoring code to load libraries dynamically

* addressing review comments

* using static cast

* pylint fix

* moving library.h file to src/common/

* adding dynamic loading support for windows

* updating header guard

* fixing headers

* fixing windows casting error

* declaring library functions for windows

* adding library testing module in examples

* adding unit test to test library loading

* correcting file names

* updating error messages

* getting error message from DL library

* adding unit test to gpu suite

* correcting windows pointer

* requiring absolute path to library

* changing file description

* addressing review comments - adding more docs, windows error msg

* addressing PR comments

* checking machine type for unit test

* “re-trigger”

* added map to store loaded libraries

* added dlclose calls in naive & threaded engines

* removed library map declaration in cc file

* added windows free

* fixed formatting

* added cast to HMODULE for void* for windows

* retrigger CI for flaky unix_cpu
zachgk pushed a commit that referenced this pull request Aug 8, 2019
* Dynamic Library Loading Support (#15489)

* Accelerator APIs header file

* adding example to test accelerator loading

* adding c_api function header

* modifying header file path

* creating templates for call to lib fns

* modifying target to mxnet_static for libdl

* rebaseing with master

* returning nullptr handle if library not loaded

* refactoring code to load libraries dynamically

* addressing review comments

* using static cast

* pylint fix

* moving library.h file to src/common/

* adding dynamic loading support for windows

* updating header guard

* fixing headers

* fixing windows casting error

* declaring library functions for windows

* adding library testing module in examples

* adding unit test to test library loading

* correcting file names

* updating error messages

* getting error message from DL library

* adding unit test to gpu suite

* correcting windows pointer

* requiring absolute path to library

* changing file description

* addressing review comments - adding more docs, windows error msg

* addressing PR comments

* checking machine type for unit test

* “re-trigger”

* added map to store loaded libraries

* added dlclose calls in naive & threaded engines

* removed library map declaration in cc file

* added windows free

* fixed formatting

* added cast to HMODULE for void* for windows

* retrigger CI for flaky unix_cpu

* build library and stash it in CI

* modifying unittest to use CI built binary

* Retrigger CI

* adding dlclose to initialize destructor

* adding sample_lib target to all in Makefile
anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this pull request Aug 20, 2019
* Accelerator APIs header file

* adding example to test accelerator loading

* adding c_api function header

* modifying header file path

* creating templates for call to lib fns

* modifying target to mxnet_static for libdl

* rebaseing with master

* returning nullptr handle if library not loaded

* refactoring code to load libraries dynamically

* addressing review comments

* using static cast

* pylint fix

* moving library.h file to src/common/

* adding dynamic loading support for windows

* updating header guard

* fixing headers

* fixing windows casting error

* declaring library functions for windows

* adding library testing module in examples

* adding unit test to test library loading

* correcting file names

* updating error messages

* getting error message from DL library

* adding unit test to gpu suite

* correcting windows pointer

* requiring absolute path to library

* changing file description

* addressing review comments - adding more docs, windows error msg

* addressing PR comments

* checking machine type for unit test

* “re-trigger”

* added map to store loaded libraries

* added dlclose calls in naive & threaded engines

* removed library map declaration in cc file

* added windows free

* fixed formatting

* added cast to HMODULE for void* for windows

* retrigger CI for flaky unix_cpu
anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this pull request Aug 20, 2019
* Revert "Dynamic Library Loading Support (apache#15489)"

This reverts commit 3112893.

* Retrigger CI

* CI

* retrigger CI

* retrigger CI

* retrigger CI
anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this pull request Aug 20, 2019
* Dynamic Library Loading Support (apache#15489)

* Accelerator APIs header file

* adding example to test accelerator loading

* adding c_api function header

* modifying header file path

* creating templates for call to lib fns

* modifying target to mxnet_static for libdl

* rebaseing with master

* returning nullptr handle if library not loaded

* refactoring code to load libraries dynamically

* addressing review comments

* using static cast

* pylint fix

* moving library.h file to src/common/

* adding dynamic loading support for windows

* updating header guard

* fixing headers

* fixing windows casting error

* declaring library functions for windows

* adding library testing module in examples

* adding unit test to test library loading

* correcting file names

* updating error messages

* getting error message from DL library

* adding unit test to gpu suite

* correcting windows pointer

* requiring absolute path to library

* changing file description

* addressing review comments - adding more docs, windows error msg

* addressing PR comments

* checking machine type for unit test

* “re-trigger”

* added map to store loaded libraries

* added dlclose calls in naive & threaded engines

* removed library map declaration in cc file

* added windows free

* fixed formatting

* added cast to HMODULE for void* for windows

* retrigger CI for flaky unix_cpu

* build library and stash it in CI

* modifying unittest to use CI built binary

* Retrigger CI

* adding dlclose to initialize destructor

* adding sample_lib target to all in Makefile
access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request Sep 25, 2019
* Dynamic Library Loading Support (apache#15489)

* Accelerator APIs header file

* adding example to test accelerator loading

* adding c_api function header

* modifying header file path

* creating templates for call to lib fns

* modifying target to mxnet_static for libdl

* rebaseing with master

* returning nullptr handle if library not loaded

* refactoring code to load libraries dynamically

* addressing review comments

* using static cast

* pylint fix

* moving library.h file to src/common/

* adding dynamic loading support for windows

* updating header guard

* fixing headers

* fixing windows casting error

* declaring library functions for windows

* adding library testing module in examples

* adding unit test to test library loading

* correcting file names

* updating error messages

* getting error message from DL library

* adding unit test to gpu suite

* correcting windows pointer

* requiring absolute path to library

* changing file description

* addressing review comments - adding more docs, windows error msg

* addressing PR comments

* checking machine type for unit test

* “re-trigger”

* added map to store loaded libraries

* added dlclose calls in naive & threaded engines

* removed library map declaration in cc file

* added windows free

* fixed formatting

* added cast to HMODULE for void* for windows

* retrigger CI for flaky unix_cpu

* build library and stash it in CI

* modifying unittest to use CI built binary

* Retrigger CI

* adding dlclose to initialize destructor

* adding sample_lib target to all in Makefile
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet C API
Projects
None yet
Development

Successfully merging this pull request may close these issues.