Relocation truncation issues #17045

leezu · 2019-12-11T07:38:47Z

Description

libmxnet.so gets too large (depending on compile options), so that linking fails. This was observed before on CI with test coverage functionality enabled (#15971), but can also happen with non-test-coverage builds, such as -DUSE_INT64_TENSOR_SIZE=ON build.

I first observe this in the #17031 (http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-17031/runs/6/nodes/52/steps/84/log/?start=0), but can easily reproduce it on the master branch when building with GCC 7.4.

Error Message

From the CI

/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libmxnet.so
crtstuff.c:(.text+0x1e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `register_tm_clones':
crtstuff.c:(.text+0x43): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x4a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libmxnet.so
crtstuff.c:(.text+0x6b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_registerTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x92): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x9c): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
crtstuff.c:(.text+0xaa): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o
crtstuff.c:(.text+0xbb): additional relocation overflows omitted from the output
libmxnet.so: PC-relative offset overflow in PLT entry for `_ZN5mxnet2op8mxnet_op6KernelINS0_9pick_gradILi3ELb0EEEN7mshadow3gpuEE6LaunchIJPdS9_PfiiNS5_5ShapeILi3EEESC_EEEvPNS5_6StreamIS6_EEiDpT_'
collect2: error: ld returned 1 exit status
FAILED: : && /tmp/ccache-redirects/g++  -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -std=c++11 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fopenmp -std=c++0x -O3 -DNDEBUG   tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/engine_shutdown_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/thread_local_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/threaded_engine_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/kvstore/gpu_topology_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/libinfo_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/activation_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/batchnorm_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/coreop_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/dropout_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/fully_conn_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/krprod_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/mkldnn_operator_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/mkldnn_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/runner/core_op_runner_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/slice_channel_perf.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/operator/tune/operator_tune_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/storage/storage_test.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cpp/test_main.cc.o tests/CMakeFiles/mxnet_unit_tests.dir/cmake_device_link.o  -o tests/mxnet_unit_tests -L/usr/local/cuda/lib64  -L/work/build/3rdparty/tvm  -L/usr/local/cuda/targets/x86_64-linux/lib -Wl,-rpath,/usr/local/cuda/lib64:/work/build/3rdparty/openmp/runtime/src:/work/build/3rdparty/tvm lib/libgtest.a -Wl,--whole-archive libmxnet.a -Wl,--no-whole-archive 3rdparty/dmlc-core/libdmlc.a /usr/local/cuda/lib64/libnvToolsExt.so /usr/lib/libopenblas.so /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libjemalloc.so /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9 /usr/lib/x86_64-linux-gnu/libopencv_imgproc.so.2.4.9 3rdparty/openmp/runtime/src/libomp.so -lpthread -llapack /usr/lib/x86_64-linux-gnu/libjemalloc.so /usr/lib/x86_64-linux-gnu/libcudnn.so -lcublas -lcufft -lcusolver -lcurand -lnvrtc -lcuda /usr/lib/x86_64-linux-gnu/libprotobuf.so /usr/lib/x86_64-linux-gnu/libzmq.so 3rdparty/ps-lite/libpslite.a -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libzmq.so 3rdparty/ps-lite/libpslite.a -lprotobuf -lrt -lpthread -llapack /usr/lib/x86_64-linux-gnu/libcudnn.so -lcublas -lcufft -lcusolver -lcurand -lnvrtc -lcuda /usr/lib/x86_64-linux-gnu/libprotobuf.so /usr/lib/x86_64-linux-gnu/libzmq.so -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libzmq.so -lprotobuf -ltvm_runtime /usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4.9 -ldl -lpthread -lcudadevrt -lcudart_static -lrt -lpthread -ldl && :
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x8): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `register_tm_clones':
crtstuff.c:(.text+0x49): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x82): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x95): relocation truncated to fit: R_X86_64_PC32 against `.bss'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/engine/engine_shutdown_test.cc.o: In function `EngineShutdown_stop_without_crashing_Test::TestBody()':
engine_shutdown_test.cc:(.text+0xf8): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x130): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x137): relocation truncated to fit: R_X86_64_PC32 against `.bss'
engine_shutdown_test.cc:(.text+0x15d): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
engine_shutdown_test.cc:(.text+0x18d): additional relocation overflows omitted from the output
tests/mxnet_unit_tests: PC-relative offset overflow in PLT entry for `nvrtcGetPTX@@libnvrtc.so.10.1'
collect2: error: ld returned 1 exit status

Compiling master version with GCC on Ubuntu 18.04 (Deep Learning AMI) gives an equivalent error message (though slightly different wording due to GCC vs Clang).

To Reproduce

cmake -DUSE_SIGNAL_HANDLER=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DPython3_EXECUTABLE=/usr/bin/python3 -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_DIST_KVSTORE=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN=52,70 -DUSE_INT64_TENSOR_SIZE=ON ..

on Ubuntu 18.04 (gcc 7.4, ld 2.3), where the CMake options here are taken from the build_ubuntu_gpu_large_tensor CI run.

Environment

Environment used for reproducing the error with master version of MXNet.

----------Python Info----------
Version      : 3.8.0
Compiler     : GCC 7.4.0
Build        : ('default', 'Dec  8 2019 08:07:09')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.2.3
Directory    : /home/ubuntu/.pyenv/versions/3.8.0/lib/python3.8/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /home/ubuntu/src/mxnet-dc/python/mxnet
Num GPUs     : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.15.0-1056-aws-x86_64-with-glibc2.27
system       : Linux
node         : ip-172-31-26-35
release      : 4.15.0-1056-aws
version      : #58-Ubuntu SMP Tue Nov 26 15:14:34 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping:            7
CPU MHz:             3600.024
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-23,48-71
NUMA node1 CPU(s):   24-47,72-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0021 sec, LOAD: 0.3891 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0003 sec, LOAD: 0.3134 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0450 sec, LOAD: 0.0738 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0034 sec, LOAD: 0.0103 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0159 sec, LOAD: 0.1406 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0432 sec, LOAD: 0.3530 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0021 sec, LOAD: 0.0701 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0313 sec, LOAD: 0.1727 sec.

The text was updated successfully, but these errors were encountered:

leezu · 2019-12-11T07:48:38Z

To solve this, I think we can instruct the compiler to always use 64 bit relocations instead of 32 bit relocations (that may overflow), -~~use -O2 (or in the extreme case -Os) instead of -O3 to reduce code bloat~~ [1] or use some linker relaxation techniques.

[1]: Still happens with -O2

junrushao · 2019-12-11T08:31:50Z

My personal experience is that using 64bit relocation is fine on x86-64, so I am in favor of such change :-)

leezu · 2019-12-11T12:13:18Z

Linking master works fine when using ninja instead of make. Not sure about the reason..

leezu · 2019-12-11T14:07:47Z

Looking at the cmake -GNinja -DUSE_SIGNAL_HANDLER=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_TVM_OP=ON -DPython3_EXECUTABLE=/usr/bin/python3 -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_DIST_KVSTORE=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Manual DUSE_INT64_TENSOR_SIZE=ON .. build with #17031, I make the following observations:

"By default" it fails like

libmxnet.a(utils.cc.o): In function `mxnet::common::ExecuteMonInputCallback(nnvm::IndexedGraph const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, unsigned long, std::function<void (char const*, char const*, void*)> const&)':
utils.cc:(.text+0xa5d): relocation truncated to fit: R_X86_64_PC32 against `.bss'
utils.cc:(.text+0xa6c): relocation truncated to fit: R_X86_64_PC32 against `.bss'
utils.cc:(.text+0xb48): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vsnprintf@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
utils.cc:(.text+0xd86): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
utils.cc:(.text+0xeab): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__pthread_key_create@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libpthread.so.0
utils.cc:(.text+0x1665): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x169d): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x16e0): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x1724): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_streambuf<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
utils.cc:(.text+0x1742): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax

Enabling -mcmodel=large to use 64bit relocation, the failure is moved to a later stage:

libmxnet.a(utils.cc.o):(.eh_frame+0x6c): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0xb8): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPfPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.1'
libmxnet.a(utils.cc.o):(.eh_frame+0xe8): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPfPlSA_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.2'
libmxnet.a(utils.cc.o):(.eh_frame+0x118): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0x164): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPdPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.4'
libmxnet.a(utils.cc.o):(.eh_frame+0x194): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPdPlSA_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.5'
libmxnet.a(utils.cc.o):(.eh_frame+0x1e4): relocation truncated to fit: R_X86_64_PC32 against `.text'
libmxnet.a(utils.cc.o):(.eh_frame+0x21c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPNS5_4half6half_tEPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.7'
libmxnet.a(utils.cc.o):(.eh_frame+0x24c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPNS5_4half6half_tEPlSC_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.8'
libmxnet.a(utils.cc.o):(.eh_frame+0x27c): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax

And when setting -Wl,--no-relax, we get back to the state reported by CI at http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-17031/runs/6/nodes/52/steps/84/log/?start=0 (which builds with clang, unlike my build here with gcc).

/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In function `_start':
(.text+0x12): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__libc_csu_fini' defined in .text section in /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
(.text+0x19): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `__libc_csu_init' defined in .text section in /usr/lib/x86_64-linux-gnu/libc_nonshared.a(elf-init.oS)
(.text+0x20): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `main' defined in .text.startup section in tests/CMakeFiles/mxnet_unit_tests.dir/cpp/test_main.cc.o
(.text+0x26): relocation truncated to fit: R_X86_64_GOTPCRELX against symbol `__libc_start_main@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text._ZNKSt5ctypeIcE8do_widenEc'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x48): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN7testing8internal15TestFactoryImplI38ContextHashTest_ContextHashUnique_TestED2Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0x5c): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN7testing8internal15TestFactoryImplI38ContextHashTest_ContextHashUnique_TestED0Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0xc0): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN38ContextHashTest_ContextHashUnique_TestD2Ev'
tests/CMakeFiles/mxnet_unit_tests.dir/cpp/misc/base.cc.o:(.eh_frame+0xdc): additional relocation overflows omitted from the output
tests/mxnet_unit_tests: PC-relative offset overflow in PLT entry for `cudnnBatchNormalizationForwardInference@@libcudnn.so.7'

hubutui · 2020-03-14T15:46:55Z

Any updates? I run into similiar issues recently.

sxjscience · 2020-03-23T06:29:34Z

I ran into similar issue with the latest master.

schliffen · 2020-04-02T17:14:12Z

I run into the same issue with the latest master.

BogdanovKirill · 2020-05-26T07:05:08Z

Same issue

leezu · 2020-05-26T17:31:04Z

@ptrendx is working on a fix (cf #18280 (comment))

ghost · 2020-05-31T14:26:05Z

Same issue on lastest master branch.

armdebugger · 2020-06-02T14:07:51Z

met the same issue
any fix or workaround for it?
I tried master branch, v1.4.x, v1.5.x, got the same result
Environment:
Ubuntu 18.04
GCC 7.6
CUDA 10.2
CUDNN 7.6.5

leezu · 2020-06-02T17:33:44Z

Set -DMXNET_CUDA_ARCH=7.0 or whatever arch you're targeting as workaround.

armdebugger · 2020-06-03T02:06:31Z

thanks leezu
build success by setting the CUDA_ARCH

zasdfgbnm · 2020-06-17T20:05:45Z

We get the same issue on PyTorch on CUDA 11 recently pytorch/pytorch#39968

eric-haibin-lin · 2020-08-09T04:31:09Z

Happened again for the cu101 build: https://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-release-job/1525/execution/node/177/log/

szha · 2020-08-09T04:54:10Z

@eric-haibin-lin that pipeline isn't the one that produces the nightly builds. ~~Currently the nightly builds for cu101 has stopped because MXNet follows the NVIDIA's supporting strategy on CUDA, which is only the latest two major and minor versions.~~
The nightly build was failing due to a recent change, which has been reverted.

wms2537 · 2020-10-03T12:20:03Z

Problem still exist when building on Jetson NX

[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/matrix_op.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/ordering_op.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/ravel.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/sparse_retain.cu.o
[ 97%] Building CUDA object CMakeFiles/mxnet.dir/src/operator/tensor/square_sum.cu.o
[ 97%] Linking CXX shared library libmxnet.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_handler<void (unsigned int, std::ostream&), nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&, std::ostream&)':
print_graph_ir.cc:(.text+0x1c4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x204): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_handler<void (unsigned int, std::ostream&), nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}>::_M_invoke(std::_Any_data const&, unsigned int&&, std::ostream&)':
print_graph_ir.cc:(.text+0x2b4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_bad_function_call()@@GLIBCXX_3.4.14' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}> const&, std::_Manager_operation)':
print_graph_ir.cc:(.text+0x318): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x324): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*, unsigned long)@@CXXABI_1.3.9' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x34c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator new(unsigned long)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x414): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator new(unsigned long)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x46c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x478): relocation truncated to fit: R_AARCH64_CALL26 against symbol `operator delete(void*, unsigned long)@@CXXABI_1.3.9' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x480): relocation truncated to fit: R_AARCH64_CALL26 against symbol `_Unwind_Resume@@GCC_3.0' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libgcc_s.so
print_graph_ir.cc:(.text+0x4a0): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
CMakeFiles/mxnet.dir/build.make:9471: recipe for target 'libmxnet.so' failed
make[2]: *** [libmxnet.so] Error 1
CMakeFiles/Makefile2:740: recipe for target 'CMakeFiles/mxnet.dir/all' failed
make[1]: *** [CMakeFiles/mxnet.dir/all] Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2

Here's my cmake config

set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(CUDACXX "/usr/local/cuda-10.2/bin/nvcc" CACHE STRING "Cuda compiler")
set(MXNET_CUDA_ARCH "7.2" CACHE STRING "Cuda architectures")

leezu · 2020-10-05T17:00:52Z

@wms2537 did you include #19123 ?

wms2537 · 2020-10-05T22:40:40Z

Isn't it turned on by default, I used the code pulled from master, the problem still exists. I can compile it on normal pc but not on jetson.

leezu · 2020-10-05T23:09:08Z

Please paste the full cmake configure log. Also note that your Jetson uses AARCH64 and not X86 arch. The code memory model is different to X86 and compiler support generally much worse than on X86 (for example, if position independent code is required, gcc / clang may not implement anything but the default model, thus limiting the size of binary and causing relocation issue above).

We do test compiling MXNet on the Jetson AARCH64 architecture (https://github.com/apache/incubator-mxnet/blob/master/ci/docker/Dockerfile.build.jetson), so in principle things should work and we just need to figure out how your environment differs from the tested one.

wms2537 · 2020-10-06T08:52:24Z

Here's the cmake output:

-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.17.3' using generator 'Unix Makefiles'
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Could NOT find MKL (missing: MKL_INCLUDE_DIR MKL_INTEL_LP64_LIBRARY MKL_INTEL_THREAD_LIBRARY MKL_CORE_LIBRARY IOMP_LIBRARY) 
-- Found OpenBLAS libraries: /usr/lib/aarch64-linux-gnu/libopenblas.so
-- Found OpenBLAS include: /usr/include/aarch64-linux-gnu
-- Found OpenCV: /usr (found version "4.1.1") found components: core highgui imgproc imgcodecs 
-- OpenCV 4.1.1 found (/usr/lib/aarch64-linux-gnu/cmake/opencv4)
--  OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
USE_LAPACK is ON
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
  VERSION keyword not followed by a value or was followed by a value that
  expanded to nothing.


-- Found PythonInterp: /usr/bin/python (found version "2.7.17") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found GTest: gtest  
-- Found CUDNN: /usr/lib/aarch64-linux-gnu/libcudnn.so  
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Searching 16 bit integer - Using unsigned short
-- Check if the system is big endian - little endian
-- /home/chkl/mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Failed
-- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_72,code=sm_72
-- Found CUDAToolkit: /usr/local/cuda/include (found version "10.2.89") 
-- Found NVML: /usr/local/cuda/include  
-- Found NVML (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/stubs/libnvidia-ml.so)
-- Found Python3: /usr/bin/python3.6 (found version "3.6.9") found components: Interpreter 
-- CUDA: Adding NVCC options: --fatbin-options --compress-all
-- Configuring done
-- Generating done
-- Build files have been written to: /home/chkl/mxnet/build

leezu · 2020-10-06T17:50:00Z

Could you try matching the following build configuration (modulo DCMAKE_TOOLCHAIN_FILE and the CUDA version)

https://github.com/apache/incubator-mxnet/blob/db171a89c5e7e0d651ae1578bd8ae8da953417cc/ci/docker/runtime_functions.sh#L140-L155

Ie. our test suite builds for jetson without opencv and without lapack feature. You may also want to try ensure that you specify the cmake -DCMAKE_BUILD_TYPE=Release option when configuring the build.

wms2537 · 2020-10-07T10:28:45Z

Still the same:

io.cc:(.text+0xa8): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)@@GLIBCXX_3.4.21' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0xc0): relocation truncated to fit: R_AARCH64_CALL26 against symbol `memcpy@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0xd8): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__stack_chk_fail@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0xe4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__throw_logic_error(char const*)@@GLIBCXX_3.4' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
3rdparty/dmlc-core/libdmlc.a(io.cc.o): In function `dmlc::io::FileSystem::GetInstance(dmlc::io::URI const&)':
io.cc:(.text+0x118): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const' defined in .text._ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc[_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/core/symbolic.cc.o
io.cc:(.text+0x168): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_guard_acquire@@CXXABI_1.3' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0x18c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_guard_release@@CXXABI_1.3' defined in .text section in /usr/lib/gcc/aarch64-linux-gnu/7/libstdc++.so
io.cc:(.text+0x1a4): relocation truncated to fit: R_AARCH64_CALL26 against symbol `__cxa_atexit@@GLIBC_2.17' defined in .text section in /lib/aarch64-linux-gnu/libc.so.6
io.cc:(.text+0x1c0): relocation truncated to fit: R_AARCH64_CALL26 against symbol `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(char const*) const' defined in .text._ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc[_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/core/symbolic.cc.o
io.cc:(.text+0x1dc): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dmlc::LogMessageFatal::LogMessageFatal(char const*, int)' defined in .text._ZN4dmlc15LogMessageFatalC2EPKci[_ZN4dmlc15LogMessageFatalC5EPKci] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/c_api/c_api_graph.cc.o
io.cc:(.text+0x1f0): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Here's my cmake log:

-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_PROCESSOR aarch64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.17.3' using generator 'Ninja'
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found OpenBLAS libraries: /usr/lib/aarch64-linux-gnu/libopenblas.so
-- Found OpenBLAS include: /usr/include/aarch64-linux-gnu
-- OpenCV Disabled
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
  VERSION keyword not followed by a value or was followed by a value that
  expanded to nothing.


-- Found PythonInterp: /usr/bin/python (found version "2.7.17") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found GTest: gtest  
-- Found CUDNN: /usr/lib/aarch64-linux-gnu/libcudnn.so  
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Searching 16 bit integer - Using unsigned short
-- Check if the system is big endian - little endian
-- /home/chkl/mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Failed
-- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_52,code=sm_52
-- Found CUDAToolkit: /usr/local/cuda/include (found version "10.2.89") 
-- Found NVML: /usr/local/cuda/include  
-- Found NVML (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/stubs/libnvidia-ml.so)
-- Found Python3: /usr/bin/python3.6 (found version "3.6.9") found components: Interpreter 
-- CUDA: Adding NVCC options: --fatbin-options --compress-all
CMake Warning at CMakeLists.txt:839 (message):
  OpenCV_VERSION_MAJOR: , version 3 with imgcodecs is required for im2rec,
  im2rec will not be available


-- Configuring done
-- Generating done
-- Build files have been written to: /home/chkl/mxnet/build

leezu · 2020-10-07T17:15:55Z

Please ensure your system toolchain is up to date (includes https://bugzilla.redhat.com/show_bug.cgi?id=1243559 fix)

You may also simply use the cross-compilation option by installing the cross-toolchain on your host system analogous to

https://github.com/apache/incubator-mxnet/blob/95f5cc60904a2d88d4861fff0f6dbad15f8cdbe3/ci/docker/Dockerfile.build.jetson#L41-L91

wms2537 · 2020-10-08T13:57:15Z

I think my system toolchain is up to date, I am using jetpack 4.3. If not, how to update system toolchain?

leezu · 2020-10-08T15:55:24Z

The binutils is not part of jetpack. It is part of the operating system. You can check what package version is provided by the operating system used by your device.

With repsect to jetpack, we recommend you update to 4.4, as this is the version tested by our CI.
If you still face problems, I really recommend you follow the cross-compilation approach as it is much faster and is tested by our CI server.

leezu · 2020-10-08T18:41:47Z

cc @TristonC @mseth10 do you have any recommendations for @wms2537's issues on Jetson NX device?

wms2537 · 2020-10-09T01:56:42Z

After some testing, I finally managed to build it. I updated ccache and openblas similar to

Please ensure your system toolchain is up to date (includes https://bugzilla.redhat.com/show_bug.cgi?id=1243559 fix)

You may also simply use the cross-compilation option by installing the cross-toolchain on your host system analogous to

https://github.com/apache/incubator-mxnet/blob/95f5cc60904a2d88d4861fff0f6dbad15f8cdbe3/ci/docker/Dockerfile.build.jetson#L41-L91

Then, I restarted the jetson and built it with these commands

Could you try matching the following build configuration (modulo DCMAKE_TOOLCHAIN_FILE and the CUDA version)

https://github.com/apache/incubator-mxnet/blob/db171a89c5e7e0d651ae1578bd8ae8da953417cc/ci/docker/runtime_functions.sh#L140-L155

Ie. our test suite builds for jetson without opencv and without lapack feature. You may also want to try ensure that you specify the cmake -DCMAKE_BUILD_TYPE=Release option when configuring the build.

I also added a 8GB swap so that I can build with all 6 cores.

Based on the changes above, I don't know which is the main cause that solved the issue. Thanks @leezu for your help.

cloudlakecho · 2023-02-25T19:00:53Z

@wms2537 Thanks for sharing your tip.
You listed several changes. Are they applied at cmake or make step?

Would you mind sharing your "CMakeLists.txt" (if you modified) or modified command at the make step?

I tried to build in Nvidia Jetson (AGX Orin) and am also having the same error of

additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/mxnet.dir/build.make:11134: libmxnet.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:645: CMakeFiles/mxnet.dir/all] Error 2
make: *** [Makefile:141: all] Error 2
``` at 98% at 'make' step. 

Runtime environment: Ubuntu 20.04, JetPack 5.0 (R 34),  CUDA 11.4

leezu added the Bug label Dec 11, 2019

szha mentioned this issue Apr 3, 2020

CI fails with PC-relative offset overflow in PLT entry #15971

Closed

Jopyth mentioned this issue Apr 6, 2020

libmxnet.so: PC-relative offset overflow in PLT entry for `_Z hpi-xnor/BMXNet-v2-wiki#2

Closed

leezu assigned ptrendx May 26, 2020

zasdfgbnm mentioned this issue Jun 12, 2020

/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax pytorch/pytorch#39968

Closed

ptrendx mentioned this issue Jun 25, 2020

Use RTC for elementwise and broadcast ops #18622

Merged

7 tasks

leezu mentioned this issue Aug 6, 2020

MXNet nightly build linker error #18861

Closed

DickJC123 mentioned this issue Sep 11, 2020

Add cmake flag USE_FATBIN_COMPRESSION, ON by default #19123

Merged

dmikushin mentioned this issue Jul 13, 2022

Relocation truncated to fit: R_X86_64_PC32 against symbol defined in .hipFatBinSegment section ROCm/ROCm#1765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relocation truncation issues #17045

Relocation truncation issues #17045

leezu commented Dec 11, 2019 •

edited

Loading

leezu commented Dec 11, 2019 •

edited

Loading

junrushao commented Dec 11, 2019

leezu commented Dec 11, 2019

leezu commented Dec 11, 2019 •

edited

Loading

hubutui commented Mar 14, 2020

sxjscience commented Mar 23, 2020

schliffen commented Apr 2, 2020

BogdanovKirill commented May 26, 2020

leezu commented May 26, 2020

ghost commented May 31, 2020

armdebugger commented Jun 2, 2020

leezu commented Jun 2, 2020

armdebugger commented Jun 3, 2020

zasdfgbnm commented Jun 17, 2020

eric-haibin-lin commented Aug 9, 2020

szha commented Aug 9, 2020 •

edited

Loading

wms2537 commented Oct 3, 2020

leezu commented Oct 5, 2020

wms2537 commented Oct 5, 2020

leezu commented Oct 5, 2020 •

edited

Loading

wms2537 commented Oct 6, 2020

leezu commented Oct 6, 2020 •

edited

Loading

wms2537 commented Oct 7, 2020

leezu commented Oct 7, 2020

wms2537 commented Oct 8, 2020

leezu commented Oct 8, 2020 •

edited

Loading

leezu commented Oct 8, 2020

wms2537 commented Oct 9, 2020

cloudlakecho commented Feb 25, 2023

Relocation truncation issues #17045

Relocation truncation issues #17045

Comments

leezu commented Dec 11, 2019 • edited Loading

Description

Error Message

To Reproduce

Environment

leezu commented Dec 11, 2019 • edited Loading

junrushao commented Dec 11, 2019

leezu commented Dec 11, 2019

leezu commented Dec 11, 2019 • edited Loading

hubutui commented Mar 14, 2020

sxjscience commented Mar 23, 2020

schliffen commented Apr 2, 2020

BogdanovKirill commented May 26, 2020

leezu commented May 26, 2020

ghost commented May 31, 2020

armdebugger commented Jun 2, 2020

leezu commented Jun 2, 2020

armdebugger commented Jun 3, 2020

zasdfgbnm commented Jun 17, 2020

eric-haibin-lin commented Aug 9, 2020

szha commented Aug 9, 2020 • edited Loading

wms2537 commented Oct 3, 2020

leezu commented Oct 5, 2020

wms2537 commented Oct 5, 2020

leezu commented Oct 5, 2020 • edited Loading

wms2537 commented Oct 6, 2020

leezu commented Oct 6, 2020 • edited Loading

wms2537 commented Oct 7, 2020

leezu commented Oct 7, 2020

wms2537 commented Oct 8, 2020

leezu commented Oct 8, 2020 • edited Loading

leezu commented Oct 8, 2020

wms2537 commented Oct 9, 2020

cloudlakecho commented Feb 25, 2023

leezu commented Dec 11, 2019 •

edited

Loading

leezu commented Dec 11, 2019 •

edited

Loading

leezu commented Dec 11, 2019 •

edited

Loading

szha commented Aug 9, 2020 •

edited

Loading

leezu commented Oct 5, 2020 •

edited

Loading

leezu commented Oct 6, 2020 •

edited

Loading

leezu commented Oct 8, 2020 •

edited

Loading