Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MXNet Master build for CUDA with DEBUG=1 failing #14263

Open
access2rohit opened this issue Feb 27, 2019 · 16 comments
Open

MXNet Master build for CUDA with DEBUG=1 failing #14263

access2rohit opened this issue Feb 27, 2019 · 16 comments

Comments

@access2rohit
Copy link
Contributor

access2rohit commented Feb 27, 2019

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.

For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io

Description

MXNet Master build for CUDA with DEBUG=1 failing

Environment info (Required)

AWS Base DLAMI (ubuntu 16.04) on a p2.8xlarge

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

ubuntu@ip-172-31-82-110 ~/Workspace/mxnet (master) $ python tools/diagnose.py
----------Python Info----------
Version      : 3.6.4
Compiler     : GCC 7.2.0
Build        : ('default', 'Jan 16 2018 18:10:19')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.0
Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Version      : 1.5.0
Directory    : /home/ubuntu/Workspace/mxnet/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-1075-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-82-110
release      : 4.4.0-1075-aws
version      : #85-Ubuntu SMP Thu Jan 17 17:15:12 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2702.320
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.14
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0023 sec, LOAD: 0.5347 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0829 sec, LOAD: 0.0508 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0959 sec, LOAD: 0.4202 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0093 sec, LOAD: 0.5746 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0023 sec, LOAD: 0.0993 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0016 sec, LOAD: 0.0276 sec.

Output of nvidia-smi:

ubuntu@ip-172-31-82-110 ~/Workspace/mxnet (master) $ nvidia-smi
Wed Feb 27 18:49:07 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:17.0 Off |                    0 |
| N/A   30C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 00000000:00:18.0 Off |                    0 |
| N/A   26C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 00000000:00:19.0 Off |                    0 |
| N/A   31C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 00000000:00:1A.0 Off |                    0 |
| N/A   29C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           On   | 00000000:00:1B.0 Off |                    0 |
| N/A   32C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           On   | 00000000:00:1C.0 Off |                    0 |
| N/A   28C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           On   | 00000000:00:1D.0 Off |                    0 |
| N/A   30C    P8    25W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   28C    P8    30W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


Package used (Python/R/Scala/Julia):
(I'm using ...) Python

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):
gcc 5.4
g++ 5.4

MXNet commit hash:
(Paste the output of git rev-parse HEAD here.)
0af40f7

Build config:
(Paste the content of config.mk, or the build command.)
USE_CUDA=1
USE_CUDNN=1
USE_LAPACK=1
USE_BLAS = openblas
USE_OPENCV=1
USE_CUDA_PATH = /usr/local/cuda
DEBUG=1

/usr/local/cuda -> /usr/local/cuda-9.0

Error Message:

(Paste the complete error message, including stack trace.)

src/lib/x86_64-linux-gnu/libopencv_ocl.so -lopencv_ocl /usr/lib/x86_64-linux-gnu/libopencv_photo.so -lopencv_photo /usr/lib/x86_64-linux-gnu/libopencv_stitching.so -lopencv_stitching /usr/lib/x86_64-linux-gnu/libopencv_superres.so -lopencv_superres /usr/lib/x86_64-linux-gnu/libopencv_ts.so /usr/lib/x86_64-linux-gnu/libopencv_video.so -lopencv_video /usr/lib/x86_64-linux-gnu/libopencv_videostab.so -lopencv_videostab -llapack -lcudnn  -lcufft -lcuda -lnvrtc -L/usr/local/cuda/lib64/stubs
ar: lib/libmxnet.a: File truncated
Makefile:513: recipe for target 'lib/libmxnet.a' failed
make: *** [lib/libmxnet.a] Error 1
make: *** Waiting for unfinished jobs....
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in lib/libmxnet.so
crtstuff.c:(.text+0x1e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_deregisterTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `register_tm_clones':
crtstuff.c:(.text+0x43): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x4a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in lib/libmxnet.so
crtstuff.c:(.text+0x6b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `_ITM_registerTMCloneTable'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x92): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x9c): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
crtstuff.c:(.text+0xaa): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o
crtstuff.c:(.text+0xbb): additional relocation overflows omitted from the output
lib/libmxnet.so: PC-relative offset overflow in PLT entry for `_ZNSt10_Iter_baseIPaLb0EE7_S_baseES0_'
collect2: error: ld returned 1 exit status
Makefile:517: recipe for target 'lib/libmxnet.so' failed
make: *** [lib/libmxnet.so] Error 1
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x8): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `register_tm_clones':
crtstuff.c:(.text+0x49): relocation truncated to fit: R_X86_64_32S against `.tm_clone_table'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x82): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x95): relocation truncated to fit: R_X86_64_PC32 against `.bss'
/home/ubuntu/Workspace/mxnet/3rdparty/dmlc-core/libdmlc.a(io.o): In function `dmlc::io::FileSystem::GetInstance(dmlc::io::URI const&)':

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Ubuntu 16.04 with K80 GPU or V100 having CUDA9.0 or 9.2 and CUDNN 7.5
  2. follow step given here: https://mxnet.incubator.apache.org/versions/master/install/ubuntu_setup.html#build-mxnet-from-source for "GPU OpenCV and OpenBLAS" to build from source

What have you tried to solve it?

  1. Tried different compiler version gcc 4.8 and g++ 4.8
  2. Tried using openCV 3.4
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Cuda, Build

@aaronmarkham
Copy link
Contributor

I tried this on a p3.2xlarge and a p3.16xlarge using a DLAMI Base and saw the same error both times. Running with DEBUG off will build fine.

@MaxBareiss
Copy link

I have the same issue on PPC64LE using master (c319ae5), built with CMake on Linux. I don't have this issue with 1.4.0.

@access2rohit
Copy link
Contributor Author

access2rohit commented Mar 4, 2019

Other Approaches tried:

  1. ubuntu 16.04 GPU MXNet 1.4.x using Make, gcc 4.8 and g++ 4.8
    USE_CUDA=1
    USE_CUDNN=1
    USE_LAPACK=1
    USE_BLAS = openblas
    USE_OPENCV=1
    USE_CUDA_PATH = /usr/local/cuda
    DEBUG=1
    /usr/local/cuda -> /usr/local/cuda-9.0

  2. ubuntu 16.04 GPU MXNet 1.4.x using Make, gcc 5.4 and g++ 5.4
    USE_CUDA=1
    USE_CUDNN=1
    USE_LAPACK=1
    USE_BLAS = openblas
    USE_OPENCV=1
    USE_CUDA_PATH = /usr/local/cuda
    DEBUG=1
    /usr/local/cuda -> /usr/local/cuda-9.0

  3. ubuntu 16.04 GPU MXNet 1.4.x using Make, gcc 4.8 and g++ 4.8
    USE_CUDA=1
    USE_CUDNN=1
    USE_LAPACK=1
    USE_BLAS = openblas
    USE_OPENCV=1
    USE_CUDA_PATH = /usr/local/cuda
    DEBUG=1
    /usr/local/cuda -> /usr/local/cuda-9.2

  4. ubuntu 16.04 GPU MXNet 1.4.x using Make, gcc 5.4 and g++ 5.4
    USE_CUDA=1
    USE_CUDNN=1
    USE_LAPACK=1
    USE_BLAS = openblas
    USE_OPENCV=1
    USE_CUDA_PATH = /usr/local/cuda
    DEBUG=1
    /usr/local/cuda -> /usr/local/cuda-9.2

  5. ubuntu 16.04 GPU MXNet 1.4.x using Make, gcc 4.8 and g++ 4.8
    USE_CUDA=1
    USE_CUDNN=1
    USE_LAPACK=1
    USE_BLAS = openblas
    USE_OPENCV=0
    USE_CUDA_PATH = /usr/local/cuda
    DEBUG=1
    /usr/local/cuda -> /usr/local/cuda-9.0

  6. ubuntu 16.04 GPU MXNet 1.4.x using Make, gcc 5.4 and g++ 5.4
    USE_CUDA=1
    USE_CUDNN=1
    USE_LAPACK=1
    USE_BLAS = openblas
    USE_OPENCV=0
    USE_CUDA_PATH = /usr/local/cuda
    DEBUG=1
    /usr/local/cuda -> /usr/local/cuda-9.0

  7. ubuntu 16.04 GPU MXNet 1.4.x using Make, gcc 4.8 and g++ 4.8
    USE_CUDA=1
    USE_CUDNN=1
    USE_LAPACK=1
    USE_BLAS = openblas
    USE_OPENCV=0
    USE_CUDA_PATH = /usr/local/cuda
    DEBUG=1
    /usr/local/cuda -> /usr/local/cuda-9.2

  8. ubuntu 16.04 GPU MXNet 1.4.x using Make, gcc 5.4 and g++ 5.4
    USE_CUDA=1
    USE_CUDNN=1
    USE_LAPACK=1
    USE_BLAS = openblas
    USE_OPENCV=0
    USE_CUDA_PATH = /usr/local/cuda
    DEBUG=1
    /usr/local/cuda -> /usr/local/cuda-9.2

@access2rohit
Copy link
Contributor Author

@mxnet-label-bot add [bug][build]

@marcoabreu marcoabreu added the Bug label Mar 5, 2019
@access2rohit
Copy link
Contributor Author

@mxnet-label-bot add [build]

@sheep94lion
Copy link

I met the same problem.

----------Python Info----------
('Version      :', '2.7.16')
('Compiler     :', 'GCC 7.3.0')
('Build        :', ('default', 'Mar 14 2019 21:00:58'))
('Arch         :', ('64bit', ''))
------------Pip Info-----------
('Version      :', '19.1.1')
('Directory    :', '/home/yizhao/anaconda3/envs/python27/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
An error occured trying to import mxnet.
This is very likely due to missing missing or incompatible library files.
Traceback (most recent call last):
  File "diagnose.py", line 103, in check_mxnet
    import mxnet
  File "/home/yizhao/Code/mxnet-dev/python/mxnet/__init__.py", line 24, in <module>
    from .context import Context, current_context, cpu, gpu, cpu_pinned
  File "/home/yizhao/Code/mxnet-dev/python/mxnet/context.py", line 24, in <module>
    from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
  File "/home/yizhao/Code/mxnet-dev/python/mxnet/base.py", line 213, in <module>
    _LIB = _load_lib()
  File "/home/yizhao/Code/mxnet-dev/python/mxnet/base.py", line 203, in _load_lib
    lib_path = libinfo.find_lib_path()
  File "/home/yizhao/Code/mxnet-dev/python/mxnet/libinfo.py", line 74, in find_lib_path
    'List of candidates:\n' + str('\n'.join(dll_path)))
RuntimeError: Cannot find the MXNet library.
List of candidates:
libmxnet.so
/home/yizhao/Code/mxnet/3rdparty/mkldnn/external/mklml_lnx_2019.0.5.20190502/lib/libmxnet.so
/home/yizhao/Code/mxnet_pop/3rdparty/mkldnn/build/install/lib/libmxnet.so
/usr/lib/cuda/lib64/libmxnet.so
/home/yizhao/Code/mxnet-dev/python/mxnet/libmxnet.so
/home/yizhao/Code/mxnet-dev/python/mxnet/../../lib/libmxnet.so
/home/yizhao/Code/mxnet-dev/python/mxnet/../../build/libmxnet.so
../../../libmxnet.so

----------System Info----------
('Platform     :', 'Linux-4.18.0-21-generic-x86_64-with-debian-buster-sid')
('system       :', 'Linux')
('node         :', 'pop-os')
('release      :', '4.18.0-21-generic')
('version      :', '#22-Ubuntu SMP Wed May 15 13:13:21 UTC 2019')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Stepping:            10
CPU MHz:             3700.339
CPU max MHz:         4100.0000
CPU min MHz:         800.0000
BogoMIPS:            4416.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            9216K
NUMA node0 CPU(s):   0-11
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0048 sec, LOAD: 1.6375 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0068 sec, LOAD: 5.5277 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1981 sec, LOAD: 2.0450 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.1868 sec, LOAD: 1.2388 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.6114 sec, LOAD: 1.8466 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 1.0332 sec, LOAD: 1.8352 sec.

my config.mk

# whether compile with options for MXNet developer
DEV = 0

# whether compile with debug
DEBUG = 1

# whether to turn on segfault signal handler to log the stack trace
USE_SIGNAL_HANDLER = 1

USE_PROFILER = 1

# the additional link flags you want to add
ADD_LDFLAGS =

# the additional compile flags you want to add
ADD_CFLAGS =

#---------------------------------------------
# matrix computation libraries for CPU/GPU
#---------------------------------------------

# whether use CUDA during compile
USE_CUDA = 1

# add the path to CUDA library to link and compile flag
# if you have already add them to environment variable, leave it as NONE
# USE_CUDA_PATH = /usr/local/cuda
USE_CUDA_PATH = /usr/lib/cuda

# whether to enable CUDA runtime compilation
ENABLE_CUDA_RTC = 1

# whether use CuDNN R3 library
USE_CUDNN = 1

The output of make:

Makefile:219: "USE_LAPACK disabled because libraries were not found"
Makefile:345: WARNING: Significant performance increases can be achieved by installing and enabling gperftools or jemalloc development packages
INFO: nvcc was not found on your path
INFO: Using /usr/lib/cuda/bin/nvcc as nvcc path
Running CUDA_ARCH: -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=[sm_70,compute_70] --fatbin-options -compress-all
cd /home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core; make libdmlc.a USE_SSE=1 config=/home/yizhao/Code/mxnet-dev/config.mk; cd /home/yizhao/Code/mxnet-dev
make[1]: Entering directory '/home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core'
make[1]: 'libdmlc.a' is up to date.
make[1]: Leaving directory '/home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core'
g++ -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -g -O0 -D_GLIBCXX_ASSERTIONS -I/home/yizhao/Code/mxnet-dev/3rdparty/mshadow/ -I/home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core/include -fPIC -I/home/yizhao/Code/mxnet-dev/3rdparty/tvm/nnvm/include -I/home/yizhao/Code/mxnet-dev/3rdparty/dlpack/include -I/home/yizhao/Code/mxnet-dev/3rdparty/tvm/include -Iinclude -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -I/usr/lib/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_SIGNAL_HANDLER=1 -DMXNET_USE_OPENCV=0 -fopenmp -DMXNET_USE_OPERATOR_TUNING=1 -DMSHADOW_INT64_TENSOR_SIZE=0 -DMSHADOW_USE_CUDNN=1  -I/home/yizhao/Code/mxnet-dev/3rdparty/nvidia_cub -DMXNET_ENABLE_CUDA_RTC=1 -DMXNET_USE_NCCL=0 -DMXNET_USE_LIBJPEG_TURBO=0 -shared -o lib/libmxnet.so build/src/operator/nn/mkldnn/mkldnn_pooling.o build/src/operator/nn/mkldnn/mkldnn_convolution.o build/src/operator/nn/mkldnn/mkldnn_concat.o build/src/operator/nn/mkldnn/mkldnn_base.o build/src/operator/nn/mkldnn/mkldnn_slice.o build/src/operator/nn/mkldnn/mkldnn_reshape.o build/src/operator/nn/mkldnn/mkldnn_act.o build/src/operator/nn/mkldnn/mkldnn_softmax.o build/src/operator/nn/mkldnn/mkldnn_deconvolution.o build/src/operator/nn/mkldnn/mkldnn_copy.o build/src/operator/nn/mkldnn/mkldnn_softmax_output.o build/src/operator/nn/mkldnn/mkldnn_fully_connected.o build/src/operator/nn/mkldnn/mkldnn_transpose.o build/src/operator/nn/mkldnn/mkldnn_sum.o build/src/operator/nn/cudnn/cudnn_algoreg.o build/src/operator/nn/cudnn/cudnn_batch_norm.o build/src/operator/quantization/mkldnn/mkldnn_quantized_elemwise_add.o build/src/operator/quantization/mkldnn/mkldnn_quantized_conv.o build/src/operator/quantization/mkldnn/mkldnn_quantized_act.o build/src/operator/quantization/mkldnn/mkldnn_quantized_fully_connected.o build/src/operator/quantization/mkldnn/mkldnn_quantized_pooling.o build/src/operator/quantization/mkldnn/mkldnn_quantized_concat.o build/src/operator/subgraph/mkldnn/mkldnn_subgraph_property.o build/src/operator/subgraph/mkldnn/mkldnn_conv.o build/src/operator/subgraph/mkldnn/mkldnn_fc.o build/src/operator/subgraph/tensorrt/tensorrt.o build/src/operator/subgraph/tensorrt/onnx_to_tensorrt.o build/src/operator/subgraph/tensorrt/nnvm_to_onnx.o build/src/operator/nnpack/nnpack_util.o build/src/operator/custom/native_op.o build/src/operator/custom/ndarray_op.o build/src/operator/custom/custom.o build/src/operator/image/crop.o build/src/operator/image/image_random.o build/src/operator/image/resize.o build/src/operator/contrib/multibox_target.o build/src/operator/contrib/dgl_graph.o build/src/operator/contrib/count_sketch.o build/src/operator/contrib/nnz.o build/src/operator/contrib/gradient_multiplier_op.o build/src/operator/contrib/adamw.o build/src/operator/contrib/optimizer_op.o build/src/operator/contrib/bilinear_resize.o build/src/operator/contrib/multibox_detection.o build/src/operator/contrib/roi_align.o build/src/operator/contrib/deformable_psroi_pooling.o build/src/operator/contrib/fft.o build/src/operator/contrib/multibox_prior.o build/src/operator/contrib/hawkes_ll.o build/src/operator/contrib/quadratic_op.o build/src/operator/contrib/transformer.o build/src/operator/contrib/all_finite.o build/src/operator/contrib/index_array.o build/src/operator/contrib/multi_proposal.o build/src/operator/contrib/index_copy.o build/src/operator/contrib/krprod.o build/src/operator/contrib/bounding_box.o build/src/operator/contrib/rpn_inv_normalize_op.o build/src/operator/contrib/proposal.o build/src/operator/contrib/amp_graph_pass.o build/src/operator/contrib/boolean_mask.o build/src/operator/contrib/psroi_pooling.o build/src/operator/contrib/deformable_convolution.o build/src/operator/contrib/ifft.o build/src/operator/contrib/sync_batch_norm.o build/src/operator/contrib/adaptive_avg_pooling.o build/src/operator/random/sample_multinomial_op.o build/src/operator/random/multisample_op.o build/src/operator/random/unique_sample_op.o build/src/operator/random/sample_op.o build/src/operator/random/shuffle_op.o build/src/operator/tensor/elemwise_binary_broadcast_op_extended.o build/src/operator/tensor/square_sum.o build/src/operator/tensor/elemwise_binary_op_basic.o build/src/operator/tensor/dot.o build/src/operator/tensor/init_op.o build/src/operator/tensor/elemwise_sum.o build/src/operator/tensor/la_op.o build/src/operator/tensor/histogram.o build/src/operator/tensor/broadcast_reduce_op_index.o build/src/operator/tensor/elemwise_binary_op.o build/src/operator/tensor/elemwise_binary_scalar_op_basic.o build/src/operator/tensor/elemwise_scatter_op.o build/src/operator/tensor/elemwise_binary_scalar_op_extended.o build/src/operator/tensor/elemwise_binary_broadcast_op_basic.o build/src/operator/tensor/elemwise_unary_op_basic.o build/src/operator/tensor/sparse_retain.o build/src/operator/tensor/amp_cast.o build/src/operator/tensor/ordering_op.o build/src/operator/tensor/indexing_op.o build/src/operator/tensor/elemwise_binary_broadcast_op_logic.o build/src/operator/tensor/broadcast_reduce_op_value.o build/src/operator/tensor/elemwise_binary_op_logic.o build/src/operator/tensor/control_flow_op.o build/src/operator/tensor/elemwise_binary_op_extended.o build/src/operator/tensor/matrix_op.o build/src/operator/tensor/diag_op.o build/src/operator/tensor/ravel.o build/src/operator/tensor/elemwise_binary_scalar_op_logic.o build/src/operator/tensor/cast_storage.o build/src/operator/tensor/elemwise_unary_op_trig.o build/src/operator/nn/moments.o build/src/operator/nn/pooling.o build/src/operator/nn/deconvolution.o build/src/operator/nn/activation.o build/src/operator/nn/upsampling.o build/src/operator/nn/ctc_loss.o build/src/operator/nn/fully_connected.o build/src/operator/nn/convolution.o build/src/operator/nn/softmax.o build/src/operator/nn/lrn.o build/src/operator/nn/layer_norm.o build/src/operator/nn/concat.o build/src/operator/nn/softmax_activation.o build/src/operator/nn/batch_norm.o build/src/operator/nn/dropout.o build/src/operator/quantization/quantized_elemwise_add.o build/src/operator/quantization/dequantize.o build/src/operator/quantization/quantized_conv.o build/src/operator/quantization/quantize_graph_pass.o build/src/operator/quantization/quantized_flatten.o build/src/operator/quantization/quantized_fully_connected.o build/src/operator/quantization/quantized_pooling.o build/src/operator/quantization/quantize_v2.o build/src/operator/quantization/quantized_concat.o build/src/operator/quantization/requantize.o build/src/operator/quantization/quantized_activation.o build/src/operator/quantization/quantize.o build/src/operator/subgraph/build_subgraph.o build/src/operator/subgraph/default_subgraph_property_v2.o build/src/operator/subgraph/default_subgraph_property.o build/src/executor/inplace_addto_detect_pass.o build/src/executor/infer_graph_attr_pass.o build/src/executor/graph_executor.o build/src/executor/attach_op_execs_pass.o build/src/executor/attach_op_resource_pass.o build/src/io/image_aug_default.o build/src/io/io.o build/src/io/iter_csv.o build/src/io/iter_image_det_recordio.o build/src/io/image_io.o build/src/io/image_det_aug_default.o build/src/io/iter_image_recordio.o build/src/io/iter_mnist.o build/src/io/iter_image_recordio_2.o build/src/io/iter_libsvm.o build/src/common/utils.o build/src/common/rtc.o build/src/nnvm/gradient.o build/src/nnvm/legacy_op_util.o build/src/nnvm/tvm_bridge.o build/src/nnvm/graph_editor.o build/src/nnvm/legacy_json_util.o build/src/nnvm/plan_memory.o build/src/imperative/cached_op.o build/src/imperative/imperative_utils.o build/src/imperative/imperative.o build/src/ndarray/ndarray_function.o build/src/ndarray/ndarray.o build/src/operator/instance_norm.o build/src/operator/subgraph_op_common.o build/src/operator/grid_generator.o build/src/operator/leaky_relu.o build/src/operator/operator_tune.o build/src/operator/rnn.o build/src/operator/crop.o build/src/operator/spatial_transformer.o build/src/operator/convolution_v1.o build/src/operator/regression_output.o build/src/operator/pad.o build/src/operator/bilinear_sampler.o build/src/operator/loss_binary_op.o build/src/operator/svm_output.o build/src/operator/softmax_output.o build/src/operator/roi_pooling.o build/src/operator/batch_norm_v1.o build/src/operator/cross_device_copy.o build/src/operator/swapaxis.o build/src/operator/l2_normalization.o build/src/operator/sequence_reverse.o build/src/operator/c_lapack_api.o build/src/operator/correlation.o build/src/operator/identity_attach_KL_sparse_reg.o build/src/operator/make_loss.o build/src/operator/operator.o build/src/operator/optimizer_op.o build/src/operator/slice_channel.o build/src/operator/sequence_last.o build/src/operator/pooling_v1.o build/src/operator/sequence_mask.o build/src/operator/control_flow.o build/src/operator/operator_util.o build/src/engine/naive_engine.o build/src/engine/openmp.o build/src/engine/threaded_engine_pooled.o build/src/engine/engine.o build/src/engine/threaded_engine.o build/src/engine/threaded_engine_perdevice.o build/src/storage/storage.o build/src/c_api/c_api_symbolic.o build/src/c_api/c_api_profile.o build/src/c_api/c_api_ndarray.o build/src/c_api/c_api_test.o build/src/c_api/c_api_executor.o build/src/c_api/c_predict_api.o build/src/c_api/c_api_function.o build/src/c_api/c_api.o build/src/c_api/c_api_error.o build/src/profiler/profiler.o build/src/profiler/aggregate_stats.o build/src/profiler/nvtx.o build/src/profiler/vtune.o build/src/kvstore/gradient_compression.o build/src/kvstore/kvstore_utils.o build/src/kvstore/kvstore.o build/src/resource.o build/src/libinfo.o build/src/initialize.o /home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core/libdmlc.a build/src/operator/nn/cudnn/cudnn_batch_norm_gpu.o build/src/operator/subgraph/tensorrt/tensorrt_gpu.o build/src/operator/custom/native_op_gpu.o build/src/operator/image/resize_gpu.o build/src/operator/image/image_random_gpu.o build/src/operator/contrib/rpn_inv_normalize_op_gpu.o build/src/operator/contrib/bilinear_resize_gpu.o build/src/operator/contrib/optimizer_op_gpu.o build/src/operator/contrib/deformable_psroi_pooling_gpu.o build/src/operator/contrib/boolean_mask_gpu.o build/src/operator/contrib/psroi_pooling_gpu.o build/src/operator/contrib/ifft_gpu.o build/src/operator/contrib/multibox_detection_gpu.o build/src/operator/contrib/adaptive_avg_pooling_gpu.o build/src/operator/contrib/multibox_target_gpu.o build/src/operator/contrib/proposal_gpu.o build/src/operator/contrib/index_array_gpu.o build/src/operator/contrib/count_sketch_gpu.o build/src/operator/contrib/gradient_multiplier_op_gpu.o build/src/operator/contrib/bounding_box_gpu.o build/src/operator/contrib/sync_batch_norm_gpu.o build/src/operator/contrib/dgl_graph_gpu.o build/src/operator/contrib/hawkes_ll_gpu.o build/src/operator/contrib/fft_gpu.o build/src/operator/contrib/multibox_prior_gpu.o build/src/operator/contrib/adamw_gpu.o build/src/operator/contrib/quadratic_op_gpu.o build/src/operator/contrib/transformer_gpu.o build/src/operator/contrib/all_finite_gpu.o build/src/operator/contrib/index_copy_gpu.o build/src/operator/contrib/deformable_convolution_gpu.o build/src/operator/contrib/roi_align_gpu.o build/src/operator/contrib/multi_proposal_gpu.o build/src/operator/random/shuffle_op_gpu.o build/src/operator/random/sample_multinomial_op_gpu.o build/src/operator/random/multisample_op_gpu.o build/src/operator/random/sample_op_gpu.o build/src/operator/tensor/indexing_op_gpu.o build/src/operator/tensor/elemwise_binary_scalar_op_basic_gpu.o build/src/operator/tensor/amp_cast_gpu.o build/src/operator/tensor/elemwise_binary_scalar_op_extended_gpu.o build/src/operator/tensor/ordering_op_gpu.o build/src/operator/tensor/matrix_op_gpu.o build/src/operator/tensor/elemwise_unary_op_trig_gpu.o build/src/operator/tensor/control_flow_op_gpu.o build/src/operator/tensor/elemwise_binary_broadcast_op_basic_gpu.o build/src/operator/tensor/elemwise_binary_op_extended_gpu.o build/src/operator/tensor/elemwise_sum_gpu.o build/src/operator/tensor/init_op_gpu.o build/src/operator/tensor/cast_storage_gpu.o build/src/operator/tensor/histogram_gpu.o build/src/operator/tensor/broadcast_reduce_op_index_gpu.o build/src/operator/tensor/dot_gpu.o build/src/operator/tensor/elemwise_binary_scalar_op_logic_gpu.o build/src/operator/tensor/elemwise_unary_op_basic_gpu.o build/src/operator/tensor/ravel_gpu.o build/src/operator/tensor/broadcast_reduce_op_value_gpu.o build/src/operator/tensor/elemwise_binary_op_basic_gpu.o build/src/operator/tensor/elemwise_binary_broadcast_op_extended_gpu.o build/src/operator/tensor/elemwise_scatter_op_gpu.o build/src/operator/tensor/square_sum_gpu.o build/src/operator/tensor/elemwise_binary_broadcast_op_logic_gpu.o build/src/operator/tensor/la_op_gpu.o build/src/operator/tensor/elemwise_binary_op_logic_gpu.o build/src/operator/tensor/diag_op_gpu.o build/src/operator/tensor/sparse_retain_gpu.o build/src/operator/nn/dropout_gpu.o build/src/operator/nn/fully_connected_gpu.o build/src/operator/nn/softmax_activation_gpu.o build/src/operator/nn/lrn_gpu.o build/src/operator/nn/moments_gpu.o build/src/operator/nn/pooling_gpu.o build/src/operator/nn/softmax_gpu.o build/src/operator/nn/deconvolution_gpu.o build/src/operator/nn/activation_gpu.o build/src/operator/nn/ctc_loss_gpu.o build/src/operator/nn/convolution_gpu.o build/src/operator/nn/upsampling_gpu.o build/src/operator/nn/batch_norm_gpu.o build/src/operator/nn/layer_norm_gpu.o build/src/operator/nn/concat_gpu.o build/src/operator/quantization/requantize_gpu.o build/src/operator/quantization/quantize_gpu.o build/src/operator/quantization/dequantize_gpu.o build/src/operator/quantization/quantized_conv_gpu.o build/src/operator/quantization/quantized_flatten_gpu.o build/src/operator/quantization/quantized_fully_connected_gpu.o build/src/operator/quantization/quantized_pooling_gpu.o build/src/operator/quantization/quantize_v2_gpu.o build/src/common/utils_gpu.o build/src/common/random_generator_gpu.o build/src/ndarray/ndarray_function_gpu.o build/src/operator/optimizer_op_gpu.o build/src/operator/slice_channel_gpu.o build/src/operator/instance_norm_gpu.o build/src/operator/pad_gpu.o build/src/operator/correlation_gpu.o build/src/operator/make_loss_gpu.o build/src/operator/grid_generator_gpu.o build/src/operator/convolution_v1_gpu.o build/src/operator/softmax_output_gpu.o build/src/operator/rnn_gpu.o build/src/operator/crop_gpu.o build/src/operator/sequence_reverse_gpu.o build/src/operator/identity_attach_KL_sparse_reg_gpu.o build/src/operator/leaky_relu_gpu.o build/src/operator/swapaxis_gpu.o build/src/operator/sequence_mask_gpu.o build/src/operator/bilinear_sampler_gpu.o build/src/operator/spatial_transformer_gpu.o build/src/operator/pooling_v1_gpu.o build/src/operator/loss_binary_op_gpu.o build/src/operator/roi_pooling_gpu.o build/src/operator/batch_norm_v1_gpu.o build/src/operator/svm_output_gpu.o build/src/operator/regression_output_gpu.o build/src/operator/l2_normalization_gpu.o build/src/operator/sequence_last_gpu.o build/src/kvstore/gradient_compression_gpu.o build/src/kvstore/kvstore_utils_gpu.o -pthread -lm -lcudart -lcublas -lcurand -lcusolver -L/usr/lib/cuda/lib64 -L/usr/lib/cuda/lib -lopenblas -fopenmp -lrt -lcudnn  -lcufft -lcuda -lnvrtc -L/usr/local/cuda/lib64/stubs \
-Wl,--whole-archive /home/yizhao/Code/mxnet-dev/3rdparty/tvm/nnvm/lib/libnnvm.a -Wl,--no-whole-archive
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
/usr/bin/ld: /home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core/libdmlc.a(io.o): in function `std::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >::basic_istringstream(std::string const&, std::_Ios_Openmode) [clone .constprop.213]':
io.cc:(.text+0x23): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.so
/usr/bin/ld: io.cc:(.text+0x7c): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `VTT for std::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.so
/usr/bin/ld: io.cc:(.text+0xa7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.so
/usr/bin/ld: io.cc:(.text+0xee): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_streambuf<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.so
/usr/bin/ld: io.cc:(.text+0x114): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.so
/usr/bin/ld: io.cc:(.text+0x173): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.so
/usr/bin/ld: io.cc:(.text+0x1cb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_streambuf<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.so
/usr/bin/ld: /home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core/libdmlc.a(io.o): in function `dmlc::io::FileSystem::GetInstance(dmlc::io::URI const&)':
io.cc:(.text+0x22d): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `guard variable for dmlc::io::LocalFileSystem::GetInstance()::instance' defined in .bss._ZGVZN4dmlc2io15LocalFileSystem11GetInstanceEvE8instance[_ZGVZN4dmlc2io15LocalFileSystem11GetInstanceEvE8instance] section in /home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core/libdmlc.a(io.o)
/usr/bin/ld: io.cc:(.text+0x23d): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `dmlc::io::LocalFileSystem::GetInstance()::instance' defined in .bss._ZZN4dmlc2io15LocalFileSystem11GetInstanceEvE8instance[_ZZN4dmlc2io15LocalFileSystem11GetInstanceEvE8instance] section in /home/yizhao/Code/mxnet-dev/3rdparty/dmlc-core/libdmlc.a(io.o)
/usr/bin/ld: io.cc:(.text+0x34b): additional relocation overflows omitted from the output
lib/libmxnet.so: PC-relative offset overflow in PLT entry for `_ZNSt10_Iter_baseIPaLb0EE7_S_baseES0_'
collect2: error: ld returned 1 exit status
make: *** [Makefile:572: lib/libmxnet.so] Error 1

@larroy
Copy link
Contributor

larroy commented Aug 8, 2019

Having the same issue.

@yuxihu
Copy link
Member

yuxihu commented Aug 30, 2019

A fix PR has been merged. Could you please try to verify if it fixes your problem?

@access2rohit
Copy link
Contributor Author

@yuxihu Still Not fixed

@hzfan
Copy link
Contributor

hzfan commented Oct 12, 2019

Try replace the
ar crv $@ $(filter %.o, $?)
with
ar Scrv $@ $(filter %.o, $?)

in Makefile. It worked for me.

The root cause of this may be the 4GB limit of static lib generated by ar. See the 64-bit variant chapter in wiki.

@access2rohit
Copy link
Contributor Author

@hzfan let me try that. Thanks for the suggestion!

@leezu
Copy link
Contributor

leezu commented Dec 12, 2019

@access2rohit Regarding ar and building the more recent version on CI, https://github.com/apache/incubator-mxnet/blob/cab1dfad37f044d691e7c4ea81d73463cfcf0c8d/ci/docker/install/ubuntu_ar.sh#L35
should get an extra --enable-64-bit-archive.

See https://sourceware.org/bugzilla/show_bug.cgi?id=14625
And based on the associated patch it seems there's no runtime switch: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blobdiff;f=bfd/archive.c;h=6fc5f1d80f9ef8be456c45b35bcea42ff3436086;hp=53e295eb26c0b66741803400e92df496b096b527;hb=e6cc316af931911da20249e19f9342e5cf8aeeff;hpb=b95a0a3177bcf797c8f5ad6a7d276fb6275352b7

However, in my local tests that didn't fix the issue when using cmake to build. (Likely cmake doesn't pick up the updated ar.)

@schliffen
Copy link

The same problem for me:
verflows omitted from the output libmxnet.so: PC-relative offset overflow in PLT entry for ZN5mxnet2op8mxnet_op6KernelINS0_9pick_gradILi3ELb0EEEN7mshadow3gpuEE6LaunchIJPdS9_PfiiNS5_5ShapeILi3EEESC_EEEvPNS5_6StreamIS6_EEiDpT'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
`

@leezu
Copy link
Contributor

leezu commented Apr 2, 2020

You should be able to avoid this issue by just building for a single cuda architecture. Look into specifying -DMXNET_CUDA_ARCH=7.0 7for the cmake build etc

@ghost
Copy link

ghost commented May 31, 2020

@leezu following your advice, the size of generated .so is reduced by 3/4, thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests