Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device_worker_test 编译错误 #36179

Closed
zlsh80826 opened this issue Sep 28, 2021 · 8 comments
Closed

device_worker_test 编译错误 #36179

zlsh80826 opened this issue Sep 28, 2021 · 8 comments

Comments

@zlsh80826
Copy link
Collaborator

  • 版本、环境信息:
  1. PaddlePaddle版本: develop branch or fbbc339 之後的 commit
  2. CPU: all
  3. GPU: all
  4. 系統環境:
OS: Ubuntu 18.04
Python version: 3.6.9

CUDA version: 11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
  • 安装方式信息:
    1)docker安装
    2)docker编译:
FROM nvcr.io/nvidia/cuda:11.2.1-cudnn8-devel-ubuntu18.04

RUN export DEBIAN_FRONTEND=noninteractive \
 && apt-get update \
 && apt-get install -y --no-install-recommends \
        cmake \
        patchelf \
        python3-dev \
        python3-pip \
        unzip \
        gcc-8 \
        g++-8 \
        libgl1 \
        libssl-dev \
        graphviz \
        net-tools \
        curl \
        zlib1g-dev \
        git \
        wget

RUN cd /opt && git clone https://github.com/PaddlePaddle/Paddle.git
WORKDIR /opt/Paddle

RUN pip3 install --no-cache-dir -r python/requirements.txt
RUN pip3 install wheel

RUN mkdir build \
 && cd build \
 && cmake .. \
          -DCMAKE_BUILD_TYPE=Release \
          -DCMAKE_INSTALL_PREFIX:PATH=/usr/local \
          -DWITH_GPU=ON \
          -DWITH_TENSORRT=OFF \
          -DWITH_ROCM=OFF \
          -DWITH_RCCL=OFF \
          -DWITH_DISTRIBUTE=ON \
          -DWITH_MKL=OFF \
          -DWITH_AVX=OFF \
          -DCUDA_ARCH_NAME=Auto \
          -DWITH_PYTHON=ON \
          -DCUDNN_ROOT=/usr \
          -DWITH_TESTING=ON \
          -DWITH_COVERAGE=OFF \
          -DWITH_INCREMENTAL_COVERAGE=OFF \
          -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
          -DWITH_CONTRIB=ON \
          -DWITH_INFERENCE_API_TEST=ON \
          -DINFERENCE_DEMO_INSTALL_DIR=/root/.cache/inference_demo \
          -DPY_VERSION=3.6 \
          -DWITH_PSCORE=ON \
          -DWITH_GLOO=ON \
          -DWITH_LITE=OFF \
          -DWITH_XPU=OFF \
          -DWITH_STRIP=ON \
          -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` \
 && make -j`nproc` install
  • 复现信息:執行上述 Dockerfile
docker build -f Dockerfile -t paddle .
  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志/代码关键片段
  1. 錯誤訊息
/usr/bin/ld: ../../../third_party/install/brpc/lib/libbrpc.a(iobuf.cpp.o): undefined reference to symbol 'BIO_fd_non_fatal_error@@OPENSSL_1_1_0'
//usr/lib/x86_64-linux-gnu/libcrypto.so.1.1: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/fluid/framework/device_worker_test] Error 1
  1. 問題原因
    device_worker_testWITH_DISTRIBUTED=ON 時會動態連接 brpc, 同時 brpc 會需要 ssl 和 crypto, 但 dependency 只有 link brpc 沒有 link ssl, crypto 導致無法在編譯 device_worker_test 時找到 libssl, libcrypto 相關函式
@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@qili93
Copy link
Contributor

qili93 commented Sep 28, 2021

@zlsh80826 您好,推荐使用paddle官方的开发镜像环境

docker pull paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8-gcc82

如果您需要自己构建镜像环境,请参考这个dockerfile进行相关依赖库的安装,例如您这提示缺少的libssl-dev等库

https://github.com/PaddlePaddle/Paddle/blob/develop/tools/dockerfile/Dockerfile.ubuntu18

@zlsh80826
Copy link
Collaborator Author

Hi @qili93,

請你測試以下 dockerfile, 使用 paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8-gcc82 一樣有同樣問題, libssl-dev 在我提供的 dockerfile 也有安裝, 沒有安裝的話在 cmake configure 階段就會報錯

workaround solution 在 PR #36181, 麻煩請相關同學複現跟進下這個問題

FROM paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8-gcc82

RUN export DEBIAN_FRONTEND=noninteractive \
 && apt-get update \
 && apt-get install -y --no-install-recommends \
        cmake \
        patchelf \
        python3-dev \
        python3-pip \
        unzip \
        gcc-8 \
        g++-8 \
        libgl1 \
        libssl-dev \
        graphviz \
        net-tools \
        curl \
        zlib1g-dev \
        git \
        wget

RUN cd /opt && git clone https://github.com/PaddlePaddle/Paddle.git
WORKDIR /opt/Paddle

RUN pip3 install --no-cache-dir -r python/requirements.txt
RUN pip3 install wheel

RUN mkdir build \
 && cd build \
 && cmake .. \
          -DCMAKE_BUILD_TYPE=Release \
          -DCMAKE_INSTALL_PREFIX:PATH=/usr/local \
          -DWITH_GPU=ON \
          -DWITH_TENSORRT=OFF \
          -DWITH_ROCM=OFF \
          -DWITH_RCCL=OFF \
          -DWITH_DISTRIBUTE=ON \
          -DWITH_MKL=OFF \
          -DWITH_AVX=OFF \
          -DCUDA_ARCH_NAME=Auto \
          -DWITH_PYTHON=ON \
          -DCUDNN_ROOT=/usr \
          -DWITH_TESTING=ON \
          -DWITH_COVERAGE=OFF \
          -DWITH_INCREMENTAL_COVERAGE=OFF \
          -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
          -DWITH_CONTRIB=ON \
          -DWITH_INFERENCE_API_TEST=ON \
          -DINFERENCE_DEMO_INSTALL_DIR=/root/.cache/inference_demo \
          -DPY_VERSION=3.6 \
          -DWITH_PSCORE=ON \
          -DWITH_GLOO=ON \
          -DWITH_LITE=OFF \
          -DWITH_XPU=OFF \
          -DWITH_STRIP=ON \
          -DCMAKE_C_COMPILER=`which gcc-8` -DCMAKE_CXX_COMPILER=`which g++-8` \
 && make -j`nproc` install

@qili93
Copy link
Contributor

qili93 commented Sep 29, 2021

hi, 您好,我在官网11.2的镜像上确认了,是可以编译成功的,猜测您这里再官网镜像上编译失败,可能是由于你这里的cmake选项导致的,请参考以下步骤中的cmake命令

docker pull paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8-gcc82

nvidia-docker run -it --name dev-cuda112 \
  --network=host --shm-size=128G --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
  paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8-gcc82 /bin/bash

git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle

mkdir build && cd build
cmake .. -DPY_VERSION=3.7 -DWITH_GPU=ON -DWITH_NCCL=ON -DWITH_TESTING=ON -DWITH_DISTRIBUTE=ON \
         -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MKL=ON

make -j16

@qili93
Copy link
Contributor

qili93 commented Sep 29, 2021

另外您可以参考一下,用这个dockerfile也可以编译成功

FROM paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8-gcc82

RUN cd /opt && git clone https://github.com/PaddlePaddle/Paddle.git
WORKDIR /opt/Paddle

RUN mkdir build && cd build \
 && cmake .. -DPY_VERSION=3.7 -DWITH_GPU=ON -DWITH_NCCL=ON -DWITH_TESTING=ON -DWITH_DISTRIBUTE=ON \
         -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MKL=ON \
 && make -j16

运行命令是

docker build -f Dockerfile.test \
       --build-arg http_proxy=${proxy} \
       --build-arg https_proxy=${proxy} \
       --build-arg ftp_proxy=${proxy} \
       -t qili93/test .

@zlsh80826
Copy link
Collaborator Author

Hi @qili93,

我這邊逐一測試每一條 cmake command, 發現是指定 g++ 版本造成的, 會需要指定 g++ 版本是因為需要在 ubuntu20.04 上編譯, 且 paddle 目前不支持 g++ 9 的編譯, 而在指定 g++-8 之後就會遇到上面所述的錯誤, 請問有 ubuntu20.04 的官方鏡像嗎?

@pangyoki
Copy link
Contributor

Hi @qili93,

我這邊逐一測試每一條 cmake command, 發現是指定 g++ 版本造成的, 會需要指定 g++ 版本是因為需要在 ubuntu20.04 上編譯, 且 paddle 目前不支持 g++ 9 的編譯, 而在指定 g++-8 之後就會遇到上面所述的錯誤, 請問有 ubuntu20.04 的官方鏡像嗎?

目前没有ubuntu20.04的官方镜像。你可以尝试使用g++9进行编译,可能会有些小问题需要设置下编译选项,不过可以编译成功。

@zlsh80826
Copy link
Collaborator Author

Hi @pangyoki @qili93

The problem has been fixed on #37064.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants