Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ROCm runner #1193

Closed
wants to merge 96 commits into from
Closed
Show file tree
Hide file tree
Changes from 94 commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
59267a9
Hipification of fbgemm for AMD GPUs/CPUs (#4)
jithunnair-amd Jan 25, 2022
a223936
Use SHEFL_SYNC_MACRO to replace __shefl() and __shefl_sync()
liligwu Jan 26, 2022
4610075
Merge pull request #6 from ROCmSoftwarePlatform/rocm4.3/develop
liligwu Jan 26, 2022
a506c52
Change the hipify dependency to hipify_torch (#7)
liligwu Jan 31, 2022
f596bde
IFU, merge from upstream commit c6df576 to main. (#8)
liligwu Feb 14, 2022
0cfb792
Enable `split_table_batched_embeddings_test.py` (#10)
liligwu Mar 2, 2022
f13af44
*Enable use_cache. *Enable split_embedding_inference_converter_test.p…
liligwu Mar 7, 2022
25e5b71
Skip use_cpu.
liligwu Mar 7, 2022
dcbe19f
Enable test_nbit_cache_pipeline and test_cache_miss_counter.
liligwu Mar 7, 2022
fda048e
Enable quantize_ops_test.py
liligwu Mar 7, 2022
00abba1
Merge branch 'main' into use_cache_enabled
liligwu Mar 7, 2022
cf307b6
Remove @skipIfRocm for test_nbit_cache_pipeline and test_cache_miss_c…
liligwu Mar 7, 2022
2d66ea8
*Uncondition use_cache in split_table_batched_embeddings_test.py *Rem…
liligwu Mar 7, 2022
958679b
Merge pull request #11 from ROCmSoftwarePlatform/use_cache_enabled
amathews-amd Mar 8, 2022
e642a48
Fix backward tests and test_cache_pipeline in split_table_batched_emb…
liligwu Mar 8, 2022
d0d294a
A minor change of removing a commented line.
liligwu Mar 8, 2022
146f2df
Remove skipIfRocm import in split_table_batched_embeddings_test.py.
liligwu Mar 8, 2022
eb0cf36
Merge pull request #12 from ROCmSoftwarePlatform/fix_backward
amathews-amd Mar 8, 2022
0c86f2b
*Removed post_hipify logic in setup.py. *Removed two headerfiles that…
liligwu Mar 11, 2022
6e7f13e
Merge pull request #16 from ROCmSoftwarePlatform/remove_post_hipify
amathews-amd Mar 11, 2022
edd3306
Pointing hipify_torch to the newer commit.
liligwu Mar 14, 2022
9a45f4a
Merge pull request #17 from ROCmSoftwarePlatform/pointing_hipify_torc…
amathews-amd Mar 14, 2022
309a3a1
Fixing #include <ATen/CUDAGeneratorImpl.h> by defining NEW_GENERATOR_…
liligwu Mar 16, 2022
358eaf5
Disabling all use_cpu in the tests. (#20)
liligwu Mar 16, 2022
3a915a8
Change py3.8 syntax to py3.7 syntax (#18)
pruthvistony Mar 16, 2022
40928ba
Match upstream setup (#21)
liligwu Mar 31, 2022
69abf78
Enable merge_pooled_embeddings op. in ROCm (#15)
reza-amd Apr 1, 2022
5c0096e
Merge remote-tracking branch 'upstream/main' into IFU-main-2022-04-07
liligwu Apr 14, 2022
bfac874
Fixing test_lxu_cache_lookup in AMD devices where warp_siize=64
liligwu Apr 14, 2022
1cf7e84
* Enabling the specificationn of hip architecture by using PYTORCH_RO…
liligwu Apr 15, 2022
5b33287
*Fixing the unit tests in sparse_ops_test.py. *Fixing the path of Ato…
liligwu Apr 19, 2022
2c514c5
Merge pull request #23 from ROCmSoftwarePlatform/IFU-main-2022-04-07
pruthvistony Apr 19, 2022
0d5a012
Enable use_cpu in the tests.
liligwu Apr 20, 2022
ae14a47
Merge remote-tracking branch 'upstream/main' into IFU-main-2022-04-20
liligwu Apr 20, 2022
1718605
*Taking @skipIfRocm back in the test_utils.py. *Fixing cublasGemmStri…
liligwu Apr 20, 2022
bc902a3
Cleaning up the code.
liligwu Apr 20, 2022
0d95948
Merge pull request #24 from ROCmSoftwarePlatform/IFU-main-2022-04-20
pruthvistony Apr 21, 2022
9a5a33b
Enabling cuda (#25)
liligwu Apr 21, 2022
6490dbc
Enabling cuda (#25)
liligwu Apr 21, 2022
77627ae
Merge branch 'main' of https://github.com/ROCmSoftwarePlatform/FBGEMM…
liligwu Apr 22, 2022
18b48e9
Merge remote-tracking branch 'upstream/main' into IFU-main-2022-05-02
liligwu May 2, 2022
99a70e1
Merge pull request #2 from ROCmSoftwarePlatform/IFU-main-2022-05-02
liligwu May 4, 2022
fed56ff
Merge branch 'main' into rocm_changes
liligwu May 4, 2022
4b39a70
Merge branch 'upstream_main' into rocm_changes
liligwu May 5, 2022
785afb8
Removing building and testing bash scripts.
liligwu May 5, 2022
bbd0ad1
* Addressing the comments in PR review ROCm changes #1102. * Reoganiz…
liligwu May 9, 2022
9db83d8
Minor changes that minimize the difference to upstream.
liligwu May 9, 2022
eabd0a8
A minor change on a blank line.
liligwu May 9, 2022
2038008
Fixing indentation and commented code in CMakeList.txt
liligwu May 10, 2022
0202078
Removing build script.
liligwu May 10, 2022
9cf8856
Addressing the second batch of comments of https://github.com/pytorch…
liligwu May 11, 2022
b885322
* Removing the condition on c++ standard * An indentation correction
liligwu May 12, 2022
0e3dfdb
* Changing the logic of detecting GPU vender, making CUDA as default.…
liligwu May 13, 2022
1f926e9
Merge remote-tracking branch 'upstream/main' into IFU-2022-05-23
liligwu May 24, 2022
adefcc0
fix enum macro to avoid missing symbols
jeffdaily May 24, 2022
b96bd9a
- Changing detection of ROCm to /opt/rocm. - Skipping 4 unit tests fo…
liligwu May 26, 2022
3a1c2a3
Cherry-pick 33c5e061e7aa47b8efbcb7dee83580b3844f6d67
amathews-amd May 23, 2022
f664267
Resolve the conflict in quantize_ops_benchmark.py
liligwu May 26, 2022
d37293d
Merge pull request #3 from ROCmSoftwarePlatform/IFU-main-2022-05-23
liligwu May 31, 2022
a986aec
Merge remote-tracking branch 'upstream/main' into main
liligwu Jun 24, 2022
2cc3656
add rocm runner
liligwu Jun 29, 2022
b9ed7da
remove cd fbgemm_gpu
liligwu Jun 29, 2022
d4ebb6d
remove sodu
liligwu Jun 29, 2022
6552b13
switch to docker container
liligwu Jun 30, 2022
cc8a0c2
add docker login
liligwu Jun 30, 2022
3d714f9
change password file
liligwu Jun 30, 2022
a2998a6
clone repo to bearmetal
liligwu Jun 30, 2022
6c2daa4
migrate to rocm/pytorch image, which is public
liligwu Jul 5, 2022
528c0aa
add pre-checkout to help actions/checkout
liligwu Jul 5, 2022
ae1a87f
fix checkout script
liligwu Jul 5, 2022
21888fc
fix baremetal path
liligwu Jul 5, 2022
c5b158d
fix baremetal path
liligwu Jul 5, 2022
888550e
checkout submodules
liligwu Jul 5, 2022
6103266
upgrade git
liligwu Jul 5, 2022
51f90ec
upgrade git
liligwu Jul 5, 2022
b5d9d4f
upgrade git
liligwu Jul 5, 2022
213667a
add logic that cheks workspace directory
liligwu Jul 5, 2022
12dec21
checkout suubmodules
liligwu Jul 5, 2022
c74102f
debug thirdparty
liligwu Jul 6, 2022
a639c59
checkout branch
liligwu Jul 6, 2022
d5e2685
checkout branch
liligwu Jul 6, 2022
a13b45e
change working dir
liligwu Jul 6, 2022
4aad1e9
change working dir
liligwu Jul 6, 2022
1993520
checkout current branch
liligwu Jul 6, 2022
4e86fc9
move build_and_run script to FBGEMM repo
liligwu Jul 7, 2022
bd39c03
remove SCRIPT_DIR_BAREMETAL
liligwu Jul 7, 2022
a6e2458
change build_and_run permission
liligwu Jul 7, 2022
85ac1e8
Merge remote-tracking branch 'upstream/main' into add_rocm_runner
liligwu Jul 7, 2022
6c5ccc7
fix the data type matching issue
liligwu Jul 7, 2022
dc7d7ca
fix indentation
liligwu Jul 7, 2022
14f2f41
recover the changes in split_table_batched_embeddings_ops.py
liligwu Jul 8, 2022
1fcaff5
change docker image of ROCm CI to staging_base
liligwu Jul 11, 2022
b73c3a9
run docker container as root in ROCm CI
liligwu Jul 11, 2022
0c4b962
remove CXX=hipcc in ROCm CI
liligwu Jul 12, 2022
f7a0c68
enable more tests in ROCm CI
liligwu Jul 13, 2022
54da556
Merge remote-tracking branch 'upstream/main' into add_rocm_runner
liligwu Jul 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/fbgemmci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,54 @@ jobs:
python3 -c "import fbgemm_gpu"
python3 -c "import fbgemm_gpu.split_embedding_codegen_lookup_invokers"

test_amd_gpu:
runs-on: rocm
strategy:
matrix:
os: [ubuntu-latest]

steps:
- name: pre-checkout
shell: bash
run: |
if [ -d ${{ github.workspace }} ]
then
sudo chown -R $USER:$USER ${{ github.workspace }}
fi
sudo add-apt-repository ppa:git-core/ppa
sudo apt update
sudo apt -y install --only-upgrade git

- uses: actions/checkout@v2
with:
ref: ${{ github.ref }}
submodules: 'true'

- name: build fbgemm_gpu and test
shell: bash
run: |
set -eux
env
ls -l
DOCKER_IMAGE=rocm/pytorch:rocm5.1.1_ubuntu20.04_py3.7_pytorch_staging_base
docker pull $DOCKER_IMAGE
JENKINS_REPO_DIR=fbgemm-private-jenkins
JENKINS_REPO_DIR_BAREMETAL=$PWD
JENKINS_REPO_DIR_DOCKER=/workspace/$JENKINS_REPO_DIR
DOCKER_OPTIONS="\
--user 0 \
--network=host \
--ipc=host \
--shm-size 16G \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device=/dev/kfd \
--device=/dev/dri \
-v $JENKINS_REPO_DIR_BAREMETAL:$JENKINS_REPO_DIR_DOCKER
"
docker run $DOCKER_OPTIONS $DOCKER_IMAGE $JENKINS_REPO_DIR_DOCKER/.jenkins/rocm/build_and_test.sh $JENKINS_REPO_DIR_DOCKER

build_cpu_only:
runs-on: ${{ matrix.os }}
strategy:
Expand Down
58 changes: 58 additions & 0 deletions .jenkins/rocm/build_and_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#!/bin/bash

# exit immediately on failure, or if an undefined variable is used
set -eux

FBGEMM_REPO_DIR=${1:-/workspace/FBGEMM}

git config --global --add safe.directory $FBGEMM_REPO_DIR
git config --global --add safe.directory $FBGEMM_REPO_DIR/third_party/asmjit
git config --global --add safe.directory $FBGEMM_REPO_DIR/third_party/cpuinfo
git config --global --add safe.directory $FBGEMM_REPO_DIR/third_party/googletest
git config --global --add safe.directory $FBGEMM_REPO_DIR/third_party/hipify_torch
shintaro-iwasaki marked this conversation as resolved.
Show resolved Hide resolved

# Install dependencies
apt-get update --allow-insecure-repositories && \
apt-get install -y --allow-unauthenticated \
git \
jq \
sshfs \
sshpass \
unzip

apt-get install -y locales
locale-gen en_US.UTF-8

pip3 install click
pip3 install jinja2
pip3 install ninja
pip3 install scikit-build
pip3 install --upgrade hypothesis
pip3 install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.1.1/

pip3 list

# Build fbgemm_gpu
cd $FBGEMM_REPO_DIR/fbgemm_gpu
shintaro-iwasaki marked this conversation as resolved.
Show resolved Hide resolved
export MAX_JOBS=`nproc`
shintaro-iwasaki marked this conversation as resolved.
Show resolved Hide resolved
export PYTORCH_ROCM_ARCH="gfx908"
python setup.py build develop

export FBGEMM_TEST_WITH_ROCM=1

# Test fbgemm_gpu
cd test

python layout_transform_ops_test.py --verbose

python permute_pooled_embedding_modules_test.py --verbose

python sparse_ops_test.py --verbose

python merge_pooled_embeddings_test.py --verbose

python quantize_ops_test.py --verbose

python split_embedding_inference_converter_test.py --verbose

python split_table_batched_embeddings_test.py --verbose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was raised by other FBGEMM developers: would you mind running the following tests too?

batched_unary_embeddings_test.py
input_combine_test.py
jagged_tensor_ops_test.py
metric_ops_test.py
uvm_test.py

See this for a complete list: https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shintaro-iwasaki , I'll enable these tests.
Another question, is it okay to open tracking issues in this repo that tack the ROCm skipped unit tests, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not fully understand your question, but please create issues to keep track of skipped unit tests at this pytorch/fbgemm repository. Unlike PyTorch, we cannot skip tests by GitHub issues, though.

2 changes: 1 addition & 1 deletion fbgemm_gpu/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ if(USE_ROCM)
include(Hipify)

message("${message_line}")
message(STATUS "hip found ${ROCM_FOUND}")
message(STATUS "hip found ${HIP_FOUND}")
endif()

#
Expand Down