Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RedisAI Build #669

Merged
merged 80 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
e590ba0
Fine Grain GPU Build Options (#609)
MattToast Jun 10, 2024
cd35301
Add Dockerfiles with CUDA support (#611)
ashao Jun 11, 2024
ad06abe
Add 'f' to f-strs (#618)
MattToast Jun 17, 2024
c89e0ec
Rebase: Intermediate work
ashao Aug 1, 2024
795cdf5
Remove builder from setup.py
ashao Aug 1, 2024
11e3c0b
Update changelog
ashao Aug 1, 2024
64ef934
Merge branch 'remove_builder_from_setup' into add_rocm
ashao Aug 1, 2024
da8ed01
intermediate work
ashao Aug 1, 2024
259441c
Finish platform reader
ashao Aug 1, 2024
ecf242e
Begin refactor of builder
ashao Aug 8, 2024
f0bd8e1
Merge branch 'add_rocm' of https://github.com/ashao/SmartSim into add…
ashao Aug 8, 2024
b7675c5
Plumb in RedisAIBuilder to CLI
ashao Aug 9, 2024
7e47f99
Merge branch 'develop' of github.com:CrayLabs/SmartSim into add_rocm
ashao Aug 9, 2024
0bfe5bd
Ensure RAI builder works
ashao Aug 9, 2024
cf69490
Confirm smart build works on Mac
ashao Aug 13, 2024
1ba2e13
Update configs for linux and darwin cpu
ashao Aug 13, 2024
b078040
Refactor python package checking
ashao Aug 14, 2024
305e4b5
Last touches for verbose and non-verbose builds
ashao Aug 14, 2024
6085881
Fix typehints and style
ashao Aug 14, 2024
83d7b95
Add docstrings
ashao Aug 15, 2024
b7e392d
Merge branch 'develop' of https://github.com/CrayLabs/SmartSim into c…
ashao Aug 15, 2024
c253eb9
Merge branch 'cuda-12-support' of https://github.com/CrayLabs/SmartSi…
ashao Aug 15, 2024
2107dc3
Fix tests for X86_64
ashao Aug 20, 2024
9122ed8
Add Darwin arm64 config file
ashao Aug 20, 2024
fcff751
Try to get tests to pass
ashao Aug 21, 2024
0ffbfea
Respond to Matt review
ashao Aug 22, 2024
9cd0237
Last bits
ashao Aug 22, 2024
5a0056e
Modify ignores because of different Python versions
ashao Aug 22, 2024
eb04b89
Fix backend test
ashao Aug 23, 2024
44b7bf8
Update the run_tests action
ashao Aug 23, 2024
4956f19
Fix RedisAIBuilder test
ashao Aug 27, 2024
654baf7
Add CUDA configurations
ashao Aug 28, 2024
32ad750
Fix tests and pydantic errors
ashao Aug 30, 2024
8bdcd04
Merge branch 'develop' of https://github.com/CrayLabs/SmartSim into r…
ashao Sep 2, 2024
4372c71
Last fixes for ROCm
ashao Sep 3, 2024
118a008
Style fixes
ashao Sep 3, 2024
20281f4
add ROCm config file
ashao Sep 3, 2024
fa36b8a
Make the redisai shared lib executable for linux
ashao Sep 3, 2024
ace7414
Update tensorflow requirement for doc
ashao Sep 3, 2024
7b17470
Workaround RedisAI LFS limits
ashao Sep 4, 2024
33ec8ff
in progress table
juliaputko Sep 6, 2024
63d5210
Address most review feedback
ashao Sep 9, 2024
7857d99
in progress tables
juliaputko Sep 9, 2024
839d58b
indent change
juliaputko Sep 9, 2024
d86d523
All comments but the regex
MattToast Sep 9, 2024
527ea1c
update spacing
juliaputko Sep 10, 2024
c1df0f3
table format
juliaputko Sep 11, 2024
7558268
table format
juliaputko Sep 11, 2024
9cefd75
More robust version checking of python packages
MattToast Sep 11, 2024
8942653
Update parsing test
MattToast Sep 11, 2024
78d2556
Add suppressions for ARM MacOS w/o TF
MattToast Sep 12, 2024
860723a
Merge branch 'doctable' of https://github.com/juliaputko/SmartSim int…
ashao Sep 13, 2024
091ec4e
Update instructions and support matrix in docs
ashao Sep 13, 2024
e3dd420
Merge branch 'refactor_rai_builder' of https://github.com/ashao/Smart…
ashao Sep 14, 2024
7544954
Update support matrix table and add variants to pytorch config
ashao Sep 14, 2024
c7b8f15
Last touchups
ashao Sep 16, 2024
8b5e16f
Update links to backends
ashao Sep 16, 2024
675d81f
Set default ROCm ARCH and hard pin versions
ashao Sep 16, 2024
746d231
regex handles extras
MattToast Sep 16, 2024
706b9c5
Merge branch 'refactor_rai_builder' of https://github.com/ashao/Smart…
MattToast Sep 16, 2024
df09ce0
Remove extras from package name
MattToast Sep 17, 2024
e69c271
Update version of Torch compatible with cuDNN 8
ashao Sep 17, 2024
33f1b2e
Update CUDA 12 dependencies
ashao Sep 18, 2024
745465d
Update CUDA 12 dependencies
ashao Sep 18, 2024
a617a2c
Remove minor versions from device
ashao Sep 18, 2024
515591c
Add instructions for perlmutter
ashao Sep 18, 2024
1072250
Add ROCM6
ashao Sep 18, 2024
ff60f33
Use official tensorflow 2.15 for CUDA 12
ashao Sep 18, 2024
dc5a44d
Update perlmutter.rst
amandarichardsonn Sep 18, 2024
2b16b57
Update perlmutter.rst
amandarichardsonn Sep 18, 2024
316e41f
Patches for rocm
MattToast Sep 19, 2024
db545b2
Merge branch 'refactor_rai_builder' of https://github.com/ashao/Smart…
MattToast Sep 19, 2024
37190bf
Fix torch, drop onnx
MattToast Sep 19, 2024
666bafa
Add Frontier instructions
ashao Sep 19, 2024
0b7662c
Fix last style errors
ashao Sep 19, 2024
a7d7161
Fix tests
ashao Sep 19, 2024
6388ab2
Update frontier instructions
ashao Sep 19, 2024
24e9d57
fix logic for default PYTORCH_ROCM_ARCH
ashao Sep 19, 2024
30601aa
Last doc touchups
ashao Sep 19, 2024
979ffc4
Remove extraneous test
ashao Sep 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 4 additions & 12 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ env:

jobs:
run_tests:
name: Run tests ${{ matrix.subset }} with ${{ matrix.os }}, Python ${{ matrix.py_v}}, RedisAI ${{ matrix.rai }}
name: Run tests ${{ matrix.subset }} with ${{ matrix.os }}, Python ${{ matrix.py_v}}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand All @@ -63,9 +63,6 @@ jobs:
- os: macos-14
py_v: "3.9"

env:
SMARTSIM_REDISAI: ${{ matrix.rai }}

steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
Expand Down Expand Up @@ -109,15 +106,10 @@ jobs:
- name: Install SmartSim (with ML backends)
run: |
python -m pip install git+https://github.com/CrayLabs/SmartRedis.git@develop#egg=smartredis
python -m pip install .[dev,mypy,ml]

- name: Install ML Runtimes with Smart (with pt, tf, and onnx support)
if: contains( matrix.os, 'ubuntu' ) || contains( matrix.os, 'macos-12')
run: smart build --device cpu --onnx -v
python -m pip install .[dev,mypy]

- name: Install ML Runtimes with Smart (no ONNX,TF on Apple Silicon)
if: contains( matrix.os, 'macos-14' )
run: smart build --device cpu --no_tf -v
- name: Install ML Runtimes
run: smart build --device cpu -v

- name: Run mypy
run: |
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ tests/test_output
# Dependencies
smartsim/_core/.third-party
smartsim/_core/.dragon
smartsim/_core/build

# Docs
_build
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -643,11 +643,11 @@ from C, C++, Fortran and Python with the SmartRedis Clients:
<tr>
<td rowspan="3">1.2.7</td>
<td>PyTorch</td>
<td>2.0.1</td>
<td>2.1.0</td>
</tr>
<tr>
<td>TensorFlow\Keras</td>
<td>2.13.1</td>
<td>2.15.0</td>
</tr>
<tr>
<td>ONNX</td>
Expand Down
33 changes: 33 additions & 0 deletions doc/changelog.md
MattToast marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,39 @@ Jump to:

## SmartSim

### Cuda 12 and ROCm support branch

To be merged into `develop` at some future point in time

Description

- Refactor to the RedisAI build to allow more flexibility in versions
and sources of ML backends
- Add Dockerfiles with GPU support
- Fine grain build support for GPUs
- Update Torch to 2.1.0, Tensorflow to 2.15.0
- Better error messages in build process

Detailed Notes

- The RedisAIBuilder class was completely overhauled to allow users to
express a wider range of support for hardware/software stacks. This
will be extended to support ROCm, CUDA-11, and CUDA-12.
- Versions for each of these packages are no longer specified in an
internal class. Instead a default set of JSON files specifies the
sources and versions. Users can specify their own custom specifications
at smart build time
- Two new Dockerfiles are now provided (one each for 11.8 and 12.1) that
can be used to build a container to run the tutorials. No HPC support
should be expected at this time
- SmartSim can now be built using Cuda version 11.8 or Cuda 12.1 by specify
`smart build --device=cuda118` or `smart build --device=cuda121`. The
original `smart build --device=gpu` will default to using Cuda 11.8.
- As a result of the previous change, SmartSim now requires C++17 and a
minimum Cuda version of 11.8 in order to build Torch 2.1.0.
- Error messages were not being interpolated correctly. This has been
addressed to provide more context when exposing error messages to users.

### Development branch

To be released at some future point in time
Expand Down
Loading
Loading