Support CUDA on Julia 1.9+ via a package extension. #32

GunnarFarneback · 2023-10-11T21:55:38Z

This PR adds a package extension for CUDA on Julia 1.9+, with support for CUDA.jl versions 4 and 5. ~~It does not change anything for older Julia, where CUDA support remains implemented through the Requires package.~~ It drops support for Julia < 1.9 and CUDA.jl < 4.

The file test/test_cuda_extension.jl, not included from runtests.jl, tries to verify that the CUDA extension works for a variety of Julia/CUDA.jl/CUDA runtime versions. Possibly it should use a more demanding ONNX file, in case that affects the amount of needed libraries.

Fixes #27.

codecov-commenter · 2023-10-12T04:55:16Z

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (dfb67d6) 89.60% compared to head (bd6a35b) 88.18%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #32      +/-   ##
==========================================
- Coverage   89.60%   88.18%   -1.42%     
==========================================
  Files           3        4       +1     
  Lines         375      381       +6     
==========================================
  Hits          336      336              
- Misses         39       45       +6

Files	Coverage Δ
src/ONNXRunTime.jl	`100.00% <ø> (ø)`
ext/CUDAExt.jl	`0.00% <0.00%> (ø)`
src/highlevel.jl	`84.52% <0.00%> (-6.51%)`	⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

GunnarFarneback · 2023-10-12T07:18:19Z

The tests fail on Julia 1.9 and later because the test environment specifies CUDA 3 and the package extension has a CUDA 4, 5 compat. There are some different ways forward but it depends on the support ambition for Julia and CUDA versions.

Most radical is to make a breaking release and drop support for CUDA < 4 and Julia < 1.9. This allows all the Requires stuff to be removed.
Adding support for CUDA 3 in the package extension seems difficult since the cuDNN package isn't compatible with any CUDA version < 4. Possibly a second extension that only depends on CUDA could be implemented and there sort out the logic for handling different CUDA versions.
Adding support for CUDA 4+ in the Requires approach doesn't seem impossible but on the other hand nothing much has happened in CUDA 4.0 compatibility #27 over time. It should be easier if CUDA 3 support is dropped at the same time.
To keep using Requires in Julia 1.9+ is not advisable since it doesn't play nicely at all with PrecompileTools.

jw3126 · 2023-10-12T07:37:18Z

I think we should just drop pre 1.9 julia support and also drop old CUDA.jl.

GunnarFarneback · 2023-10-12T08:33:41Z

Done in second commit. I need to do some tests on my real use case and it needs README updates and a breaking version bump, either to 0.4 or 1.0. I would recommend taking the opportunity to jump to 1.0, regardless of package maturity, but that's your call.

src/highlevel.jl

test/test_cuda_extension.jl

jw3126 · 2023-10-12T08:53:56Z

test/test_cuda_extension.jl

+# This file is not included from `runtests.jl` nor run in CI.
+#
+# Run it with `julia tests/test_cuda_extension.jl`. This requires that
+# Julia is installed with juliaup and will involve downloading of a
+# lot of big artifacts. The output will contain lots of error messages
+# from caught errors; what matters is that all testsets pass.


This file looks pretty useful, thanks!

jw3126 · 2023-10-12T09:16:19Z

Done in second commit. I need to do some tests on my real use case and it needs README updates and a breaking version bump, either to 0.4 or 1.0. I would recommend taking the opportunity to jump to 1.0, regardless of package maturity, but that's your call.

Thanks, 1.0 is a good idea!

GunnarFarneback · 2023-10-12T15:33:51Z

From my point of view this is good to go. The CI failure on nightly is a CUDA.jl issue; it doesn't load in isolation either.

Oh, there's a docs directory as well. Clearly index.md needs some of the updates from README.md. Should some of the new material only go into one of the files?

jw3126 · 2023-10-12T15:53:54Z

From my point of view this is good to go. The CI failure on nightly is a CUDA.jl issue; it doesn't load in isolation either.

Oh, there's a docs directory as well. Clearly index.md needs some of the updates from README.md. Should some of the new material only go into one of the files?

In theory that should be automatic

ONNXRunTime.jl/docs/make.jl

Line 6 in dfb67d6

    
           cp(joinpath(@__DIR__, "..", "README.md"), joinpath(@__DIR__, "src", "index.md"), force=true)

Of course feel free to do any other additions to the docs, but it is not required from my side.

GunnarFarneback · 2023-10-12T16:49:51Z

Ok, then I'm happy with it as it is.

jw3126 · 2023-10-12T18:52:35Z

I tried to play with your brach locally. I setup an environment, with

]dev ONNXRunTime
]add CUDA cuDNN
import CUDA; CUDA.set_runtime_version("v11.8")
exit()

However when I do import CUDA I get

 Downloading artifact: CUDA_Runtime
    Downloading [==================================>      ]  82.6 %

And after that hits 99.1% it just starts again at 0% in an infinite loop. Any ideas?

jw3126 · 2023-10-12T18:57:08Z

With CUDA runtime version v12.2 I don't get stuck in an infinte loop, but I get the following:

julia> include("test/runtests.jl")
[ Info: Precompiling ONNXRunTime [e034b28e-924e-41b2-b98f-d2bbeb830c6a]
Test Summary: | Pass  Total  Time
high level    |  110    110  0.8s
Test Summary: | Pass  Total  Time
Session       |   25     25  0.3s
Test Summary:    | Pass  Total  Time
tensor roundtrip |    9      9  0.1s
[ Info: Precompiling cuDNN [02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd]
┌ Error: cuDNN is not available for your platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-c
uda+none-julia_version+1.9.3)
└ @ cuDNN ~/.julia/packages/cuDNN/rSQYc/src/cuDNN.jl:172
[ Info: Precompiling CUDAExt [d886cdb6-706a-5e22-bdf1-b04a849cf272]
┌ Warning: The :cuda execution provider requires a CUDA runtime version of at least 11.8 but less th
an 12. See `CUDA.set_runtime_version!`.
└ @ ONNXRunTime ~/.julia/dev/ONNXRunTime/src/highlevel.jl:86
increment2x3.onnx: Error During Test at /home/jan/.julia/dev/ONNXRunTime/test/test_cuda.jl:13
  Got exception outside of a @test
  /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1131 onnxruntime::Provider& onnxr
untime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime
_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or di
rectory

GunnarFarneback · 2023-10-12T19:07:39Z

And after that hits 99.1% it just starts again at 0% in an infinite loop. Any ideas?

It's nothing I've run into but I'm trying it out in a clean depot. If it's reproducible I'm sure the Pkg developers would love a bug report.

Update: Failed to reproduce the problem.

With CUDA runtime version v12.2 I don't get stuck in an infinte loop, but I get the following:

This is expected since you don't get v11 .so files with CUDA runtime v12, and ONNXRunTime tries to load v11.

GunnarFarneback · 2023-10-12T19:30:35Z

And after that hits 99.1% it just starts again at 0% in an infinite loop. Any ideas?

Could it be a network or package server issue? Kind of sounds like the artifact transmission gets interrupted and something decides to start over. Might be worth trying to switch to another package server or disable it, to see if something changes. Or just try again and hope it was a transient problem.

GunnarFarneback · 2023-10-13T08:31:27Z

With CUDA runtime version v12.2 I don't get stuck in an infinte loop, but I get the following:

This is expected since you don't get v11 .so files with CUDA runtime v12, and ONNXRunTime tries to load v11.

Ok, let's make this stricter. The latest commit gives an error outside the officially supported range of CUDA runtime versions. It also centralizes the version data and adds consistency checks within the package.

stemann · 2023-10-13T20:10:38Z

What’s the rush with 1.0?

What would be reasonable, achievable goals for a first not-supposed-to-need-breaking changes version?

stemann · 2023-10-13T20:21:06Z

Will restricting CUDA support to v11+ (via restrictions on CUDA.jl) rule out support for ONNXRuntime on Jetson Nano devices (that only support CUDA 10.2)?

It might be that Nvidia has released a newer version with Jetson Orin Nano (at 5x the price), but I think dropping support for a quite popular runtime platform from a quite popular runtime package sounds like a bad idea.

GunnarFarneback · 2023-10-13T20:35:23Z

The main point of going to 1.0 is to get all three digits of semantic versioning and being able to distinguish between new features and bugfixes.

Does the current version work with Jetson Nano?

jw3126 · 2023-10-14T03:00:18Z

And after that hits 99.1% it just starts again at 0% in an infinite loop. Any ideas?

Could it be a network or package server issue? Kind of sounds like the artifact transmission gets interrupted and something decides to start over. Might be worth trying to switch to another package server or disable it, to see if something changes. Or just try again and hope it was a transient problem.

The issue disappeared, but now I get with CUDA_Runtime_jll v11.8:

  Got exception outside of a @test
  /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1131 onnxruntime::Provider& onnxr
untime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime
_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or direct
ory

jw3126 · 2023-10-14T03:02:28Z

If I try to import it directly, I get:

julia> import cuDNN
┌ Error: cuDNN is not available for your platform (x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-c
uda+none-julia_version+1.9.3)

CUDA itself seems to work with v11.8:

julia> import CUDA; CUDA.randn(3)
3-element CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
  0.34151646
 -1.027569
  0.46423724

GunnarFarneback · 2023-10-14T07:02:19Z

My best guess is that your artifacts are messed up. I've seen similar things at work occasionally and then it has turned out that there is some empty artifact directory, possibly a result of some Pkg operation being interrupted at a bad time. Just removing the empty directory has solved the problem since Pkg in that case automatically downloads the missing artifact. A more brutal approach is to just delete all of .julia/artifacts and reinstantiate everything.

A more sophisticated approach is to try to find the exact artifact where libcudnn.so.8 should be located. With Julia 1.9.3 on Linux, doing

$ mkdir /tmp/testenv
$ julia --project=/tmp/testenv
pkg> add CUDA cuDNN
pkg> status
julia> import cuDNN
julia> cuDNN.set_runtime_version!(v"11.8")
julia> exit()
$ julia --project=/tmp/testenv
julia> import CUDA, cuDNN
julia> cuDNN.libcudnn
pkg> status

I get

julia> import CUDA, cuDNN

julia> cuDNN.libcudnn
"/home/gunnar/.julia/artifacts/330d3b99898bb6899242acecc7fefe96ce279acc/lib/libcudnn.so"

(testenv) pkg> status
Status `/tmp/testenv/Project.toml`
  [052768ef] CUDA v5.0.0
  [02a925ec] cuDNN v1.2.0

jw3126 · 2023-10-15T06:43:13Z

@GunnarFarneback Thanks a lot, my artifacts are now fixed and this works for me locally.
@stemann I have no idea if the current version runs on Jetson Nano, I would guess you are the only person, who even dared to run it in that setup? Would this PR make life for you more harder? Are you opposed to parts of this PR, or only to the 1.0 tag? It would be super cool if someday we can use proper artifacts and don't depend on CUDA.jl anymore.
In my view tagging 1.0 does not mean we are opposed to supporting Jetson Nano or any other platform.
This package did not have breaking changes for a long time and I don't foresee any breaking changes after this PR (except bumping CUDA.jl version from time to time), so in my view 1.0 is appropriate.

stemann · 2023-10-15T21:10:07Z

The current version of ONNXRunTime.jl does not (I believe) run on the Jetson Nano, but ONNXRuntime does and the ONNXRuntime JLL should - that was a primary goal of JuliaPackaging/Yggdrasil#4386 (#12) and #19.

All that has been left (for ages, I agree!) is to implement, I believe, a similar approach as the CUDARuntime JLL - using e.g. Preferences.jl to select among the execution providers (and hence the artifacts) at Pkg.add-time. Please feel free to comment e.g. in #19 or #12.

I agree that compatibility with recent CUDA.jl versions should be an important goal for this package (I believe that is what this PR achieves…?).

But it would be nice if there was still room to get the Jetson Nano etc. support completed. Preferable without having to go for a 2.x version of this package, I would say.

So yes, I guess my main concern with this is bumping the version to 1.0.

P.S. Inspired by this PR, I raced to the office to get the old Jetson Nano fired up and ready for testing - hoping to find some time soon for pitching in…

GunnarFarneback · 2023-10-15T21:37:23Z

The main point of this PR is to make the package play well with Julia 1.9+ precompilation and PrecompileTools, which is accomplished by replacing Requires with a package extension. Making it compatible with recent CUDA.jl is more of a bonus since it was easier to do that than to stay with the old version.

There's no harm done in going to major version 2, or more, if it turns out that further breaking changes are needed.

jw3126 · 2023-10-16T05:40:53Z

So yes, I guess my main concern with this is bumping the version to 1.0.

P.S. Inspired by this PR, I raced to the office to get the old Jetson Nano fired up and ready for testing - hoping to find some time soon for pitching in…

Cool! How about this? We merge this PR with v0.4 and release v1 December first. If you find the time to add artifacts support until then @stemann, that is awesome.
If not, also no big deal IMO to release a v2 a later.

stemann · 2023-10-16T06:40:14Z

The no-later-than-December plan sounds great.

stemann · 2023-10-16T06:40:29Z

Have you considered using https://github.com/cjdoris/PackageExtensionCompat.jl to keep 1.6 compatibility?

Project.toml

jw3126 · 2023-10-16T07:07:17Z

Have you considered using https://github.com/cjdoris/PackageExtensionCompat.jl to keep 1.6 compatibility?

CUDA v4 would be compatible with 1.6, CUDA v5 needs julia 1.8. Personally I would keep it simple and just drop 1.6 compat, but if anyone wants to add it using PackageExtensionCompat I am also fine with that.

GunnarFarneback · 2023-10-16T07:12:08Z

I did some git commit rewriting to change the version bump to 0.4.

GunnarFarneback · 2023-10-16T07:18:29Z

Have you considered using https://github.com/cjdoris/PackageExtensionCompat.jl to keep 1.6 compatibility?

It's not entirely that simple since it doesn't help with the Base.get_extension-based diagnostics.

README.md

Co-authored-by: Jan Weidner <[email protected]>

test/test_cuda_extension.jl

jw3126

LGTM, thanks a lot this is a great PR! If there are no further objections, I will merge today.

GunnarFarneback · 2023-10-16T07:49:51Z

Remember to register as well. I'd love to get back to a registered version instead of depending on this branch.

Support CUDA on Julia 1.9+ via a package extension.

5f07cf3

GunnarFarneback mentioned this pull request Oct 11, 2023

CUDA 4.0 compatibility #27

Closed

Drop support for Julia < 1.9 and CUDA < 4. Remove Requires machinery.

ab407a3

GunnarFarneback force-pushed the cuda_package_extension branch from a83257d to ab407a3 Compare October 12, 2023 08:28

jw3126 reviewed Oct 12, 2023

View reviewed changes

src/highlevel.jl Outdated Show resolved Hide resolved

jw3126 reviewed Oct 12, 2023

View reviewed changes

test/test_cuda_extension.jl Outdated Show resolved Hide resolved

jw3126 reviewed Oct 12, 2023

View reviewed changes

GunnarFarneback added 3 commits October 12, 2023 16:00

Error instead of warn in cases where a later error is unavoidable.

34d7e7e

Remove stray Julia 1.6 installation step.

2d699db

Update README.md.

7cc0816

jw3126 reviewed Oct 16, 2023

View reviewed changes

Project.toml Outdated Show resolved Hide resolved

GunnarFarneback added 2 commits October 16, 2023 09:09

Stricter CUDA runtime version checks and version consistency tests.

dbe6498

Bump version to 0.4.

0a99758

GunnarFarneback force-pushed the cuda_package_extension branch from e0ae641 to 0a99758 Compare October 16, 2023 07:10

jw3126 reviewed Oct 16, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update README.md

73e0b51

Co-authored-by: Jan Weidner <[email protected]>

jw3126 reviewed Oct 16, 2023

View reviewed changes

test/test_cuda_extension.jl Show resolved Hide resolved

jw3126 approved these changes Oct 16, 2023

View reviewed changes

Remove no longer used file.

bd6a35b

jw3126 merged commit 4688a1a into jw3126:main Oct 17, 2023
7 of 12 checks passed

GunnarFarneback deleted the cuda_package_extension branch October 18, 2023 14:06

jw3126 mentioned this pull request Dec 28, 2023

v1.0.0 #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CUDA on Julia 1.9+ via a package extension. #32

Support CUDA on Julia 1.9+ via a package extension. #32

GunnarFarneback commented Oct 11, 2023 •

edited

Loading

codecov-commenter commented Oct 12, 2023 •

edited

Loading

GunnarFarneback commented Oct 12, 2023 •

edited

Loading

jw3126 commented Oct 12, 2023 •

edited

Loading

GunnarFarneback commented Oct 12, 2023

jw3126 Oct 12, 2023

jw3126 commented Oct 12, 2023

GunnarFarneback commented Oct 12, 2023

jw3126 commented Oct 12, 2023

GunnarFarneback commented Oct 12, 2023

jw3126 commented Oct 12, 2023

jw3126 commented Oct 12, 2023

GunnarFarneback commented Oct 12, 2023 •

edited

Loading

GunnarFarneback commented Oct 12, 2023

GunnarFarneback commented Oct 13, 2023

stemann commented Oct 13, 2023

stemann commented Oct 13, 2023

GunnarFarneback commented Oct 13, 2023

jw3126 commented Oct 14, 2023

jw3126 commented Oct 14, 2023 •

edited

Loading

GunnarFarneback commented Oct 14, 2023 •

edited

Loading

jw3126 commented Oct 15, 2023 •

edited

Loading

stemann commented Oct 15, 2023

GunnarFarneback commented Oct 15, 2023

jw3126 commented Oct 16, 2023

stemann commented Oct 16, 2023

stemann commented Oct 16, 2023 •

edited

Loading

jw3126 commented Oct 16, 2023

GunnarFarneback commented Oct 16, 2023

GunnarFarneback commented Oct 16, 2023

jw3126 left a comment

GunnarFarneback commented Oct 16, 2023

Support CUDA on Julia 1.9+ via a package extension. #32

Support CUDA on Julia 1.9+ via a package extension. #32

Conversation

GunnarFarneback commented Oct 11, 2023 • edited Loading

codecov-commenter commented Oct 12, 2023 • edited Loading

Codecov Report

GunnarFarneback commented Oct 12, 2023 • edited Loading

jw3126 commented Oct 12, 2023 • edited Loading

GunnarFarneback commented Oct 12, 2023

jw3126 Oct 12, 2023

Choose a reason for hiding this comment

jw3126 commented Oct 12, 2023

GunnarFarneback commented Oct 12, 2023

jw3126 commented Oct 12, 2023

GunnarFarneback commented Oct 12, 2023

jw3126 commented Oct 12, 2023

jw3126 commented Oct 12, 2023

GunnarFarneback commented Oct 12, 2023 • edited Loading

GunnarFarneback commented Oct 12, 2023

GunnarFarneback commented Oct 13, 2023

stemann commented Oct 13, 2023

stemann commented Oct 13, 2023

GunnarFarneback commented Oct 13, 2023

jw3126 commented Oct 14, 2023

jw3126 commented Oct 14, 2023 • edited Loading

GunnarFarneback commented Oct 14, 2023 • edited Loading

jw3126 commented Oct 15, 2023 • edited Loading

stemann commented Oct 15, 2023

GunnarFarneback commented Oct 15, 2023

jw3126 commented Oct 16, 2023

stemann commented Oct 16, 2023

stemann commented Oct 16, 2023 • edited Loading

jw3126 commented Oct 16, 2023

GunnarFarneback commented Oct 16, 2023

GunnarFarneback commented Oct 16, 2023

jw3126 left a comment

Choose a reason for hiding this comment

GunnarFarneback commented Oct 16, 2023

GunnarFarneback commented Oct 11, 2023 •

edited

Loading

codecov-commenter commented Oct 12, 2023 •

edited

Loading

GunnarFarneback commented Oct 12, 2023 •

edited

Loading

jw3126 commented Oct 12, 2023 •

edited

Loading

GunnarFarneback commented Oct 12, 2023 •

edited

Loading

jw3126 commented Oct 14, 2023 •

edited

Loading

GunnarFarneback commented Oct 14, 2023 •

edited

Loading

jw3126 commented Oct 15, 2023 •

edited

Loading

stemann commented Oct 16, 2023 •

edited

Loading