Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassette overhead #17

Closed
carstenbauer opened this issue Mar 29, 2023 · 4 comments
Closed

Cassette overhead #17

carstenbauer opened this issue Mar 29, 2023 · 4 comments

Comments

@carstenbauer
Copy link

Hey, great effort! I see that you're using Cassette for instrumenting libraries like Base.Threads or Distributed. This introduces a runtime overhead that - so I've been told be knowledgable people - is non negligible. Do you, perhaps, have an idea what the magnitude of this instrumentation overhead is?

@carstenbauer
Copy link
Author

carstenbauer commented Mar 29, 2023

Perhaps worth noting, in my drafty https://github.com/pc2/MPITape.jl package I've taken a similar approach whereas in https://github.com/JuliaPerf/ScoreP.jl the user must manually mark code regions (expect for external libraries like MPI which have a dedicated Profiling API that can be used). The latter is obviously not ideal but has the advantage that source code information (like file name and line number) can appear in the profiling/tracing output.

@clasqui
Copy link
Contributor

clasqui commented Mar 29, 2023

This is totally something that we have to tackle at some point. When we started there were several strategies that we considered, having in mind overhead.

One of them was this more simple one, where the user has to add events manually. This strategy could be used right now without problem with the current implementation of the package, as we have the interface to the Extrae API. For the MPI part, we have not tested but we expect that it should work straighforward when loading the library, as MPI tracing is done intercepting the library calls and this is no different in Julia.

But if we want to trace the built in parallel programming models (like Distributed, Threads, and all the asynchronous functionality) with its internal functions, we needed another strategy and that's why we went for Cassette.

Personally, I think there's one third alternative that would allow us to trace everything we want without overhead, which is manually instrumenting the stdlibs of Distributed and Threads, but this opens another round of problems. I think that if we ever want to consider this, the bet thing to do would be think from the ground up a new API provided by the Julia runtime that allows attaching any desired tool (something like OMPT for OpenMP).

@mofeing
Copy link
Member

mofeing commented Mar 29, 2023

CassetteOverlay.jl can be tried in order to reduce the overhead. The people at the field of Automatic Differentiation have been trying different methods but zero-overhead seems only posible with Compiler Plugins.

In any case, I agree that an API provided by Julia would be the best.

@mofeing
Copy link
Member

mofeing commented May 7, 2024

This will tackled when Tracepoints.jl gets to production.

@mofeing mofeing closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants