[Inference] limit inference timing recording to `NativeInterpreter` only #49391

vchuravy · 2023-04-17T19:32:30Z

Allows consumers to separate native inference from inference coming
from a custom AbstractInterpreter.

x-ref: timholy/SnoopCompile.jl#338

I choose typeof(interp) since the interp may be arbitrarily heavy and we might not want to hold on to them.

maleadt

SGTM

base/compiler/typeinfer.jl

aviatesk · 2023-04-18T10:51:57Z

Alternatively we can do something like:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 1eec73d043..eb88a2081e 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -205,8 +205,7 @@ __set_measure_typeinf(onoff::Bool) = __measure_typeinf__[] = onoff
 const __measure_typeinf__ = fill(false)
 
 # Wrapper around _typeinf that optionally records the exclusive time for each invocation.
-function typeinf(interp::AbstractInterpreter, frame::InferenceState)
-    interp = switch_from_irinterp(interp)
+function typeinf(interp::NativeInterpreter, frame::InferenceState)
     if __measure_typeinf__[]
         Timings.enter_new_timer(frame)
         v = _typeinf(interp, frame)
@@ -216,6 +215,8 @@ function typeinf(interp::AbstractInterpreter, frame::InferenceState)
         return _typeinf(interp, frame)
     end
 end
+# disable recording timings for external `AbstractInterpreter`s
+typeinf(interp::AbstractInterpreter, frame::InferenceState) = _typeinf(interp, frame)
 
 function finish!(interp::AbstractInterpreter, caller::InferenceResult)
     # If we didn't transform the src for caching, we may have to transform
@@ -242,6 +243,7 @@ function finish!(interp::AbstractInterpreter, caller::InferenceResult)
 end
 
 function _typeinf(interp::AbstractInterpreter, frame::InferenceState)
+    interp = switch_from_irinterp(interp)
     typeinf_nocycle(interp, frame) || return false # frame is now part of a higher cycle
     # with no active ip's, frame is done
     frames = frame.callers_in_cycle

?

vchuravy · 2023-04-18T13:11:09Z

Alternatively we can do something like:

I thought about that, but I actually want to be able to see how much time it took to infer CUDA code.

aviatesk · 2023-04-18T13:58:24Z

I thought about that, but I actually want to be able to see how much time it took to infer CUDA code.

GPUCompiler can implement its own measurement mechanism for that?

I'm not opposed to the current approach, but it seems somewhat risky to leave Core.Compiler.Timings to support an external AbstractInterpreters. The reason being that the external AbstractInterpreter's compilation pipeline may trigger native compilation, which could invalidate the current assumption of being able to find children/parent frames. So I believe it would be cleaner and more correct to have Core.Compiler.Timings only support NativeInterpreter by default, and have any external AbstractInterpreter optionally use the data structure of Core.Compiler.Timings to inspect its own compilation performance.

vchuravy · 2023-04-18T15:10:48Z

The major reason would be that it would be great if SnoopCompile could be extended to custom abstract interpreters.
For me that's the next frontier in TTFX. Oceananigans has 10+ minutes of compilation time.

vchuravy · 2023-04-18T15:32:29Z

The reason being that the external AbstractInterpreter's compilation pipeline may trigger native compilation, which could invalidate the current assumption of being able to find children/parent frames.

I don't know why that would be the case? I would say it is correct? You might have arbitrary interleavings?

aviatesk · 2023-04-18T15:50:06Z

I don't know why that would be the case? I would say it is correct? You might have arbitrary interleavings?

It's possible that there are type unstable parts within some external AbstractInterpreter implementation, and then invoking the external compilation pipeline may trigger dynamic dispatch to native-compile the compilation pipeline itself.

aviatesk · 2023-04-18T15:56:02Z

The major reason would be that it would be great if SnoopCompile could be extended to custom abstract interpreters.
For me that's the next frontier in TTFX. Oceananigans has 10+ minutes of compilation time.

Yeah, I agree. But would it be cleaner if external AbstractInterpreter uses SnoopCompile utilities like this?

const gpu_timings = CC.Timings.Timing[]

function CC.typeinf(interp::GPUInterpreter, frame::InferenceState)
    [ push timing into gpu_timings ]
    @invoke CC.typeinf(interp::AbstractInterpreter, frame::InferenceState)
end

macro snoopi_deep_gpu(ex) _snoopi_deep_gpu(ex) end

function _snoopi_deep_gpu(cmd::Expr)
    return quote
        start_gpu_deep_timing()
        try
            $(esc(cmd))
        finally
            stop_gpu_deep_timing()
        end
        CC.Timings.InferenceTimingNode(gpu_timings[1])
    end 
end

vchuravy · 2023-04-18T17:01:48Z

You would want to do both at the same time though...

…

On Tue, Apr 18, 2023, 11:56 Shuhei Kadowaki ***@***.***> wrote: The major reason would be that it would be great if SnoopCompile could be extended to custom abstract interpreters. For me that's the next frontier in TTFX. Oceananigans has 10+ minutes of compilation time. Yeah, I agree. But would it be cleaner if external AbstractInterpreter uses SnoopCompile utilities like this? const gpu_timings = CC.Timings.Timing[] function CC.typeinf(interp::GPUInterpreter, frame::InferenceState) [ push timing into gpu_timings ] @invoke CC.typeinf(interp::AbstractInterpreter, frame::InferenceState)end macro snoopi_deep_gpu(ex) _snoopi_deep_gpu(ex) end function _snoopi_deep_gpu(cmd::Expr) return quote start_gpu_deep_timing() try $(esc(cmd)) finally stop_gpu_deep_timing() end CC.Timings.InferenceTimingNode(gpu_timings[1]) end end — Reply to this email directly, view it on GitHub <#49391 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABDO2WTCOU2UCQKTAQCSTTXB22R3ANCNFSM6AAAAAAXBT4PGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

aviatesk · 2023-04-25T06:30:37Z

Leaving a comment from slack for future reference:

Shuhei Kadowaki
I think it is problematic that NativeInterpreter and external AbstractInterpreters share the same data structure (Core.Compiler.Timings._timings), since Core.Compiler.Timings seems to assume that the inference graph is constructed by the same interpreter.

E.g.

julia/base/compiler/typeinfer.jl

Lines 165 to 166 in a34261f

    
           # Prepare to unwind one level of the stack and record in the parent 
        
           parent_timer = _timings[end]

finds the parent frame just by looking at _timings[end], and currently we may get an inference graph like:

Timing(::GPUInterpreter, MethodInstance for xxx, ...)
 Timing(::GPUInterpreter, MethodInstance for yyy, ...)
  Timing(::NativeInterpreter, MethodInstance for GPUCompiler.abstract_eval_zzz, ...)
   Timing(::NativeInterpreter, MethodInstance for GPUCompiler.some_utility_for_abstract_eval_zzz, ...)
  Timing(::GPUInterpreter, MethodInstance for zzz, ...)

, which seems to be a bit tricky to handle?

If we isolate Core.Compiler.Timings._timings to NativeInterpreter and make external AbstractInterpreter maintain its own timings, we will get cleaner graphs like

Timing(::GPUInterpreter, MethodInstance for xxx, ...)
 Timing(::GPUInterpreter, MethodInstance for yyy, ...)
  Timing(::GPUInterpreter, MethodInstance for zzz, ...)

and

Timing(::NativeInterpreter, MethodInstance for GPUCompiler.abstract_eval_zzz, ...)
 Timing(::NativeInterpreter, MethodInstance for GPUCompiler.some_utility_for_abstract_eval_zzz, ...)

Allows consumers to separate native inference from inference coming from a custom AbstractInterpreter. Co-authored-by: Shuhei Kadowaki <[email protected]>

This reverts commit 062fa0a.

…nly (#49391) The logic of `Core.Compiler.Timings` assumes that the whole recorded inference graph is constructed by the same interpreter, thus we should limit the inference timing recording to `NativeInterpreter` only. External `AbstractInterpreter` can implement its own recording logic, likely by reusing existing `Core.Compiler.Timings` utilities, in a way that does not interfere with the recording for native compilation pipeline. --------- Co-authored-by: Shuhei Kadowaki <[email protected]> (cherry picked from commit 3db036e)

…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (timholy/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools.

…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (timholy/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools. (cherry picked from commit c0611e8)

vchuravy added compiler:inference Type inference backport 1.9 Change should be backported to release-1.9 caching labels Apr 17, 2023

vchuravy requested review from aviatesk, maleadt and timholy April 17, 2023 19:32

maleadt approved these changes Apr 17, 2023

View reviewed changes

aviatesk reviewed Apr 18, 2023

View reviewed changes

base/compiler/typeinfer.jl Outdated Show resolved Hide resolved

vchuravy added the gpu Affects running Julia on a GPU label Apr 18, 2023

timholy approved these changes Apr 21, 2023

View reviewed changes

vchuravy force-pushed the vc/inf_timings branch from c25de89 to aff54d5 Compare April 24, 2023 14:45

vchuravy and others added 4 commits April 25, 2023 15:41

[Inference] Mark timing with typeof(interp)

0efef7a

Allows consumers to separate native inference from inference coming from a custom AbstractInterpreter. Co-authored-by: Shuhei Kadowaki <[email protected]>

Revert "[Inference] Mark timing with typeof(interp)"

4563d1d

This reverts commit 062fa0a.

Only time for NativeInterpreter

756eb7f

fix up ir_abstract_constant_propagation

c2411ce

aviatesk force-pushed the vc/inf_timings branch from aff54d5 to c2411ce Compare April 25, 2023 06:41

aviatesk changed the title ~~[Inference] Mark timing with typeof(interp)~~ [Inference] enable inference timing recording only for NativeInterpreter Apr 25, 2023

aviatesk changed the title ~~[Inference] enable inference timing recording only for NativeInterpreter~~ [Inference] limit inference timing recording to NativeInterpreter only Apr 25, 2023

aviatesk approved these changes Apr 25, 2023

View reviewed changes

aviatesk merged commit 3db036e into master Apr 25, 2023

aviatesk deleted the vc/inf_timings branch April 25, 2023 11:17

This was referenced Apr 26, 2023

Cannot precompile GPU code with SnoopPrecompile JuliaGPU/CUDA.jl#1870

Closed

SnoopPrecompile with pkgimages chokes on non-native code timholy/SnoopCompile.jl#338

Closed

KristofferC mentioned this pull request May 8, 2023

Backports for julia 1.9.1 #49680

Merged

51 tasks

KristofferC removed the backport 1.9 Change should be backported to release-1.9 label May 28, 2023

vchuravy mentioned this pull request Apr 13, 2024

Allow PrecompileTools to see MI's inferred by foreign abstract interpreters #54069

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] limit inference timing recording to `NativeInterpreter` only #49391

[Inference] limit inference timing recording to `NativeInterpreter` only #49391

vchuravy commented Apr 17, 2023 •

edited

Loading

maleadt left a comment

aviatesk commented Apr 18, 2023

vchuravy commented Apr 18, 2023

aviatesk commented Apr 18, 2023

vchuravy commented Apr 18, 2023

vchuravy commented Apr 18, 2023

aviatesk commented Apr 18, 2023 •

edited

Loading

aviatesk commented Apr 18, 2023

vchuravy commented Apr 18, 2023 via email

aviatesk commented Apr 25, 2023

[Inference] limit inference timing recording to NativeInterpreter only #49391

[Inference] limit inference timing recording to NativeInterpreter only #49391

Conversation

vchuravy commented Apr 17, 2023 • edited Loading

maleadt left a comment

Choose a reason for hiding this comment

aviatesk commented Apr 18, 2023

vchuravy commented Apr 18, 2023

aviatesk commented Apr 18, 2023

vchuravy commented Apr 18, 2023

vchuravy commented Apr 18, 2023

aviatesk commented Apr 18, 2023 • edited Loading

aviatesk commented Apr 18, 2023

vchuravy commented Apr 18, 2023 via email

aviatesk commented Apr 25, 2023

[Inference] limit inference timing recording to `NativeInterpreter` only #49391

[Inference] limit inference timing recording to `NativeInterpreter` only #49391

vchuravy commented Apr 17, 2023 •

edited

Loading

aviatesk commented Apr 18, 2023 •

edited

Loading