Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile (with a Cassette Pass) until fire breakpoint is hit #306

Open
oxinabox opened this issue Jul 26, 2019 · 11 comments
Open

Compile (with a Cassette Pass) until fire breakpoint is hit #306

oxinabox opened this issue Jul 26, 2019 · 11 comments

Comments

@oxinabox
Copy link
Contributor

The idea that @MikeInnes and I came up with last night:

User actions:

  1. set 1 or more breakpoints
  2. run @compiled_run foo()
  3. the code runs as compiled until the function containing the breakpoint is hit.
  4. Now we are in the interpretter, for all stepping etc, until we step out, then we are compiled again.

Now to do this we need a Cassette style pass,
that replaces every function call (in the compiled section) with:
Pseudocode:

if contains_breakpoint(f, args...)
    interpret(f, args...)
else
    recurse(ctx, f, args...)
end

Further more, when Continue is run we can switch back into that compile.
(Possibly "ContinueCompiled").
The cost of this, you can't StepOut into compiled code.
Unless you made the compiled section actually have the full instrumention of MagneticReadHead.
But the advantage of this simpler compiler pass code over MagneticReadHead
is that it doesn't add anywhere need as many statements to the Code IR.
So while everything still needs to recompile, it doesn't hit those compile-time blackholes that MRH has.

@KristofferC
Copy link
Member

I don't really think this is the right way to go. It wouldn't handle cases like

...
big_loop()
breakpoint

People would need to start splitting up their functions into "work functions" and the part they want to debug etc. Also, having to run Cassette on everything has a lot of drawbacks so I don't think this is the solution.

@oxinabox
Copy link
Contributor Author

oxinabox commented Sep 4, 2019

I think this is a solution worth investigating,
Can we leave this open until I get around to running those investigations?
Maybe add a [speculative] tag?

I started to write code for it, but I am still getting familar with how JuliaInterpretter works.
So haven't done it yet.

@oxinabox
Copy link
Contributor Author

@pfitzseb
Copy link
Member

The Cassette-less alternative to this (at least for file breakpoints) would be some kind of Infiltrator/Debugger hybrid. Setting a breakpoint in whatever UI would recompile the relevant method with an @enter spliced in, basically.

So basically if you have

function foo(x)
    y = sin(x)

    while x > y
        x -= sin(x)
    end
    x
end

and set a breakpoint on line 3 you'd end up with

function foo(x)
    y = sin(x)

    Main.Debugger.@enter((() -> 
        while x > y
            x -= sin(x)
        end
        x
    end)())
end

or something close to that. You also wouldn't need to prefix your function invocation with @run, which is also something people have complained about.

@oxinabox
Copy link
Contributor Author

oxinabox commented Sep 24, 2019

Yeah, I initially thought that was what Inflitrator did, and was going to just insert infliltrator's code.

I think just making an infiltrator style version of this would be pretty solid.

@oxinabox
Copy link
Contributor Author

Benchmarking

with code written to always call continue when breakpoint hit.

Using code from
https://github.com/oxinabox/MixedModeDebugger.jl/blob/541f63b8a61afb5a6d3a8473ae092fe304bfc5a0/src/proto.jl

New benchmarking function winter

function winter(A)
   s = zero(eltype(A))
   return winter_s1(s,A)
end
function winter_s1(s, A)
   for a in A
       s += exp(a)
   end
   return winter_s1s1(s)
end
function winter_s1s1(s)
   return s + s
end

Trial is with const x = rand(1_000, 500)

Native: (no breakpoints)

winter(x)

  • 1st Run: 0.046176 seconds (76.61 k allocations: 4.036 MiB)
  • 2nd Run: 0.004589 seconds (5 allocations: 176 bytes)
  • 3nd Run: 0.004865 seconds (5 allocations: 176 bytes)

No breakpoints

run_interpretted(winter, x)

  • 1st Run: 28.640364 seconds (275.66 M allocations: 7.673 GiB, 2.57% gc time)
  • 2nd Run: 24.058352 seconds (269.59 M allocations: 7.384 GiB, 1.47% gc time)
  • 3nd Run: 23.998528 seconds (269.59 M allocations: 7.384 GiB, 1.48% gc time)

run_mixedmode(winter, x)

  • 1st Run: 1.085078 seconds (3.91 M allocations: 204.089 MiB, 2.51% gc time)
  • 2nd Run: 0.003748 seconds (5 allocations: 176 bytes)
  • 3nd Run: 0.004449 seconds (5 allocations: 176 bytes)

Runtime for mixedmode is the clear winner here, since it runs at native speed,
when there are no breakpoints.

The mixed mode compile-time was 300x worse than native,
There is some fiddling that one can do with the compile-time, with regards what code gets generated.

Breakpoint on winter_s1

This has the same perforance as putting a breakpoint on any line in winte_s1

run_interpretted(winter, x)

  • 1st Run: 29.471830 seconds (275.64 M allocations: 7.672 GiB, 2.47% gc time)
  • 2nd Run: 25.295872 seconds (269.60 M allocations: 7.384 GiB, 1.56% gc time)
  • 3nd Run: 25.142979 seconds (269.60 M allocations: 7.384 GiB, 1.61% gc time)

run_mixedmode(winter, x)

  • 1st Run: 24.457268 seconds (272.33 M allocations: 7.520 GiB, 1.69% gc time)
  • 2nd Run: 23.362447 seconds (269.60 M allocations: 7.384 GiB, 1.60% gc time)
  • 3nd Run: 25.369876 seconds (269.60 M allocations: 7.384 GiB, 1.57% gc time)

So in this case it falls back to ther same performance as interpretting
As expected surprising since winter_s1 is where all the work is actually done.
So that whole function is getting interpretted.

Breakpoint on winter_s1s1

This has the same perforance as putting a breakpoint on any line in winte_s1

run_interpretted(winter, x)

  • 1st Run: 28.968437 seconds (275.56 M allocations: 7.670 GiB, 2.54% gc time)
  • 2nd Run: 24.877208 seconds (269.51 M allocations: 7.382 GiB, 1.51% gc time)
  • 3nd Run: 24.963561 seconds (269.51 M allocations: 7.382 GiB, 1.61% gc time)

run_mixedmode(winter, x)

  • 1st Run: 1.144460 seconds (3.96 M allocations: 207.329 MiB, 2.82% gc time)
  • 2nd Run: 0.004416 seconds (185 allocations: 11.656 KiB)
  • 3nd Run: 0.004966 seconds (185 allocations: 11.656 KiB)

So here mixed mode gets naitive speed,
since almost none of the work is done in winter_s1s1,
the loop is over.

Breakpoint on exp

this is the same as putting a breakpoint inside Base.exp

run_interpretted(winter, x)

  • 1st Run: 30.114210 seconds (278.19 M allocations: 7.741 GiB, 2.94% gc time)
  • 2nd Run: 26.520003 seconds (272.12 M allocations: 7.452 GiB, 1.46% gc time)
  • 3nd Run: 26.604521 seconds (272.12 M allocations: 7.452 GiB, 1.57% gc time)

This doesn't terminate in 10 minutes for the mixed mode.
This is where we hit the breakpoint every round of the loop.
We have to keep switching into interpretted mode,
which I guess has some overhead to starting.

I don't think it is a particularly realistic case,
since this is hitting that breakpoint 500_000 times.
Where as normally a programmer would hit it once or twice,
then disable it / stop debugging.

But it does highlight that we need to make sure that when a breakpoint is disabled,
the overdubs to switch it into interpretted mode are also removed.

@oxinabox
Copy link
Contributor Author

oxinabox commented Oct 27, 2019

The is a case where the breakppoint on Exp is a problem:
if it is a conditional breakpoint.
Since mixed mode switches to interpretted regardless of if the breakpoint is conditional or not,
that case can be hit.
There are work arounds, like the setting a (conditional or otherise) breakpoint on the method that calls the conditional breakpoint, if it is in a tight inner loop, so that it is already in interpretted mode.
Also we can probably make interpretted mode much cheaper that start than it is right now.

@oxinabox
Copy link
Contributor Author

You also wouldn't need to prefix your function invocation with @run, which is also something people have complained about.

Getting a bit ahead of self here, but this kind of mixed mode would also solve that in Juno.
Since there basically no overhead when no breakpoints are set,
one can just have the Juno REPL always run as a debugger in mixed mode.
A bit of a scary idea. But possible.

@oxinabox
Copy link
Contributor Author

@timholy @KristofferC whats next for this idea?
If I put in the work to get it into JuliaInterpretter, would that be a thing we would like?
Shall we wait and talk about it at JuliaCon?

@timholy
Copy link
Member

timholy commented Jan 30, 2020

I'm a bit swamped now, maybe talk at JuliaCon? Or in a month and a half, I may be able to carve out some time around the mid-March timeframe.

@oxinabox
Copy link
Contributor Author

I'm happy to hold til JuliaCon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants