Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unrooted" backtraces #19

Closed
timholy opened this issue Jun 19, 2013 · 4 comments
Closed

"Unrooted" backtraces #19

timholy opened this issue Jun 19, 2013 · 4 comments

Comments

@timholy
Copy link
Owner

timholy commented Jun 19, 2013

I've seen something odd with the sampling profiler for some time, and finally got around to doing a little debugging. At least when run through the profiler, rec_backtrace sometimes return an "unrooted" backtrace. Here's a simple example file that one can use to illustrate this:

# A helpful utility for debugging
lookup(ip) = ccall(:jl_lookup_code_address, Any, (Ptr{Void}, Bool), ip, true)

# Now we get to code we'll be profiling
# Here's a much faster variant of iceil for when you know everything is finite (would be even faster inlined...)
function unsafe_iceil{T<:FloatingPoint}(x::T)
    ix = itrunc(x)
    fx = convert(T,ix)
    ix + (x>fx)
end

function doiceil(x::Vector)
    sc = 0
    for i = 1:length(x)
        sc += unsafe_iceil(x[i])
    end
end

If you're digging into this, one can optionally modify profile.c to print a little information each time a backtrace is captured:

        int n = rec_backtrace(bt_data_prof+bt_size_cur, bt_size_max-bt_size_cur-1);
        printf("Size is %d\n", n);
        printf("3rd is %lx\n", bt_data_prof[bt_size_cur+2]);
//         if (n == 3)
            printf("1st is %lx, and second is %lx\n", bt_data_prof[bt_size_cur], bt_data_prof[bt_size_cur+1]);
        bt_size_cur += n;

However, this isn't required if you just want to run @sprofile normally.

Now if we say x = linspace(0,10,10^8); and then profile doiceil(x), we find some curious observations:

  • Some backtraces are of length 19, and the rest are of length 3. One can deduce this via analyzing the data from sprofile_get() or via the printf statements in the profile.c patch above.
  • In all cases, the first two instruction pointers (ip) correspond (via lookup above) to (:???,:/home/tim/src/julia-modules/Profile.jl/deps/libprofile.so,0) and (:???,:/lib/x86_64-linux-gnu/libc.so.6,0). The also have the same pointer address each time. These are the ips generated from the signal (so this makes perfect sense).
  • For backtraces of length 3, one often finds the third ip corresponds to (:unsafe_iceil,:/tmp/unrooted.jl,5). That itself is very sensible. What's weird is that it claims not to be being called by doiceil (it is "unrooted")---it's as if unsafe_iceil is sitting at the head of the stack.
  • For backtraces of length 19, sometimes the third ip corresponds to something perfectly-sensible like (:unsafe_iceil,:/tmp/unrooted.jl,5). In such cases the pointer is in the same general memory region as for backtraces of length 3 that end up with the same lookup result.
  • For other backtraces of length 19, sometimes one gets (:???,:???,0) for the third ip. In such cases the pointer is relatively far away from locations that are typical of the sensible ones (examples: 0x7f70a7487a20 for the nonsensical one, whereas unsafe_iceil had one at 0x7f70a49f24c0 and doiceil had one at 0x7f70a49f2470). "Good" pointers have many different values, whereas it seems that any time the third pointer was not in the 0x7f70a49f region, it was always at this same 0x7f70a7487a20. However, the ips after this weird third one seem fine.

The full sprofile_tree report looks like this:

46  /tmp/unrooted.jl; unsafe_iceil; line: 4
12  /tmp/unrooted.jl; unsafe_iceil; line: 5
89  /tmp/unrooted.jl; unsafe_iceil; line: 6
246 julia; ???; line: 0
 246 /lib/x86_64-linux-gnu/libc.so.6; __libc_start_main; line: 237
  246 julia; ???; line: 0
   246 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; julia_trampoline; line: 131
    246 julia; ???; line: 0
     246 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
      246 ???; ???; line: 0
       246 client.jl; _start; line: 341
        246 client.jl; run_repl; line: 143
         246 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
          246 ???; ???; line: 0
           246 client.jl; eval_user_input; line: 68
            246 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
             246 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
              246 /home/tim/src/julia-modules/Profile.jl/src/sprofile.jl; anonymous; line: 315
               246 /home/tim/src/julia/usr/bin/../lib/libjulia-release.so; ???; line: 0
                34  /tmp/unrooted.jl; doiceil; line: 11
                138 /tmp/unrooted.jl; doiceil; line: 12
                42  /tmp/unrooted.jl; unsafe_iceil; line: 4
                32  ???; ???; line: 0

Assuming we ignore the pointer value and just consider its lookup, there are 7 types of backtraces represented here:

  • The first three lines are "unrooted" (these are the ones I'm most concerned about)
  • There are 4 types of length-19 backtraces. The ones that terminate in /tmp/unrooted.jl are reasonably normal, and how I'd like all of them to behave. The one oddity is that the 42 samples ending in unsafe_iceil seem as if they should be of length 20, not 19, and descend from doiceil. However, I've checked and there really are only 2 distinct lengths to the backtraces. (Or, perhaps this represents inlining, and the backtrace behavior has changed relative to earlier versions of julia.)
  • The ??? one, on the last line, are the length-19 backtraces with weird 3rd pointer.

At least for this example, the proportion of unrooted backtraces is disturbingly large.

@timholy
Copy link
Owner Author

timholy commented Jun 19, 2013

I have also verified (by jl_printf in rec_backtrace) that there's nothing weird about the maxsize that profile.c feeds to rec_backtrace. In other words, this very likely is a libunwind bug. Alternatively, might there be anything weird about Julia's stack frames?

@timholy
Copy link
Owner Author

timholy commented Jun 19, 2013

CC @vtjnash, @ViralBShah, @JeffBezanson, @StefanKarpinski. This issue is nowhere near as serious as #14 (which is OSX-specific) and #16 (which Jameson seems to have fixed, once it gets applied to master), but it's worth knowing about in terms of planning for SProfile's eventual inclusion into base Julia.

I've verified that this issue is unchanged by changing our task.c:rec_backtrace from

 while (unw_step(&cursor) && n < maxsize) {

to

 while (unw_step(&cursor)>0 && n < maxsize) {

(although the latter seems technically more correct, so perhaps we should anyway).

I've glanced at the code for unw_step, enough to know that it is non-trivial (more than I'm willing to tackle today, that's for sure).

@blakejohnson
Copy link
Contributor

In trying to fix #14, I have made exactly the change Tim suggests to rec_backtrace. Not that it fixed the problem, but it is more correct.

@timholy
Copy link
Owner Author

timholy commented Jun 19, 2013

I have a hunch now that this isn't confined just to Profile. Occasionally I've seen BoundsError() fail to give a complete backtrace: it only shows ERROR: BoundsError() without any information about the line that caused the error (or anything that in turn called that line). This seems identical to these unrooted backtraces. My half-hearted attempts to come up with a simple test case have not yet yielded a positive result, but when it comes up again I'll be more energetic.

Has anyone else seen this phenomenon?

This was referenced Jun 20, 2013
@timholy timholy closed this as completed Nov 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants