Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More instrumentation (memory allocation, also fixes #7259) #7464

Merged
merged 1 commit into from
Jul 9, 2014

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Jun 30, 2014

This adds the ability to track the amount of memory allocated by each line of code. It mimics the --code-coverage functionality, writing a filename.jl.mlc (mlc = malloc, perhaps there's a better extension?) listing the number of bytes allocated by each line.

It's already quite useful, but not perfect, and I confess to not really understanding what's wrong. Here's a test script:

function alloc1()
    s = 0
    for x = 1:3
        s += x
    end
    A = zeros(4000, 7000)
    s2 = sum(A)
    nothing
end

function alloc2()
    s = 0
    for x = 1:3
        s += x/2  # deliberately type-unstable
    end
    s
end

function runner()
    alloc1()
    alloc2()
end

runner()
println("Something")
println("Clearing allocation data to eliminate memory allocated during JITting")
clear_malloc_data()
runner()

which you use like this

julia --track-allocation=user logallocation.jl

Here's the resulting .mlc file: (EDIT: the off-by-one bug has been fixed)

       - function alloc1()
        0     s = 0
        0     for x = 1:3
        0         s += x
        -     end
        0     A = zeros(4000, 7000)
224000048     s2 = sum(A)
        0     nothing
        - end
        - 
        - function alloc2()
        0     s = 0
        0     for x = 1:3
       64         s += x/2  # deliberately type-unstable
        -     end
       32     s
        - end
        - 
        - function runner()
      192     alloc1()
        0     alloc2()
        - end
        - 
        - runner()
        - println("Something")
        - println("Clearing allocation data to eliminate memory allocated during JITting")
        - clear_malloc_data()
        - runner()

The core algorithm is to look for a change in jl_gc_total_bytes since the last line.

Observations:

  • There seems to be an off-by-one bug even though I attempted to insert the memory-check instructions after each line had already been built.
  • Anything run from the global scope (for example, a test script in a package's tests/) shows allocation on the first line of the function.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 15c9ef3 on teh/malloclog into * on master*.

@IainNZ
Copy link
Member

IainNZ commented Jul 1, 2014

Wait --code-coverage works for Base?
Also, this is a really cool idea

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 09adfd5 on teh/malloclog into * on master*.

@jakebolewski
Copy link
Member

This is cool but could you turn off the coveralls reporting on every pull request? It seems like it is going to generate a lot of noise.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 1, 2014

Wait --code-coverage works for Base?

In this PR it does 😄. But there are still issues I'm working out.

This is cool but could you turn off the coveralls reporting on every pull request? It seems like it is going to generate a lot of noise.

I think coverage-testing is going to have to be a separate event. I'm pretty sure you're going to have to test with a single core (at least, until someone implements a way to transport coverage data from one process to another), and I kinda suspect we may even need to disable inlining. If so, it's going to be so freaking slow that we don't want to subject every PR to it.

So, my modifications to .travis.yml are just for my testing (that commit is titled "Dirty hack" for precisely that reason), and won't be part of the final PR. There are still some oddities I'm working out. I apologize in advance for any extra noise.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 2, 2014

I'm finding myself in a nasty situation where all tests pass on my laptop, they pass on a second machine in my lab (with a different distro and architecture), but I get this segfault when running on Travis. Frustratingly, this is the second time this has happened to me. I'm taking the mildly-heroic step of running the entire test suite using a single core under valgrind. That run will probably take something like 24 hours; in the meantime, does anyone have any ideas? This one is not some kind of file permission error, is it?

@carnaval
Copy link
Contributor

carnaval commented Jul 2, 2014

In my experience, a polite mail to support@travis will get you ssh credentials to an identical travis vm for 24h, this could help you debugging if the segfault is decently reproducible. Nice piece of work btw.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 2, 2014

Ooh, that's a great tip, many thanks. Since I'm about to head out of town for the 4th and will have only spotty internet access, I'll wait to use those precious 24hrs until I get back.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 8, 2014

Got a Makefile/installation problem. Looking for advice from folks like @staticfloat, @tkelman, @ViralBShah, or anyone else who knows more about these matters than me, esp. across platforms.

Bottom line: for code_coverage=all to work, it needs to know where to find base/. What would be the best way to communicate to runtime julia where base/ is? The options I see are built on the observation JULIA_HOME is known at runtime:

  • In JULIA_HOME put a symlink to base/.
  • Write a file called $JULIA_HOME/path_to_base that contains a string describing the full path to base/
  • Compile the value into the julia executable. (This option seems bad because what if someone moves things around after compilation?)

Any advice about which to choose? Or alternative suggestions? Presumably this should be implemented in the Makefile?

(Note: the location of base/, relative to the julia binary, can vary depending on whether we do a make install step. On Travis the binary gets installed in /tmp/julia/bin and the code for base is in /tmp/julia/share/julia/base. That's different from the case where you build & run from the cloned directory, where the binary is in $CLONED_DIR/usr/bin and base is in $CLONED_DIR/base.)

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 8, 2014

(Oh, and thanks to @carnaval about the suggestion to request temporary Travis access. Very helpful here and in #6877.)

@tkelman
Copy link
Contributor

tkelman commented Jul 8, 2014

It looks like the makefiles are setting up a set of symlinks (or NTFS junctions for anyone who builds from source on Windows) well before make install, so $CLONED_DIR/usr/share/julia/base is a symlink to $CLONED_DIR/base. I think it would be safe to assume base is always at $JULIA_HOME/../share/julia/base.

@ViralBShah
Copy link
Member

Given that we do have code in our runtime to locate where the .jl files in base are, we probably can use the same directory lookup logic to find the location of base.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 9, 2014

Ok, that's very helpful.

We have our first test coverage number: 81%. But that number is probably a significant overestimate, see #7541.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 9, 2014

Hmm, Coveralls prefers to display master, for which we got what was probably a partial result earlier. The actual number seems to be 91%.

@timholy timholy changed the title WIP: More instrumentation (memory allocation, also fixes #7259) More instrumentation (memory allocation, also fixes #7259) Jul 9, 2014
@timholy
Copy link
Sponsor Member Author

timholy commented Jul 9, 2014

Working, cleaned up, and ready to merge when the time is right.

@StefanKarpinski
Copy link
Sponsor Member

That's amazingly high. Like unbelievably high – even given that this is the percentage of coverage of functions that we actually call while testing. I think that a good way to break this down might be to combine this number with the percentage of functions that we actually test.

@StefanKarpinski
Copy link
Sponsor Member

This seems pretty low risk and like it should be merged as soon as is reasonable.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 9, 2014

This seems pretty low risk and like it should be merged as soon as is reasonable.

Anything that touches the compiler seems like it shouldn't be "low risk," but here I know what you mean.

That's amazingly high. Like unbelievably high

Indeed. I think the main value is for a human looking over the results manually; one can quickly find a bunch of functions that should probably get tests.

I think soon we should implement turning off inlining, but I'm inclined to merge as-is so we can get at least one night's run by PackageEval before fiddling with this more.

Besides, I think the most exciting part of this is the part on memory allocation, and that's not really influenced by all the difficult issues surrounding code-coverage.

timholy added a commit that referenced this pull request Jul 9, 2014
More instrumentation (memory allocation, also fixes #7259)
@timholy timholy merged commit a60de70 into master Jul 9, 2014
@IainNZ
Copy link
Member

IainNZ commented Jul 10, 2014

So @timholy we/I should probably add support to this for Coverage.jl, I'm thinking maybe in a way that sorts lines by memory allocation?

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 10, 2014

@IainNZ, I'll submit a PR.

One thing that never happened was bikeshedding on the name of the output files. Are people happy with the *.mlc extension? Would *.mem be better? Certainly not too late to change at this point.

@StefanKarpinski
Copy link
Sponsor Member

I like .mem.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 12, 2014

Changed to .mem.

@timholy timholy deleted the teh/malloclog branch July 24, 2014 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants