-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UBSAN issue with sanitizers #229
Comments
The error reported is: But NimArrType is a base class to NimArr<2, double>, and the call is to a virtual member function. So it is complaining about a call to a base class virtual member function via a base class pointer that actually points to a derived class, as it should. Should be ok, and the code has worked for a long time and is used a lot in the system. It looks like a bit of work to set up Dirk’s docker image for re-creating the AddressSanitizer result. If you get it working I could try to exploring more. Or, can we just remove the @example (or set it to not run) and move on? I think this is a non-issue. |
I don't know how CRAN will react if we simply remove the @example. And I'm also not sure if they will balk at the warning related to issue #228, which also shows up in the output from our examples when run under R-devel. I suppose I could email Kurt back and ask for the configuration they are using to try to replicate this. More info on the address sanitizer is here I tried putting this in .R/Makevars: But now I get this error with the nfVar example instead:
More generally when I add the "-fsanitize=undefined" (this seems to correspond to the UBSAN checking) I also get an error in loading our .so's (the nfVar example or more generally in trying to compileNimble or even load the nimble package) with a similar error to the above: I can't figure out Dirk's docker container. It seems to be advertized as using clang, but when I run it, it seems to use gcc. Plus it doesn't seem to be doing the address/ubsan checking - NIMBLE installs fine and the nfVar example is fine. |
The undefined symbols might be because R has to be built with certain -f flags related to the address/ubsan stuff. I'll monkey around with that a bit more. |
Hey @paciorek , just to double check, you are using (apologies for the out-of-the-blue comment, I don't normally pay attention to any nimble issues but I do work with Dirk on rocker images; though admittedly ubsan containers are entirely his baby) |
My experience thus far with CRAN is that they will not care: there are I've had a surprisingly high amount of support from them when there is a
Under normal circumstances, they certainly would. On Tue, Sep 13, 2016 at 8:30 AM, Christopher Paciorek <
|
The undefined symbol looks like something from the addressSanitizer that is getting lost in nimble's on-the-fly compilation. I am trying to create, outside of R, a boiled down case of the class hierarchy and compilation scheme involved and see if I can re-create it that way. @paciorek if you want to keep pushing on re-creating CRAN's result, I suspect we need those flags in src/Makevars and inst/make/Makevars.in, so they will be picked up in both package building and on-the-fly compilation. Thanks @pistacliffcho. I think it is worth giving this some further attention but then planning to just remove it from the example in the Rd. We can include a note to CRAN. |
So here is what I suspect is going on: It may be an issue created by nimble.so and libnimble.a having some of the same code compiled into them, which is for simplicity in our build process but does create some duplication in binary libraries. I think a pointer to an object of a derived class (NimArr<2, double>) created in code linked against libnimble.a is then being passed as a void* (since R external pointers end up cast to void*) into a function in nimble.so (not libnimble.a) that casts it back to something meaningful and uses it. But it casts it to a base class pointer, and everything should be completely identical to what is in libnimble.a, but it may not be recognized by the address sanitizer as part of the same class hierarchy since they are compiled twice and appear in binary libraries redundantly. We have never had a problem from this. This guess at what's happening could be wrong. If it's right, it involves some of the same C++ code organization issues around the CRAN system that we've had challenges with before. I'll think about if there are easy workarounds, but just dropping the example still seems like a good path. |
Thanks, @cboettig ! I hadn't realized RD was there and was very confused about the container being for clang+Rdevel but not seeing R-devel with clang there. I'm still not sure how to run check.r such that it uses RD rather than R. Can you clarify? |
@pistacliffcho when you say run through valgrind, I'm not sure how that is related to the memtest UBSAN thing. Do you know how they do the valgrind checking? |
I wasn't specifically referring to memtest UBSAN, but just in general the My understanding of they do is just something like this: R -d valgrind --vanilla < mypkg-Ex.R (example lifted from Writing R Extensions 4.3.2). I had a conversation with if(c_vector[i] > 0 && i < c_vector.size() ) {...} I'm not a valgrind expert at all, but I have heard that it runs much On Tue, Sep 13, 2016 at 11:42 AM, Christopher Paciorek <
|
I haven't proven if it's the double compilation, but it's still my best guess, and I have confirmed that a derived object created from one compilation is being handled in code from the other, which is as expected. If we want to address this underlying issue, I think it would take some careful following of threads about which functions are called in which contexts and could be separated into only libnimble and kept out of nimble.so and called via R's dll information so that when multiple projects are compiled we only use the utility functions built (redundantly) into that compilation unit. This has been on the to-do list for a while and would be a good idea, but since things haven't been breaking, it hasn't been the top priority. |
I asked Kurt H for more info on reproducing how CRAN is generating the Let's see what I hear and then make a decision about simply removing ?nfVar. On Tue, Sep 13, 2016 at 1:40 PM, perrydv [email protected] wrote:
|
I think I've fixed the situation for the particular cpp function in question. Questions are: Am I right about the underlying problem, since we haven't been able to reproduce it in the first place? And how much pre-emptive modifications of other functions in similar compilation situations is worth it? I'm going to start looking at some other cases. |
Ok, hopefully we get more info from Kurt and then we can reproduce the original error and see if this fixes it. If we don't get more info or still can't reproduce the original error, then I think we can say we've done our due diligence with the -fsanitize=undefined and just resubmit. |
here's the configuration info from Kurt, so it's basically what I've been trying but I was using an older clang. clang-3.9 is not available via an ubuntu package so I may try to test with clang-3.8 instead. Perry, what branch has your fix to the cpp function used in the nfVar example? ====== from Kurt H. ======================== On Debian testing I configure with CC="clang-3.9 -fsanitize=undefined,address -fno-sanitize=float-divide-by-zero,vptr -fno-omit-frame-pointer" and similarly for CXX. Not sure about clang-3.5, in general you'd want The .Rout file at RcppUtils.cpp:1044:20: runtime error: member call on address 0x60d000002018 which does not point to an object of type 'NimArrType' |
compilation_overlaps At the moment it has some other changes and it outputs some diagnostic messages. I've made more progress isolating use of some functions just to nimble.so or libnimble. Partly this is pre-emptive against further addressSanitizer risks, and partly it is because it is a good idea anyway. |
For future ref, here's what seems to work to build R-devel with *SAN support: ./configure --prefix=/usr/local/R-devel_2016-09-12-clang38 'CC=clang-3.8 -fsanitize=undefined,address -fno-sanitize=float-divide-by-zero,vptr -fno-omit-frame-pointer' 'MAIN_LDFLAGS=-fsanitize=undefined,address -fno-sanitize=float-divide-by-zero,vptr -fno-omit-frame-pointer' 'CFLAGS=-pipe -std=gnu99 -Wall -pedantic -g' 'F77=gfortran' 'FFLAGS=-pipe -Wall -pedantic -g' 'CXX=clang++-3.8 -fsanitize=undefined,address -fno-sanitize=float-divide-by-zero,vptr -fno-omit-frame-pointer' 'CXXFLAGS=-pipe -Wall -pedantic -g' 'CXX1X=clang++-3.8 -fsanitize=undefined,address -fno-sanitize=float-divide-by-zero,vptr -fno-omit-frame-pointer' 'FC=gfortran' 'FCFLAGS=-pipe -Wall -pedantic -g' --with-cairo --with-jpeglib --with-readline --with-tcltk --with-lapack --enable-R-profiling --enable-R-shlib Also note to self - need to turn vm.overcommit_memory to 2 on my Linux box or building R will fail because conftest will not run as ASAN tries to allocate a ridiculous amount of memory. |
I'm getting this when I run R CMD check (without the *SAN business) on nimble_0.6-1.tar.gz from compilation_overlaps.
Not sure if compilation_overlaps is obsolete - I was hoping to test just the nfVar fix... |
Ok, well, I have R-devel built with clang-3.8 with ASAN and UBSAN checking working in Ubuntu but it's not reproducing the clang-UBSAN error for nfVar reported by CRAN. Frustrating. Perhaps because CRAN may be using clang-3.9 but that feels a bit unlikely. |
R-devel CMD check on compilation_overlaps with ASAN/UBSAN checking is giving the following. I'm also filing a separate note on R-devel CMD check on compilation_overlaps3. makeNimbleFxnCppCopyTypes: Error while checking: C stack usage 8011696 ==13564==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1368 byte(s) in 3 object(s) allocated from: |
here's what I get running R-devel with clang and *SAN checking on compilation_overlaps3:
==========/var/tmp/paciorek/nimble-dev/nimble/packages/nimble.Rcheck/00install.out===========
|
No, compiler_overlaps3 isn't supposed to work yet. I'm not sure what the status of compiler_overlaps was for the version you had. I suspect all the warnings of memory leaks from the sanitizer checks are for similar reasons that I suspect (but we can't confirm) for the first address sanitizer warning we saw: In some cases we have relied on duplicate compilations of exactly the same code and counted on the ability to instantiate an object from one compilation unit and then pass a pointer to it to a function in the other compilation unit and have that function use the pointed-to class correctly because both were compiled identically. It's annoying to do it this way but we've never had a problem. It was essentially a workaround to the difficulty of building and distributing a so/dll via CRAN that our on-the-fly compilation could link to. The message about the C stack usage being too close to the limit is from a pure R function that is not doing anything with our compiled code, so on first glance it would seem to involve R itself and not our compiled code. Does CRAN actually run the LeakSanitizer? That output does not easily reveal what is at issue, and I have no way to know if it can see R and objects from our compiled code interacting, such as if R has an external pointer to an object from our compiled code. |
Yeah, not clear on what CRAN runs (Kurt indicated both ASAN and UBSAN but the naming of the testing that we failed was clang-UBSAN). And if they do run LeakSanitizer, which seems to be the case, why didn't our submission fail based on that... and why can't we reproduce the nfVar error... arggh. |
I did things more carefully and verified that with UBSAN (i.e., no -fsanitize=address) then R CMD check with R-devel and clang-3.9 does not give any memory leak issues, so that suggests that is what CRAN is running. That said, I can't get the nfVar error to show up. I've just been running the nfVar example directly. I wonder if somehow running all the examples together would do it, but I presume that happens in R CMD check, and we're not seeing an error report from there. |
Could it have to do with the environment in which the examples are actually run during building of the Rd pages?
|
my running of R CMD check creates nimble.Rcheck/nimble-Ex.Rout that has the same format as we see in the clang-UBSAN file posted by CRAN, so that suggests it is being created as part of R CMD check. |
branch compilation_overlaps3 is now ready to be tested. specifically @paciorek please check it with your UBSAN setup. all other testing of people's code is also welcome on this branch. the changes represent an overdue cleanup and reorganization of which compilation unit has which built-in interface function and how the .Call()s in the R code look up the compiled functions they need. I've run through all our testing, so the remaining risk would be any cases not represented in the test suite. A new module for the test suite could systematically check every feature of compilation and ensuing build-in interface calls, but that will be a separate issue. By every feature of compilation, I don't mean every keyword but rather every use of every interface functions via the situations in which they are needed. |
I'm seeing the following when running R CMD check using ASAN+UBSAN. Note that all the address/memory stuff showing up when checking examples is coming from teh ASAN checking. www.stat.berkeley.edu/~paciorek/transfer/00check.log Also the following was printed after the output of R CMD check:
|
I can't do anything with this output unless you can isolate the example or call each one comes from as specifically as possible. |
Ok, now I see the output at the link you gave. That is a little more useful. Let's talk tomorrow. |
Ok, I think this long saga is nearly dealt with. On branch compilation_overlaps4:
|
What's the issue with the roxygen for |
I'm closing this now, as we think the issue is dealt with. It's difficult to tell since we couldn't reproduce it. (@danielturek, I think the runMCMC roxygen issue may have been that Rd was out of date with roxygen and something wasn't happy, but it seems to resolve itself when Chris rebuilt the Rd files) |
Good to know. If that was at some point assigned to me, then I had lost track of it anyway. |
CRAN folks noted that we have another problem that we hadn't noticed on our page of checking results for nimble:
https://cran.r-project.org/web/checks/check_results_nimble.html
Memtest notes: clang-UBSAN
See: https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/nimble-Ex.Rout
search for "runtime error" and you'll see an issue with compilation of our nfVar() example.
This post from Dirk seems useful and I will try to run his test on 0.6-1 in his docker container and see what I discover.
https://www.r-bloggers.com/running-ubsan-tests-via-clang-with-rocker/
Note that issue #228 is also revealed in the checking of nimble's examples... Not sure if CRAN will insist we fix that.
The text was updated successfully, but these errors were encountered: