Impute srcrefs for subexpressions #154

krlmlr · 2016-03-09T15:33:06Z

This allows collecting coverage for non-braced subexpressions in if, for, and while clauses. Works by imputing missing srcrefs on the fly (in trace_calls()) by analyzing getParseData() output.

Caveats:

This doesn't work for me in package_coverage(), I don't know why. Perhaps package_coverage() somehow needs to know all srcrefs beforehand?
Probably broken for non-ASCII characters.
Percentage changes in one of the tests.

Fixes #39.

Test file:

f <- function() {
  if (FALSE)
    FALSE

  for (i in character())
    FALSE

  while (FALSE)
    FALSE

  repeat
    break
}

cv <- covr::function_coverage(f, f())
covr::shine(cv)

- percentage changes in one of the tests

jimhester · 2016-03-09T16:04:30Z

This is awesome thank you for working on it, I had a couple aborted attempts and this has been a nagging problem I have been wanting to fix.

There shouldn't be any difference in srcrefs for package_coverage().

However package_coverage() does use a subprocess, so you need to actually install the package with your changes before they will be used.

krlmlr · 2016-03-09T22:17:03Z

package_coverage(): It seems that the package loading mechanism does something "evil": It concatenates all R files to one, and adds #line directives. This breaks getParseData() on which my code relies. devtools::load_all() loads the code from the original files; would you mind using that instead of loadNamespace?

krlmlr · 2016-03-10T11:11:46Z

Using load_all() breaks the S4 tests. Bummer...

jimhester · 2016-03-10T12:58:24Z

I originally used load_all(), but as you saw S4 doesn't work correctly with it. I ran into the line directive issue for getting the source of each file as well. Here is the function which actually installs the code when you run R CMD INSTALL (https://github.com/wch/r-source/blob/b156e3a711967f58131e23c1b1dc1ea90e2f0c43/src/library/tools/R/admin.R#L205-L335).

To get around this I think we could just re-parse the text in the parent source reference to get the token data. Something like

txt <- as.character(parent_ref)
sf <- srcfile(txt)
parse(text = txt, srcfile = sf)
pd <- getParseData(sf)

Then you need to fix the line and column numbers from the tokens with those from where the if block starts.

krlmlr · 2016-03-10T13:28:43Z

An alternative would be to fix getParseData() in this case. The "srcfile" attribute is an environment which contains an "original" object; I think the parse data belongs there, but is instead attached to the last file in the package.

Test package: https://github.com/krlmlr/covr.dummy
Example output: http://rpubs.com/krlmlr/getParseData

Just posted to r-devel.

krlmlr · 2016-03-10T13:30:31Z

Until this is fixed in R, I suggest we tweak the package loading process: Create a "last" file ourselves with an otherwise useless function, grab the parseData from there and put it where it belongs.

jimhester · 2016-03-10T13:58:02Z

From your report it looks like the last srcref will always have the parse data attached. Could we just find that srcref and copy the parse data to the other srcrefs?

Figuring out the last file is slightly complicated due to Collate directives, locale issues etc.

krlmlr · 2016-03-10T14:05:09Z

What if there are no objects in the last file? Think good ol' zzz.R.

This reverts commit 4982d94.

also be aware of line offsets due to #line directives

krlmlr · 2016-03-10T15:40:20Z

jimhester · 2016-03-10T15:53:49Z

Ok so we can tokens of the entire package including line directives from any srcref in the package with the following.

get_tokens <- function(srcref) {
  getParseData(attr(parse(text = attr(getSrcref(srcref), "srcfile")$original$lines), "srcfile"))
}

Then you would need to fix the line numbers based on the line directives.

This works because the object returned by parse returns a srcfile with the parse data attached.

krlmlr · 2016-03-10T16:11:25Z

Much easier, don't need to reparse at all. Working on a custom version of getParseData() to make it a tad faster. Functionality is in place already, tests pass, only lintr keeps complaining.

- avoids sorting - could use data.table for lookup performance here

krlmlr · 2016-03-10T16:53:31Z

Tests pass (AppVeyor requires an appveyor.yml file; if you delete the project there, it shouldn't show up here anymore). Character encoding still might be an issue (if we're ever looking at exact column position of the imputed srcrefs).

There will be warnings if parse data cannot be repaired.

seems to happen (at least) with the :: operator

jimhester · 2016-03-10T17:26:34Z

R/parse_data.R

+        make_srcref(5),
+        make_srcref(6, 7)
+      )
+      src_ref[seq_along(x)]


We need this to handle if's without else's right? Might be worth adding a comment so it is clear why the if case is different than for and while.

jimhester · 2016-03-10T17:31:09Z

R/parse_data.R

+    get_parse_data(x$original)
+  else if (exists("covr_parse_data", x))
+    x$covr_parse_data
+  else if (!is.null(data <- x[["parseData"]])) {


Could you put braces around the bodies of these conditionals, I prefer to be explicit to avoid issues in the future when statements are added and someone forgets to add braces.

Like this:

if (inherits(x, "srcref")) { get_parse_data(attr(x, "srcfile")) } else if (exists("original", x)) {

?

krlmlr · 2016-03-10T18:05:18Z

Done. Should we wait for community feedback before merging? The changes here are rather invasive.

jimhester · 2016-03-10T18:18:19Z

Well the current behavior is clearly wrong, so if we wait it shouldn't need to be long.

utils::getParseData has a longstanding bug: for an installed package, parse data is available only for the last file [1]. To work around that, the get_tokens helper first calls getParseData and then falls back to custom logic that extracts the concatenated source lines, splits them on #line directives, and calls getParseData on each file's lines. The getParseData bug was fixed in R 4.4.0 (r84538). Unfortunately that change causes at least two issues (for some subset of packages): a substantial performance regression [2] and an error when applying exclusions [3]. Under R 4.4, getParseData always returns non-NULL as a result of that change when calculating package coverage (in other words, the get_parse_data fallback is _not_ triggered). The slowdown is partially due to the parse data no longer being cached across get_tokens calls. Another relevant aspect, for both the slowdown and the error applying exclusions, is likely that the new getParseData returns data for the entire package rather than the per-file parse data the downstream covr code expects. One solution would be to adapt covr's caching and handling of the getParseData when running under R 4.4.0 or later. Instead go with a simpler and more minimal fix. Reorder the calls so that the get_parse_data call, which we know has been the primary code path for package coverage before R 4.4.0, is the first call tried. Leave getParseData as the fallback to handle the non-package coverage cases. [1] r-lib#154 https://bugs.r-project.org/show_bug.cgi?id=16756 [2] As an extreme case, calling package_coverage on R.utils goes from under 15 minutes to over 6 hours. [3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of packages that hit into this error. Closes r-lib#576 Closes r-lib#579 Re: r-lib#567

* split_on_line_directives: guard against input without a directive get_parse_data extracts lines from the input srcfile object and feeds them to split_on_line_directives, which expects the lines to be a concatenation of all the package R files, separated by #line directives. With how get_parse_data is currently called, that expectation is met. get_parse_data is called only if utils::getParseData returns NULL, and getParseData doesn't return NULL for any of the cases where the input does _not_ have line directives (i.e. entry points other than package_coverage). An upcoming commit is going to move the get_parse_data call in front of the getParseData call, so update split_on_line_directives to detect the "no directives" case. Without this guard, the mapply call in split_on_line_directives would error under an R version before 4.2; with R 4.2 or later, split_on_line_directives returns empty. * split_on_line_directives: fix handling of single-file package case split_on_line_directives breaks the input at #line directives and returns a named list of lines for each file. For a package with a single file under R/, there is one directive. The bounds calculation is still correct for that case. However, the return value is incorrectly a matrix rather than a list because the mapply call simplifies the result. At this point, this bug is mostly [*] unexposed because this code path is only triggered if utils::getParseData returns NULL, and it should always return a non-NULL result for the single-file package case. The next commit will reorder things, exposing the bug. Tell mapply to not simplify the result. [*] The simplification to a matrix could also happen for multi-file packages in the unlikely event that all files have the same number of lines. * parse_data: promote custom parse logic for R 4.4 compatibility utils::getParseData has a longstanding bug: for an installed package, parse data is available only for the last file [1]. To work around that, the get_tokens helper first calls getParseData and then falls back to custom logic that extracts the concatenated source lines, splits them on #line directives, and calls getParseData on each file's lines. The getParseData bug was fixed in R 4.4.0 (r84538). Unfortunately that change causes at least two issues (for some subset of packages): a substantial performance regression [2] and an error when applying exclusions [3]. Under R 4.4, getParseData always returns non-NULL as a result of that change when calculating package coverage (in other words, the get_parse_data fallback is _not_ triggered). The slowdown is partially due to the parse data no longer being cached across get_tokens calls. Another relevant aspect, for both the slowdown and the error applying exclusions, is likely that the new getParseData returns data for the entire package rather than the per-file parse data the downstream covr code expects. One solution would be to adapt covr's caching and handling of the getParseData when running under R 4.4.0 or later. Instead go with a simpler and more minimal fix. Reorder the calls so that the get_parse_data call, which we know has been the primary code path for package coverage before R 4.4.0, is the first call tried. Leave getParseData as the fallback to handle the non-package coverage cases. [1] #154 https://bugs.r-project.org/show_bug.cgi?id=16756 [2] As an extreme case, calling package_coverage on R.utils goes from under 15 minutes to over 6 hours. [3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of packages that hit into this error. Closes #576 Closes #579 Re: #567

Kirill Müller added 2 commits March 9, 2016 16:27

first stab at imputing srcrefs for subexpressions

72ad35c

- percentage changes in one of the tests

don't include text

70823d1

use srcref()

f6d9d1c

Kirill Müller added 3 commits March 10, 2016 12:11

use devtools::load_all() instead of loadNamespace()

4982d94

fix coverage results

48fa2c6

lintr

12cde4e

Kirill Müller added 3 commits March 10, 2016 15:06

Revert "use devtools::load_all() instead of loadNamespace()"

954437c

This reverts commit 4982d94.

repair parse data

3a8b454

make use of get_parse_data()

1eab556

also be aware of line offsets due to #line directives

Kirill Müller added 2 commits March 10, 2016 16:41

move code

b79cd73

fix tests

bf38a63

Kirill Müller added 6 commits March 10, 2016 17:14

how it works

4e15c61

lint

8e75384

test functionality

6fb1120

faster version of getParseData()

8c694ea

- avoids sorting - could use data.table for lookup performance here

corrected values

e1ce0c9

lintr

c2984f9

don't treat repeat, usually needs compound statement anyway

e74d24c

safeguard

da897b7

seems to happen (at least) with the :: operator

krlmlr changed the title ~~WIP: Impute srcrefs for subexpressions~~ Impute srcrefs for subexpressions Mar 10, 2016

NEWS

8ce080c

krlmlr mentioned this pull request Mar 10, 2016

Finer coverage analysis tidyverse/tibble#37

Merged

jimhester reviewed Mar 10, 2016
View reviewed changes

comment

687d7a5

jimhester reviewed Mar 10, 2016
View reviewed changes

add braces

6442dd8

krlmlr mentioned this pull request Mar 11, 2016

NULL issues #156

Merged

jimhester merged commit 6442dd8 into r-lib:master Mar 17, 2016

krlmlr deleted the feature/39-impute-srcref branch March 17, 2016 13:12

MichaelChirico mentioned this pull request May 18, 2020

[BUGZILLA #16756] Parse data not available in package MichaelChirico/r-bugs#6130

Open

kyleam mentioned this pull request Oct 31, 2024

Applying exclusions raise error due to missing srcref lines #579

Closed

kyleam mentioned this pull request Nov 15, 2024

parse_data: Fix compatibility with R 4.4 #588

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impute srcrefs for subexpressions #154

Impute srcrefs for subexpressions #154

krlmlr commented Mar 9, 2016

jimhester commented Mar 9, 2016

krlmlr commented Mar 9, 2016

krlmlr commented Mar 10, 2016

jimhester commented Mar 10, 2016

krlmlr commented Mar 10, 2016

krlmlr commented Mar 10, 2016

jimhester commented Mar 10, 2016

krlmlr commented Mar 10, 2016

krlmlr commented Mar 10, 2016

jimhester commented Mar 10, 2016

krlmlr commented Mar 10, 2016

krlmlr commented Mar 10, 2016

jimhester Mar 10, 2016

jimhester Mar 10, 2016

krlmlr Mar 10, 2016

jimhester Mar 10, 2016

krlmlr commented Mar 10, 2016

jimhester commented Mar 10, 2016

Impute srcrefs for subexpressions #154

Impute srcrefs for subexpressions #154

Conversation

krlmlr commented Mar 9, 2016

jimhester commented Mar 9, 2016

krlmlr commented Mar 9, 2016

krlmlr commented Mar 10, 2016

jimhester commented Mar 10, 2016

krlmlr commented Mar 10, 2016

krlmlr commented Mar 10, 2016

jimhester commented Mar 10, 2016

krlmlr commented Mar 10, 2016

krlmlr commented Mar 10, 2016

jimhester commented Mar 10, 2016

krlmlr commented Mar 10, 2016

krlmlr commented Mar 10, 2016

jimhester Mar 10, 2016

Choose a reason for hiding this comment

jimhester Mar 10, 2016

Choose a reason for hiding this comment

krlmlr Mar 10, 2016

Choose a reason for hiding this comment

jimhester Mar 10, 2016

Choose a reason for hiding this comment

krlmlr commented Mar 10, 2016

jimhester commented Mar 10, 2016