Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impute srcrefs for subexpressions #154

Merged
merged 22 commits into from
Mar 17, 2016
Merged

Impute srcrefs for subexpressions #154

merged 22 commits into from
Mar 17, 2016

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Mar 9, 2016

This allows collecting coverage for non-braced subexpressions in if, for, and while clauses. Works by imputing missing srcrefs on the fly (in trace_calls()) by analyzing getParseData() output.

Caveats:

  • This doesn't work for me in package_coverage(), I don't know why. Perhaps package_coverage() somehow needs to know all srcrefs beforehand?
  • Probably broken for non-ASCII characters.
  • Percentage changes in one of the tests.

Fixes #39.

Test file:

f <- function() {
  if (FALSE)
    FALSE

  for (i in character())
    FALSE

  while (FALSE)
    FALSE

  repeat
    break
}

cv <- covr::function_coverage(f, f())
covr::shine(cv)

screenshot from 2016-03-09 16-32-53

Kirill Müller added 2 commits March 9, 2016 16:27
@jimhester
Copy link
Member

This is awesome thank you for working on it, I had a couple aborted attempts and this has been a nagging problem I have been wanting to fix.

There shouldn't be any difference in srcrefs for package_coverage().

However package_coverage() does use a subprocess, so you need to actually install the package with your changes before they will be used.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 9, 2016

package_coverage(): It seems that the package loading mechanism does something "evil": It concatenates all R files to one, and adds #line directives. This breaks getParseData() on which my code relies. devtools::load_all() loads the code from the original files; would you mind using that instead of loadNamespace?

@krlmlr
Copy link
Member Author

krlmlr commented Mar 10, 2016

Using load_all() breaks the S4 tests. Bummer...

@jimhester
Copy link
Member

I originally used load_all(), but as you saw S4 doesn't work correctly with it. I ran into the line directive issue for getting the source of each file as well. Here is the function which actually installs the code when you run R CMD INSTALL (https://github.com/wch/r-source/blob/b156e3a711967f58131e23c1b1dc1ea90e2f0c43/src/library/tools/R/admin.R#L205-L335).

To get around this I think we could just re-parse the text in the parent source reference to get the token data. Something like

txt <- as.character(parent_ref)
sf <- srcfile(txt)
parse(text = txt, srcfile = sf)
pd <- getParseData(sf)

Then you need to fix the line and column numbers from the tokens with those from where the if block starts.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 10, 2016

An alternative would be to fix getParseData() in this case. The "srcfile" attribute is an environment which contains an "original" object; I think the parse data belongs there, but is instead attached to the last file in the package.

Test package: https://github.com/krlmlr/covr.dummy
Example output: http://rpubs.com/krlmlr/getParseData

Just posted to r-devel.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 10, 2016

Until this is fixed in R, I suggest we tweak the package loading process: Create a "last" file ourselves with an otherwise useless function, grab the parseData from there and put it where it belongs.

@jimhester
Copy link
Member

From your report it looks like the last srcref will always have the parse data attached. Could we just find that srcref and copy the parse data to the other srcrefs?

Figuring out the last file is slightly complicated due to Collate directives, locale issues etc.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 10, 2016

What if there are no objects in the last file? Think good ol' zzz.R.

Kirill Müller added 3 commits March 10, 2016 15:06
also be aware of line offsets due to #line directives
@krlmlr
Copy link
Member Author

krlmlr commented Mar 10, 2016

screenshot from 2016-03-10 16-38-48

Kirill Müller added 2 commits March 10, 2016 16:41
@jimhester
Copy link
Member

Ok so we can tokens of the entire package including line directives from any srcref in the package with the following.

get_tokens <- function(srcref) {
  getParseData(attr(parse(text = attr(getSrcref(srcref), "srcfile")$original$lines), "srcfile"))
}

Then you would need to fix the line numbers based on the line directives.

This works because the object returned by parse returns a srcfile with the parse data attached.

@krlmlr
Copy link
Member Author

krlmlr commented Mar 10, 2016

Much easier, don't need to reparse at all. Working on a custom version of getParseData() to make it a tad faster. Functionality is in place already, tests pass, only lintr keeps complaining.

Kirill Müller added 6 commits March 10, 2016 17:14
@krlmlr
Copy link
Member Author

krlmlr commented Mar 10, 2016

Tests pass (AppVeyor requires an appveyor.yml file; if you delete the project there, it shouldn't show up here anymore). Character encoding still might be an issue (if we're ever looking at exact column position of the imputed srcrefs).

There will be warnings if parse data cannot be repaired.

seems to happen (at least) with the :: operator
@krlmlr krlmlr changed the title WIP: Impute srcrefs for subexpressions Impute srcrefs for subexpressions Mar 10, 2016
make_srcref(5),
make_srcref(6, 7)
)
src_ref[seq_along(x)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this to handle if's without else's right? Might be worth adding a comment so it is clear why the if case is different than for and while.

get_parse_data(x$original)
else if (exists("covr_parse_data", x))
x$covr_parse_data
else if (!is.null(data <- x[["parseData"]])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put braces around the bodies of these conditionals, I prefer to be explicit to avoid issues in the future when statements are added and someone forgets to add braces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this:

  if (inherits(x, "srcref")) {
    get_parse_data(attr(x, "srcfile"))
  } else if (exists("original", x)) {

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@krlmlr
Copy link
Member Author

krlmlr commented Mar 10, 2016

Done. Should we wait for community feedback before merging? The changes here are rather invasive.

@jimhester
Copy link
Member

Well the current behavior is clearly wrong, so if we wait it shouldn't need to be long.

@krlmlr krlmlr mentioned this pull request Mar 11, 2016
@jimhester jimhester merged commit 6442dd8 into r-lib:master Mar 17, 2016
@krlmlr krlmlr deleted the feature/39-impute-srcref branch March 17, 2016 13:12
kyleam added a commit to kyleam/covr that referenced this pull request Nov 15, 2024
utils::getParseData has a longstanding bug: for an installed package,
parse data is available only for the last file [1].  To work around
that, the get_tokens helper first calls getParseData and then falls
back to custom logic that extracts the concatenated source lines,
splits them on #line directives, and calls getParseData on each file's
lines.

The getParseData bug was fixed in R 4.4.0 (r84538).  Unfortunately
that change causes at least two issues (for some subset of packages):
a substantial performance regression [2] and an error when applying
exclusions [3].

Under R 4.4, getParseData always returns non-NULL as a result of that
change when calculating package coverage (in other words, the
get_parse_data fallback is _not_ triggered).  The slowdown is
partially due to the parse data no longer being cached across
get_tokens calls.  Another relevant aspect, for both the slowdown and
the error applying exclusions, is likely that the new getParseData
returns data for the entire package rather than the per-file parse
data the downstream covr code expects.

One solution would be to adapt covr's caching and handling of the
getParseData when running under R 4.4.0 or later.  Instead go with a
simpler and more minimal fix.  Reorder the calls so that the
get_parse_data call, which we know has been the primary code path for
package coverage before R 4.4.0, is the first call tried.  Leave
getParseData as the fallback to handle the non-package coverage cases.

[1] r-lib#154
    https://bugs.r-project.org/show_bug.cgi?id=16756

[2] As an extreme case, calling package_coverage on R.utils goes from
    under 15 minutes to over 6 hours.

[3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of
    packages that hit into this error.

Closes r-lib#576
Closes r-lib#579
Re: r-lib#567
kyleam added a commit to kyleam/covr that referenced this pull request Nov 15, 2024
utils::getParseData has a longstanding bug: for an installed package,
parse data is available only for the last file [1].  To work around
that, the get_tokens helper first calls getParseData and then falls
back to custom logic that extracts the concatenated source lines,
splits them on #line directives, and calls getParseData on each file's
lines.

The getParseData bug was fixed in R 4.4.0 (r84538).  Unfortunately
that change causes at least two issues (for some subset of packages):
a substantial performance regression [2] and an error when applying
exclusions [3].

Under R 4.4, getParseData always returns non-NULL as a result of that
change when calculating package coverage (in other words, the
get_parse_data fallback is _not_ triggered).  The slowdown is
partially due to the parse data no longer being cached across
get_tokens calls.  Another relevant aspect, for both the slowdown and
the error applying exclusions, is likely that the new getParseData
returns data for the entire package rather than the per-file parse
data the downstream covr code expects.

One solution would be to adapt covr's caching and handling of the
getParseData when running under R 4.4.0 or later.  Instead go with a
simpler and more minimal fix.  Reorder the calls so that the
get_parse_data call, which we know has been the primary code path for
package coverage before R 4.4.0, is the first call tried.  Leave
getParseData as the fallback to handle the non-package coverage cases.

[1] r-lib#154
    https://bugs.r-project.org/show_bug.cgi?id=16756

[2] As an extreme case, calling package_coverage on R.utils goes from
    under 15 minutes to over 6 hours.

[3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of
    packages that hit into this error.

Closes r-lib#576
Closes r-lib#579
Re: r-lib#567
jimhester pushed a commit that referenced this pull request Nov 19, 2024
* split_on_line_directives: guard against input without a directive

get_parse_data extracts lines from the input srcfile object and feeds
them to split_on_line_directives, which expects the lines to be a
concatenation of all the package R files, separated by #line
directives.

With how get_parse_data is currently called, that expectation is met.
get_parse_data is called only if utils::getParseData returns NULL, and
getParseData doesn't return NULL for any of the cases where the input
does _not_ have line directives (i.e. entry points other than
package_coverage).

An upcoming commit is going to move the get_parse_data call in front
of the getParseData call, so update split_on_line_directives to detect
the "no directives" case.

Without this guard, the mapply call in split_on_line_directives would
error under an R version before 4.2; with R 4.2 or later,
split_on_line_directives returns empty.

* split_on_line_directives: fix handling of single-file package case

split_on_line_directives breaks the input at #line directives and
returns a named list of lines for each file.

For a package with a single file under R/, there is one directive.
The bounds calculation is still correct for that case.  However, the
return value is incorrectly a matrix rather than a list because the
mapply call simplifies the result.

At this point, this bug is mostly [*] unexposed because this code path
is only triggered if utils::getParseData returns NULL, and it should
always return a non-NULL result for the single-file package case.  The
next commit will reorder things, exposing the bug.

Tell mapply to not simplify the result.

[*] The simplification to a matrix could also happen for multi-file
    packages in the unlikely event that all files have the same number
    of lines.

* parse_data: promote custom parse logic for R 4.4 compatibility

utils::getParseData has a longstanding bug: for an installed package,
parse data is available only for the last file [1].  To work around
that, the get_tokens helper first calls getParseData and then falls
back to custom logic that extracts the concatenated source lines,
splits them on #line directives, and calls getParseData on each file's
lines.

The getParseData bug was fixed in R 4.4.0 (r84538).  Unfortunately
that change causes at least two issues (for some subset of packages):
a substantial performance regression [2] and an error when applying
exclusions [3].

Under R 4.4, getParseData always returns non-NULL as a result of that
change when calculating package coverage (in other words, the
get_parse_data fallback is _not_ triggered).  The slowdown is
partially due to the parse data no longer being cached across
get_tokens calls.  Another relevant aspect, for both the slowdown and
the error applying exclusions, is likely that the new getParseData
returns data for the entire package rather than the per-file parse
data the downstream covr code expects.

One solution would be to adapt covr's caching and handling of the
getParseData when running under R 4.4.0 or later.  Instead go with a
simpler and more minimal fix.  Reorder the calls so that the
get_parse_data call, which we know has been the primary code path for
package coverage before R 4.4.0, is the first call tried.  Leave
getParseData as the fallback to handle the non-package coverage cases.

[1] #154
    https://bugs.r-project.org/show_bug.cgi?id=16756

[2] As an extreme case, calling package_coverage on R.utils goes from
    under 15 minutes to over 6 hours.

[3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of
    packages that hit into this error.

Closes #576
Closes #579
Re: #567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants