-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impute srcrefs for subexpressions #154
Conversation
- percentage changes in one of the tests
This is awesome thank you for working on it, I had a couple aborted attempts and this has been a nagging problem I have been wanting to fix. There shouldn't be any difference in srcrefs for However |
package_coverage(): It seems that the package loading mechanism does something "evil": It concatenates all R files to one, and adds |
Using load_all() breaks the S4 tests. Bummer... |
I originally used To get around this I think we could just re-parse the text in the parent source reference to get the token data. Something like txt <- as.character(parent_ref)
sf <- srcfile(txt)
parse(text = txt, srcfile = sf)
pd <- getParseData(sf) Then you need to fix the line and column numbers from the tokens with those from where the if block starts. |
An alternative would be to fix getParseData() in this case. The "srcfile" attribute is an environment which contains an "original" object; I think the parse data belongs there, but is instead attached to the last file in the package. Test package: https://github.com/krlmlr/covr.dummy Just posted to r-devel. |
Until this is fixed in R, I suggest we tweak the package loading process: Create a "last" file ourselves with an otherwise useless function, grab the parseData from there and put it where it belongs. |
From your report it looks like the last srcref will always have the parse data attached. Could we just find that srcref and copy the parse data to the other srcrefs? Figuring out the last file is slightly complicated due to Collate directives, locale issues etc. |
What if there are no objects in the last file? Think good ol' |
This reverts commit 4982d94.
also be aware of line offsets due to #line directives
Ok so we can tokens of the entire package including line directives from any srcref in the package with the following. get_tokens <- function(srcref) {
getParseData(attr(parse(text = attr(getSrcref(srcref), "srcfile")$original$lines), "srcfile"))
} Then you would need to fix the line numbers based on the line directives. This works because the object returned by parse returns a srcfile with the parse data attached. |
Much easier, don't need to reparse at all. Working on a custom version of getParseData() to make it a tad faster. Functionality is in place already, tests pass, only lintr keeps complaining. |
- avoids sorting - could use data.table for lookup performance here
Tests pass (AppVeyor requires an appveyor.yml file; if you delete the project there, it shouldn't show up here anymore). Character encoding still might be an issue (if we're ever looking at exact column position of the imputed srcrefs). There will be warnings if parse data cannot be repaired. |
make_srcref(5), | ||
make_srcref(6, 7) | ||
) | ||
src_ref[seq_along(x)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this to handle if's without else's right? Might be worth adding a comment so it is clear why the if
case is different than for
and while
.
get_parse_data(x$original) | ||
else if (exists("covr_parse_data", x)) | ||
x$covr_parse_data | ||
else if (!is.null(data <- x[["parseData"]])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put braces around the bodies of these conditionals, I prefer to be explicit to avoid issues in the future when statements are added and someone forgets to add braces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this:
if (inherits(x, "srcref")) {
get_parse_data(attr(x, "srcfile"))
} else if (exists("original", x)) {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
Done. Should we wait for community feedback before merging? The changes here are rather invasive. |
Well the current behavior is clearly wrong, so if we wait it shouldn't need to be long. |
utils::getParseData has a longstanding bug: for an installed package, parse data is available only for the last file [1]. To work around that, the get_tokens helper first calls getParseData and then falls back to custom logic that extracts the concatenated source lines, splits them on #line directives, and calls getParseData on each file's lines. The getParseData bug was fixed in R 4.4.0 (r84538). Unfortunately that change causes at least two issues (for some subset of packages): a substantial performance regression [2] and an error when applying exclusions [3]. Under R 4.4, getParseData always returns non-NULL as a result of that change when calculating package coverage (in other words, the get_parse_data fallback is _not_ triggered). The slowdown is partially due to the parse data no longer being cached across get_tokens calls. Another relevant aspect, for both the slowdown and the error applying exclusions, is likely that the new getParseData returns data for the entire package rather than the per-file parse data the downstream covr code expects. One solution would be to adapt covr's caching and handling of the getParseData when running under R 4.4.0 or later. Instead go with a simpler and more minimal fix. Reorder the calls so that the get_parse_data call, which we know has been the primary code path for package coverage before R 4.4.0, is the first call tried. Leave getParseData as the fallback to handle the non-package coverage cases. [1] r-lib#154 https://bugs.r-project.org/show_bug.cgi?id=16756 [2] As an extreme case, calling package_coverage on R.utils goes from under 15 minutes to over 6 hours. [3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of packages that hit into this error. Closes r-lib#576 Closes r-lib#579 Re: r-lib#567
utils::getParseData has a longstanding bug: for an installed package, parse data is available only for the last file [1]. To work around that, the get_tokens helper first calls getParseData and then falls back to custom logic that extracts the concatenated source lines, splits them on #line directives, and calls getParseData on each file's lines. The getParseData bug was fixed in R 4.4.0 (r84538). Unfortunately that change causes at least two issues (for some subset of packages): a substantial performance regression [2] and an error when applying exclusions [3]. Under R 4.4, getParseData always returns non-NULL as a result of that change when calculating package coverage (in other words, the get_parse_data fallback is _not_ triggered). The slowdown is partially due to the parse data no longer being cached across get_tokens calls. Another relevant aspect, for both the slowdown and the error applying exclusions, is likely that the new getParseData returns data for the entire package rather than the per-file parse data the downstream covr code expects. One solution would be to adapt covr's caching and handling of the getParseData when running under R 4.4.0 or later. Instead go with a simpler and more minimal fix. Reorder the calls so that the get_parse_data call, which we know has been the primary code path for package coverage before R 4.4.0, is the first call tried. Leave getParseData as the fallback to handle the non-package coverage cases. [1] r-lib#154 https://bugs.r-project.org/show_bug.cgi?id=16756 [2] As an extreme case, calling package_coverage on R.utils goes from under 15 minutes to over 6 hours. [3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of packages that hit into this error. Closes r-lib#576 Closes r-lib#579 Re: r-lib#567
* split_on_line_directives: guard against input without a directive get_parse_data extracts lines from the input srcfile object and feeds them to split_on_line_directives, which expects the lines to be a concatenation of all the package R files, separated by #line directives. With how get_parse_data is currently called, that expectation is met. get_parse_data is called only if utils::getParseData returns NULL, and getParseData doesn't return NULL for any of the cases where the input does _not_ have line directives (i.e. entry points other than package_coverage). An upcoming commit is going to move the get_parse_data call in front of the getParseData call, so update split_on_line_directives to detect the "no directives" case. Without this guard, the mapply call in split_on_line_directives would error under an R version before 4.2; with R 4.2 or later, split_on_line_directives returns empty. * split_on_line_directives: fix handling of single-file package case split_on_line_directives breaks the input at #line directives and returns a named list of lines for each file. For a package with a single file under R/, there is one directive. The bounds calculation is still correct for that case. However, the return value is incorrectly a matrix rather than a list because the mapply call simplifies the result. At this point, this bug is mostly [*] unexposed because this code path is only triggered if utils::getParseData returns NULL, and it should always return a non-NULL result for the single-file package case. The next commit will reorder things, exposing the bug. Tell mapply to not simplify the result. [*] The simplification to a matrix could also happen for multi-file packages in the unlikely event that all files have the same number of lines. * parse_data: promote custom parse logic for R 4.4 compatibility utils::getParseData has a longstanding bug: for an installed package, parse data is available only for the last file [1]. To work around that, the get_tokens helper first calls getParseData and then falls back to custom logic that extracts the concatenated source lines, splits them on #line directives, and calls getParseData on each file's lines. The getParseData bug was fixed in R 4.4.0 (r84538). Unfortunately that change causes at least two issues (for some subset of packages): a substantial performance regression [2] and an error when applying exclusions [3]. Under R 4.4, getParseData always returns non-NULL as a result of that change when calculating package coverage (in other words, the get_parse_data fallback is _not_ triggered). The slowdown is partially due to the parse data no longer being cached across get_tokens calls. Another relevant aspect, for both the slowdown and the error applying exclusions, is likely that the new getParseData returns data for the entire package rather than the per-file parse data the downstream covr code expects. One solution would be to adapt covr's caching and handling of the getParseData when running under R 4.4.0 or later. Instead go with a simpler and more minimal fix. Reorder the calls so that the get_parse_data call, which we know has been the primary code path for package coverage before R 4.4.0, is the first call tried. Leave getParseData as the fallback to handle the non-package coverage cases. [1] #154 https://bugs.r-project.org/show_bug.cgi?id=16756 [2] As an extreme case, calling package_coverage on R.utils goes from under 15 minutes to over 6 hours. [3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of packages that hit into this error. Closes #576 Closes #579 Re: #567
This allows collecting coverage for non-braced subexpressions in
if
,for
, andwhile
clauses. Works by imputing missing srcrefs on the fly (intrace_calls()
) by analyzinggetParseData()
output.Caveats:
package_coverage()
, I don't know why. Perhapspackage_coverage()
somehow needs to know all srcrefs beforehand?Fixes #39.
Test file: