Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying exclusions raise error due to missing srcref lines #579

Closed
dgkf opened this issue Oct 22, 2024 · 3 comments · Fixed by #588
Closed

Applying exclusions raise error due to missing srcref lines #579

dgkf opened this issue Oct 22, 2024 · 3 comments · Fixed by #588

Comments

@dgkf
Copy link
Contributor

dgkf commented Oct 22, 2024

Identified this odd bug when trying to run covr against diffobj. It's unclear to me why this is popping up for this package.

library(covr)  # using main @c0c88f1
package_coverage("./diffobj")

Coverage tallying finishes successfully, but when applying exclusions will produce the error:

Error in seq.default(df[i, "first_line"], df[i, "last_line"]) :
  'from' must be a finite number
> traceback()
9: stop("'from' must be a finite number")
8: seq.default(df[i, "first_line"], df[i, "last_line"])
7: seq(df[i, "first_line"], df[i, "last_line"])
6: seq(df[i, "first_line"], df[i, "last_line"]) %in% excl[[file]] at exclusions.R#90
5: FUN(X[[i]], ...)
4: vapply(X, FUN, ..., FUN.VALUE = logical(1)) at utils.R#346
3: vlapply(seq_len(NROW(df)), function(i) {
       file <- df[i, "full_name"]
       which_exclusion <- match(file, names(excl))
       !is.na(which_exclusion) && (identical(excl[[which_exclusion]],
           Inf) || all(seq(df[i, "first_line"], df[i, "last_line"]) %in%
           excl[[file]]))
   }) at exclusions.R#85
2: exclude(coverage, line_exclusions = line_exclusions, function_exclusions = function_exclusions,
       path = root) at covr.R#544
1: covr::package_coverage("diffobj"

There's a single trace in the coverage trace data frame with a NA first_line (among other columns), and this can also be see in the pre-transformed coverage object. When debugging the coverage traces before applying exclusions, one trace contains:

$`capt.R:NA:NA:NA:NA:NA:NA:NA:NA`
$`capt.R:NA:NA:NA:NA:NA:NA:NA:NA`$value
[1] 100

$`capt.R:NA:NA:NA:NA:NA:NA:NA:NA`$srcref
<srcref: file "diffobj/R/capt.R" chars NA:NA to NA:NA>

$`capt.R:NA:NA:NA:NA:NA:NA:NA:NA`$functions
[1] "capt_chr"
@kyleam
Copy link
Contributor

kyleam commented Oct 31, 2024

@dgkf Are you running R 4.4?

I've been troubleshooting what looks to be the same failure for diffobj as well as for a few other packages (data.table, mrgsolve, and nanotime). I hit into these when running the checks for https://mpn.metworx.com packages under R 4.4 rather of than R 4.3.

Based on bisecting with an R source build, I'm pretty confident that this is an interaction with R's revision 84538: 990fe3d8a1 (getParseData() - for first-file objects, 2023-06-14). That was part of the R 4.4.0 release.

Here are some links associated with that commit:

[ +cc @krlmlr ]

@dgkf
Copy link
Contributor Author

dgkf commented Nov 3, 2024

@kyleam Thanks for sharing! Yeah, I was testing against 4.4.1 specifically.

I've had limited bandwidth to pursue this bug further, but since filing I've successfully reproduced old covr results on older R versions indicating that this is likely an artifact of a base R change. You definitely seem to be on the right track here.

@kyleam
Copy link
Contributor

kyleam commented Nov 4, 2024

Yeah, I was testing against 4.4.1 specifically.

Thanks for confirming.

It looks like @MichaelChirico reported the same failure in gh-576.

And there's the related gh-567, where @mcol pointed to the same R 4.4.0 change as the culprit (revision 84538, 990fe3d8a1).

kyleam added a commit to kyleam/covr that referenced this issue Nov 15, 2024
utils::getParseData has a longstanding bug: for an installed package,
parse data is available only for the last file [1].  To work around
that, the get_tokens helper first calls getParseData and then falls
back to custom logic that extracts the concatenated source lines,
splits them on #line directives, and calls getParseData on each file's
lines.

The getParseData bug was fixed in R 4.4.0 (r84538).  Unfortunately
that change causes at least two issues (for some subset of packages):
a substantial performance regression [2] and an error when applying
exclusions [3].

Under R 4.4, getParseData always returns non-NULL as a result of that
change when calculating package coverage (in other words, the
get_parse_data fallback is _not_ triggered).  The slowdown is
partially due to the parse data no longer being cached across
get_tokens calls.  Another relevant aspect, for both the slowdown and
the error applying exclusions, is likely that the new getParseData
returns data for the entire package rather than the per-file parse
data the downstream covr code expects.

One solution would be to adapt covr's caching and handling of the
getParseData when running under R 4.4.0 or later.  Instead go with a
simpler and more minimal fix.  Reorder the calls so that the
get_parse_data call, which we know has been the primary code path for
package coverage before R 4.4.0, is the first call tried.  Leave
getParseData as the fallback to handle the non-package coverage cases.

[1] r-lib#154
    https://bugs.r-project.org/show_bug.cgi?id=16756

[2] As an extreme case, calling package_coverage on R.utils goes from
    under 15 minutes to over 6 hours.

[3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of
    packages that hit into this error.

Closes r-lib#576
Closes r-lib#579
Re: r-lib#567
kyleam added a commit to kyleam/covr that referenced this issue Nov 15, 2024
utils::getParseData has a longstanding bug: for an installed package,
parse data is available only for the last file [1].  To work around
that, the get_tokens helper first calls getParseData and then falls
back to custom logic that extracts the concatenated source lines,
splits them on #line directives, and calls getParseData on each file's
lines.

The getParseData bug was fixed in R 4.4.0 (r84538).  Unfortunately
that change causes at least two issues (for some subset of packages):
a substantial performance regression [2] and an error when applying
exclusions [3].

Under R 4.4, getParseData always returns non-NULL as a result of that
change when calculating package coverage (in other words, the
get_parse_data fallback is _not_ triggered).  The slowdown is
partially due to the parse data no longer being cached across
get_tokens calls.  Another relevant aspect, for both the slowdown and
the error applying exclusions, is likely that the new getParseData
returns data for the entire package rather than the per-file parse
data the downstream covr code expects.

One solution would be to adapt covr's caching and handling of the
getParseData when running under R 4.4.0 or later.  Instead go with a
simpler and more minimal fix.  Reorder the calls so that the
get_parse_data call, which we know has been the primary code path for
package coverage before R 4.4.0, is the first call tried.  Leave
getParseData as the fallback to handle the non-package coverage cases.

[1] r-lib#154
    https://bugs.r-project.org/show_bug.cgi?id=16756

[2] As an extreme case, calling package_coverage on R.utils goes from
    under 15 minutes to over 6 hours.

[3] nanotime (v0.3.10) and diffobj (v0.3.5) are two examples of
    packages that hit into this error.

Closes r-lib#576
Closes r-lib#579
Re: r-lib#567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants