Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset() does not return error message properly #745

Closed
LudvigOlsen opened this issue Mar 24, 2020 · 21 comments
Closed

subset() does not return error message properly #745

LudvigOlsen opened this issue Mar 24, 2020 · 21 comments

Comments

@LudvigOlsen
Copy link

When attempting to select a column that is not in a tibble with subset(), I cannot capture the error message. It works with regular data frames.

I'm using the newest GitHub version of tibble (master branch). R version 3.6.1.

library(tidyverse)
df <- data.frame("a" = c(1,2,3))
tf <- tibble("a" = c(1,2,3))

# Regular data frame
df %>% subset(select = "b")
#> Error in `[.data.frame`(x, r, vars, drop = drop): undefined columns selected
testthat::capture_error(df %>% subset(select = "b"))$message
#> [1] "undefined columns selected"

# tibble
tf %>% subset(select = "b")
#> Error: Can't subset columns that don't exist.
#> x The column `b` doesn't exist.
testthat::capture_error(tf %>% subset(select = "b"))$message
#> [1] ""

I've tried with tryCatch as well.

@krlmlr
Copy link
Member

krlmlr commented Mar 24, 2020

Can you use conditionMessage() or rlang::cnd_message()?

rlang::cnd_message(testthat::capture_error(
  subset(tibble::tibble(a = 1), select = "b")
))
#> [1] "Can't subset columns that don't exist.\nx The column `b` doesn't exist."

Created on 2020-03-24 by the reprex package (v0.3.0)

It's much safer to test for the class of the condition. What's your use case?

CC @lionel-.

@LudvigOlsen
Copy link
Author

xpectr generates tests for errors and I use capture_error to catch it. This is the first time I've encountered an empty message like this. Is this a common thing? (Didn't happen with my previous version of tibble)

Do you think it would be safe to always use rlang::cnd_message to extract the message? Or perhaps check if $message is empty and use it if so?

Thanks!

@LudvigOlsen
Copy link
Author

It seems that testthat is using cnd_message to get messages for its capture_messages() and capture_warnings() functions, so I will assume it's the right thing to do.

@krlmlr
Copy link
Member

krlmlr commented Mar 24, 2020

conditionMessage() always used the message component by default:

conditionMessage.condition
#> function (c) 
#> c$message
#> <bytecode: 0x56488f229120>
#> <environment: namespace:base>

Created on 2020-03-24 by the reprex package (v0.3.0)

@krlmlr
Copy link
Member

krlmlr commented Mar 24, 2020

The errors are new because we only started to use classed error messages in tibble 3.0.0. It should be safe to use cnd_message(), it's much safer to compare the class attribute though. (Note that condition classes in tibble currently don't come with stability guarantees, it's safe to check for "tibble_error" though.)

@LudvigOlsen
Copy link
Author

I'll probably have it generate both a test of the message and the class then.

I don't see the "tibble_error", btw:

class(testthat::capture_error(
      subset(tibble::tibble(a = 1), select = "b")
    ))
#> [1] "vctrs_error_subscript_oob" "vctrs_error_subscript"    
#> [3] "rlang_error"               "error"                    
#> [5] "condition"

Created on 2020-03-24 by the reprex package (v0.3.0)

@krlmlr
Copy link
Member

krlmlr commented Mar 24, 2020

Some errors are generated by vctrs.

@LudvigOlsen
Copy link
Author

Ah, okay, so what you're saying is that if the "tibble_error" is currently part of the class attribute, it will likely remain so?

I aim to have xpectr generate the tests without such package specific knowledge though, but good to know :)

Thanks for the clarifications :)

@lionel-
Copy link
Member

lionel- commented Mar 25, 2020

Empty message fields will become more prevalent as we move towards lazy generation of error messages.

I'd use conditionMessage() instead of cnd_message(), for simplicity. The former calls the latter for rlang errors.

@krlmlr krlmlr closed this as completed Mar 26, 2020
@LudvigOlsen
Copy link
Author

Let me know if this needs a separate issue, but it's pretty related.

Based on #410
I've disabled the ansi escape sequences in the messages, but the x remains. On Travis CI the x is not there though.

1/1 mismatches
x[1]: "Must extract column with a single valid subscript The subscript NA cant be NA"
y[1]: "Must extract column with a single valid subscriptx The subscript NA cant be NA"
                                                       ---

I'm using this:

cnd_message <- function(x){
  withr::local_options(c(rlang_backtrace_on_error = "none", 
                         crayon.enabled = FALSE))
  conditionMessage(x)
}

I could keep crayon enabled and remove the x with

gsub("\033[31mx\033[39m", "", error_msg, fixed = TRUE)

Seems like a hack though and in case other packages than tibble uses crayon to style errors, it might be preferable to disable it.

Any tips?

@krlmlr
Copy link
Member

krlmlr commented Apr 4, 2020

Can you disable crayon also for the code that triggers the error message? tibble currently doesn't do lazy generation of error messages.

@LudvigOlsen
Copy link
Author

LudvigOlsen commented Apr 4, 2020

I can set the crayon.enabled option before evaluating the code, but the x is still there.

> conditionMessage(testthat::capture_error(df[[NA]]))
[1] "Must extract column with a single valid subscript.\n\033[31mx\033[39m The subscript `NA` can't be `NA`."
>
> options(crayon.enabled = F)
> conditionMessage(testthat::capture_error(df[[NA]]))
[1] "Must extract column with a single valid subscript.\nx The subscript `NA` can't be `NA`."

Now, I get that the ansi code is there to style the x, hence it makes sense that the x would still be there without the styling. What I don't get is why it's not there in Travis?

@krlmlr
Copy link
Member

krlmlr commented Apr 4, 2020

Can you clear the package cache on Travis?

@LudvigOlsen

This comment has been minimized.

@LudvigOlsen
Copy link
Author

LudvigOlsen commented Apr 4, 2020

It fails for all but this branch:

- name: Strict Latin-1 locale
    r: release
    before_script:
      - sudo locale-gen en_US
      - export LC_ALL=en_US

Given that it's for xpectr, my solution needs to work for all types of errors a user might generate a test for. Would it be reasonable to set the locale locally (assuming it can be done with withr) before evaluating the code and catching the error? And would the best locale be Latin-1 or utf8?

Let me know if I'm venturing a bit too far off tibble territory here. This is very helpful for me :)

@krlmlr
Copy link
Member

krlmlr commented Apr 4, 2020

All conditions in tibble now have a class, so that you don't need to verify error messages at all but can compare the class. This is much more robust, much more can go wrong when comparing textual output.

Perhaps xpectr should do the same and generate tests that only compare the class attribute?

@LudvigOlsen
Copy link
Author

One of my own uses is to go through the generated tests (gxs_function() is sort of fuzz testing) to check that the errors thrown by the function are meaningful to the user. So I would prefer to keep that test. Also, I usually use checkmate to check inputs, which just throws simpleErrors.

Currently a generated test look something like this:

output_10856 <- capture_side_effects(df[[NA]])
expect_equal(
    strip(output_10856[["error"]]),
    strip("Must extract column with a single valid subscript.\nx The subscript `NA` can't be `NA`."),
    fixed = TRUE)
expect_equal(
    output_10856[["error_class"]],
    c("vctrs_error_subscript_type", "vctrs_error_subscript", "rlang_error",
      "error", "condition"),
    fixed = TRUE)

The strip() function removes non-alphanumeric characters and simplifies white spaces. That makes them easier to compare. It can also remove numbers and soon ansi (using a regex from the crayon source code).

A user can remove the test of the message, if it fails for some reason. My goal is to make them useful in as many cases as possible. So if setting a locale before evaluating the code could make it work in a lot more instances, that seems like a good approach?

@krlmlr
Copy link
Member

krlmlr commented Apr 5, 2020

You can use withr::with_locale() for temporary setting the locale during capturing and evaluation, and also fansi::strip_ctl() to strip all control sequences.

Have you considered generating verify_output() calls so that the user sees the actual output in a text file?

@LudvigOlsen
Copy link
Author

Hmm, setting the locale is confusing. with_locale() can't set LC_ALL and I'm not sure which locale to pick, such that it will work on all platforms. Also, it doesn't seem to work:

It seems the locale when defining f() decides the encoding of the string?

# Set locale
# If I set this to en_US.ISO8859-1 it changes the encoding to latin1
Sys.setlocale(category = "LC_ALL", locale = "en_US.UTF-8")
#> [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

f <- function(lcl){
    withr::with_locale(
        new = c(LC_MESSAGES = lcl, LC_COLLATE = lcl, LC_CTYPE = lcl, LC_MONETARY = lcl, 
                LC_TIME = lcl),
        {s <- "æøå1€%§"
        print(Sys.getlocale())
        Encoding(s)}
    )
}

f("en_US.ISO8859-1")
#> [1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_US.ISO8859-1"
#> [1] "UTF-8"
Sys.setlocale(category = "LC_ALL", locale = "en_US.ISO8859-1")
#> [1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_US.UTF-8"
f("en_US.ISO8859-1")
#> [1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_US.ISO8859-1"
#> [1] "UTF-8"

Created on 2020-04-05 by the reprex package (v0.3.0

If the process that generates the message doesn't use the local locale, I would expect errors from that?

and also fansi::strip_ctl() to strip all control sequences.

Great, I'll have a look at that :)

Have you considered generating verify_output() calls so that the user sees the actual output in a text file?

This seems like an option just waiting to happen. I could perhaps create a gxs_output() function for generating that kind of test. Or have it as an (optional) extra test among the other tests (redundant?).

Could also add a gxs_file() for going the other way and generating tests for the content in a file. Not sure if that would be useful, but I think it would be reasonably easy to implement. It would only work when the content can be evaluated of course.

@LudvigOlsen
Copy link
Author

LudvigOlsen commented Apr 5, 2020

I had a look at reprex:::reprex_locale, which uses withr::local_envvar instead, but I get the same encoding for the string.

f <- function(lcl){
    withr::local_envvar(c(LANGUAGE = "en"))
    withr::local_envvar(c(LC_MESSAGES = lcl, LC_COLLATE = lcl, 
                LC_CTYPE = lcl, LC_MONETARY = lcl, 
                LC_TIME = lcl))
    s <- "æøå1€%§"    # For some reason copied as "ʯÂ1<U+20AC>%ß"
    print(Sys.getlocale())
    Encoding(s)
}

Created on 2020-04-05 by the reprex package (v0.3.0

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2021

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants