subset() does not return error message properly #745

LudvigOlsen · 2020-03-24T18:07:02Z

When attempting to select a column that is not in a tibble with subset(), I cannot capture the error message. It works with regular data frames.

I'm using the newest GitHub version of tibble (master branch). R version 3.6.1.

library(tidyverse)
df <- data.frame("a" = c(1,2,3))
tf <- tibble("a" = c(1,2,3))

# Regular data frame
df %>% subset(select = "b")
#> Error in `[.data.frame`(x, r, vars, drop = drop): undefined columns selected
testthat::capture_error(df %>% subset(select = "b"))$message
#> [1] "undefined columns selected"

# tibble
tf %>% subset(select = "b")
#> Error: Can't subset columns that don't exist.
#> x The column `b` doesn't exist.
testthat::capture_error(tf %>% subset(select = "b"))$message
#> [1] ""

I've tried with tryCatch as well.

The text was updated successfully, but these errors were encountered:

krlmlr · 2020-03-24T18:21:25Z

Can you use conditionMessage() or rlang::cnd_message()?

rlang::cnd_message(testthat::capture_error(
  subset(tibble::tibble(a = 1), select = "b")
))
#> [1] "Can't subset columns that don't exist.\nx The column `b` doesn't exist."

^{Created on 2020-03-24 by the reprex package (v0.3.0)}

It's much safer to test for the class of the condition. What's your use case?

CC @lionel-.

LudvigOlsen · 2020-03-24T18:46:58Z

xpectr generates tests for errors and I use capture_error to catch it. This is the first time I've encountered an empty message like this. Is this a common thing? (Didn't happen with my previous version of tibble)

Do you think it would be safe to always use rlang::cnd_message to extract the message? Or perhaps check if $message is empty and use it if so?

Thanks!

LudvigOlsen · 2020-03-24T19:10:49Z

It seems that testthat is using cnd_message to get messages for its capture_messages() and capture_warnings() functions, so I will assume it's the right thing to do.

krlmlr · 2020-03-24T19:21:56Z

conditionMessage() always used the message component by default:

conditionMessage.condition
#> function (c) 
#> c$message
#> <bytecode: 0x56488f229120>
#> <environment: namespace:base>

^{Created on 2020-03-24 by the reprex package (v0.3.0)}

krlmlr · 2020-03-24T19:23:24Z

The errors are new because we only started to use classed error messages in tibble 3.0.0. It should be safe to use cnd_message(), it's much safer to compare the class attribute though. (Note that condition classes in tibble currently don't come with stability guarantees, it's safe to check for "tibble_error" though.)

LudvigOlsen · 2020-03-24T19:32:58Z

I'll probably have it generate both a test of the message and the class then.

I don't see the "tibble_error", btw:

class(testthat::capture_error(
      subset(tibble::tibble(a = 1), select = "b")
    ))
#> [1] "vctrs_error_subscript_oob" "vctrs_error_subscript"    
#> [3] "rlang_error"               "error"                    
#> [5] "condition"

^{Created on 2020-03-24 by the reprex package (v0.3.0)}

krlmlr · 2020-03-24T19:40:52Z

Some errors are generated by vctrs.

LudvigOlsen · 2020-03-24T19:45:33Z

Ah, okay, so what you're saying is that if the "tibble_error" is currently part of the class attribute, it will likely remain so?

I aim to have xpectr generate the tests without such package specific knowledge though, but good to know :)

Thanks for the clarifications :)

lionel- · 2020-03-25T07:05:51Z

Empty message fields will become more prevalent as we move towards lazy generation of error messages.

I'd use conditionMessage() instead of cnd_message(), for simplicity. The former calls the latter for rlang errors.

LudvigOlsen · 2020-04-03T22:39:59Z

Let me know if this needs a separate issue, but it's pretty related.

Based on #410
I've disabled the ansi escape sequences in the messages, but the x remains. On Travis CI the x is not there though.

1/1 mismatches
x[1]: "Must extract column with a single valid subscript The subscript NA cant be NA"
y[1]: "Must extract column with a single valid subscriptx The subscript NA cant be NA"
                                                       ---

I'm using this:

cnd_message <- function(x){
  withr::local_options(c(rlang_backtrace_on_error = "none", 
                         crayon.enabled = FALSE))
  conditionMessage(x)
}

I could keep crayon enabled and remove the x with

gsub("\033[31mx\033[39m", "", error_msg, fixed = TRUE)

Seems like a hack though and in case other packages than tibble uses crayon to style errors, it might be preferable to disable it.

Any tips?

krlmlr · 2020-04-04T02:57:28Z

Can you disable crayon also for the code that triggers the error message? tibble currently doesn't do lazy generation of error messages.

LudvigOlsen · 2020-04-04T03:43:48Z

I can set the crayon.enabled option before evaluating the code, but the x is still there.

> conditionMessage(testthat::capture_error(df[[NA]]))
[1] "Must extract column with a single valid subscript.\n\033[31mx\033[39m The subscript `NA` can't be `NA`."
>
> options(crayon.enabled = F)
> conditionMessage(testthat::capture_error(df[[NA]]))
[1] "Must extract column with a single valid subscript.\nx The subscript `NA` can't be `NA`."

Now, I get that the ansi code is there to style the x, hence it makes sense that the x would still be there without the styling. What I don't get is why it's not there in Travis?

krlmlr · 2020-04-04T03:55:38Z

Can you clear the package cache on Travis?

LudvigOlsen · 2020-04-04T11:37:58Z

It fails for all but this branch:

- name: Strict Latin-1 locale
    r: release
    before_script:
      - sudo locale-gen en_US
      - export LC_ALL=en_US

Given that it's for xpectr, my solution needs to work for all types of errors a user might generate a test for. Would it be reasonable to set the locale locally (assuming it can be done with withr) before evaluating the code and catching the error? And would the best locale be Latin-1 or utf8?

Let me know if I'm venturing a bit too far off tibble territory here. This is very helpful for me :)

krlmlr · 2020-04-04T15:04:36Z

All conditions in tibble now have a class, so that you don't need to verify error messages at all but can compare the class. This is much more robust, much more can go wrong when comparing textual output.

Perhaps xpectr should do the same and generate tests that only compare the class attribute?

LudvigOlsen · 2020-04-04T15:47:39Z

One of my own uses is to go through the generated tests (gxs_function() is sort of fuzz testing) to check that the errors thrown by the function are meaningful to the user. So I would prefer to keep that test. Also, I usually use checkmate to check inputs, which just throws simpleErrors.

Currently a generated test look something like this:

output_10856 <- capture_side_effects(df[[NA]])
expect_equal(
    strip(output_10856[["error"]]),
    strip("Must extract column with a single valid subscript.\nx The subscript `NA` can't be `NA`."),
    fixed = TRUE)
expect_equal(
    output_10856[["error_class"]],
    c("vctrs_error_subscript_type", "vctrs_error_subscript", "rlang_error",
      "error", "condition"),
    fixed = TRUE)

The strip() function removes non-alphanumeric characters and simplifies white spaces. That makes them easier to compare. It can also remove numbers and soon ansi (using a regex from the crayon source code).

A user can remove the test of the message, if it fails for some reason. My goal is to make them useful in as many cases as possible. So if setting a locale before evaluating the code could make it work in a lot more instances, that seems like a good approach?

krlmlr · 2020-04-05T04:16:38Z

You can use withr::with_locale() for temporary setting the locale during capturing and evaluation, and also fansi::strip_ctl() to strip all control sequences.

Have you considered generating verify_output() calls so that the user sees the actual output in a text file?

LudvigOlsen · 2020-04-05T15:30:09Z

Hmm, setting the locale is confusing. with_locale() can't set LC_ALL and I'm not sure which locale to pick, such that it will work on all platforms. Also, it doesn't seem to work:

It seems the locale when defining f() decides the encoding of the string?

# Set locale
# If I set this to en_US.ISO8859-1 it changes the encoding to latin1
Sys.setlocale(category = "LC_ALL", locale = "en_US.UTF-8")
#> [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

f <- function(lcl){
    withr::with_locale(
        new = c(LC_MESSAGES = lcl, LC_COLLATE = lcl, LC_CTYPE = lcl, LC_MONETARY = lcl, 
                LC_TIME = lcl),
        {s <- "æøå1€%§"
        print(Sys.getlocale())
        Encoding(s)}
    )
}

f("en_US.ISO8859-1")
#> [1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_US.ISO8859-1"
#> [1] "UTF-8"
Sys.setlocale(category = "LC_ALL", locale = "en_US.ISO8859-1")
#> [1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_US.UTF-8"
f("en_US.ISO8859-1")
#> [1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_US.ISO8859-1"
#> [1] "UTF-8"

^{Created on 2020-04-05 by the reprex package (v0.3.0}

If the process that generates the message doesn't use the local locale, I would expect errors from that?

and also fansi::strip_ctl() to strip all control sequences.

Great, I'll have a look at that :)

Have you considered generating verify_output() calls so that the user sees the actual output in a text file?

This seems like an option just waiting to happen. I could perhaps create a gxs_output() function for generating that kind of test. Or have it as an (optional) extra test among the other tests (redundant?).

Could also add a gxs_file() for going the other way and generating tests for the content in a file. Not sure if that would be useful, but I think it would be reasonably easy to implement. It would only work when the content can be evaluated of course.

LudvigOlsen · 2020-04-05T15:44:23Z

I had a look at reprex:::reprex_locale, which uses withr::local_envvar instead, but I get the same encoding for the string.

f <- function(lcl){
    withr::local_envvar(c(LANGUAGE = "en"))
    withr::local_envvar(c(LC_MESSAGES = lcl, LC_COLLATE = lcl, 
                LC_CTYPE = lcl, LC_MONETARY = lcl, 
                LC_TIME = lcl))
    s <- "æøå1€%§"    # For some reason copied as "Ê¯Â1<U+20AC>%ß"
    print(Sys.getlocale())
    Encoding(s)
}

^{Created on 2020-04-05 by the reprex package (v0.3.0}

github-actions · 2021-04-06T00:15:44Z

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

krlmlr closed this as completed Mar 26, 2020

This comment has been minimized.

Sign in to view

github-actions bot locked and limited conversation to collaborators Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subset() does not return error message properly #745

subset() does not return error message properly #745

LudvigOlsen commented Mar 24, 2020

krlmlr commented Mar 24, 2020

LudvigOlsen commented Mar 24, 2020

LudvigOlsen commented Mar 24, 2020

krlmlr commented Mar 24, 2020

krlmlr commented Mar 24, 2020

LudvigOlsen commented Mar 24, 2020

krlmlr commented Mar 24, 2020

LudvigOlsen commented Mar 24, 2020

lionel- commented Mar 25, 2020

LudvigOlsen commented Apr 3, 2020

krlmlr commented Apr 4, 2020

LudvigOlsen commented Apr 4, 2020 •

edited

Loading

krlmlr commented Apr 4, 2020

This comment has been minimized.

LudvigOlsen commented Apr 4, 2020 •

edited

Loading

krlmlr commented Apr 4, 2020

LudvigOlsen commented Apr 4, 2020

krlmlr commented Apr 5, 2020

LudvigOlsen commented Apr 5, 2020

LudvigOlsen commented Apr 5, 2020 •

edited

Loading

github-actions bot commented Apr 6, 2021

subset() does not return error message properly #745

subset() does not return error message properly #745

Comments

LudvigOlsen commented Mar 24, 2020

krlmlr commented Mar 24, 2020

LudvigOlsen commented Mar 24, 2020

LudvigOlsen commented Mar 24, 2020

krlmlr commented Mar 24, 2020

krlmlr commented Mar 24, 2020

LudvigOlsen commented Mar 24, 2020

krlmlr commented Mar 24, 2020

LudvigOlsen commented Mar 24, 2020

lionel- commented Mar 25, 2020

LudvigOlsen commented Apr 3, 2020

krlmlr commented Apr 4, 2020

LudvigOlsen commented Apr 4, 2020 • edited Loading

krlmlr commented Apr 4, 2020

This comment has been minimized.

LudvigOlsen commented Apr 4, 2020 • edited Loading

krlmlr commented Apr 4, 2020

LudvigOlsen commented Apr 4, 2020

krlmlr commented Apr 5, 2020

LudvigOlsen commented Apr 5, 2020

LudvigOlsen commented Apr 5, 2020 • edited Loading

github-actions bot commented Apr 6, 2021

LudvigOlsen commented Apr 4, 2020 •

edited

Loading

LudvigOlsen commented Apr 4, 2020 •

edited

Loading

LudvigOlsen commented Apr 5, 2020 •

edited

Loading