Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ignore_class argument and ignore grouped_df if check_groups = FALSE #109

Merged
merged 10 commits into from
Feb 12, 2022

Conversation

rossellhayes
Copy link
Contributor

@rossellhayes rossellhayes commented Feb 4, 2022

  • Adds argument ignore_class to class checks (and passed by table, vector, and column checks), specifying class differences to ignore.
    • Unnamed elements are dropped from the list of classes for the purpose of comparison: e.g. ignore_class = "glue" will ignore cases where .result is glue/character and .solution is character, or vice versa.
    • Named elements are treated as equivalent to their names: e.g. ignore_class = c("integer" = "numeric") will ignore cases where .result is integer and .solution is numeric, or vice versa.
  • Table checking now ignores the grouped_df class if check_groups = FALSE.
  • Classes are now checked with setequal() instead of identical(), so objects with the same classes in different orders will not trigger a class problem.
library(dplyr)
library(tblcheck)

.result   <- data.frame(a = 1, b = 2)
.solution <- tibble(a = 1, b = 2)

tbl_check_class(ignore_class = c("tbl_df", "tbl"))
# No output

.result   <- 1L
.solution <- 1

vec_check_class(ignore_class = c("integer" = "numeric"))
# No output

.result   <- tibble(a = 1, b = 2)
.solution <- tibble(a = 1, b = 2) %>% group_by(b)

tbl_check(check_groups = FALSE)
# No output

Created on 2022-02-08 by the reprex package (v2.0.1)

Closes #108.

@rossellhayes rossellhayes added the bug Something isn't working label Feb 4, 2022
@rossellhayes rossellhayes self-assigned this Feb 4, 2022
@gadenbuie
Copy link
Member

I bumped into this unusual behavior, I don't think it's intended.

library(dplyr)
library(tblcheck)

.result <- tibble(a = 1L, b = "2", c = "3")
.solution <- tibble(a = 1, b = 2, c = 3L)

tbl_check(ignore_class = c("numeric" = "integer", "character" = "numeric"))
#> <tblcheck problem>
#> Your `a` column should be a number (class `numeric`), but it is an integer (class `integer`).
#> $ type           : chr "class"
#> $ expected       : chr "numeric"
#> $ actual         : chr "integer"
#> $ expected_length: int 1
#> $ actual_length  : int 1
#> $ location       : chr "column"
#> $ column         : chr "a"

tbl_check(ignore_class = c("numeric" = "integer", "numeric" = "character"))
#> <tblcheck problem>
#> Your `b` column should be a number (class `numeric`), but it is a text string (class `character`).
#> $ type           : chr "class"
#> $ expected       : chr "numeric"
#> $ actual         : chr "character"
#> $ expected_length: int 1
#> $ actual_length  : int 1
#> $ location       : chr "column"
#> $ column         : chr "b"

Also after thinking about this more, I wonder if we should skip class checking altogether if, after removing the ignored classes from the set of classes on .solution, we have no classes left. That would let us use ignore_class = "numeric" to ignore instances where the .solution is numeric without having to specify each class/pair that might go with a numeric. (i.e. skip a and b)

.result <- tibble(a = 1L, b = "2", c = "3")
.solution <- tibble(a = 1, b = 2, c = 3L)

tbl_check(ignore_class = "numeric")

R/check_class.R Outdated
Comment on lines 81 to 91
# Replace classes that match named elements of `ignore_class` with the
# element's name. This allows us to ignore differences between the element
# class and the name class.
obj_class_ignored[obj_class_ignored %in% ignore_class[paired]] <-
names(ignore_class[paired])[
na.omit(match(obj_class_ignored, ignore_class[paired]))
]
exp_class_ignored[exp_class_ignored %in% ignore_class[paired]] <-
names(ignore_class[paired])[
na.omit(match(exp_class_ignored, ignore_class[paired]))
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this in a loop to avoid problem of ignore_class("integer" = "numeric", "numeric" = "character")

@rossellhayes
Copy link
Contributor Author

@gadenbuie

  • 4b5d6aa should resolve the bug you found
  • 85af52e skips class checking if all classes of expected are included in ignore_class

Copy link
Member

@gadenbuie gadenbuie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🥳

@rossellhayes rossellhayes merged commit 1e28a16 into main Feb 12, 2022
@rossellhayes rossellhayes deleted the ignore-class branch February 12, 2022 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tbl_check(check_groups = FALSE) still finds problem if one table is grouped and the other is not
2 participants