Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saveData throws NAs in train_data #28

Closed
cebarboza opened this issue Sep 30, 2024 · 0 comments · Fixed by #27
Closed

saveData throws NAs in train_data #28

cebarboza opened this issue Sep 30, 2024 · 0 comments · Fixed by #27

Comments

@cebarboza
Copy link
Collaborator

Hi,

I noticed there was a change in saveData() compared to the version I was using in tests. If data is TRUE or FALSE, is not converting the data to 0 and 1 like before in here.

train_data <- data.frame(check.names = FALSE,
                          outcomeCount = c(FALSE,FALSE,FALSE,
                                           FALSE,FALSE,TRUE),
                           `198124209` = c(FALSE,FALSE,FALSE,
                                           FALSE,FALSE,TRUE),
                           `316139209` = c(FALSE,FALSE,FALSE,
                                           FALSE,FALSE,FALSE),
                           `316139210` = c(FALSE,FALSE,FALSE,
                                           FALSE,FALSE,FALSE)
                        )

binary_cols <- sapply(1:ncol(train_data), function(c) all(train_data[[c]] %in% 0:1))
train_data[binary_cols] <- lapply(colnames(train_data[binary_cols]), function(c) factor(train_data[[c]], levels=c("0","1"), labels=c(0,1)))
  
print(train_data[binary_cols])
#>   outcomeCount 198124209 316139209 316139210
#> 1         <NA>      <NA>      <NA>      <NA>
#> 2         <NA>      <NA>      <NA>      <NA>
#> 3         <NA>      <NA>      <NA>      <NA>
#> 4         <NA>      <NA>      <NA>      <NA>
#> 5         <NA>      <NA>      <NA>      <NA>
#> 6         <NA>      <NA>      <NA>      <NA>

Maybe something like to account for TRUE and FALSE?

train_data <- data.frame(check.names = FALSE,
                         outcomeCount = c(FALSE,FALSE,FALSE,
                                          FALSE,FALSE,TRUE),
                         `198124209` = c(FALSE,FALSE,FALSE,
                                         FALSE,FALSE,TRUE),
                         `316139209` = c(FALSE,FALSE,FALSE,
                                         FALSE,FALSE,FALSE),
                         `316139210` = c(FALSE,FALSE,FALSE,
                                         FALSE,FALSE,FALSE)
)

binary_cols <- sapply(train_data, function(col) all(col %in% c(0, 1, TRUE, FALSE)))

# Convert TRUE/FALSE to 1/0 and create factors
train_data[binary_cols] <- lapply(train_data[binary_cols], function(col) {
  col <- as.numeric(as.logical(col))  # Convert TRUE/FALSE to 1/0
  factor(col, levels = c(0, 1), labels = c(0, 1))  # Convert to factors
})

print(train_data[binary_cols])
#>   outcomeCount 198124209 316139209 316139210
#> 1            0         0         0         0
#> 2            0         0         0         0
#> 3            0         0         0         0
#> 4            0         0         0         0
#> 5            0         0         0         0
#> 6            1         1         0         0

Created on 2024-09-30 with reprex v2.1.1

@cebarboza cebarboza linked a pull request Sep 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant