aorsf with doParallel cluster has interesting error #85

frankiethull · 2024-06-27T18:28:09Z

Hi Simon -

TLDR: I think doParallel clusters are causing an issue with aorsf engine.

I was eager to test the new aorsf random forest engine after your blog post & new bonsai release to CRAN!

My goal was to benchmark against some bagged and boosted trees for one of my projects. When I swapped out the rand_forest and switched to aorsf, I kept getting a weird error: "parsnip could not locate an implementation for rand_forest regression model specifications using the aorsf engine." But I had updated bonsai, parsnip, even switched R version as I thought I was losing my mind. To make matters even more confusing, I setup a simple regex yesterday, but it worked. (?!). So I thought something was going on with function masking or my environment, constantly refreshing it and testing my main script & the reproducible example.

The one thing I didn't have in my regex yesterday was the pretraining setting: (cluster <- makePSOCKcluster(8); registerDoParallel(cluster)). Once I initialize this cluster in the regex, it seems there is an issue with training the aorsf model (I think?). It's the only way I was able to reproduce the error in the regex.

I am wondering if I should go about this differently, not run a cluster for aorsf?, maybe it is compatible with a different library cluster library? I initialize clusters for bagged and boosted trees so there will be one in my environment unless I ran aorsf in a different script altogether (I am currently running a Quarto code chunk for each model). Open to feedback, solution, or if I'm just losing my mind. Each time it says parsnip could not locate, you can see that it is there with show_engines().

# setup libs, data, recipe ----------------------------------------------------------

library(doParallel) # The issue arised after creating a doParallel cluster ? I think so ??
#> Warning: package 'doParallel' was built under R version 4.4.1
#> Loading required package: foreach
#> Warning: package 'foreach' was built under R version 4.4.1
#> Loading required package: iterators
#> Warning: package 'iterators' was built under R version 4.4.1
#> Loading required package: parallel

library(parsnip)
#> Warning: package 'parsnip' was built under R version 4.4.1


``` r
library(rsample)
#> Warning: package 'rsample' was built under R version 4.4.1

library(tune)
#> Warning: package 'tune' was built under R version 4.4.1

library(yardstick)
#> Warning: package 'yardstick' was built under R version 4.4.1

library(workflows)
#> Warning: package 'workflows' was built under R version 4.4.1

library(recipes)
#> Warning: package 'recipes' was built under R version 4.4.1
#> Loading required package: dplyr
#> Warning: package 'dplyr' was built under R version 4.4.1
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

library(dplyr)
library(bonsai)
#> Warning: package 'bonsai' was built under R version 4.4.1

#library(aorsf)
library(tidymodels)   # THE ISSUE IS AN UNDERLYING LIB mask? ??? I don't think so
#> Warning: package 'tidymodels' was built under R version 4.4.1
#> Warning: package 'broom' was built under R version 4.4.1
#> Warning: package 'dials' was built under R version 4.4.1
#> Warning: package 'scales' was built under R version 4.4.1
#> Warning: package 'ggplot2' was built under R version 4.4.1
#> Warning: package 'infer' was built under R version 4.4.1
#> Warning: package 'modeldata' was built under R version 4.4.1
#> Warning: package 'purrr' was built under R version 4.4.1
#> Warning: package 'tibble' was built under R version 4.4.1
#> Warning: package 'tidyr' was built under R version 4.4.1
#> Warning: package 'workflowsets' was built under R version 4.4.1

library(finetune)
#> Warning: package 'finetune' was built under R version 4.4.1

#library(dplyr)

training <- ChickWeight |> ungroup() |> tibble::as_tibble() |> mutate(Chick = as.numeric(Chick))

folds <- vfold_cv(data = training, v = 10)

rm_smth <- "Diet"

model_recipe <- 
   recipe(weight ~ ., training) |>
    step_rm(any_of(!!rm_smth)) |>
    step_dummy(all_nominal_predictors()) |>
    step_YeoJohnson(all_nominal_predictors()) 


# random forest spec and grid ----------------------------------------
forest_grid <-  expand.grid(
  trees = c(20, 50),
  mtry = c(2, 5, 7)
)

orf_spec <- rand_forest(
  trees = tune(),
  mtry = tune()) |>
  set_engine("aorsf") |> # issues arise for aorsf:
  set_mode("regression")


# pre training settings ---
cluster <- makePSOCKcluster(8)
registerDoParallel(cluster)

# model creation -------------------------------------------
orf_results <-
  finetune::tune_race_anova(
    workflow() |>
      add_recipe(model_recipe) |>
      add_model(orf_spec),
    resamples = folds,
    grid = forest_grid,
    control = control_race(),
    metrics = metric_set(yardstick::rmse)
  )
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Error in `test_parameters_gls()`:
#> ! There were no valid metrics for the ANOVA model.

# post training settings ---
stopCluster(cluster)
registerDoSEQ()


show_notes(.Last.tune.result)
#> unique notes:
#> ──────────────────────────────────────────────────────────────
#> Error:
#> ! parsnip could not locate an implementation for `rand_forest`
#>   regression model specifications using the `aorsf` engine.

parsnip::show_engines("rand_forest")
#> # A tibble: 10 × 2
#>    engine       mode          
#>    <chr>        <chr>         
#>  1 ranger       classification
#>  2 ranger       regression    
#>  3 randomForest classification
#>  4 randomForest regression    
#>  5 spark        classification
#>  6 spark        regression    
#>  7 partykit     regression    
#>  8 partykit     classification
#>  9 aorsf        classification
#> 10 aorsf        regression

# select_best(orf_results)

^{Created on 2024-06-27 with reprex v2.1.0}

The text was updated successfully, but these errors were encountered:

simonpcouch · 2024-06-28T21:09:13Z

Hey @frankiethull! The parallelism + parsnip extension package interaction can be dizzying to debug, for sure. Sorry that sent you down such a rabbit hole.

For now, confirming that I can reproduce your issue (albeit with a different error).

library(tidymodels)
library(bonsai)

training <- as_tibble(ChickWeight) |> mutate(Chick = as.numeric(Chick))

folds <- vfold_cv(data = training, v = 5)

spec_rf <- 
  rand_forest(trees = 100, mtry = tune()) %>%
  set_mode("regression")
  
spec_orf <- spec_rf %>% set_engine("aorsf")
spec_pk <- spec_rf %>% set_engine("partykit")

res_orf_seq <- tune_grid(spec_orf, weight ~ ., resamples = folds)
#> i Creating pre-processing data to finalize unknown parameter: mtry

res_pk_seq <- tune_grid(spec_pk, weight ~ ., resamples = folds)
#> i Creating pre-processing data to finalize unknown parameter: mtry

show_notes(res_orf_seq)
#> Great job! No notes to show.

show_notes(res_pk_seq)
#> Great job! No notes to show.

library(doParallel)
#> Loading required package: foreach
#> 
#> Attaching package: 'foreach'
#> The following objects are masked from 'package:purrr':
#> 
#>     accumulate, when
#> Loading required package: iterators
#> Loading required package: parallel

cluster <- makePSOCKcluster(8)
registerDoParallel(cluster)

res_orf_par <- tune_grid(spec_orf, weight ~ ., resamples = folds)
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.

res_pk_par <- tune_grid(spec_pk, weight ~ ., resamples = folds)
#> i Creating pre-processing data to finalize unknown parameter: mtry

show_notes(res_orf_par)
#> unique notes:
#> ────────────────────────────────────────────────────────────────────────────────
#> Error in `pull_workflow_spec_encoding_tbl()`:
#> ! Exactly 1 model/engine/mode combination must be located.
#> ℹ This is an internal error that was detected in the workflows package.
#>   Please report it at <https://github.com/tidymodels/workflows/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.

show_notes(res_pk_par)
#> Great job! No notes to show.

^{Created on 2024-06-28 with reprex v2.1.0}

I was curious whether this was some sort of issue with our support for cluster sockets generally with parsnip extensions or the aorsf package specifically. Looks like this has something to do with aorsf specifically---possibly an issue with it model registration code. I am experiencing Friday afternoon brain fry, so will revisit next week. :)

simonpcouch · 2024-06-28T21:14:18Z

Two more data points:

I see the same aorsf-specific error with socket clusters via future, i.e. library(future); plan(multisession)
I do not see the same error with forking, i.e.

library(future)
rlang::local_options(parallelly.fork.enable = TRUE)
plan(multicore)

...works fine. So, maybe the recommendation if you're running some models this weekend is to briefly transition to forking, if possible🙃

frankiethull · 2024-06-28T22:36:10Z

No worries at all on the rabbit hole, I just wanted to give you some info on my side quest!

I had tried dev versions of parsnip and bonsai too, to see if something would change and tested partykit as well. partykit works for me too. I think you are right, it may be narrowed down to the aorsf library. tbf, this is the first time I had ran that engine, and haven't trained a random forest in awhile, so had tried a lot of stuff. Changing column types, recipe steps, dropping NAs (typically leave those in for bagging/boosting unless running CQR via probably), removing the tidymodels meta pkg, etc. So I'm glad it isn't just me!

Lovely to see that future with forking works. I will have to give that a try. Have a great weekend and happy to test stuff on my end next week, or whenever -- no rush of course. Appreciate all you do!

VIRADUS · 2024-07-02T10:53:02Z

Hi @simonpcouch,

Is there a way to implement the parallelization on Windows as well? I understand that the plan(multicore) method does not work on Windows systems.

Thanks!

simonpcouch · 2024-07-02T20:29:15Z

Is there a way to implement the parallelization on Windows as well? I understand that the plan(multicore) method does not work on Windows systems.

You're correct. There are some other experimental future backends, but I'd likely recommend just waiting until I've managed to figure this bug out—I'm focused here this afternoon. :)

A couple more bits of information as I debug:

If I mock into workflows:::pull_workflow_spec_encoding_tbl to print out the encodings, only the "plain" parsnip model environment is there (i.e. bonsai has not been loaded in the worker and thus neither the registrations for aorsf or e.g. partykit)
partykit (another engine that requires an extension package) resamples fits just fine. If I resample a partykit model in those workers, it will effectively load bonsai, and then I can go back and fit aorsf just fine.

library(tidymodels)
library(bonsai)
library(future)

plan(multisession, workers = 5)

spec_rf <- 
  rand_forest(trees = 100, mtry = tune()) %>%
  set_mode("regression") %>% 
  set_engine("aorsf")

folds <- vfold_cv(sim_regression(200)[1:5])

res_orf <- tune_grid(spec_rf, outcome ~ ., folds)
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> x Fold01: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold02: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold03: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold04: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold05: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold06: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold07: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold08: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold09: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> x Fold10: preprocessor 1/1:
#>   Error in `pull_workflow_spec_encoding_tbl()`:
#>   ! Exactly 1 model/engine/mode combination must be located.
#>   ℹ This is an internal error that was detected in the workflows package.
#>     Please report it at <https://github.com/tidymodels/workflows/issues>...
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.

# resample another model that also needs bonsai
res_pk <- tune_grid(spec_rf %>% set_engine("partykit"), outcome ~ ., folds)
#> i Creating pre-processing data to finalize unknown parameter: mtry

# same code works okay now...
res_orf <- tune_grid(spec_rf, outcome ~ ., folds)
#> i Creating pre-processing data to finalize unknown parameter: mtry

^{Created on 2024-07-02 with reprex v2.1.0}

Definitely smells like a model registration issue to me at this point.

simonpcouch · 2024-07-02T20:35:20Z

Aha, got it! :)

simonpcouch mentioned this issue Jul 2, 2024

set bonsai as dependency for "aorsf" #86

Merged

simonpcouch closed this as completed in #86 Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aorsf with doParallel cluster has interesting error #85

aorsf with doParallel cluster has interesting error #85

frankiethull commented Jun 27, 2024

simonpcouch commented Jun 28, 2024

simonpcouch commented Jun 28, 2024

frankiethull commented Jun 28, 2024

VIRADUS commented Jul 2, 2024

simonpcouch commented Jul 2, 2024

simonpcouch commented Jul 2, 2024

aorsf with doParallel cluster has interesting error #85

aorsf with doParallel cluster has interesting error #85

Comments

frankiethull commented Jun 27, 2024

simonpcouch commented Jun 28, 2024

simonpcouch commented Jun 28, 2024

frankiethull commented Jun 28, 2024

VIRADUS commented Jul 2, 2024

simonpcouch commented Jul 2, 2024

simonpcouch commented Jul 2, 2024