- fixed an issue where omitting NA values would cause an error in regression forests.
-
orsf_vs
now returns a column that contains non-reference coded variable names (see #52). -
orsf_vs
no longer throws an error whenn_predictor_min = 1
is used (see #58). -
orsf_summarize_uni
now allows specification of a class to summarize for oblique classification forests (see #57). -
fixed an issue where
orsf
would throw an uninformative error when all predictors were categorical (see #56) -
oblique random forests can now compute out-of-bag predictions on modified versions of their training data (see #54)
-
Setting
oobag_pred_type
to'none'
when growing a forest no longer necessitates the specification ofpred_type
when callingpredict
later (see #48). -
Setting
sample_fraction
to 1 will no longer result in emptyoobag_rows
in the forest object (this would cause R to crash when the forest was passed to C++; see #48) -
Re-worked the creation and maintenance of
oobag_denom
in C++ routines (see #48). -
Restricted mean survival time is now used for
pred_type = 'time'
instead of median survival time (See #46).
- minor changes to partial dependence vignette to resolve code sanitization errors.
-
Allowed option
"time"
forpred_type
inpredict
and partial dependence to predict survival time (see #37). -
Added
pred_spec_auto()
for more convenient specification of variables for partial dependence. -
Partial dependence now runs much faster with multiple threads.
-
Added
orsf_vint()
to compute variable interaction scores using partial dependence. -
Added
orsf_update()
, which can copy and modify anobliqueForest
or modify it in place. -
Added
orsf_control
functions for classification, regression, and survival (#25). -
optimization implemented for matrix multiplication during prediction (#20)
-
Fixed an uninitialized value for
pd_type
-
Fixed various issues related to memory leaks
-
Re-worked internal C++ routines following the design of
ranger
. -
Re-worked how progress is printed to console when
verbose_progress
isTRUE
, following the design ofranger
. Messages now indicate the action being taken, the % complete, and the approximate time until finishing the action. -
Improved variable importance, following the design of
ranger
. Importance is now computed tree-by-tree instead of by aggregate. Additionally, mortality is the type of prediction used for importance with survival trees, since mortality does not depend onpred_horizon
. -
Allowed multi-threading to be performed in
orsf()
,predict.orsf_fit()
, and functions in theorsf_vi()
andorsf_pd()
family. -
Allowed sampling without replacement and sampling a specific fraction of observations in
orsf()
-
Included Harrell's C-statistic as an option for assessing goodness of splits while growing trees.
-
Fixed an issue where an uninformative error message would occur when
pred_horizon
was > max(time) fororsf_summarize_uni
. Thanks to @JyHao1 and @DustinMLong for finding this!
- Additional changes in internal testing to avoid problems with ATLAS
- Minor fix for internal tests that were failing when run on ATLAS
-
orsf()
no longer throws errors or warnings when you try to give it a single predictor. A note was added to the documentation in the details of?orsf
that explains why using a single predictor withorsf()
is somewhat useless. This was done to resolve mlr-org/mlr3extralearners#259. -
predict.orsf_fit
now acceptspred_horizon = 0
and returns sensible values. Thanks to @mattwarkentin for the feature request. -
added a function to perform variable selection,
orsf_vs()
. -
Made variable importance consistent with respect to
group_factors
. Originally, the output fromorsf
would have ungrouped VI values whileorsf_vi
would have grouped values. With this update,orsf
defaults to grouped values. The ungrouped values can still be recovered. -
Fixed an issue in
orsf_pd
functions where output data were not being returned on the original scale.
-
orsf
formulas now acceptsSurv
objects (see #11) -
Added
verbose_progress
input toorsf
, which prints messages to console indicating progress. -
Allowance of missing values for
orsf
. Mean and mode imputation is performed for observations with missing data. These values can also be used to impute new data with missing values. -
Centering and scaling of predictors is now done prior to growing the forest.
-
Included rOpenSci reviewers Christopher Jackson, Marvin N Wright, and Lukas Burk in
DESCRIPTION
as reviewers. Thank you! -
Added clarification to docs about pros/cons of different variable importance techniques
-
Added regression tests for
aorsf
versusobliqueRSF
(they should be similar) -
Additional support and tests for functions with long right hand sides
-
Updated out-of-bag vignette with more appropriate custom functions.
-
Allow status values in input data to be more general, i.e., not just 0 and 1.
-
Allow missing values in
predict
functions, including partial dependence.
- Modified unit tests for compatibility with extra checks run through CRAN.
-
Added
orsf_control_custom()
, which allows users to submit custom functions for identifying linear combinations of inputs while growing oblique decision trees. -
Added
weights
input toorsf
, allowing users to over or under fitorsf
to specific data in their training set. -
Added
chf
andmort
options topredict.orsf_fit()
. Mortality predictions are not fully implemented yet - they are not supported in partial dependence or out-of-bag error estimates. These features will be added in a future update.
-
Core features implemented: fit, interpret, and predict using oblique random survival forests.
-
Vignettes + Readme covering usage of core features.
-
Website hosted through GitHub pages, managed with
pkgdown
.