-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiclass support for caretList and caretStack #260
Conversation
… for binary classification in caretEnsemble
Awesome! Thank you for the PR. I will review. |
Sorry for the delay. I plan to review this week! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple small comments, but it generally looks good.
Did you use an automatic linter to add all the code style stuff (e.g. spaces around "=")? In the future it's a good practice to make a PR that applies your delinting first, and then follow up with a second PR that makes your code changes.
I appreciate the delinting! It's just a good practice to separate style changes from code changes.
Thank you again! And please see PR comments for specifics changes requested.
tests/testthat/test-ensemble.R
Outdated
expect_identical(pred.classa, pred.classb) | ||
expect_less_than(abs(0.9749462 - pred.classc), 0.01) | ||
expect_equal(pred.classc[, 1], 0.9489, tol = 0.0001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails on my machine (mac, apple silicon, R 4.3.2, caret caret_6.0-94). Can you loosen the tolerance a bit, perhaps to 0.001
?
expect_equal(pred.classc[, 1], 0.9489, tol = 0.0001) | |
expect_equal(pred.classc[, 1], 0.9489, tol = 0.001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will keep the previously established tolerance (0.01).
) | ||
|
||
#Order, and return | ||
overall <- norm_to_100(apply(dat, 1, weighted.mean, w = weights)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, why'd you make this one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because it was easier for me to understand this part of the code. Perhaps it was more convenient to leave the line as it was at the beginning. But I just realized that there is an error in the line if (grepl("_", name)) sub("_[^_]*$", "", name) else name
. If the class name contains an '_', it produces an unexpected result.
) | ||
lintr::expect_lint_free(linters=my_linters) | ||
}) | ||
# context("Code is high quality and lint free") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why'd you comment this out? I'd like to keep this test please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, the test was not working. Then I made some changes, and it started working. However, the tests were failing because the code is not lint-free (for example, commas_linter
is giving a lot of warnings). If you prefer, I can fix this in another pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you comment out just the linters that fail, e.g. commas_linter
?
And then make an issue for each commented out linter in https://github.com/zachmayer/caretEnsemble/issues and at some point in the future we can fix them.
Yes, I use an automatic linter. Initially, I tried not to use it in this project, but eventually I did and didn't realize it. Next time, I will separate it into two different PRs! Regarding the linter test, I will remove the comments and address the warnings in another PR. |
Awesome, thank you for the updates! I think we're almost there! we can sort out the lint in a different PR. I actually appreciate the de-linting. I should use the same automatic tool: what do you use to lint your code as you write it? |
@antongomez remind me where we're at here. Did you respond to all my feedback? If so I'll do one last review and merge. (we can deal with linting in another PR once this is in) |
Crap. I delete the branch by accident. Let me see how I can restore it |
Ok I think I did it. Sorry. Please let me know if you need anything else before we merge this! |
Ok, we have a lot of merge conflicts. I'll work on this. |
Sorry for taking so long to respond. I'm going to add a test to add a unit test to expect an error from caret when model names in Regarding to the automatic tool that I'm using to format automatically the code, I'm using vscode and I added the extension for R. |
No worries! A lot of people were off this week.
I’ve already delinted the repo and added 100% test coverage (see my most
recent PRs).
I need to rebase my multiclass PR— then could you rebase your PR?
Sorry this got complicated, but I think it’s close to done!
|
Here is the delint PR: #273 @antongomez I think what I'll do is merge your PR into #271, which I will then need to rebase and have you review. |
Okay, I saw your delint PR and the test coverage PR. But what exactly do you need from me now? @zachmayer I'm a little bit lost. |
I’ll take it from here
|
15f1664
to
b8ec20c
Compare
PR here to merge to main: #280 |
Basically, the problem was that this branch was off the 8-year old multiclass branch, and was therefore missing all the changes I've made since then. I have now rebased and merged my original multiclass branch, and will work on getting your commits in too. |
Remove duplicated argument "size" in plot.caretEnsemble Rewrite condition on predict.caretList Solve test warnings and fails (skip test-lintr) Modify tests to support multiclass Multiclass suport for caretList and caretStack Fix problems with underscores in method and class names and add check for binary classification in caretEnsemble Add unity tests for multiclass Let the user choose which class to exclude in careStack train and predict Solved bug for class weights when calculate varImp in caretEnsemble Change cheks in check_binary_classification and other minor changes
Remove duplicated argument "size" in plot.caretEnsemble Rewrite condition on predict.caretList Solve test warnings and fails (skip test-lintr) Modify tests to support multiclass Multiclass suport for caretList and caretStack Fix problems with underscores in method and class names and add check for binary classification in caretEnsemble Add unity tests for multiclass Let the user choose which class to exclude in careStack train and predict Solved bug for class weights when calculate varImp in caretEnsemble Change cheks in check_binary_classification and other minor changes
REBASED BY ZACH DEANE MAYER Remove duplicated argument "size" in plot.caretEnsemble Rewrite condition on predict.caretList Solve test warnings and fails (skip test-lintr) Modify tests to support multiclass Multiclass suport for caretList and caretStack Fix problems with underscores in method and class names and add check for binary classification in caretEnsemble Add unity tests for multiclass Let the user choose which class to exclude in careStack train and predict Solved bug for class weights when calculate varImp in caretEnsemble Change cheks in check_binary_classification and other minor changes
* SQUASH OF THE COMMITS FROM #260 REBASED BY ZACH DEANE MAYER Remove duplicated argument "size" in plot.caretEnsemble Rewrite condition on predict.caretList Solve test warnings and fails (skip test-lintr) Modify tests to support multiclass Multiclass suport for caretList and caretStack Fix problems with underscores in method and class names and add check for binary classification in caretEnsemble Add unity tests for multiclass Let the user choose which class to exclude in careStack train and predict Solved bug for class weights when calculate varImp in caretEnsemble Change cheks in check_binary_classification and other minor changes * try to get to 100% * rebuild * fix lint * readd * Multiclass pr (#281) * Add tests to check that the character | is not allowed in variable na… (#282) --------- Co-authored-by: antongomez <[email protected]> Co-authored-by: Zach Deane-Mayer <[email protected]>
The following changes were made:
predict.caretList
now returns a matrix of probabilities for classification problems.makePredObsMatrix
uses probabilities for all but one class.predict.caretStack
now returns a matrix of probabilities.setMulticlassExcludedLevel
andgetMulticlassExcludedLevel
can be used to determine which class to exclude. If a class outside the range 1,...,num_classes
is provided,makePredObsMatrix
andpredict.caretStack
will use probabilities for ALL classes and a warning will be displayed, as this may cause collinearity issues.caretEnsemble
still works only for binary classification problems. This is due to the method used,glm
, which only works with binary class problems. Instead of this method,caretStack
can be used with other methods, such asmultinom
, as suggested here.