Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiclass support for caretList and caretStack #260

Closed
wants to merge 11 commits into from

Conversation

antongomez
Copy link
Contributor

The following changes were made:

  • predict.caretList now returns a matrix of probabilities for classification problems.
  • Internally, makePredObsMatrix uses probabilities for all but one class.
  • predict.caretStack now returns a matrix of probabilities.
  • Functions setMulticlassExcludedLevel and getMulticlassExcludedLevel can be used to determine which class to exclude. If a class outside the range 1,...,num_classes is provided, makePredObsMatrix and predict.caretStack will use probabilities for ALL classes and a warning will be displayed, as this may cause collinearity issues.
  • All tests have passed, and specific tests for multiclass functionality were added.

caretEnsemble still works only for binary classification problems. This is due to the method used, glm, which only works with binary class problems. Instead of this method, caretStack can be used with other methods, such as multinom, as suggested here.

@zachmayer
Copy link
Owner

Awesome! Thank you for the PR. I will review.

@zachmayer
Copy link
Owner

Sorry for the delay. I plan to review this week!

Copy link
Owner

@zachmayer zachmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple small comments, but it generally looks good.

Did you use an automatic linter to add all the code style stuff (e.g. spaces around "=")? In the future it's a good practice to make a PR that applies your delinting first, and then follow up with a second PR that makes your code changes.

I appreciate the delinting! It's just a good practice to separate style changes from code changes.

Thank you again! And please see PR comments for specifics changes requested.

expect_identical(pred.classa, pred.classb)
expect_less_than(abs(0.9749462 - pred.classc), 0.01)
expect_equal(pred.classc[, 1], 0.9489, tol = 0.0001)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails on my machine (mac, apple silicon, R 4.3.2, caret caret_6.0-94). Can you loosen the tolerance a bit, perhaps to 0.001?

Suggested change
expect_equal(pred.classc[, 1], 0.9489, tol = 0.0001)
expect_equal(pred.classc[, 1], 0.9489, tol = 0.001)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep the previously established tolerance (0.01).

)

#Order, and return
overall <- norm_to_100(apply(dat, 1, weighted.mean, w = weights))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why'd you make this one line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because it was easier for me to understand this part of the code. Perhaps it was more convenient to leave the line as it was at the beginning. But I just realized that there is an error in the line if (grepl("_", name)) sub("_[^_]*$", "", name) else name. If the class name contains an '_', it produces an unexpected result.

R/caretList.R Show resolved Hide resolved
R/caretList.R Show resolved Hide resolved
R/caretList.R Show resolved Hide resolved
tests/testthat/test-ensembleMethods.R Show resolved Hide resolved
tests/testthat/test-ensembleMethods.R Show resolved Hide resolved
tests/testthat/test-ensembleMethods.R Show resolved Hide resolved
tests/testthat/test-ensembleMethods.R Show resolved Hide resolved
)
lintr::expect_lint_free(linters=my_linters)
})
# context("Code is high quality and lint free")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why'd you comment this out? I'd like to keep this test please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, the test was not working. Then I made some changes, and it started working. However, the tests were failing because the code is not lint-free (for example, commas_linter is giving a lot of warnings). If you prefer, I can fix this in another pull request.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you comment out just the linters that fail, e.g. commas_linter?

And then make an issue for each commented out linter in https://github.com/zachmayer/caretEnsemble/issues and at some point in the future we can fix them.

@antongomez antongomez closed this Jun 20, 2024
@antongomez antongomez reopened this Jun 20, 2024
@antongomez
Copy link
Contributor Author

I have a couple small comments, but it generally looks good.

Did you use an automatic linter to add all the code style stuff (e.g. spaces around "=")? In the future it's a good practice to make a PR that applies your delinting first, and then follow up with a second PR that makes your code changes.

I appreciate the delinting! It's just a good practice to separate style changes from code changes.

Thank you again! And please see PR comments for specifics changes requested.

Yes, I use an automatic linter. Initially, I tried not to use it in this project, but eventually I did and didn't realize it. Next time, I will separate it into two different PRs!

Regarding the linter test, I will remove the comments and address the warnings in another PR.

This was referenced Jun 21, 2024
@zachmayer
Copy link
Owner

Awesome, thank you for the updates! I think we're almost there!

we can sort out the lint in a different PR. I actually appreciate the de-linting. I should use the same automatic tool: what do you use to lint your code as you write it?

@zachmayer
Copy link
Owner

@antongomez remind me where we're at here. Did you respond to all my feedback? If so I'll do one last review and merge.

(we can deal with linting in another PR once this is in)

@zachmayer zachmayer deleted the branch zachmayer:multiclass June 26, 2024 01:07
@zachmayer zachmayer closed this Jun 26, 2024
@zachmayer
Copy link
Owner

Crap. I delete the branch by accident. Let me see how I can restore it

@zachmayer zachmayer reopened this Jun 26, 2024
@zachmayer
Copy link
Owner

Ok I think I did it. Sorry. Please let me know if you need anything else before we merge this!

@zachmayer zachmayer mentioned this pull request Jun 26, 2024
@zachmayer
Copy link
Owner

Ok, we have a lot of merge conflicts. I'll work on this.

@antongomez
Copy link
Contributor Author

Sorry for taking so long to respond. I'm going to add a test to add a unit test to expect an error from caret when model names in caretList contains character |. Also, I've prepared the lint PR, so when we finish with this one, I will make it.

Regarding to the automatic tool that I'm using to format automatically the code, I'm using vscode and I added the extension for R.

@zachmayer
Copy link
Owner

zachmayer commented Jul 5, 2024 via email

@zachmayer
Copy link
Owner

zachmayer commented Jul 5, 2024

Here is the delint PR: #273
Here is the 100% test coverage PR: #275

@antongomez I think what I'll do is merge your PR into #271, which I will then need to rebase and have you review.

@antongomez
Copy link
Contributor Author

antongomez commented Jul 8, 2024

Here is the delint PR: #273 Here is the 100% test coverage PR: #275

@antongomez I think what I'll do is merge your PR into #271, which I will then need to rebase and have you review.

Okay, I saw your delint PR and the test coverage PR. But what exactly do you need from me now? @zachmayer I'm a little bit lost.

@zachmayer
Copy link
Owner

zachmayer commented Jul 8, 2024 via email

@zachmayer zachmayer force-pushed the multiclass branch 2 times, most recently from 15f1664 to b8ec20c Compare July 9, 2024 20:12
@zachmayer zachmayer deleted the branch zachmayer:multiclass July 9, 2024 20:22
@zachmayer zachmayer closed this Jul 9, 2024
@zachmayer
Copy link
Owner

PR here to merge to main: #280

@zachmayer
Copy link
Owner

Basically, the problem was that this branch was off the 8-year old multiclass branch, and was therefore missing all the changes I've made since then. I have now rebased and merged my original multiclass branch, and will work on getting your commits in too.

zachmayer pushed a commit that referenced this pull request Jul 9, 2024
Remove duplicated argument "size" in plot.caretEnsemble

Rewrite condition on predict.caretList

Solve test warnings and fails (skip test-lintr)

Modify tests to support multiclass

Multiclass suport for caretList and caretStack

Fix problems with underscores in method and class names and add check for binary classification in caretEnsemble

Add unity tests for multiclass

Let the user choose which class to exclude in careStack train and predict

Solved bug for class weights when calculate varImp in caretEnsemble

Change cheks in check_binary_classification and other minor changes
zachmayer pushed a commit that referenced this pull request Jul 9, 2024
Remove duplicated argument "size" in plot.caretEnsemble

Rewrite condition on predict.caretList

Solve test warnings and fails (skip test-lintr)

Modify tests to support multiclass

Multiclass suport for caretList and caretStack

Fix problems with underscores in method and class names and add check for binary classification in caretEnsemble

Add unity tests for multiclass

Let the user choose which class to exclude in careStack train and predict

Solved bug for class weights when calculate varImp in caretEnsemble

Change cheks in check_binary_classification and other minor changes
zachmayer pushed a commit that referenced this pull request Jul 9, 2024
REBASED BY ZACH DEANE MAYER

Remove duplicated argument "size" in plot.caretEnsemble

Rewrite condition on predict.caretList

Solve test warnings and fails (skip test-lintr)

Modify tests to support multiclass

Multiclass suport for caretList and caretStack

Fix problems with underscores in method and class names and add check for binary classification in caretEnsemble

Add unity tests for multiclass

Let the user choose which class to exclude in careStack train and predict

Solved bug for class weights when calculate varImp in caretEnsemble

Change cheks in check_binary_classification and other minor changes
@antongomez antongomez deleted the multiclass branch July 10, 2024 17:40
zachmayer added a commit that referenced this pull request Jul 10, 2024
* SQUASH OF THE COMMITS FROM #260

REBASED BY ZACH DEANE MAYER

Remove duplicated argument "size" in plot.caretEnsemble

Rewrite condition on predict.caretList

Solve test warnings and fails (skip test-lintr)

Modify tests to support multiclass

Multiclass suport for caretList and caretStack

Fix problems with underscores in method and class names and add check for binary classification in caretEnsemble

Add unity tests for multiclass

Let the user choose which class to exclude in careStack train and predict

Solved bug for class weights when calculate varImp in caretEnsemble

Change cheks in check_binary_classification and other minor changes

* try to get to 100%

* rebuild

* fix lint

* readd

* Multiclass pr (#281)

* Add tests to check that the character | is not allowed in variable na… (#282)

---------

Co-authored-by: antongomez <[email protected]>
Co-authored-by: Zach Deane-Mayer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants