-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross validation feature #839
Comments
I think that the cleaner way would some abstraction above the |
@Borda, I don't have any plan how to implement it because I wasn't working on that till now. If I have any questions I will post it here, if not I will make a PR directly. |
what if we just integrate with sklearn cross validation? this can be the start of supporting sklearn interop |
How would you propose that @williamFalcon? In my "own" library I split the datasets into K folders by using my own script (you can use k-fold or stratified k-fold or any of the scikit methods). dataset/k_0/train dataset/k_1/train Then I trained and evaluated K neural networks and finally I just grab all the results and saved out the mean of acc, f1 and other metrics. That of course means you wasted space on HDD which equals to (K-1) * size of the dataset. We shouldn't be implementing that approach. I think we should add new parameter into trainer which can be something like GridSearchCV in scikit-learn
|
@williamFalcon skorch has a nice implementation. https://github.com/skorch-dev/skorch/blob/f94466e272f6f325898359fecb9a7c004354af7f/skorch/dataset.py#L212 |
check use case in #1393 |
By passing data loaders directly to the for fold, (train_idx, valid_idx) in enumerate(kfold.split(train_df):
train_loader = create_dataloader(train_df.iloc[train_idx])
valid_loader = create_dataloader(train_df.iloc[valid_idx])
# Folder hack
tb_logger = TensorBoardLogger(save_dir=OUTPUT_PATH, name=f'{args.model_name}', version=f'fold_{fold + 1}')
os.makedirs(OUTPUT_PATH / f'{args.model_name}, exist_ok=True)
checkpoint_callback = ModelCheckpoint(filepath=tb_logger.log_dir + "/{epoch:02d}-{val_metric:.4f}",
monitor='val_metric', mode='max')
model = YourPLModule(args)
trainer = pl.Trainer(logger=tb_logger, early_stop_callback=early_stop_callback, checkpoint_callback=checkpoint_callback)
trainer.fit(model, train_dataloader=train_loader, val_dataloaders=valid_loader) Note that the folder hack is from #1207 |
it could be a nice feature as we have now the LR finder... |
I wouldn't integrate this to fit or trainer init, but to a separate function internally calling fit |
I agree, that's why I proposed to do it similar as LR finder... lol |
We should also somehow include the CV results into tensorboard, to provide scientists easy way to check the quality of their models. I don't know much about tensorboard, so I don't know whether that's possible. Or, we should at least save the final results into json / pickle file. |
Are there any news on this? |
@axkoenig how would you do it, Write a wrapper over a Trainer and perform the fold splitting followed by train-test? |
I think, we could have something like that in bolts, but it is very hard to generalize this, since it always depends on how you want to split your data. |
I think we could provide two options:
|
@SkafteNicki I think this would be a good idea to start. However, we might also want to have some stratified splitting and not just random splitting, which may become more difficult, since we would have to assume things (like structure, dtype etc.) about these batches. In general, we should also keep in mind, that we may not want to only split for train and test but also for validation sets/data loaders |
@justusschock completely agree, I think that v1 of this feature should be very simple just random splitting. My proposed option 2. would allow the user to provide their own stratified dataloaders. In v2 we can begin to figure out how to do more advance stuff/better integration. The main problem (in my view), is that we are working with dataloaders and not datasets, so to get dataset statistics (like class balance for stratified splitting) we need to explicit run over the dataset and enforce a lot of structure in the batches (as you mention). |
Worth noting that metrics from This is after the proposed changes by @ltx-dan. If I find anything that fixes this I will let you know. (I assume @ltx-dan with your set up you have the epochs metric logging as a zigzag/sharktooth against global step for each fold?) |
Yepp, that's what I have @jameschapman19 . But that works for me bc I wanted to ensure that the max_epoch param is abided by fold and I don't really care about the total epochs passed across the folds.. |
Thanks @ltx-dan - nice job with the changes by the way! |
Hey @ltx-dan, Did you make a PR with the changes ? |
No I'm afraid I haven't got around to it yet.. But all the mods I had to do are outlined above so feel free to add them to your PR. |
Really hope this is still planned for milestone 1.6 as suggested by @awaelchli :) |
fixed the example recently. are you guys still facing issues with this? |
for single device training, the strategy will be SingleDeviceStrategy. It can never be Null. |
Hello @rohitgr7 , may I ask if there's some plan to adopt the example implementation officially into the library, or just leave it as an example (with copy & paste for usage)? Thanks! |
hey @function2-llx for now, AFAIK, it will stay as an example since Loop API is still experimental. |
I agree with @williamFalcon , for the case of datasets in memory we should really just have a function (probably in bolts) that wraps |
I'm getting the following error when I try to use the
I'm using PL 1.5.10 and Pytorch 1.10.2 |
hi @rohitgr7 FYI, we just discovered that this updated version of the example (running on master) does not work at all if early_stopping callbacks are used.. after the first fold, the subsequent ones don't run at all.. do you have any idea why that might be? What I found was that in anticipation of the rollout of the So for some reason, once early_stopping is updating the
to the |
@chanshing The example code on the master branch is inconsistent with tag 1.5.10, if you're using 1.5.10 you should refer to code with this tag. |
I noticed this too, with early stopping. The subsequent fold training loops retain state from the first fold, and so the behavior is as if the early stopping condition is already satisfied, and hence they don't run. Even for the MNIST example given, due to the (All these refer to the version pointed out by @function2-llx above ). There needs to be a way to truly reset the fit_loop state as each fold training starts (so that callbacks like Early Stopping, ModelCheckpoint, and stopping conditions like max_epochs function propertly). I tried doing |
Hi, I would like what is the KFold feature state in Lightning since I am really interested but the example provided in the official documentation is no longer available |
Hey @YerePhy To my knowledge, the best version of KFold is this repo: https://github.com/SkafteNicki/pl_cross. Best, |
Thanks @tchaton, looks great, I will give a try for sure. |
Hey, |
@rohitgr7 could this issue be reopened? |
Hello, |
Any updates on this feature? I can't currently find any information on if there is a lightning 2.0 officially supported way to do this, would be nice to know before I get stuck into a hacky way of doing it for my project. |
I would also like to know if there are any updates on this feature? |
🚀 Feature
Cross-Validation is a crucial model validation techniques for assessing how the model generalizes on new data.
Motivation
Research papers usually require cross-validation. From my point of view, this kind of feature would simplify the work of researches.
Pitch
I want to pass a parameter to the Trainer object to specify that I want to train the model on K-folds.
In the case that nobody wants to make a PR, I can start working on that.
The text was updated successfully, but these errors were encountered: