-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect batch_sizes when Dataloader returns a dict with multiple tensors. #3668
Comments
I think doing just |
This should probably catch most things. Might be a bit much though. if is_result_obj:
result_obj.track_batch_size(unpack_batchsize(batch))
# maybe add as staticmethod to ResultObj?
def unpack_batchsize(sample):
"""
Recursively unpack sample to find a torch.Tensor.
returns len(tensor) when found, or 1 when it hits an empty or non iterable.
"""
if isinstance(sample, torch.Tensor):
sample = len(sample)
elif isinstance(sample, dict):
sample = next(iter(sample.values()), 1)
elif isinstance(sample, Iterable):
sample = next(iter(sample), 1)
else:
sample = 1
if isinstance(sample, int):
return sample
return unpack_batchsize(sample) |
I suggest adding a function to the LightningModule |
Exactly what I had in mind @carmocca. Or maybe simple ask to put .log('some_metric', metric_value, on_epoch=True, batch_size=batch_size)
.log('some_metric', metric_value, on_epoch=False) |
Lightning currently defaults to If this is the desired behaviour, I think Lightning should at least attempt getting a reasonable estimate for the batch size. In most use cases the dataloader will return multiple tensors, resulting in an incorrect batch estimate if This could still be done using |
this should work too. Probably default to 1 if not provided since |
@gerardsn I have a problem exactly with this It gets Lightning tries to reduce on epoch end. |
@fogside in your example result is a tensor so |
But it's a list with a tensor inside. |
you refering to this right? |
Sorry, I just realized, that I was mistaken.
And on the second time it gives me the error. |
are you logging non-tensor values? maybe doing |
Yes, I was calculating precision in numpy.. Isn't it possible to log non-tensor values? |
no.. also to calculate precision or anyother metric you can try pl.metrics package which computes these metrics on the current device itself. or you can just do |
I see. Thank you! |
already working on topk accuracy. Maybe will add topk precision and recall in pl.metrics as well. Can you point me to the implementation of topk precision in |
It's great!
|
So I guess 2 things should be fixed:
|
this is fixed on master |
ok, making changes to this today. What do we want as the default behavior? doesn't the custom reduce function solve the problem of custom batches etc? |
The batches are not tracked correctly. |
🐛 Bug
Tracked batch sizes in result object are incorrect when a Dataloader returns a dict with multiple tensors.
To Reproduce
Create data loader that returns a dict, e.g. batch = {'batchA': tensor_A, 'batchB': tensor_B}.
Both entires have batch size N with N != 2.
For this example a batch size of 2 will be logged since len(batch) == 2.
https://github.com/PyTorchLightning/pytorch-lightning/blob/05e5f03fd7c851b06ca5e34b39eb660857b8f00c/pytorch_lightning/trainer/evaluation_loop.py#L147-L150
https://github.com/PyTorchLightning/pytorch-lightning/blob/05e5f03fd7c851b06ca5e34b39eb660857b8f00c/pytorch_lightning/trainer/training_loop.py#L304-L306
Expected behavior
Log correct batch size.
I'm not sure what can be defined as the 'correct' batch size when there are multiple tensors, but I expect that each tensor in the dict has the same batch_size. So, maybe something like:
The text was updated successfully, but these errors were encountered: