-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Return tensors #894
Comments
I found two problems: The tensors should live on the
|
I think the first version needs to initialize tensors on the device. I'm trying a variant of this on the diff: --- a/test/test_train_mnist.py
+++ b/test/test_train_mnist.py
@@ -113,24 +113,39 @@ def train_mnist():
tracker.rate()))
def test_loop_fn(model, loader, device, context):
- total_samples = 0
- correct = 0
+ y_trues = torch.zeros(1152, device=device)
+ y_preds = torch.zeros(1152, device=device)
model.eval()
- for x, (data, target) in loader:
- output = model(data)
- pred = output.max(1, keepdim=True)[1]
- correct += pred.eq(target.view_as(pred)).sum().item()
- total_samples += data.size()[0]
-
- print('[{}] Accuracy={:.2f}%'.format(device,
- 100.0 * correct / total_samples))
- return correct / total_samples
+ #size=0
+ for k, (X, y_true) in loader:
+ #size += X.shape[0] # 1152
+ #continue
+ output = model(X)
+ y_pred = output.argmax(-1)
+ #print(torch_xla._XLAC._xla_metrics_report())
+ y_preds[k * 128: (k + 1) * 128] = y_pred
+ y_trues[k * 128: (k + 1) * 128] = y_true
+ #print(y_preds.sum()) @ this changes every step as one would expect
+ #print('-'*80)
+ #print(torch_xla._XLAC._xla_metrics_report())
+ #import pdb; pdb.set_trace()
+
+ #print(size)
+ #return
+ #print(y_trues.max().item(), y_preds.max().item())
+ #y_trues = y_trues.cpu()
+ #y_preds = y_preds.cpu()
+ #print(y_trues.max().item(), y_preds.max().item())
+ return y_trues, y_preds
- accuracy = 0.0
for epoch in range(1, FLAGS.num_epochs + 1):
- model_parallel(train_loop_fn, train_loader)
- accuracies = model_parallel(test_loop_fn, test_loader)
- accuracy = sum(accuracies) / len(accuracies)
+ #model_parallel(train_loop_fn, train_loader)
+ out = model_parallel(test_loop_fn, test_loader)
+ print('device 0, ytrues ypreds :')
+ print('{}'.format(out[0]))
+ print('device 1, ytrues ypreds :')
+ print('{}'.format(out[1]))
+ import pdb; pdb.set_trace()
if FLAGS.metrics_debug:
print(torch_xla._XLAC._xla_metrics_report()) output:
btw, If you wanted to compute roc (assuming you have a binary clf task), you may want the scores out instead of argmax. So here, I commented out training, and my predictions are mostly 1 but not always. |
Ok thanks a lot for helping me here. Accuracy or roc is just an example, but I think the pattern is quite common. I am using the 1.) I get
output:
2.) This works: I move the
output:
3.) If I remove
output:
4.) Regarding the speed. The validation now takes as long as the training. This is still surprising. Not computing the backward pass should be much faster? I can try to create an example with fake data. I am seeing this with real data. 5.) Unrelated: I often get:
And I am disconnected from my instance. Any help is appreciated. |
5/ you can use screen or tmux and reattach to your working session when disconnected. when was the last time you |
I ran |
so I think in 1/, you have tensors that you're concatenating, which live on different devices. That's not possible. Moving .cpu inside takes care of this issue. Alternatively, you can move to cpu outside test_loop_fn first, then concat. |
I'm able to repro. 4/ my validation speed is not that slow.
|
I think you are exposing a behavior here where we don't update the data of cpu counterpart of a tpu tensor, unless some item call happens. To compute accuracies, we don't need this, as you can compute on device accuracies and average them appropriately to get total accuracy, however to compute roc_auc or a similar complicated metric, we would need to go back to cpu w/o loss of data. |
yeah I initialize ypreds = torch.ones instead of zeros, commented out the .max and accuracy is now 0% |
Hm, I think I might be bound by the data loading. To be clear, it's not related to these mnist examples. They can't be used to benchmark. I will investigate this more. |
Why do you need an |
Some metrics such as the roc curve, precision recall curve, auc etc require all the predictions in hand before starting to compute (you sort w.r.to prediction score, and then count etc.) (as opposed to something like accuracy, where you can reduce to counters first and then combine). I was running on nightly. |
What is the issue with storing all the tensors in a list? |
So we don't normally call item on those tensors, it's just a weird workaround that @see-- found. |
The training loop is per-device, so you can have per-device list. |
hmm, @see-- did you try this? first send to cpu, then concat, all outside the
|
it seems I have the same problem https://github.com/PyTorchLightning/pytorch-lightning/pull/2632/checks?check_run_id=891921103
but when I print their devices
|
@Borda could you open a new issue with (small) repro instructions? There's a lot that's changed since this issue was reported/closed and it's likely to be a different issue. |
What is the best pattern for metrics that can't be accumulated like accuracy? E.g. roc_auc_score. Is this pattern ok: return tensors?
I am asking because the validation takes 2x the time that my training loop takes for the same number of samples. Maybe related: Shouldn't the code be wrapped with
torch.no_grad()
?The text was updated successfully, but these errors were encountered: