fixing TPU tests #2632

Borda · 2020-07-17T13:22:05Z

What does this PR do?

Fixes #2124
Fixes #1956

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2020-07-17T13:31:15Z

Codecov Report

Merging #2632 into master will increase coverage by 0%.
The diff coverage is 49%.

@@          Coverage Diff           @@
##           master   #2632   +/-   ##
======================================
  Coverage      91%     91%           
======================================
  Files          82      82           
  Lines        6770    6784   +14     
======================================
+ Hits         6127    6151   +24     
+ Misses        643     633   -10

Borda · 2020-07-17T22:08:54Z

GPU available: False, used: False
TPU available: True, using: 8 TPU cores
training on 8 TPU cores
Exception in device=TPU:0: Invalid device string: 'xla:EvalModelTemplate(
  (c_d1): Linear(in_features=784, out_features=1000, bias=True)
  (c_d1_bn): BatchNorm1d(1000, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (c_d1_drop): Dropout(p=0.2, inplace=False)
  (c_d2): Linear(in_features=1000, out_features=10, bias=True)
)'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 330, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 324, in _start_fn
    fn(gindex, *args)
  File "/content/pytorch-lightning/pytorch_lightning/trainer/distrib_parts.py", line 195, in tpu_train
    self._device = xm.xla_device(tpu_core_idx) if tpu_core_idx is not None else xm.xla_device()
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 239, in xla_device
    torch_xla._XLAC._xla_set_default_device(device)
RuntimeError: Invalid device string: 'xla:EvalModelTemplate(
  (c_d1): Linear(in_features=784, out_features=1000, bias=True)
  (c_d1_bn): BatchNorm1d(1000, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (c_d1_drop): Dropout(p=0.2, inplace=False)
  (c_d2): Linear(in_features=1000, out_features=10, bias=True)
)'
Traceback (most recent call last):
  File "/content/pytorch-lightning/tests/base/develop_utils.py", line 102, in inner_f
    func(**kwargs)
  File "/content/pytorch-lightning/tests/models/test_tpu.py", line 240, in test_dataloaders_passed_to_fit
    val_dataloaders=model.val_dataloader(),
  File "/content/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1029, in fit
    start_method=start_method,
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 395, in spawn
    start_method=start_method)
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 113, in join
    (error_index, exitcode)
Exception: process 0 terminated with exit code 17

Borda · 2020-07-21T09:12:28Z

it seems that first argument of tpu_train is ignored so model was wrongly treated as index...
Actual failed is:

Traceback (most recent call last):
  File "/content/pytorch-lightning/tests/base/develop_utils.py", line 102, in inner_f
    func(**kwargs)
  File "/content/pytorch-lightning/tests/models/test_tpu.py", line 39, in test_model_tpu_cores_1
    tpipes.run_model_test(trainer_options, model, on_gpu=False, with_hpc=False)
  File "/content/pytorch-lightning/tests/base/develop_pipelines.py", line 50, in run_model_test
    result = trainer.fit(model)
  File "/content/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1029, in fit
    start_method=start_method,
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 387, in spawn
    _start_fn(0, pf_cfg, fn, args)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 324, in _start_fn
    fn(gindex, *args)
  File "/content/pytorch-lightning/pytorch_lightning/trainer/distrib_parts.py", line 223, in tpu_train
    self.run_pretrain_routine(model)
  File "/content/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1193, in run_pretrain_routine
    self._run_sanity_check(ref_model, model)
  File "/content/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1220, in _run_sanity_check
    False)
  File "/content/pytorch-lightning/pytorch_lightning/trainer/evaluation_loop.py", line 294, in _evaluate
    output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
  File "/content/pytorch-lightning/pytorch_lightning/trainer/evaluation_loop.py", line 481, in evaluation_forward
    output = model.validation_step(*args)
  File "/content/pytorch-lightning/tests/base/model_valid_steps.py", line 25, in validation_step
    val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)
RuntimeError: tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:991 : Check failed: device == xrt_data.device() (TPU:0 vs. CPU:0)
*** Begin stack trace ***
	tensorflow::CurrentStackTrace[abi:cxx11]()
	xla::XrtComputationClient::GetArgumentsInputs(absl::lts_2020_02_25::Span<std::shared_ptr<xla::ComputationClient::Data> const>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
	xla::XrtComputationClient::CreateExecuteOps(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, xla::XrtSessionCache::Ref, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, xla::XrtSessionCache::Ref> > >*, xla::XrtComputationClient::XrtComputation const&, std::vector<std::vector<std::shared_ptr<xla::ComputationClient::Data>, std::allocator<std::shared_ptr<xla::ComputationClient::Data> > >, std::allocator<std::vector<std::shared_ptr<xla::ComputationClient::Data>, std::allocator<std::shared_ptr<xla::ComputationClient::Data> > > > > const&, bool, absl::lts_2020_02_25::Span<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const>, std::unordered_map<tensorflow::Output, tensorflow::Input::Initializer, tensorflow::OutputHash, std::equal_to<tensorflow::Output>, std::allocator<std::pair<tensorflow::Output const, tensorflow::Input::Initializer> > >*)
	xla::XrtComputationClient::ExecuteComputation(xla::ComputationClient::Computation const&, absl::lts_2020_02_25::Span<std::shared_ptr<xla::ComputationClient::Data> const>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, xla::ComputationClient::ExecuteComputationOptions const&)	
	clone
*** End stack trace ***

it is interesting as both are in the same device before...

>>> model.device
xla:0
>>> y.device
xla:1
>>> labels_hat.device
xla:1

similar to pytorch/xla#894

@dlibenzi do you have advice what am I doing wrong?

Borda · 2020-07-21T12:24:29Z

Explanation of why the error raise when .item() is called...

XLA Tensors are Lazy

CPU and CUDA tensors launch operations immediately or eagerly. XLA tensors, on the other hand, are lazy. They record operations in a graph until the results are needed. Deferring execution like this lets XLA optimize it.

even with device = xm.xla_device()

  File "/content/pytorch-lightning/tests/base/model_valid_steps.py", line 43, in validation_step
    torch.sum(y.to(device) == labels_hat.to(device)).item()
RuntimeError: tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:991 : Check failed: device == xrt_data.device() (TPU:0 vs. CPU:0)

davidel · 2020-07-21T13:17:17Z

As I explained countless times, you cannot do stuff like this:

https://github.com/PyTorchLightning/pytorch-lightning/pull/2632/files#diff-d673a98c14aa5cd59610bcc447178852R197

You should get rid of that TPU IDs, like you can control which device ordinal you get within a process. You cannot.
In multi-processing mode (xmp.spawn()) you should simply call xm.xla_device() and get the only device it is assigned to the calling process.
You cannot control the TPU ID, so you should just get rid of that API which try to control the TPU IDs.

If you want to have some sort of information about which ordinal within the distributed system the calling process has, you can use the xm.get_ordinal() API.

williamFalcon · 2020-07-21T13:28:20Z

@davidel, I understand... can you explain what @lezwon did in the kaggle kernel then? he seems to have run 8 copies of a model each on its own TPU core. Or is that incorrect @lezwon ?

Borda · 2020-07-21T13:28:26Z

@davidel I am going to change the TPU id in next step, but not I am getting this error in test_model_tpu_cores_1 with just xm.xla_device() without setting id... so is possible that the data are not pushed to the device as it is "lazy"?

Borda · 2020-07-21T13:29:57Z

pytorch_lightning/trainer/distrib_parts.py

        # call setup after the ddp process has connected
        self.setup('fit')
        if self.is_function_implemented('setup', model):
            model.setup('fit')

        # put model on tpu
-        self._device = xm.xla_device(self.tpu_id) if self.tpu_id is not None else xm.xla_device()
+        xm.get_ordinal()
+        # TODO, wrong definition of TPU index


this will be changed in the next step of corrections... cc: @davidel

lezwon · 2020-07-21T13:46:42Z

@davidel, I understand... can you explain what @lezwon did in the kaggle kernel then? he seems to have run 8 copies of a model each on its own TPU core. Or is that incorrect @lezwon ?

@davidel @williamFalcon When TPU ID is provided, the training does not run in multi-processing mode i.e xmp.spawn(). I just fetch the device using xm.xla_device() and train the model on it. I assumed this enabled us to train a separate model on every core as explained here pytorch/xla#2041 (comment). It seemed to work too. Did we miss something?

davidel · 2020-07-21T14:02:51Z

I will be flying today. Will get back to this once on the other side of the pond 😄

zcain117 · 2020-07-22T20:43:45Z

@davidel, I understand... can you explain what @lezwon did in the kaggle kernel then? he seems to have run 8 copies of a model each on its own TPU core. Or is that incorrect @lezwon ?

@davidel @williamFalcon When TPU ID is provided, the training does not run in multi-processing mode i.e xmp.spawn(). I just fetch the device using xm.xla_device() and train the model on it. I assumed this enabled us to train a separate model on every core as explained here pytorch/xla#2041 (comment). It seemed to work too. Did we miss something?

I had some follow-up questions:

Do you mean 1 independent copy of the same model on each of the 8 cores? Or different types of models training simultaneously, each using 1 core?
If you mean the former, what is the use case? I think it would be better to have the 8 cores working together by sharing the gradients/weights between processes using xmp.spawn given that the smallest TPU a user can have is 8-cores, so it's not like a user can save on cost by using only 1 core

zcain117 · 2020-07-22T20:51:53Z

@Borda I see this as the latest error in the Github Actions run for the 8-core test:

AttributeError: Can't pickle local object 'test_model_16bit_tpu_cores_8.<locals>.long_train_loader'

I talked with Davide about this last week and here is an explanation of what is happening:

As you know, when you use xmp.spawn you pass in some kind of mp_fn. That function is the code that will run on each TPU core.
If you have Python objects that are defined outside that function, they will be pickled and sent over the wire to the TPU cores. I think this is what is happening with your dataloader in this case.
If you create the dataloader within the mp_fn, then the TPU core will start up, run the code, and create the dataloader and there will be no need for pickling. This is what we recommend, otherwise you'll need some kind of dataloader that is able to be pickled.

Here are 2 examples of defining the dataloader inside the mp_fn:

Our canonical example. Note that the dataloader is instantiated within train_mnist here, which is part of the mp_fn code here
This Kaggle example. Note that the dataloader is instantiated within run(), which is then wrapped in mp_fn

Borda · 2020-07-22T20:53:30Z

Do you mean 1 independent copy of the same model on each of the 8 cores? Or different types of models training simultaneously, each using 1 core?

yes, the point is to speed-up training

If you mean the former, what is the use case? I think it would be better to have the 8 cores working together by sharing the gradients/weights between processes using xmp.spawn given that the smallest TPU a user can have is 8-cores, so it's not like a user can save on cost by using only 1 core

I agree, the trainer is teaching just one model so the case with selecting core was about multiple users can train their own model/trainer and share one physical device

Borda · 2020-07-22T20:55:37Z

If you create the dataloader within the mp_fn, then the TPU core will start up, run the code, and create the dataloader and there will be no need for pickling. This is what we recommend, otherwise you'll need some kind of dataloader that is able to be pickled.

in such case, it is very similar to DDP when we run a python script on beach GPU separately, right?

zcain117 · 2020-07-22T21:12:57Z

If you create the dataloader within the mp_fn, then the TPU core will start up, run the code, and create the dataloader and there will be no need for pickling. This is what we recommend, otherwise you'll need some kind of dataloader that is able to be pickled.

in such case, it is very similar to DDP when we run a python script on beach GPU separately, right?

I'm not familiar with DDP but I think generally this is similar to the multi-GPU case. Each TPU core is running it's own copy of the same code, then after each backprop, we synchronize the gradients using xm.optimizer_step, e.g. here

See here for xm.optimizer_step code but it's basically just sharing gradients from backprop between TPU cores.

This means that each TPU core should end up with the same weights and it means that the TPU cores are working together to make training progress rather than having independent copies of your model training on each core.

Note that we are sharing gradients but each process instantiates its own weights. One pitfall is if each process ends up with different initial weights. It's important to set the same seed between processes so that initial weights are the same, like we do here in our example. Note that we set the seed immediately. This is important since any code that runs before setting the seed can change the seed for that process and lead to different weights per process, which can lead to e.g. model failing to converge.

williamFalcon · 2020-07-22T21:37:26Z

yes, this is exactly how ddp works and what we are set up to do. This is how we setup the training.
In fact, even in our tests the seed is being set. (we don’t set seeds for users, so this is set by us for test purpose only)

lezwon · 2020-07-23T00:47:53Z

Do you mean 1 independent copy of the same model on each of the 8 cores? Or different types of models training simultaneously, each using 1 core?

Yes, An independent instance of a model on each core with a different fold.

If you mean the former, what is the use case? I think it would be better to have the 8 cores working together by sharing the gradients/weights between processes using xmp.spawn given that the smallest TPU a user can have is 8-cores, so it's not like a user can save on cost by using only 1 core

The use case is K-fold training. The user can train K-models simultaneously with a different fold on every core.

zcain117 · 2020-07-23T01:07:51Z

Do you mean 1 independent copy of the same model on each of the 8 cores? Or different types of models training simultaneously, each using 1 core?

Yes, An independent instance of a model on each core with a different fold.

If you mean the former, what is the use case? I think it would be better to have the 8 cores working together by sharing the gradients/weights between processes using xmp.spawn given that the smallest TPU a user can have is 8-cores, so it's not like a user can save on cost by using only 1 core

The use case is K-fold training. The user can train K-models simultaneously with a different fold on every core.

I see. I think we can make this work. I'm assuming you would also want to write out 8 different model weights.

For your normal 8-core case, you'll want to use xm.save as shown in the example Kaggle script

If you want every core to write its weights, you'll want to call xm.save(..., master_only=False, ...) (see method here)

lezwon · 2020-07-23T02:39:16Z

I'm not sure if xm.save is being used in Lightning right now. In my kernel here: https://www.kaggle.com/lezwon/parallel-kfold-training-on-tpu-using-pytorch-li, I created a separate trainer instance for each core with a checkpoint callback. Seems to have worked.

davidel · 2020-07-23T19:35:51Z

@davidel, I understand... can you explain what @lezwon did in the kaggle kernel then? he seems to have run 8 copies of a model each on its own TPU core. Or is that incorrect @lezwon ?

@davidel @williamFalcon When TPU ID is provided, the training does not run in multi-processing mode i.e xmp.spawn(). I just fetch the device using xm.xla_device() and train the model on it. I assumed this enabled us to train a separate model on every core as explained here pytorch/xla#2041 (comment). It seemed to work too. Did we miss something?

If it does not run as multi-core it might be OK ... but honestly specifying the core ID makes no sense from an API standpoint.

Borda · 2020-07-25T22:20:02Z

checkpoint loading, maybe similar to #2700

Traceback (most recent call last):
  File "/content/pytorch-lightning/tests/base/develop_utils.py", line 102, in inner_f
    func(**kwargs)
  File "/content/pytorch-lightning/tests/models/test_tpu.py", line 100, in test_model_tpu_cores_8
    tpipes.run_model_test(trainer_options, model, on_gpu=False, with_hpc=False)
  File "/content/pytorch-lightning/tests/base/develop_pipelines.py", line 56, in run_model_test
    pretrained_model = load_model_from_checkpoint(logger, trainer.checkpoint_callback.best_model_path)
  File "/content/pytorch-lightning/tests/base/develop_utils.py", line 64, in load_model_from_checkpoint
    trained_model = module_class.load_from_checkpoint(root_weights_dir)
  File "/content/pytorch-lightning/pytorch_lightning/core/saving.py", line 142, in load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=lambda storage, loc: storage)
  File "/content/pytorch-lightning/pytorch_lightning/utilities/cloud_io.py", line 10, in load
    return torch.load(path_or_url, map_location=map_location)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 581, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 230, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 211, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: ''

EDIT: it seems that no checkpoint is saved during training
EDIT2: it seems that the checkpoint from spawn is not updated back to the main process so global trainer does not know about children checkpoints...

cc: @lezwon @zcain117

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Borda · 2020-07-27T22:46:54Z

@williamFalcon lets fix the master... :]
cc: @PyTorchLightning/core-contributors

Borda added bug Something isn't working ci Continuous Integration labels Jul 17, 2020

Borda added this to the 0.8.x milestone Jul 17, 2020

mergify bot requested a review from a team July 17, 2020 13:22

Borda mentioned this pull request Jul 17, 2020

terminate called after throwing an instance of 'std::runtime_error pytorch/xla#2338

Closed

Borda commented Jul 21, 2020

View reviewed changes

williamFalcon added the allowed_pre_1.0 label Jul 22, 2020

Borda force-pushed the tpu/fix-tests branch from d0bed90 to b0860b0 Compare July 23, 2020 10:11

Borda force-pushed the tpu/fix-tests branch from 5aebae9 to 838436a Compare July 25, 2020 22:01

Borda and others added 22 commits July 27, 2020 23:59

tests

22dd8c0

tests

e4f2088

tests

d89de9b

tests

1614724

tests

def846b

tests

f6060ec

tests

f0a6174

tests

bdaaaef

tests

458471d

tests

6e6588a

tests

6dc0e85

tests

6196724

tests

183fcc2

tests

e7a5295

tests

1999198

tests

9579bce

docs

fde01de

docs

6abee20

Apply suggestions from code review

bf6ac74

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Apply suggestions from code review

6f30bc5

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

docs

45233a5

Apply suggestions from code review

6c9495e

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Borda force-pushed the tpu/fix-tests branch from 498cf3a to 6c9495e Compare July 27, 2020 22:00

Borda added the ready PRs ready to be merged label Jul 27, 2020

williamFalcon merged commit 0fe933e into master Jul 27, 2020

Borda deleted the tpu/fix-tests branch July 28, 2020 05:03

Borda mentioned this pull request Jul 29, 2020

Very slow training on colab with TPU #2148

Closed

Borda mentioned this pull request Sep 21, 2020

wrap tpu tests with process decorator #2582

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing TPU tests #2632

fixing TPU tests #2632

Borda commented Jul 17, 2020 •

edited

Loading

codecov bot commented Jul 17, 2020 •

edited

Loading

Borda commented Jul 17, 2020 •

edited

Loading

Borda commented Jul 21, 2020 •

edited

Loading

Borda commented Jul 21, 2020 •

edited

Loading

davidel commented Jul 21, 2020

williamFalcon commented Jul 21, 2020

Borda commented Jul 21, 2020

Borda Jul 21, 2020

lezwon commented Jul 21, 2020

davidel commented Jul 21, 2020

zcain117 commented Jul 22, 2020

zcain117 commented Jul 22, 2020

Borda commented Jul 22, 2020

Borda commented Jul 22, 2020

zcain117 commented Jul 22, 2020

williamFalcon commented Jul 22, 2020

lezwon commented Jul 23, 2020 •

edited by Borda

Loading

zcain117 commented Jul 23, 2020 •

edited by Borda

Loading

lezwon commented Jul 23, 2020

davidel commented Jul 23, 2020

Borda commented Jul 25, 2020 •

edited

Loading

Borda commented Jul 27, 2020

fixing TPU tests #2632

fixing TPU tests #2632

Conversation

Borda commented Jul 17, 2020 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Jul 17, 2020 • edited Loading

Codecov Report

Borda commented Jul 17, 2020 • edited Loading

Borda commented Jul 21, 2020 • edited Loading

Borda commented Jul 21, 2020 • edited Loading

XLA Tensors are Lazy

davidel commented Jul 21, 2020

williamFalcon commented Jul 21, 2020

Borda commented Jul 21, 2020

Borda Jul 21, 2020

Choose a reason for hiding this comment

lezwon commented Jul 21, 2020

davidel commented Jul 21, 2020

zcain117 commented Jul 22, 2020

zcain117 commented Jul 22, 2020

Borda commented Jul 22, 2020

Borda commented Jul 22, 2020

zcain117 commented Jul 22, 2020

williamFalcon commented Jul 22, 2020

lezwon commented Jul 23, 2020 • edited by Borda Loading

zcain117 commented Jul 23, 2020 • edited by Borda Loading

lezwon commented Jul 23, 2020

davidel commented Jul 23, 2020

Borda commented Jul 25, 2020 • edited Loading

Borda commented Jul 27, 2020

Borda commented Jul 17, 2020 •

edited

Loading

codecov bot commented Jul 17, 2020 •

edited

Loading

Borda commented Jul 17, 2020 •

edited

Loading

Borda commented Jul 21, 2020 •

edited

Loading

Borda commented Jul 21, 2020 •

edited

Loading

lezwon commented Jul 23, 2020 •

edited by Borda

Loading

zcain117 commented Jul 23, 2020 •

edited by Borda

Loading

Borda commented Jul 25, 2020 •

edited

Loading