xm.mesh_reduce results in RuntimeError concerning message size

## 🐛 Bug


torch_xla.core.xla_model.mesh_reduce(...) results in a RuntimeError:

`tensorflow/compiler/xla/xla_client/mesh_service.cc:243 : Failed to meet rendezvous 'eval_lguids': Received message larger than max (5602816 vs. 4194304) (8)`

Note that for context that I'm adapting @jysohn23's [run_glue_tpu.py](https://github.com/huggingface/transformers/blob/master/examples/run_tpu_glue.py) to MS-MARCO's passage ranking dataset (much larger than the individual GLUE datasets). I believe the equivalent lines of code in run_glue_tpu.py would be 271-272:

`preds = xm.mesh_reduce("eval_preds", preds, np.concatenate)`
`out_label_ids = xm.mesh_reduce("eval_out_label_ids", out_label_ids, np.concatenate)`

This probably has something to do with grpc's max send and receive limits. Adding grpc.max_send_message_length=1000000000,grpc.max_receive_message_length=1000000000 to os.environ['TF_GRPC_DEFAULT_OPTIONS'] in [_setup_grpc()](https://github.com/pytorch/xla/blob/78299228ddcd9c4139b8a38a8054212f14c23cc8/torch_xla/__init__.py#L5) in the torch_xla/\_\_init__.py file might help. 

Building from source doesn't work on colab so I wasn't able to test if it did.  I'm unsure if it can even take on limits of size 1 GB (Note I used a subset of the dataset which corresponds to 5602816, so a size of this much would be required by calculations) or if there is any other way to circumvent this issue. Although I do believe a 4MB limit is a bit too small for larger datasets. Thanks!

## Environment

 - Reproducible on XLA backend [CPU/TPU]: TPU
 - torch_xla version: latest nightly
 - Running on google colab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

xm.mesh_reduce results in RuntimeError concerning message size #1924

🐛 Bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

xm.mesh_reduce results in RuntimeError concerning message size #1924

Description

🐛 Bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions