Skip to content
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.

ValueError when predicting with pretrained models #150

Closed
iocuydi opened this issue Mar 22, 2021 · 13 comments
Closed

ValueError when predicting with pretrained models #150

iocuydi opened this issue Mar 22, 2021 · 13 comments
Labels
bug Something isn't working.

Comments

@iocuydi
Copy link

iocuydi commented Mar 22, 2021

Describe the bug
When using GPT3XL to perform inference with the --predict flag as shown in examples, the following error is thrown

ValueError: Argument not a list with same length as devices arg=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255] devices=['device:GPU:0']

This is with a single GTX 1070 GPU.

commands that both produced this error were:
python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt
python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt --gpu_ids=['device:GPU:0']

@iocuydi iocuydi added the bug Something isn't working. label Mar 22, 2021
@StellaAthena
Copy link
Member

StellaAthena commented Mar 22, 2021

This code was designed for TPUs and although it should work on GPUs that's not something we officially support. We recommend using the GPT-NeoX repo instead for GPU training.

That said it seems like it's having trouble identifying your GPU. Try the command nvidia-smi and check what the device ID number of your GPU is.

Also, does your machine have 255 GPUs? Otherwise I have no idea where it's getting that number from...

@iocuydi
Copy link
Author

iocuydi commented Mar 22, 2021

No, my machine has only 1 GPU lol. I haven't used mesh tensorflow before but I found this issue:
google-research/text-to-text-transfer-transformer#334
in which it seemed to be an issue with the mesh shape? I notice that the mesh shape is Shape[x=128, y=2] when running the above commands, so perhaps it has to do with this?

The device appears to be registered as device 0, other tensorflow models pick it up as 0, and when first loading tensorflow I see the typical "Adding visible gpu devices: 0 ... Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6709 MB memory) -> physical GPU (device: 0 ..." messages

@iocuydi
Copy link
Author

iocuydi commented Mar 22, 2021

I got around this error by setting params['mesh_shape'] = [], not sure if this broke something else because now I'm getting the error:
'tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Rsqrt' used by {{node gpt2/h0/norm_1/rsqrt/parallel_0/Rsqrt}} with these attrs: [T=DT_BFLOAT16]'

although it appeared to build the model properly before displaying this

@iocuydi
Copy link
Author

iocuydi commented Mar 22, 2021

the issue was XLA devices not being enabled. Setting mesh shape to 1x1 and adding

os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

should make this work for GPUs. (I still couldn't run it because my GPU is too small, but the above errors no longer persisted)

@StellaAthena
Copy link
Member

StellaAthena commented Mar 22, 2021

the issue was XLA devices not being enabled. Setting mesh shape to 1x1 and adding

os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

should make this work for GPUs. (I still couldn't run it because my GPU is too small, but the above errors no longer persisted)

Great to know! Thanks for chasing this down for us.

I’m going to leave this open as a reminder to make sure this is in the next update.

@EleutherAI EleutherAI deleted a comment Mar 23, 2021
@soneo1127
Copy link

Thanks, that worked for me on 1GPU.

How do I set up a mesh for a multi-GPU system?
(I want to predict on 2 GPUs).

@GenTxt
Copy link

GenTxt commented Mar 24, 2021

Hello. I have the same error and would appreciate knowing which files to edit to add the above solutions:

  1. Setting mesh shape to 1x1

model_fns.py

mesh_shape = mtf.convert_to_shape(params["mesh_shape"])

mesh_shape = mtf.convert_to_shape(params["1x1"]) = error

config.json (GPT3_XL_Pile)

"mesh_shape" : "x:128,y:2", (here? "1x1")

Another .py file?

  1. os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

add to 'main.py' and/or 'model_fns.py' (???)

import os
os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices'

As a sidebar managed to convert both models using your 'convert_gpt.py' repo script. Keep in mind to change the huggingface config.json files to match the same "n_head" values or it will generate gibberish.

Able to sample from transformer-based repos

"n_head" : 16, (GPT3_XL)

"n_head" : 20, (GPT3_2.7B)

Cheers

@StellaAthena
Copy link
Member

StellaAthena commented Mar 24, 2021

@GenTxt I haven't actually run the model on GPU, so I'll leave that question for @iocuydi or @soneo1127 to answer. I am intrigued by your sidebar though.

Did you test your converted model on long inputs? We are under the impression that that file doesn't work as-is, due to the fact that our model uses local and global attention. Specifically, we think that for short contexts (less than 256 char, if I recall correctly) it works fine but for full contexts it does not. HF is working on (what we think is the problem) on their end, but it'd be a big win if that was extraneous.

@StellaAthena
Copy link
Member

@soneo1127 I'm going to recommend you check out the Mesh TF documentation for further info.

@GenTxt
Copy link

GenTxt commented Mar 25, 2021

Have tested both models with long inputs and max good output is around 400-500+ before the text turns to gibberish. For some reason it starts jamming fragments of words and letters together similar to low epoch LSTM character-based training (appears similar).

The good output from both 2.7B and XL is on par and often better than 1558M gpt2

Doesn't go beyond the default 1024 even after editing transformer files. Will wait for proper HF conversion of models which will, hopefully, solve all those issues.

In the meantime would appreciate requested info from others in this thread.

Cheers,

@StellaAthena
Copy link
Member

StellaAthena commented Mar 25, 2021

Have tested both models with long inputs and max good output is around 400-500+ before the text turns to gibberish. For some reason it starts jamming fragments of words and letters together similar to low epoch LSTM character-based training (appears similar).

The good output from both 2.7B and XL is on par and often better than 1558M gpt2

Doesn't go beyond the default 1024 even after editing transformer files. Will wait for proper HF conversion of models which will, hopefully, solve all those issues.

I just double checked and it’s actually ~512 where performance should jump off a cliff. For prompts of length 400-512, I would expect that the initial tokens are good but as the model goes on it devolves into gibberish. Is that what you see?

It’s good to see that the model is often better than 1.5B GPT-2: that’s what our preliminary testing has shown too. The next update to the README will include the following table:

Model Pile BPB Pile PPL Lambada Acc. Lambada PPL. Wikitext PPL.
GPT-Neo XL (1.3B) 0.7527 6.159 64.73% 5.04 13.10
GPT-3 XL (1.3B) ------ ----- 63.6% 5.44 -----
GPT-2 (1.5B) 1.0468 ----- 63.24% 8.63 17.48
GPT-Neo Alan (2.7B) 0.7165 5.646 68.83% 4.137 11.39
GPT-3 Ada (2.7B) 0.9631 ----- 67.1% 4.60 -----
GPT-3 DaVinci (175B) 0.7177 ----- 76.2% 3.00 -----

@StellaAthena
Copy link
Member

StellaAthena commented Mar 26, 2021

@GenTxt FYI, I have created an issue to serve as the canonical reference for the conversion script issue #174. Please direct any future queries about the conversion script there

@jaehyunshinML
Copy link

Thanks, that worked for me on 1GPU.

How do I set up a mesh for a multi-GPU system?
(I want to predict on 2 GPUs).

Hi

I think the easiest way to use multi-GPU, change Mesh_shape.
Set x as 1 and set the y with the number of your GPU in the config file.
For example, if you have 4 GPUs.

"mesh_shape" : "x:1,y:4",

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

5 participants