-
Notifications
You must be signed in to change notification settings - Fork 953
ValueError when predicting with pretrained models #150
Comments
This code was designed for TPUs and although it should work on GPUs that's not something we officially support. We recommend using the GPT-NeoX repo instead for GPU training. That said it seems like it's having trouble identifying your GPU. Try the command Also, does your machine have 255 GPUs? Otherwise I have no idea where it's getting that number from... |
No, my machine has only 1 GPU lol. I haven't used mesh tensorflow before but I found this issue: The device appears to be registered as device 0, other tensorflow models pick it up as 0, and when first loading tensorflow I see the typical "Adding visible gpu devices: 0 ... Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6709 MB memory) -> physical GPU (device: 0 ..." messages |
I got around this error by setting params['mesh_shape'] = [], not sure if this broke something else because now I'm getting the error: although it appeared to build the model properly before displaying this |
the issue was XLA devices not being enabled. Setting mesh shape to 1x1 and adding
should make this work for GPUs. (I still couldn't run it because my GPU is too small, but the above errors no longer persisted) |
Great to know! Thanks for chasing this down for us. I’m going to leave this open as a reminder to make sure this is in the next update. |
Thanks, that worked for me on 1GPU. How do I set up a mesh for a multi-GPU system? |
Hello. I have the same error and would appreciate knowing which files to edit to add the above solutions:
model_fns.py mesh_shape = mtf.convert_to_shape(params["mesh_shape"]) mesh_shape = mtf.convert_to_shape(params["1x1"]) = error config.json (GPT3_XL_Pile) "mesh_shape" : "x:128,y:2", (here? "1x1") Another .py file?
add to 'main.py' and/or 'model_fns.py' (???) import os As a sidebar managed to convert both models using your 'convert_gpt.py' repo script. Keep in mind to change the huggingface config.json files to match the same "n_head" values or it will generate gibberish. Able to sample from transformer-based repos "n_head" : 16, (GPT3_XL) "n_head" : 20, (GPT3_2.7B) Cheers |
@GenTxt I haven't actually run the model on GPU, so I'll leave that question for @iocuydi or @soneo1127 to answer. I am intrigued by your sidebar though. Did you test your converted model on long inputs? We are under the impression that that file doesn't work as-is, due to the fact that our model uses local and global attention. Specifically, we think that for short contexts (less than 256 char, if I recall correctly) it works fine but for full contexts it does not. HF is working on (what we think is the problem) on their end, but it'd be a big win if that was extraneous. |
@soneo1127 I'm going to recommend you check out the Mesh TF documentation for further info. |
Have tested both models with long inputs and max good output is around 400-500+ before the text turns to gibberish. For some reason it starts jamming fragments of words and letters together similar to low epoch LSTM character-based training (appears similar). The good output from both 2.7B and XL is on par and often better than 1558M gpt2 Doesn't go beyond the default 1024 even after editing transformer files. Will wait for proper HF conversion of models which will, hopefully, solve all those issues. In the meantime would appreciate requested info from others in this thread. Cheers, |
I just double checked and it’s actually ~512 where performance should jump off a cliff. For prompts of length 400-512, I would expect that the initial tokens are good but as the model goes on it devolves into gibberish. Is that what you see? It’s good to see that the model is often better than 1.5B GPT-2: that’s what our preliminary testing has shown too. The next update to the README will include the following table:
|
Hi I think the easiest way to use multi-GPU, change Mesh_shape.
|
Describe the bug
When using GPT3XL to perform inference with the --predict flag as shown in examples, the following error is thrown
This is with a single GTX 1070 GPU.
commands that both produced this error were:
python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt
python main.py --model=gpt3xl/config.json --predict --prompt=prompt.txt --gpu_ids=['device:GPU:0']
The text was updated successfully, but these errors were encountered: