Skip to content

codegen-16B-mono (Salesforce) fails to load tokenizer and model #17954

@weidotwisc

Description

@weidotwisc

System Info

  • transformers version: 4.20.1
  • Platform: Linux-4.18.0-193.19.1.el8_2.x86_64-x86_64-with-glibc2.28
  • Python version: 3.9.10
  • Huggingface_hub version: 0.8.1
  • PyTorch version (GPU?): 1.7.1+cu110 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: <N/A>
  • Using distributed or parallel set-up in script?: <N/A>

Who can help?

Per https://huggingface.co/Salesforce/codegen-16B-mono?text=What+is+projection+matrix, I should be able to load the codegen tokenizer and model by doing
tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-16B-mono")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-16B-mono")
@SaulLu When I do tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-16B-mono"), I got this error:
"...huggingface_py3.9/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 576, in from_pretrained
raise ValueError(
ValueError: Tokenizer class CodeGenTokenizer does not exist or is not currently imported.

@LysandreJik When I do model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-16B-mono"), I got this error:
"... huggingface_py3.9/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 725, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/Volume0/userhomes/weiz/venvs/huggingface_py3.9/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 432, in getitem
raise KeyError(key)
KeyError: 'codegen'
"

Thanks!
Wei

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Follow the model card at https://huggingface.co/Salesforce/codegen-16B-mono?text=What+is+projection+matrix
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-16B-mono")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-16B-mono")

I then got the Errors:

"...huggingface_py3.9/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 576, in from_pretrained
raise ValueError(
ValueError: Tokenizer class CodeGenTokenizer does not exist or is not currently imported.

"... huggingface_py3.9/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 725, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/Volume0/userhomes/weiz/venvs/huggingface_py3.9/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 432, in getitem
raise KeyError(key)
KeyError: 'codegen'
"

Expected behavior

The tokenizer and Model should be loaded successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions