[RFC] Add framework argument to ONNX export #15620

lewtun · 2022-02-11T08:50:41Z

What does this PR do?

This PR addresses an edge case introduced by #13831 where the ONNX export fails if:

Both torch and tensorflow are installed in the same environment
The user tries to export a pure TensorFlow model (i.e. a model repo without PyTorch weights)

Here is an example that fails to export on the master branch:

python -m transformers.onnx --model=keras-io/transformers-qa onnx/

Traceback

404 Client Error: Entry Not Found for url: https://huggingface.co/keras-io/transformers-qa/resolve/main/pytorch_model.bin
Traceback (most recent call last):
  File "/Users/lewtun/git/transformers/src/transformers/modeling_utils.py", line 1358, in from_pretrained
    resolved_archive_file = cached_path(
  File "/Users/lewtun/git/transformers/src/transformers/file_utils.py", line 1904, in cached_path
    output_path = get_from_cache(
  File "/Users/lewtun/git/transformers/src/transformers/file_utils.py", line 2108, in get_from_cache
    _raise_for_status(r)
  File "/Users/lewtun/git/transformers/src/transformers/file_utils.py", line 2031, in _raise_for_status
    raise EntryNotFoundError(f"404 Client Error: Entry Not Found for url: {request.url}")
transformers.file_utils.EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/keras-io/transformers-qa/resolve/main/pytorch_model.bin

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/lewtun/git/transformers/src/transformers/onnx/__main__.py", line 77, in <module>
    main()
  File "/Users/lewtun/git/transformers/src/transformers/onnx/__main__.py", line 51, in main
    model = FeaturesManager.get_model_from_feature(args.feature, args.model)
  File "/Users/lewtun/git/transformers/src/transformers/onnx/features.py", line 307, in get_model_from_feature
    return model_class.from_pretrained(model)
  File "/Users/lewtun/git/transformers/src/transformers/models/auto/auto_factory.py", line 447, in from_pretrained
    return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
  File "/Users/lewtun/git/transformers/src/transformers/modeling_utils.py", line 1394, in from_pretrained
    raise EnvironmentError(
OSError: keras-io/transformers-qa does not appear to have a file named pytorch_model.bin but there is a file for TensorFlow weights. Use `from_tf=True` to load this model from those weights.

The reason this fails is because the FeaturesManager.get_model_class_for_feature() method uses the _TASKS_TO_AUTOMODELS mapping to determine which autoclass (e.g AutoModel vs TFAutoModel) to return for a given task. This mapping relies on the following branching logic:

if is_torch_available():
    _TASKS_TO_AUTOMODELS = {
        "default": AutoModel, ...
    }
elif is_tf_available():
    _TASKS_TO_AUTOMODELS = {
        "default": TFAutoModel, ...
    }
else:
    _TASKS_TO_AUTOMODELS = {}

As a result, if a user has torch and tensorflow installed, we return an AutoModel class instead of the desired TFAutoModel class. In particular, Colab users cannot export pure TensorFlow models because torch is installed by default.

Proposal

To address this issue, I've introduced a new framework argument in the ONNX CLI and extended _TASKS_TO_AUTOMODELS to be a nested dict when both frameworks are installed. With this change, one can now export pure TensorFlow models with:

python -m transformers.onnx --model=keras-io/transformers-qa --framework=tf onnx/

Similarly, pure PyTorch models can be exported as follows:

python -m transformers.onnx --model=lewtun/bert-finetuned-squad --framework=pt onnx/

And checkpoints with both sets of weights also works:

python -m transformers.onnx --model=distilbert-base-uncased onnx/

Although the implementation works, I'm not entirely happy with it because _TASKS_TO_AUTOMODELS changes (flat vs nested) depending on the installation environment, and this feels hacky.

Alternative solution 1

Thanks to a tip from @stas00, one solution is to change nothing and get the user to specify which framework they're using as an environment variable, e.g.

USE_TORCH=0 USE_JAX=0 USE_TF=1 python -m transformers.onnx --model=keras-io/transformers-qa onnx/

If we adopt this approach, we could provide a warning when both torch and tensorflow are installed and suggest an example like the one above.

Alternative solution 2

It occurred to me that we can solve this with a simple try/except in FeaturesManager.get_model_from_feature() as follows:

def get_model_from_feature(feature: str, model: str) -> Union[PreTrainedModel, TFPreTrainedModel]:
    # By default we return `AutoModel` if `torch` and `tensorflow` are installed
    model_class = FeaturesManager.get_model_class_for_feature(feature)
    try:
        model = model_class.from_pretrained(model)
    except OSError:
        # Load the TensorFlow weights in `AutoModel`
        model = model_class.from_pretrained(model, from_tf=True)
    return model

The user will still see a 404 error in the logs

python -m transformers.onnx --model=keras-io/transformers-qa onnx/
# 404 Client Error: Entry Not Found for url: https://huggingface.co/keras-io/transformers-qa/resolve/main/pytorch_model.bin

but the conversion to ONNX will work once the TensorFlow weights are loaded in the AutoModel instance. Note: this solution seems to be similar to the one adopted in the pipeline() function, e.g.

from transformers import pipeline

# Load a pure TensorFlow model => see 404 Client Error in logs, but pipeline loads fine
p = pipeline("question-answering", model="keras-io/transformers-qa")

The advantage of this approach is that the user doesn't have to manually specify a --framework arg, i.e. it "just works". The only drawback I see is that there might be differences between the torch.onnx and tf2onnx packages used for the ONNX export, and by using torch.onnx as the default we may mislead users on where to debug their exports. However, this is probably a rare case and could be revisited if users report problems.

Feedback on which approach is preferred is much appreciated!

HuggingFaceDocBuilder · 2022-02-11T08:51:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

lewtun · 2022-02-11T08:52:21Z

src/transformers/onnx/features.py

 logger = logging.get_logger(__name__)  # pylint: disable=invalid-name

-if is_torch_available():
+if is_torch_available() and not is_tf_available():


This extra condition is used to check if we're in a pure torch environment

lewtun · 2022-02-11T08:53:27Z

src/transformers/onnx/features.py


 class FeaturesManager:
-    if is_torch_available():
+    if is_torch_available() and not is_tf_available():


There's a bit of duplicate logic in this module - perhaps the autoclass imports above should moved directly within FeaturesManager?

sgugger

I personally think the solution 2 would be better for the user, as it "just works". We can investigate the provenance of the error log and try to remove it if it's an issue, but it would be better than adding a new arg :-)

sgugger · 2022-02-11T14:18:25Z

src/transformers/onnx/features.py

        AutoModelForTokenClassification,
    )
-elif is_tf_available():
+elif is_tf_available() and not is_torch_available():


I think the whole logic of having three tests can be simplified if you just change that elif to a simple if.

sgugger · 2022-02-11T14:22:47Z

src/transformers/onnx/features.py


 class FeaturesManager:
-    if is_torch_available():
+    if is_torch_available() and not is_tf_available():


Same here, instead of having three tests, why not always have _TASKS_TO_AUTOMODELS be a nested dict with frameworks, and you then fill the frameworks when each framework if available?

That's a nice idea - thanks! In the end we may not need this if we adopt solution 2 :)

lewtun · 2022-02-11T14:52:04Z

I personally think the solution 2 would be better for the user, as it "just works". We can investigate the provenance of the error log and try to remove it if it's an issue, but it would be better than adding a new arg :-)

Thanks for the feedback @sgugger ❤️ ! Having thought about it a bit more, I agree that solution 2 is the simplest and less error-prone: I've opened a PR for this here #15625

lewtun added 2 commits February 11, 2022 09:26

Add framework argument to ONNX export

1f28289

Clarify framework arg description

ebaa901

lewtun requested review from gante, michaelbenayoun and sgugger February 11, 2022 08:50

lewtun requested a review from LysandreJik February 11, 2022 08:51

lewtun commented Feb 11, 2022

View reviewed changes

sgugger reviewed Feb 11, 2022

View reviewed changes

lewtun mentioned this pull request Feb 11, 2022

Enable ONNX export when PyTorch and TensorFlow installed in the same env #15625

Merged

lewtun closed this in #15625 Feb 11, 2022

lewtun deleted the fix-tf-onnx branch February 11, 2022 16:36

lewtun mentioned this pull request Feb 11, 2022

Remove redundant error logging in from_pretrained() method #15631

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Add framework argument to ONNX export #15620

[RFC] Add framework argument to ONNX export #15620

Uh oh!

lewtun commented Feb 11, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilder commented Feb 11, 2022

Uh oh!

lewtun Feb 11, 2022

Uh oh!

lewtun Feb 11, 2022

Uh oh!

sgugger left a comment

Uh oh!

sgugger Feb 11, 2022

Uh oh!

sgugger Feb 11, 2022

Uh oh!

lewtun Feb 11, 2022

Uh oh!

lewtun commented Feb 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[RFC] Add framework argument to ONNX export #15620

[RFC] Add framework argument to ONNX export #15620

Uh oh!

Conversation

lewtun commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Proposal

Alternative solution 1

Alternative solution 2

Uh oh!

HuggingFaceDocBuilder commented Feb 11, 2022

Uh oh!

lewtun Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

lewtun Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

lewtun Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

lewtun commented Feb 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lewtun commented Feb 11, 2022 •

edited

Loading