Skip to content

Conversation

@sgugger
Copy link
Collaborator

@sgugger sgugger commented Dec 13, 2022

What does this PR do?

This PR reworks the automatic code sample docstrings in two ways:

First, use the auto-classes for the preprocessing. As was decided internally, we want to document the model class used, but use the auto classes for preprocessing so users are not confused when a given model uses the tokenizer/feature extractor/image processor/processor of another.

Second, we don't want to showcase hf-internal-testing models in the docstrings. Those are tiny random models and it confuses users more than it helps. However when using the standard checkpoint we get doctest problems, so this PR removes the output/loss from the code example when it shouldn't be tested.

Two examples are shown with BERT and DeBERTaV2, I can add more models to the PR if it suits everyone.

>>> model.config.id2label[predicted_class_id]
{expected_output}
```
>>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze() > 0.5]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the code sample did not make any sense for multi-label classification, so changed it.

>>> image = dataset["test"]["image"][0]
>>> feature_extractor = {processor_class}.from_pretrained("{checkpoint}")
>>> image_processor = ImageProcessor.from_pretrained("{checkpoint}")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapted names in the vision examples.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 13, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but maybe I miss sth - why for most of them, we don't add processor_class="AutoTokenizer",

```python
>>> from transformers import {processor_class}, {model_class}
>>> from transformers import ImageProcessor, {model_class}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have AutoImageProcessor. Any reason not to use it? Or it's just because that PR was merged after you started on this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a typo, I forgot the Auto here.

@add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings(
processor_class=_TOKENIZER_FOR_DOC,
checkpoint=_CHECKPOINT_FOR_DOC,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this is not added say as the last argument and same question below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is no processor_class in the code sample used by this model anymore. It's just in the base model, and I have just left it there in case it's used by other modalities.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM it simplifies a lot for addition of specific heads without checkpoints.
Are you planning on also updating previously added models ?

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding! Looks a lot tidier 🧹 🧹 🧹

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks!

@sgugger sgugger merged commit c8f35a9 into main Jan 14, 2023
@sgugger sgugger deleted the docstring_examples branch January 14, 2023 08:49
@ydshieh ydshieh mentioned this pull request Jan 18, 2023
@sgugger sgugger mentioned this pull request Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants