Rework automatic code samples in docstrings #20757

sgugger · 2022-12-13T17:28:41Z

What does this PR do?

This PR reworks the automatic code sample docstrings in two ways:

First, use the auto-classes for the preprocessing. As was decided internally, we want to document the model class used, but use the auto classes for preprocessing so users are not confused when a given model uses the tokenizer/feature extractor/image processor/processor of another.

Second, we don't want to showcase hf-internal-testing models in the docstrings. Those are tiny random models and it confuses users more than it helps. However when using the standard checkpoint we get doctest problems, so this PR removes the output/loss from the code example when it shouldn't be tested.

Two examples are shown with BERT and DeBERTaV2, I can add more models to the PR if it suits everyone.

sgugger · 2022-12-13T17:29:49Z

src/transformers/utils/doc.py

-    >>> model.config.id2label[predicted_class_id]
-    {expected_output}
-    ```
+    >>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze() > 0.5]


Here the code sample did not make any sense for multi-label classification, so changed it.

sgugger · 2022-12-13T17:30:06Z

src/transformers/utils/doc.py

    >>> image = dataset["test"]["image"][0]

-    >>> feature_extractor = {processor_class}.from_pretrained("{checkpoint}")
+    >>> image_processor = ImageProcessor.from_pretrained("{checkpoint}")


Adapted names in the vision examples.

HuggingFaceDocBuilderDev · 2022-12-13T17:42:51Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh

LGTM, but maybe I miss sth - why for most of them, we don't add processor_class="AutoTokenizer",

ydshieh · 2022-12-13T17:57:31Z

src/transformers/utils/doc.py


    ```python
-    >>> from transformers import {processor_class}, {model_class}
+    >>> from transformers import ImageProcessor, {model_class}


We have AutoImageProcessor. Any reason not to use it? Or it's just because that PR was merged after you started on this PR?

Just a typo, I forgot the Auto here.

ydshieh · 2022-12-13T18:01:52Z

src/transformers/models/bert/modeling_bert.py

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @add_code_sample_docstrings(
-        processor_class=_TOKENIZER_FOR_DOC,
        checkpoint=_CHECKPOINT_FOR_DOC,


not sure why this is not added say as the last argument and same question below

Because there is no processor_class in the code sample used by this model anymore. It's just in the base model, and I have just left it there in case it's used by other modalities.

ArthurZucker

LGTM it simplifies a lot for addition of specific heads without checkpoints.
Are you planning on also updating previously added models ?

amyeroberts

Thanks for adding! Looks a lot tidier 🧹 🧹 🧹

LysandreJik

Nice! Thanks!

Rework automatic code samples in docstrings

85bd5f8

sgugger requested review from ArthurZucker, LysandreJik, amyeroberts and ydshieh December 13, 2022 17:28

sgugger commented Dec 13, 2022

View reviewed changes

ydshieh approved these changes Dec 13, 2022

View reviewed changes

ArthurZucker approved these changes Dec 13, 2022

View reviewed changes

sgugger added 3 commits December 13, 2022 13:32

ImageProcessor->AutoImageProcessor

5fa2c83

Add models to fix copies

08d9fd3

Last typos

16b2c15

amyeroberts approved these changes Dec 14, 2022

View reviewed changes

LysandreJik approved these changes Dec 14, 2022

View reviewed changes

A couple more models

42d6231

ArthurZucker mentioned this pull request Jan 4, 2023

[CI-doc-daily] Remove RobertaPreLayernorm random tests #20992

Merged

Fix copies

c0d1d4f

sgugger merged commit c8f35a9 into main Jan 14, 2023

sgugger deleted the docstring_examples branch January 14, 2023 08:49

ydshieh mentioned this pull request Jan 18, 2023

Fix doctest CI #21166

Merged

sgugger mentioned this pull request Jan 20, 2023

Models docstring #21225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework automatic code samples in docstrings #20757

Rework automatic code samples in docstrings #20757

Uh oh!

sgugger commented Dec 13, 2022

Uh oh!

sgugger Dec 13, 2022

Uh oh!

sgugger Dec 13, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Dec 13, 2022 •

edited

Loading

Uh oh!

ydshieh left a comment

Uh oh!

ydshieh Dec 13, 2022

Uh oh!

sgugger Dec 13, 2022

Uh oh!

ydshieh Dec 13, 2022

Uh oh!

sgugger Dec 13, 2022

Uh oh!

ArthurZucker left a comment

Uh oh!

amyeroberts left a comment

Uh oh!

LysandreJik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Rework automatic code samples in docstrings #20757

Rework automatic code samples in docstrings #20757

Uh oh!

Conversation

sgugger commented Dec 13, 2022

What does this PR do?

Uh oh!

sgugger Dec 13, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 13, 2022

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh Dec 13, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 13, 2022

Choose a reason for hiding this comment

Uh oh!

ydshieh Dec 13, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 13, 2022

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

HuggingFaceDocBuilderDev commented Dec 13, 2022 •

edited

Loading