Models docstring #21225

sgugger · 2023-01-20T22:06:59Z

What does this PR do?

This PR cleans up all docstrings following up from #20757 and #21199. It removes the need for the processor_class in TensorFlow and Flax generic examples by setting in the examples like #20757 did for PyTorch then makes a full pass across all models to clean up the docstrings (removing the processor_classin theadd_code_sample` decorator, remove random outputs, use the auto classes for preprocessing).

Note that in some cases we can't use the auto-classes for preprocessing: when linking to the __call__ method of a processor or image processor, we need the actual class (cc @amyeroberts I changed a couple of things you did here).

HuggingFaceDocBuilderDev · 2023-01-20T22:27:01Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2023-01-23T13:33:28Z

Thank you @sgugger for cleaning this up. With all ~250 files, I will trust you instead of look lines by lines, except one question below.

I would definitely prefer to run a doctest first offline before merging this PR - for which I can launch on my side. From previous PRs, it has shown there are always some surprise. I will launch doctest CI when all reviewers give their approval.

So here my question

Note that in some cases we can't use the auto-classes for preprocessing: when linking to the call method of a processor or image processor, we need the actual class (cc @amyeroberts I changed a couple of things you did here).

I see even in such places, we still have

Pixel values can be obtained using [`AutoImageProcessor`].  See [`ConvNextImageProcessor.__call__`] for details.

I don't have much context and prior knowledge, but is it true we want to use AutoImageProcessor but ConvNextImageProcessor.__call__ in such cases?

sgugger · 2023-01-23T14:49:51Z

With all ~250 files, I will trust you instead of look lines by lines.

A review would still be much appreciated, as it could catch accidental typos.

I would definitely prefer to run a doctest first offline before merging this PR - for which I can launch on my side. From previous PRs, it has shown there are always some surprise. I will launch doctest CI when all reviewers give their approval.

Sure, we can wait for that as long as the results are available before the release branch is cut.

I don't have much context and prior knowledge, but is it true we want to use AutoImageProcessor but ConvNextImageProcessor.call in such cases?

Yes.

ydshieh · 2023-01-23T15:00:12Z

I triggered the doctest CI against the (last) commit (so far) in this PR. Will take a look on the PR changes too :-)

run page

LysandreJik

Impressive PR! Only found a couple of typos.

LysandreJik · 2023-01-23T16:08:44Z

src/transformers/models/blenderbot/modeling_tf_blenderbot.py

+        tokenizer = AutoTokenizer.from_pretrained(mname) >>> UTTERANCE = "My friends are cool but they eat too many
+        carbs." >>> print("Human: ", UTTERANCE) >>> inputs = tokenizer([UTTERANCE], return_tensors='tf') >>> reply_ids
+        = model.generate(**inputs) >>> print("Bot: ", tokenizer.batch_decode(reply_ids, skip_special_tokens=True)[0])



There's an issue in this docstring's format

LysandreJik · 2023-01-23T16:09:04Z

src/transformers/models/blenderbot_small/modeling_flax_blenderbot_small.py

+        >>> from transformers import AutoTokenizer, FlaxBlenderbotSmallForConditionalGeneration >>> tokenizer =
+        AutoTokenizer.from_pretrained('facebook/blenderbot_small-90M') >>> TXT = "My friends are <mask> but they eat
+        too many carbs."


LysandreJik · 2023-01-23T16:09:19Z

src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py

-        >>> from transformers import BlenderbotSmallTokenizer, TFBlenderbotSmallForConditionalGeneration >>> mname =
+        >>> from transformers import AutoTokenizer, TFBlenderbotSmallForConditionalGeneration >>> mname =
        'facebook/blenderbot_small-90M' >>> model = BlenderbotSmallForConditionalGeneration.from_pretrained(mname) >>>
-        tokenizer = BlenderbotSmallTokenizer.from_pretrained(mname)
+        tokenizer = AutoTokenizer.from_pretrained(mname)


ydshieh

Leave some minor comments. (one is mentioned by Lysandre already I believe).
Thank you again!

ydshieh · 2023-01-23T15:36:17Z

src/transformers/models/bert/modeling_tf_bert.py

 _CHECKPOINT_FOR_QA = "ydshieh/bert-base-cased-squad2"
 _QA_EXPECTED_OUTPUT = "'a nice puppet'"
 _QA_EXPECTED_LOSS = 7.41
-_QA_TARGET_START_INDEX = 14


Currently these are not used, so I understand the reason you remove them.

But we should instead keep here and use them in add_code_sample_docstrings for BertForQuestionAnswering.

(without using it, it works currently due to the default values 14 and 15 specified in def add_code_sample_docstrings - this is not the best approach though)

ydshieh · 2023-01-23T15:45:48Z

src/transformers/models/blenderbot/modeling_blenderbot.py


 _CONFIG_FOR_DOC = "BlenderbotConfig"
-_TOKENIZER_FOR_DOC = "BlenderbotTokenizer"
-_CHECKPOINT_FOR_DOC = "facebook/blenderbot-400M-distill"


Just wanna pointing out this attribute in pipeline testing (to generate tests)

if hasattr(module, "_CHECKPOINT_FOR_DOC"): return module._CHECKPOINT_FOR_DOC else: logger.warning(f"Can't retrieve checkpoint from {architecture.__name__}")

I am ok with this change though, as we plan to use new tiny models for pipeline tests (which doesn't need this attribute anymore)

Removing the _CHECKPOINT_FOR_DOC here is my mistake.

ydshieh · 2023-01-23T15:49:20Z

src/transformers/models/blenderbot_small/modeling_flax_blenderbot_small.py


    Summarization example:

-        >>> from transformers import BlenderbotSmallTokenizer, FlaxBlenderbotSmallForConditionalGeneration


the docstring in this block seems broken style, but this is not introduced in this PR. Good to fix it here if you find time.

ydshieh · 2023-01-23T15:51:10Z

src/transformers/models/blenderbot_small/modeling_tf_blenderbot_small.py

 BLENDERBOT_SMALL_GENERATION_EXAMPLE = r"""
    Conversation example::

-        >>> from transformers import BlenderbotSmallTokenizer, TFBlenderbotSmallForConditionalGeneration >>> mname =


doc example broken (not from this PR though)

ydshieh · 2023-01-23T16:06:10Z

src/transformers/models/clip/modeling_clip.py

        ```python
        >>> from PIL import Image
        >>> import requests
-        >>> from transformers import CLIPProcessor, CLIPVisionModel


(just for the record) For CLIP-like models, we sometimes use the Processor to process text components or vision models, but other times use tokenizer and image processor to do this.

ydshieh · 2023-01-23T16:19:12Z

src/transformers/models/distilbert/modeling_flax_distilbert.py


 _CHECKPOINT_FOR_DOC = "distilbert-base-uncased"
 _CONFIG_FOR_DOC = "DistilBertConfig"
-_TOKENIZER_FOR_DOC = "DistilBertTokenizer"


can't comment on the exact line, but this file has

Indices can be obtained using [`BertTokenizer`]. See [`PreTrainedTokenizer.encode`]

ydshieh · 2023-01-23T16:28:41Z

src/transformers/models/gpt2/modeling_gpt2.py

        output_type=SequenceClassifierOutputWithPast,
        config_class=_CONFIG_FOR_DOC,
-        expected_output="'LABEL_0'",
-        expected_loss=5.28,


Why we remove expected ouptuts/loss here? Just because we don't want to see LABEL_0?

The checkpoint is real one.

LABEL_0 is not informative at all and the loss seems very high, I don't think this is a model trained for sequence classification.

ydshieh · 2023-01-23T17:02:58Z

src/transformers/models/tapas/modeling_tapas.py


 _CONFIG_FOR_DOC = "TapasConfig"
-_TOKENIZER_FOR_DOC = "TapasTokenizer"
 _TOKENIZER_FOR_DOC = "google/tapas-base"


_TOKENIZER_FOR_DOC = "google/tapas-base" one line below should be deleted too.

Remove obsolete tokenizer doc from append_call_sample_docstring to match API changes in huggingface/transformers#21225

sgugger added 3 commits January 20, 2023 17:00

Clean all models

1c2c620

Style

d4a8858

Last to remove

0ba1a37

sgugger requested review from LysandreJik and ydshieh January 20, 2023 22:06

amyeroberts mentioned this pull request Jan 23, 2023

Update TF doc test template #21260

Closed

5 tasks

LysandreJik approved these changes Jan 23, 2023

View reviewed changes

sgugger and others added 2 commits January 23, 2023 11:43

address review comments

1c8d22a

Merge branch 'main' into models_docstring

d8d8ab2

ydshieh approved these changes Jan 23, 2023

View reviewed changes

Address review comments

e02b30e

sgugger merged commit fd5cdae into main Jan 23, 2023

sgugger deleted the models_docstring branch January 23, 2023 19:33

sgugger mentioned this pull request Jan 24, 2023

Add X-MOD #20939

Merged

5 tasks

This was referenced Jan 25, 2023

[Hubert] Fix Hubert processing auto #21299

Merged

[Doctest] Fix Blenderbot doctest #21297

Merged

[Doctest] Fix Perceiver doctest #21318

Merged

pziecina-nv added a commit to triton-inference-server/pytriton that referenced this pull request Feb 1, 2023

Fix Jax Opt multinode example

887d138

Remove obsolete tokenizer doc from append_call_sample_docstring to match API changes in huggingface/transformers#21225


		Summarization example:

		>>> from transformers import BlenderbotSmallTokenizer, FlaxBlenderbotSmallForConditionalGeneration

Models docstring #21225

Models docstring #21225

Uh oh!

Conversation

sgugger commented Jan 20, 2023

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Jan 23, 2023

Uh oh!

ydshieh commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HuggingFaceDocBuilderDev commented Jan 20, 2023 •

edited

Loading

ydshieh commented Jan 23, 2023 •

edited

Loading

ydshieh commented Jan 23, 2023 •

edited

Loading