diff --git a/.ai/skills/model-doc/SKILL.md b/.ai/skills/model-doc/SKILL.md new file mode 100644 index 000000000000..b5d3f75f8d28 --- /dev/null +++ b/.ai/skills/model-doc/SKILL.md @@ -0,0 +1,251 @@ +--- +name: model-doc +description: Completes scaffolded model documentation pages at docs/source/en/model_doc/.md by resolving the `` sentinel tokens emitted by the `transformers add-new-model-like` CLI. Resolves paper links, Hub checkpoints, intro summaries, Quickstart inputs, and autodoc blocks against the model's source code under `src/transformers/models//`. Trigger when a contributor asks to "finish the docs for this new model", "fill in the model_doc template", or "complete the scaffolded docs for ". +--- + +# model-doc skill + +Resolve `` sentinel tokens in a scaffolded `docs/source/en/model_doc/.md` file. The source of truth is the codebase under `src/transformers/models//` (modular file, modeling file, config, tokenizer/processor). + +Use the [hf-papers](https://github.com/huggingface/skills/blob/main/skills/hf-cli/SKILL.md#hf-papers--interact-with-papers-on-the-hub), [hf-models](https://github.com/huggingface/skills/blob/main/skills/hf-cli/SKILL.md#hf-models--interact-with-models-on-the-hub), and [hf-collections](https://github.com/huggingface/skills/blob/main/skills/hf-cli/SKILL.md#hf-collections--interact-with-collections-on-the-hub) skills for Hub lookups, with web search as a fallback when those skills are unavailable or return no match. + +## Output contract + +Write the completed file and return nothing else — no summary, no resolver report, no warnings. Sentinels you cannot resolve stay in the file as `` tokens. A human reviewer will fill them in. + +## Scope rule + +The model_doc covers only what is unique to the Transformers integration of a model. Do not duplicate other sources of truth: + +| Info type | Lives in | +|---|---| +| Benchmarks, training data, license, intended use | Hub model card | +| Architecture details, ablations, paper abstract | Paper | +| Quantization backends, generation strategies, attention impls, cache classes | Central Transformers docs | +| Transformers API surface, gotchas specific to this model's integration, one runnable example | model_doc | + +## Sentinel vocabulary + +| Token | Filled with | Primary resolver | +|---|---|---| +| `` | Hub papers page | `hf-papers` | +| `` | 2–4 sentence factual summary | `hf-papers` abstract | +| `` | Hub org or user name | `hf-models` | +| `` | Collection, org page, or single-checkpoint URL | `hf-collections` + `hf-models` | +| `` | Canonical checkpoint ID for the Quickstart | `hf-models` | +| `` | Pipeline task string, e.g. `"text-generation"` | inferred from the public class list | +| `` | `AutoModelForCausalLM`, etc. | inferred from the public class list | +| `` | `AutoTokenizer`, `AutoProcessor`, `AutoImageProcessor`, or `AutoFeatureExtractor` | inferred from the processor class in `__all__`; replaces the hardcoded `AutoTokenizer` the scaffolder emits | +| `` | Modality-appropriate input literal | modality table below | +| `` | Concrete value to `print()` | modality table below | + +## Resolver rules + +### `` and `` + +Try `hf-papers` first, then a web search for an official technical report or release blog post; leave both sentinels in place if neither finds a source. + +When a source is found, write a 2–4 sentence intro under these constraints. + +- Name the most-distinguishing architectural choice the source actually states +- No benchmark claims. "State-of-the-art," "outperforms," "best," "leading" are banned — they rot fast +- No adjectives the source did not use. If the abstract does not call it "novel," neither do you +- No parameter count or training-token count unless the checkpoint ID already implies them +- Present tense, declarative, active voice. No marketing + +The first sentence starts: `[ModelName]() is` + +### `` and `` + +Use `hf-collections` to check whether a curated collection exists. Pick one variant: + +- Collection exists: `You can find all the [ModelName] checkpoints under the [Org](https://hf.co/collections/...) collection.` +- No collection, multiple checkpoints under a verified org: `You can find all the [ModelName] checkpoints on the [Org](https://hf.co/Org) page.` +- No collection, single checkpoint or user-account upload: `The [ModelName] checkpoint is [org/model](https://hf.co/org/model).` + +### `` + +Pick the smallest official checkpoint that produces a convincing Quickstart output. For generative modalities, bump up if the smallest is below the quality threshold: + +- LLMs: prefer ≥ 1B parameters +- ASR: skip `tiny`, start at `base` or equivalent +- Image classification, embeddings, non-generative heads: smallest is fine + +### `` and `` + +Pick one representative task head from the model's `__all__`. Common mappings: + +- `ForCausalLM` → `AutoModelForCausalLM` + `"text-generation"` +- `ForConditionalGeneration` (text-only) → `AutoModelForSeq2SeqLM` + `"text2text-generation"` +- `ForSequenceClassification` → `AutoModelForSequenceClassification` + `"text-classification"` +- `ForImageClassification` → `AutoModelForImageClassification` + `"image-classification"` +- `ForCTC` / `ForSpeechSeq2Seq` → corresponding Auto class + `"automatic-speech-recognition"` +- Multimodal `ForConditionalGeneration` → `AutoModelForImageTextToText` + `"image-text-to-text"` + +If the model is not registered in any `src/transformers/models/auto/*_mapping.py`, drop the Pipeline tab — emit only the AutoModel block, no `` wrapper. + +### `` and variable name + +| Processor suffix in `__all__` | `` | Variable name | +|---|---|---| +| `Tokenizer` / `TokenizerFast` | `AutoTokenizer` | `tokenizer` | +| `ImageProcessor` / `ImageProcessorFast` | `AutoImageProcessor` | `processor` | +| `FeatureExtractor` | `AutoFeatureExtractor` | `processor` | +| `Processor` (multimodal) | `AutoProcessor` | `processor` | + +When `` ≠ `AutoTokenizer`, also rename the `tokenizer` variable to `processor` everywhere in the Quickstart code block (import, instantiation, input prep, decode call). + +### Call style: generative vs non-generative + +Inspect the Quickstart task head (the single head class chosen for ``): + +- Generative (`ForCausalLM`, `ForConditionalGeneration`, `ForSeq2SeqLM`, `ForSpeechSeq2Seq`): replace `model(inputs)` with `model.generate(inputs, max_new_tokens=32)`. +- Non-generative (all other heads): keep `model(inputs)`. + +### Badge row + +Emit a badge row after the release date line and before the `# ModelName` H1. Determine which badges to include by grepping the model's modeling file: + +| Badge | Detection | +|---|---| +| FlashAttention | `_supports_flash_attn_2 = True` on any model class | +| SDPA | `_supports_sdpa = True` on any model class | +| Tensor parallelism | `_tp_plan` dict defined on any model class | + +```bash +grep -n "_supports_flash_attn_2\|_supports_sdpa\|_tp_plan" src/transformers/models//modeling_*.py +``` + +Emit only the badges that match: + +```html +
+
+ FlashAttention + SDPA + Tensor parallelism +
+
+``` + +### `` and `` + +Pick the row that matches the model's processor + task head. Emit exactly these values. + +| Signal | `` | `` | +|---|---|---| +| Tokenizer + `ForCausalLM` | `"Plants create energy through a process known as"` | `tokenizer.decode(outputs[0], skip_special_tokens=True)` | +| Tokenizer + `ForSeq2SeqLM` / `ForConditionalGeneration` | `"translate English to French: Hello, how are you?"` (or task-appropriate) | `tokenizer.decode(outputs[0], skip_special_tokens=True)` | +| Tokenizer + `ForSequenceClassification` | `"I love this movie!"` | `model.config.id2label[outputs.logits.argmax(-1).item()]` | +| Tokenizer + `ForTokenClassification` / `ForQuestionAnswering` | task-appropriate literal | `outputs.logits.shape` | +| ImageProcessor + `ForImageClassification` | `Image.open(BytesIO(httpx.get("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg").content))` | `model.config.id2label[outputs.logits.argmax(-1).item()]` | +| ImageProcessor + other vision heads | same image load | `outputs.last_hidden_state.shape` or task-appropriate | +| FeatureExtractor + `ForCTC` / `ForSpeechSeq2Seq` | `load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")[0]["audio"]` | `processor.batch_decode(outputs.logits.argmax(-1))[0]` for CTC, `processor.batch_decode(outputs, skip_special_tokens=True)[0]` for seq2seq | +| Processor (multimodal image+text) | `[{"role": "user", "content": [{"type": "image", "url": "..."}, {"type": "text", "text": "What is in this image?"}]}]` with `processor.apply_chat_template(..., add_generation_prompt=True, tokenize=True, return_tensors="pt")` | `processor.decode(outputs[0], skip_special_tokens=True)` | + +If the combination matches no row, leave both sentinels in place. + +## Code example conventions + +Apply to every code block in the doc — Quickstart, `## Usage examples`, and `## Quantization`. + +- Never set `dtype` or `torch_dtype`. `from_pretrained` and `pipeline` set it automatically. Strip it out +- Always pass `device_map="auto"` to model `from_pretrained` calls. This is what makes the code work unchanged on single GPU, multi-GPU, MPS, and CPU. In any block, not just the Quickstart +- For image loading from a URL, use `httpx` + `BytesIO` (never `requests`): + ```py + import httpx + from io import BytesIO + + with httpx.stream("GET", url) as response: + image = Image.open(BytesIO(response.read())) + ``` +- Always chain `.to(model.device)` on prepared inputs: + ```py + inputs = tokenizer("...", return_tensors="pt").to(model.device) + outputs = model(inputs) + ``` + This works on single GPU, multi-GPU, MPS, or CPU. + +The Pipeline tab uses `device=0` (the `pipeline()` API convention, not `device_map`). The dtype rule still applies — no `torch_dtype` on `pipeline()` either. + +## Prose conventions + +- Checkpoint IDs such as `Qwen/Qwen3.5-35B-A3B` are not code — write them as plain text or Hub links, never in backticks. Reserve backticks for code: class names, method names, kwarg names, and string literals that appear in code +- Reference Transformers classes and methods using doc-builder link syntax: [`~AutoProcessor.from_pretrained`], [`~pipeline`]. The `~` drops the module prefix so only the leaf name renders as the link text + +## Required sections (in this order) + +1. License + MDX header — scaffolder-emitted; do not edit +2. Release date metadata line — `*This model was released on YYYY-MM-DD and added to Hugging Face Transformers on YYYY-MM-DD.*`, placed above the H1. Tooling-managed — never write or edit the dates yourself +3. Badge row — see Badge row resolver rule above +4. `# ModelName` — H1 +5. Intro paragraph — resolves ``. Links the paper inline. Does not quote the abstract +6. Checkpoints line — one of the three variants from the `` / `` rule +7. Quickstart — two tabs using doc-builder MDX syntax, or a single AutoModel block when no pipeline task mapping exists: + ``` + + + + + ``` +8. `[[autodoc]]` blocks — one `## ` per public class, in the fixed order below + +## Optional sections + +Place between Quickstart and autodoc, in this order. Include only when the model genuinely needs them. + +- `## Usage tips and notes` — bullets for non-obvious transformers-specific gotchas. Each bullet must name a specific kwarg, class, or behavior. Examples: + - "Inputs should be padded on the right because the model uses absolute position embeddings" + - "The tokenizer does not add a BOS token by default — pass `add_special_tokens=True` when batching" + - "Static cache is not supported; use `DynamicCache` for generation" + - "Audio inputs must be mono and resampled to 16 kHz" + +- `## Usage examples` — only when the Quickstart cannot cover the case (multimodal batching, long-form audio windowing, fine-tuning recipes with non-obvious collators). + +## Autodoc coverage + +Public class = the union of every `__all__` list across Python files in `src/transformers/models//`, excluding any class whose name ends in `Output`. + +```bash +grep -h "^__all__" src/transformers/models//*.py +``` + +Order, top to bottom: + +1. Config class +2. Tokenizer (fast variant before slow) / Processor / ImageProcessor / FeatureExtractor +3. Base `Model` +4. `PreTrainedModel` — only if present in `__all__`; many models do not export it +5. Task heads in their `__all__` order + +Each autodoc block is formatted as: +``` +## ClassName + +[[autodoc]] ClassName + - forward +``` + +`- forward` vs `- generate`: every block gets `- forward`, except the single Quickstart task-head class — if it is generative (`ForCausalLM`, `ForConditionalGeneration`, `ForSeq2SeqLM`, `ForSpeechSeq2Seq`), it gets `- generate` instead. Exactly one `- generate` per doc. + +## Workflow + +Ground every resolution in `__all__` and the model's source files — do not rely on paper claims or Hub card descriptions without verifying in code. Apply the resolver rules above for each sentinel; leave unresolvable sentinels in place. Run the quality checklist before writing. Write the completed file and nothing else; never touch the release date line. + +## Quality checklist + +- [ ] Release date line is present above the H1, unedited +- [ ] Intro is 2–4 sentences, links the paper inline, makes no benchmark claims, uses no invented adjectives +- [ ] Checkpoints line matches one of the three variants and drops "official" when not a verified org +- [ ] Quickstart has Pipeline + AutoModel tabs in ``, or a single AutoModel block when no pipeline task exists +- [ ] AutoModel tab uses exactly one task-head class and prints a concrete output (not `outputs`) +- [ ] AutoModel tab imports `` (not hardcoded `AutoTokenizer`) and uses the correct variable name (`tokenizer` or `processor`) per the `` rule +- [ ] AutoModel tab uses `model.generate(inputs, max_new_tokens=32)` for generative task heads and `model(inputs)` for non-generative +- [ ] Every code block follows the conventions: no `torch_dtype` / `dtype`; `device_map="auto"` on every model `from_pretrained`; `.to(model.device)` on every prepared input +- [ ] `## Usage tips and notes` is absent or every bullet names a specific transformers API +- [ ] Badge row is present; FlashAttention/SDPA/Tensor parallelism badges match what `_supports_flash_attn_2`, `_supports_sdpa`, and `_tp_plan` indicate in the modeling file +- [ ] No abstract quote, contributor line, "click sidebar" tip, `## Overview` heading, or long-form checkpoints paragraph +- [ ] Checkpoint IDs in prose are plain text or Hub links, not backtick-wrapped +- [ ] Transformers classes and methods in prose use doc-builder link syntax: [`~ClassName.method`] +- [ ] Autodoc blocks cover the full `__all__` union (minus `*Output` classes) in the fixed order +- [ ] Exactly one autodoc block uses `- generate` — the Quickstart task head if generative diff --git a/src/transformers/cli/add_new_model_like.py b/src/transformers/cli/add_new_model_like.py index 3293f6a4ba8c..3ae2fb4347e9 100644 --- a/src/transformers/cli/add_new_model_like.py +++ b/src/transformers/cli/add_new_model_like.py @@ -272,10 +272,62 @@ def add_model_to_auto_mappings( ) +_PROCESSOR_SUFFIXES = ( + "TokenizerFast", + "Tokenizer", + "ImageProcessorFast", + "ImageProcessor", + "VideoProcessor", + "FeatureExtractor", + "Processor", +) + + +def _sort_public_classes(public_classes: list[str]) -> tuple[list[str], set[str]]: + """ + Sort public classes into the canonical autodoc order for the model_doc template: + + 1. Config class + 2. Tokenizer / Processor / ImageProcessor / FeatureExtractor + 3. Base `Model` + 4. `PreTrainedModel` + 5. Task heads (remaining classes), preserving `__all__` order + + Returns the ordered list and the set of classes that should emit a `- forward` + directive under their autodoc block (everything except config and processors). + """ + config, processor, base_model, pretrained, heads = [], [], [], [], [] + for cls in public_classes: + if cls.endswith("Config"): + config.append(cls) + elif cls.endswith(_PROCESSOR_SUFFIXES): + processor.append(cls) + elif cls.endswith("PreTrainedModel"): + pretrained.append(cls) + elif cls.endswith("Model"): + base_model.append(cls) + else: + heads.append(cls) + + ordered = config + processor + base_model + pretrained + heads + with_forward = set(base_model + pretrained + heads) + return ordered, with_forward + + def create_doc_file(new_paper_name: str, public_classes: list[str]): """ Create a new doc file to fill for the new model. + Emits a skeleton with `` sentinel tokens that the + transformers-model-doc skill (or a human contributor) resolves into the + final doc. Sentinels intentionally stay in the file if unresolved — they + are the signal that the doc is incomplete. + + Code-example conventions: + - No `dtype` / `torch_dtype`. + - `device_map="auto"` on every model `from_pretrained` call. + - Prepared inputs chain `.to(model.device)` before the model call. + Args: new_paper_name (`str`): The fully cased name (as in the official paper name) of the new model. @@ -289,37 +341,51 @@ def create_doc_file(new_paper_name: str, public_classes: list[str]): copyright_for_markdown = re.sub(r"# ?", "", COPYRIGHT).replace("coding=utf-8\n", " - # MyTest - ## Overview + [MyTest]() - The MyTest model was proposed in []() by . - + You can find all the [MyTest] checkpoints under the []() collection. - The abstract from the paper is the following: + ## Quickstart - + + - Tips: + ```py + from transformers import pipeline - + pipe = pipeline(task="", model="", device=0) + pipe("") + ``` - This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/). - The original code can be found [here](). + + - ## Usage examples + ```py + from transformers import AutoTokenizer, - + tokenizer = AutoTokenizer.from_pretrained("") + model = .from_pretrained("", device_map="auto") - ## MyTestConfig + inputs = tokenizer("", return_tensors="pt").to(model.device) + outputs = model(**inputs) + print() + ``` - [[autodoc]] MyTestConfig + + - ## MyTestForCausalLM + ## MyTestConfig - [[autodoc]] MyTestForCausalLM + [[autodoc]] MyTestConfig ## MyTestModel @@ -342,17 +350,25 @@ class MyTestForTokenClassification(LlamaForTokenClassification): [[autodoc]] MyTestPreTrainedModel - forward + ## MyTestForCausalLM + + [[autodoc]] MyTestForCausalLM + - forward + ## MyTestForSequenceClassification [[autodoc]] MyTestForSequenceClassification + - forward ## MyTestForQuestionAnswering [[autodoc]] MyTestForQuestionAnswering + - forward ## MyTestForTokenClassification [[autodoc]] MyTestForTokenClassification + - forward """ ) self.assertFileIsEqual(EXPECTED_DOC, os.path.join(self.DOC_PATH, "model_doc", "my_test.md")) @@ -721,28 +737,40 @@ class MyTest2Processor(Phi4MultimodalProcessor): --> - # MyTest2 - ## Overview + [MyTest2]() - The MyTest2 model was proposed in []() by . - + You can find all the [MyTest2] checkpoints under the []() collection. - The abstract from the paper is the following: + ## Quickstart - + + - Tips: + ```py + from transformers import pipeline - + pipe = pipeline(task="", model="", device=0) + pipe("") + ``` - This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/). - The original code can be found [here](). + + - ## Usage examples + ```py + from transformers import AutoTokenizer, - + tokenizer = AutoTokenizer.from_pretrained("") + model = .from_pretrained("", device_map="auto") + + inputs = tokenizer("", return_tensors="pt").to(model.device) + outputs = model(**inputs) + print() + ``` + + + ## MyTest2VisionConfig @@ -756,29 +784,26 @@ class MyTest2Processor(Phi4MultimodalProcessor): [[autodoc]] MyTest2Config - ## MyTest2AudioPreTrainedModel + ## MyTest2ImageProcessor - [[autodoc]] MyTest2AudioPreTrainedModel - - forward + [[autodoc]] MyTest2ImageProcessor - ## MyTest2AudioModel + ## MyTest2FeatureExtractor - [[autodoc]] MyTest2AudioModel - - forward + [[autodoc]] MyTest2FeatureExtractor - ## MyTest2VisionPreTrainedModel + ## MyTest2Processor - [[autodoc]] MyTest2VisionPreTrainedModel - - forward + [[autodoc]] MyTest2Processor - ## MyTest2VisionModel + ## MyTest2AudioModel - [[autodoc]] MyTest2VisionModel + [[autodoc]] MyTest2AudioModel - forward - ## MyTest2PreTrainedModel + ## MyTest2VisionModel - [[autodoc]] MyTest2PreTrainedModel + [[autodoc]] MyTest2VisionModel - forward ## MyTest2Model @@ -786,21 +811,25 @@ class MyTest2Processor(Phi4MultimodalProcessor): [[autodoc]] MyTest2Model - forward - ## MyTest2ForCausalLM + ## MyTest2AudioPreTrainedModel - [[autodoc]] MyTest2ForCausalLM + [[autodoc]] MyTest2AudioPreTrainedModel + - forward - ## MyTest2ImageProcessor + ## MyTest2VisionPreTrainedModel - [[autodoc]] MyTest2ImageProcessor + [[autodoc]] MyTest2VisionPreTrainedModel + - forward - ## MyTest2FeatureExtractor + ## MyTest2PreTrainedModel - [[autodoc]] MyTest2FeatureExtractor + [[autodoc]] MyTest2PreTrainedModel + - forward - ## MyTest2Processor + ## MyTest2ForCausalLM - [[autodoc]] MyTest2Processor + [[autodoc]] MyTest2ForCausalLM + - forward """ ) self.assertFileIsEqual(EXPECTED_DOC, os.path.join(self.DOC_PATH, "model_doc", "my_test2.md"))