Merge pull request #985 from huggingface/v3-docs

Improve documentation (v3)
huggingface · Oct 22, 2024 · e8c0f77 · e8c0f77
2 parents d61848e + 96b30ae
commit e8c0f77
Show file tree

Hide file tree

Showing 9 changed files with 332 additions and 69 deletions.
diff --git a/README.md b/README.md
@@ -11,25 +11,19 @@
 </p>
 
 <p align="center">
-    <a href="https://www.npmjs.com/package/@huggingface/transformers">
-        <img alt="NPM" src="https://img.shields.io/npm/v/@huggingface/transformers">
-    </a>
-    <a href="https://www.npmjs.com/package/@huggingface/transformers">
-        <img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@huggingface/transformers">
-    </a>
-    <a href="https://www.jsdelivr.com/package/npm/@huggingface/transformers">
-        <img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@huggingface/transformers">
-    </a>
-    <a href="https://github.com/huggingface/transformers.js/blob/main/LICENSE">
-        <img alt="License" src="https://img.shields.io/github/license/huggingface/transformers.js?color=blue">
-    </a>
-    <a href="https://huggingface.co/docs/transformers.js/index">
-        <img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online">
-    </a>
+    <a href="https://www.npmjs.com/package/@huggingface/transformers"><img alt="NPM" src="https://img.shields.io/npm/v/@huggingface/transformers"></a>
+    <a href="https://www.npmjs.com/package/@huggingface/transformers"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@huggingface/transformers"></a>
+    <a href="https://www.jsdelivr.com/package/npm/@huggingface/transformers"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@huggingface/transformers"></a>
+    <a href="https://github.com/huggingface/transformers.js/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/huggingface/transformers.js?color=blue"></a>
+    <a href="https://huggingface.co/docs/transformers.js/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online"></a>
 </p>
 
 
-State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
+<h3 align="center">
+  <p>State-of-the-art Machine Learning for the Web</p>
+</h3>
+
+Run 🤗 Transformers directly in your browser, with no need for a server!
 
 Transformers.js is designed to be functionally equivalent to Hugging Face's [transformers](https://github.com/huggingface/transformers) python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as:
   - 📝 **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
@@ -42,6 +36,22 @@ Transformers.js uses [ONNX Runtime](https://onnxruntime.ai/) to run models in th
 For more information, check out the full [documentation](https://huggingface.co/docs/transformers.js).
 
 
+## Installation
+
+
+To install via [NPM](https://www.npmjs.com/package/@huggingface/transformers), run:
+```bash
+npm i @huggingface/transformers
+```
+
+Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
+```html
+<script type="module">
+    import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';
+</script>
+```
+
+
 ## Quick tour
 
 
@@ -72,9 +82,9 @@ out = pipe('I love transformers!')
 import { pipeline } from '@huggingface/transformers';
 
 // Allocate a pipeline for sentiment-analysis
-let pipe = await pipeline('sentiment-analysis');
+const pipe = await pipeline('sentiment-analysis');
 
-let out = await pipe('I love transformers!');
+const out = await pipe('I love transformers!');
 // [{'label': 'POSITIVE', 'score': 0.999817686}]
 ```
 
@@ -86,29 +96,40 @@ let out = await pipe('I love transformers!');
 You can also use a different model by specifying the model id or path as the second argument to the `pipeline` function. For example:
 ```javascript
 // Use a different model for sentiment-analysis
-let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
+const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
 ```
 
+By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like
+to run the model on your GPU (via WebGPU), you can do this by setting `device: 'webgpu'`, for example:
+```javascript
+// Run the model on WebGPU
+const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
+  device: 'webgpu',
+});
+```
 
-## Installation
+For more information, check out the [WebGPU guide](https://huggingface.co/docs/transformers.js/guides/webgpu).
 
+> [!WARNING]
+> The WebGPU API is still experimental in many browsers, so if you run into any issues,
+> please file a [bug report](https://github.com/huggingface/transformers.js/issues/new?title=%5BWebGPU%5D%20Error%20running%20MODEL_ID_GOES_HERE&assignees=&labels=bug,webgpu&projects=&template=1_bug-report.yml).
 
-To install via [NPM](https://www.npmjs.com/package/@huggingface/transformers), run:
-```bash
-npm i @huggingface/transformers
-```
-
-Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
-```html
-<script type="module">
-    import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';
-</script>
+In resource-constrained environments, such as web browsers, it is advisable to use a quantized version of
+the model to lower bandwidth and optimize performance. This can be achieved by adjusting the `dtype` option,
+which allows you to select the appropriate data type for your model. While the available options may vary
+depending on the specific model, typical choices include `"fp32"` (default for WebGPU), `"fp16"`, `"q8"`
+(default for WASM), and `"q4"`. For more information, check out the [quantization guide](https://huggingface.co/docs/transformers.js/guides/dtypes).
+```javascript
+// Run the model at 4-bit quantization
+const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
+  dtype: 'q4',
+});
 ```
 
 
 ## Examples
 
-Want to jump straight in? Get started with one of our sample applications/templates:
+Want to jump straight in? Get started with one of our sample applications/templates, which can be found [here](https://github.com/huggingface/transformers.js-examples).
 
 | Name              | Description                      | Links                   |
 |-------------------|----------------------------------|-------------------------------|

diff --git a/docs/scripts/build_readme.py b/docs/scripts/build_readme.py
@@ -13,33 +13,23 @@
 </p>
 
 <p align="center">
-    <a href="https://www.npmjs.com/package/@huggingface/transformers">
-        <img alt="NPM" src="https://img.shields.io/npm/v/@huggingface/transformers">
-    </a>
-    <a href="https://www.npmjs.com/package/@huggingface/transformers">
-        <img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@huggingface/transformers">
-    </a>
-    <a href="https://www.jsdelivr.com/package/npm/@huggingface/transformers">
-        <img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@huggingface/transformers">
-    </a>
-    <a href="https://github.com/huggingface/transformers.js/blob/main/LICENSE">
-        <img alt="License" src="https://img.shields.io/github/license/huggingface/transformers.js?color=blue">
-    </a>
-    <a href="https://huggingface.co/docs/transformers.js/index">
-        <img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online">
-    </a>
+    <a href="https://www.npmjs.com/package/@huggingface/transformers"><img alt="NPM" src="https://img.shields.io/npm/v/@huggingface/transformers"></a>
+    <a href="https://www.npmjs.com/package/@huggingface/transformers"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@huggingface/transformers"></a>
+    <a href="https://www.jsdelivr.com/package/npm/@huggingface/transformers"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@huggingface/transformers"></a>
+    <a href="https://github.com/huggingface/transformers.js/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/huggingface/transformers.js?color=blue"></a>
+    <a href="https://huggingface.co/docs/transformers.js/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online"></a>
 </p>
 
 {intro}
 
-## Quick tour
-
-{quick_tour}
-
 ## Installation
 
 {installation}
 
+## Quick tour
+
+{quick_tour}
+
 ## Examples
 
 {examples}

diff --git a/docs/snippets/0_introduction.snippet b/docs/snippets/0_introduction.snippet
@@ -1,5 +1,9 @@
 
-State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
+<h3 align="center">
+  <p>State-of-the-art Machine Learning for the Web</p>
+</h3>
+
+Run 🤗 Transformers directly in your browser, with no need for a server!
 
 Transformers.js is designed to be functionally equivalent to Hugging Face's [transformers](https://github.com/huggingface/transformers) python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as:
   - 📝 **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.

diff --git a/docs/snippets/1_quick-tour.snippet b/docs/snippets/1_quick-tour.snippet
@@ -26,9 +26,9 @@ out = pipe('I love transformers!')
 import { pipeline } from '@huggingface/transformers';
 
 // Allocate a pipeline for sentiment-analysis
-let pipe = await pipeline('sentiment-analysis');
+const pipe = await pipeline('sentiment-analysis');
 
-let out = await pipe('I love transformers!');
+const out = await pipe('I love transformers!');
 // [{'label': 'POSITIVE', 'score': 0.999817686}]
 ```
 
@@ -40,5 +40,32 @@ let out = await pipe('I love transformers!');
 You can also use a different model by specifying the model id or path as the second argument to the `pipeline` function. For example:
 ```javascript
 // Use a different model for sentiment-analysis
-let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
+const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
+```
+
+By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like
+to run the model on your GPU (via WebGPU), you can do this by setting `device: 'webgpu'`, for example:
+```javascript
+// Run the model on WebGPU
+const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
+  device: 'webgpu',
+});
+```
+
+For more information, check out the [WebGPU guide](/guides/webgpu).
+
+> [!WARNING]
+> The WebGPU API is still experimental in many browsers, so if you run into any issues,
+> please file a [bug report](https://github.com/huggingface/transformers.js/issues/new?title=%5BWebGPU%5D%20Error%20running%20MODEL_ID_GOES_HERE&assignees=&labels=bug,webgpu&projects=&template=1_bug-report.yml).
+
+In resource-constrained environments, such as web browsers, it is advisable to use a quantized version of
+the model to lower bandwidth and optimize performance. This can be achieved by adjusting the `dtype` option,
+which allows you to select the appropriate data type for your model. While the available options may vary
+depending on the specific model, typical choices include `"fp32"` (default for WebGPU), `"fp16"`, `"q8"`
+(default for WASM), and `"q4"`. For more information, check out the [quantization guide](/guides/dtypes).
+```javascript
+// Run the model at 4-bit quantization
+const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
+  dtype: 'q4',
+});
 ```
diff --git a/docs/snippets/3_examples.snippet b/docs/snippets/3_examples.snippet
@@ -1,4 +1,4 @@
-Want to jump straight in? Get started with one of our sample applications/templates:
+Want to jump straight in? Get started with one of our sample applications/templates, which can be found [here](https://github.com/huggingface/transformers.js-examples).
 
 | Name              | Description                      | Links                   |
 |-------------------|----------------------------------|-------------------------------|

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -23,10 +23,14 @@
     title: Server-side Inference in Node.js
   title: Tutorials
 - sections:
+  - local: guides/webgpu
+    title: Running models on WebGPU
+  - local: guides/dtypes
+    title: Using quantized models (dtypes)
   - local: guides/private
     title: Accessing Private/Gated Models
   - local: guides/node-audio-processing
-    title: Server-side Audio Processing in Node.js
+    title: Server-side Audio Processing
   title: Developer Guides
 - sections:
   - local: api/transformers

diff --git a/docs/source/guides/dtypes.md b/docs/source/guides/dtypes.md
@@ -0,0 +1,130 @@
+# Using quantized models (dtypes)
+
+Before Transformers.js v3, we used the `quantized` option to specify whether to use a quantized (q8) or full-precision (fp32) variant of the model by setting `quantized` to `true` or `false`, respectively. Now, we've added the ability to select from a much larger list with the `dtype` parameter.
+
+The list of available quantizations depends on the model, but some common ones are: full-precision (`"fp32"`), half-precision (`"fp16"`), 8-bit (`"q8"`, `"int8"`, `"uint8"`), and 4-bit (`"q4"`, `"bnb4"`, `"q4f16"`).
+
+<p align="center">
+    <picture> 
+        <source media="(prefers-color-scheme: dark)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/dtypes-dark.jpg" style="max-width: 100%;">
+        <source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/dtypes-light.jpg" style="max-width: 100%;">
+        <img alt="Available dtypes for mixedbread-ai/mxbai-embed-xsmall-v1" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/dtypes-dark.jpg" style="max-width: 100%;">
+    </picture>
+  <a href="https://huggingface.co/mixedbread-ai/mxbai-embed-xsmall-v1/tree/main/onnx">(e.g., mixedbread-ai/mxbai-embed-xsmall-v1)</a>
+</p>
+
+## Basic usage
+
+**Example:** Run Qwen2.5-0.5B-Instruct in 4-bit quantization ([demo](https://v2.scrimba.com/s0dlcpv0ci))
+
+```js
+import { pipeline } from "@huggingface/transformers";
+
+// Create a text generation pipeline
+const generator = await pipeline(
+  "text-generation",
+  "onnx-community/Qwen2.5-0.5B-Instruct",
+  { dtype: "q4", device: "webgpu" },
+);
+
+// Define the list of messages
+const messages = [
+  { role: "system", content: "You are a helpful assistant." },
+  { role: "user", content: "Tell me a funny joke." },
+];
+
+// Generate a response
+const output = await generator(messages, { max_new_tokens: 128 });
+console.log(output[0].generated_text.at(-1).content);
+```
+
+## Per-module dtypes
+
+Some encoder-decoder models, like Whisper or Florence-2, are extremely sensitive to quantization settings: especially of the encoder. For this reason, we added the ability to select per-module dtypes, which can be done by providing a mapping from module name to dtype.
+
+**Example:** Run Florence-2 on WebGPU ([demo](https://v2.scrimba.com/s0pdm485fo))
+
+```js
+import { Florence2ForConditionalGeneration } from "@huggingface/transformers";
+
+const model = await Florence2ForConditionalGeneration.from_pretrained(
+  "onnx-community/Florence-2-base-ft",
+  {
+    dtype: {
+      embed_tokens: "fp16",
+      vision_encoder: "fp16",
+      encoder_model: "q4",
+      decoder_model_merged: "q4",
+    },
+    device: "webgpu",
+  },
+);
+```
+
+<p align="middle">
+  <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/florence-2-webgpu.gif" alt="Florence-2 running on WebGPU" />
+</p>
+
+<details>
+<summary>
+See full code example
+</summary>
+
+```js
+import {
+  Florence2ForConditionalGeneration,
+  AutoProcessor,
+  AutoTokenizer,
+  RawImage,
+} from "@huggingface/transformers";
+
+// Load model, processor, and tokenizer
+const model_id = "onnx-community/Florence-2-base-ft";
+const model = await Florence2ForConditionalGeneration.from_pretrained(
+  model_id,
+  {
+    dtype: {
+      embed_tokens: "fp16",
+      vision_encoder: "fp16",
+      encoder_model: "q4",
+      decoder_model_merged: "q4",
+    },
+    device: "webgpu",
+  },
+);
+const processor = await AutoProcessor.from_pretrained(model_id);
+const tokenizer = await AutoTokenizer.from_pretrained(model_id);
+
+// Load image and prepare vision inputs
+const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg";
+const image = await RawImage.fromURL(url);
+const vision_inputs = await processor(image);
+
+// Specify task and prepare text inputs
+const task = "<MORE_DETAILED_CAPTION>";
+const prompts = processor.construct_prompts(task);
+const text_inputs = tokenizer(prompts);
+
+// Generate text
+const generated_ids = await model.generate({
+  ...text_inputs,
+  ...vision_inputs,
+  max_new_tokens: 100,
+});
+
+// Decode generated text
+const generated_text = tokenizer.batch_decode(generated_ids, {
+  skip_special_tokens: false,
+})[0];
+
+// Post-process the generated text
+const result = processor.post_process_generation(
+  generated_text,
+  task,
+  image.size,
+);
+console.log(result);
+// { '<MORE_DETAILED_CAPTION>': 'A green car is parked in front of a tan building. The building has a brown door and two brown windows. The car is a two door and the door is closed. The green car has black tires.' }
+```
+
+</details>