Skip to content

Commit

Permalink
Merge pull request #985 from huggingface/v3-docs
Browse files Browse the repository at this point in the history
Improve documentation (v3)
  • Loading branch information
xenova authored Oct 22, 2024
2 parents d61848e + 96b30ae commit e8c0f77
Show file tree
Hide file tree
Showing 9 changed files with 332 additions and 69 deletions.
83 changes: 52 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,25 +11,19 @@
</p>

<p align="center">
<a href="https://www.npmjs.com/package/@huggingface/transformers">
<img alt="NPM" src="https://img.shields.io/npm/v/@huggingface/transformers">
</a>
<a href="https://www.npmjs.com/package/@huggingface/transformers">
<img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@huggingface/transformers">
</a>
<a href="https://www.jsdelivr.com/package/npm/@huggingface/transformers">
<img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@huggingface/transformers">
</a>
<a href="https://github.com/huggingface/transformers.js/blob/main/LICENSE">
<img alt="License" src="https://img.shields.io/github/license/huggingface/transformers.js?color=blue">
</a>
<a href="https://huggingface.co/docs/transformers.js/index">
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online">
</a>
<a href="https://www.npmjs.com/package/@huggingface/transformers"><img alt="NPM" src="https://img.shields.io/npm/v/@huggingface/transformers"></a>
<a href="https://www.npmjs.com/package/@huggingface/transformers"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@huggingface/transformers"></a>
<a href="https://www.jsdelivr.com/package/npm/@huggingface/transformers"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@huggingface/transformers"></a>
<a href="https://github.com/huggingface/transformers.js/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/huggingface/transformers.js?color=blue"></a>
<a href="https://huggingface.co/docs/transformers.js/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online"></a>
</p>


State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
<h3 align="center">
<p>State-of-the-art Machine Learning for the Web</p>
</h3>

Run 🤗 Transformers directly in your browser, with no need for a server!

Transformers.js is designed to be functionally equivalent to Hugging Face's [transformers](https://github.com/huggingface/transformers) python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as:
- 📝 **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
Expand All @@ -42,6 +36,22 @@ Transformers.js uses [ONNX Runtime](https://onnxruntime.ai/) to run models in th
For more information, check out the full [documentation](https://huggingface.co/docs/transformers.js).


## Installation


To install via [NPM](https://www.npmjs.com/package/@huggingface/transformers), run:
```bash
npm i @huggingface/transformers
```

Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
```html
<script type="module">
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';
</script>
```


## Quick tour


Expand Down Expand Up @@ -72,9 +82,9 @@ out = pipe('I love transformers!')
import { pipeline } from '@huggingface/transformers';

// Allocate a pipeline for sentiment-analysis
let pipe = await pipeline('sentiment-analysis');
const pipe = await pipeline('sentiment-analysis');

let out = await pipe('I love transformers!');
const out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]
```

Expand All @@ -86,29 +96,40 @@ let out = await pipe('I love transformers!');
You can also use a different model by specifying the model id or path as the second argument to the `pipeline` function. For example:
```javascript
// Use a different model for sentiment-analysis
let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
```

By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like
to run the model on your GPU (via WebGPU), you can do this by setting `device: 'webgpu'`, for example:
```javascript
// Run the model on WebGPU
const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
device: 'webgpu',
});
```

## Installation
For more information, check out the [WebGPU guide](https://huggingface.co/docs/transformers.js/guides/webgpu).

> [!WARNING]
> The WebGPU API is still experimental in many browsers, so if you run into any issues,
> please file a [bug report](https://github.com/huggingface/transformers.js/issues/new?title=%5BWebGPU%5D%20Error%20running%20MODEL_ID_GOES_HERE&assignees=&labels=bug,webgpu&projects=&template=1_bug-report.yml).
To install via [NPM](https://www.npmjs.com/package/@huggingface/transformers), run:
```bash
npm i @huggingface/transformers
```

Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with:
```html
<script type="module">
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';
</script>
In resource-constrained environments, such as web browsers, it is advisable to use a quantized version of
the model to lower bandwidth and optimize performance. This can be achieved by adjusting the `dtype` option,
which allows you to select the appropriate data type for your model. While the available options may vary
depending on the specific model, typical choices include `"fp32"` (default for WebGPU), `"fp16"`, `"q8"`
(default for WASM), and `"q4"`. For more information, check out the [quantization guide](https://huggingface.co/docs/transformers.js/guides/dtypes).
```javascript
// Run the model at 4-bit quantization
const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
dtype: 'q4',
});
```


## Examples

Want to jump straight in? Get started with one of our sample applications/templates:
Want to jump straight in? Get started with one of our sample applications/templates, which can be found [here](https://github.com/huggingface/transformers.js-examples).

| Name | Description | Links |
|-------------------|----------------------------------|-------------------------------|
Expand Down
28 changes: 9 additions & 19 deletions docs/scripts/build_readme.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,33 +13,23 @@
</p>
<p align="center">
<a href="https://www.npmjs.com/package/@huggingface/transformers">
<img alt="NPM" src="https://img.shields.io/npm/v/@huggingface/transformers">
</a>
<a href="https://www.npmjs.com/package/@huggingface/transformers">
<img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@huggingface/transformers">
</a>
<a href="https://www.jsdelivr.com/package/npm/@huggingface/transformers">
<img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@huggingface/transformers">
</a>
<a href="https://github.com/huggingface/transformers.js/blob/main/LICENSE">
<img alt="License" src="https://img.shields.io/github/license/huggingface/transformers.js?color=blue">
</a>
<a href="https://huggingface.co/docs/transformers.js/index">
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online">
</a>
<a href="https://www.npmjs.com/package/@huggingface/transformers"><img alt="NPM" src="https://img.shields.io/npm/v/@huggingface/transformers"></a>
<a href="https://www.npmjs.com/package/@huggingface/transformers"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/@huggingface/transformers"></a>
<a href="https://www.jsdelivr.com/package/npm/@huggingface/transformers"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/@huggingface/transformers"></a>
<a href="https://github.com/huggingface/transformers.js/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/huggingface/transformers.js?color=blue"></a>
<a href="https://huggingface.co/docs/transformers.js/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers.js/index.svg?down_color=red&down_message=offline&up_message=online"></a>
</p>
{intro}
## Quick tour
{quick_tour}
## Installation
{installation}
## Quick tour
{quick_tour}
## Examples
{examples}
Expand Down
6 changes: 5 additions & 1 deletion docs/snippets/0_introduction.snippet
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
<h3 align="center">
<p>State-of-the-art Machine Learning for the Web</p>
</h3>

Run 🤗 Transformers directly in your browser, with no need for a server!

Transformers.js is designed to be functionally equivalent to Hugging Face's [transformers](https://github.com/huggingface/transformers) python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as:
- 📝 **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
Expand Down
33 changes: 30 additions & 3 deletions docs/snippets/1_quick-tour.snippet
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ out = pipe('I love transformers!')
import { pipeline } from '@huggingface/transformers';
// Allocate a pipeline for sentiment-analysis
let pipe = await pipeline('sentiment-analysis');
const pipe = await pipeline('sentiment-analysis');
let out = await pipe('I love transformers!');
const out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]
```

Expand All @@ -40,5 +40,32 @@ let out = await pipe('I love transformers!');
You can also use a different model by specifying the model id or path as the second argument to the `pipeline` function. For example:
```javascript
// Use a different model for sentiment-analysis
let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
```

By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like
to run the model on your GPU (via WebGPU), you can do this by setting `device: 'webgpu'`, for example:
```javascript
// Run the model on WebGPU
const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
device: 'webgpu',
});
```

For more information, check out the [WebGPU guide](/guides/webgpu).

> [!WARNING]
> The WebGPU API is still experimental in many browsers, so if you run into any issues,
> please file a [bug report](https://github.com/huggingface/transformers.js/issues/new?title=%5BWebGPU%5D%20Error%20running%20MODEL_ID_GOES_HERE&assignees=&labels=bug,webgpu&projects=&template=1_bug-report.yml).

In resource-constrained environments, such as web browsers, it is advisable to use a quantized version of
the model to lower bandwidth and optimize performance. This can be achieved by adjusting the `dtype` option,
which allows you to select the appropriate data type for your model. While the available options may vary
depending on the specific model, typical choices include `"fp32"` (default for WebGPU), `"fp16"`, `"q8"`
(default for WASM), and `"q4"`. For more information, check out the [quantization guide](/guides/dtypes).
```javascript
// Run the model at 4-bit quantization
const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', {
dtype: 'q4',
});
```
2 changes: 1 addition & 1 deletion docs/snippets/3_examples.snippet
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Want to jump straight in? Get started with one of our sample applications/templates:
Want to jump straight in? Get started with one of our sample applications/templates, which can be found [here](https://github.com/huggingface/transformers.js-examples).

| Name | Description | Links |
|-------------------|----------------------------------|-------------------------------|
Expand Down
6 changes: 5 additions & 1 deletion docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,14 @@
title: Server-side Inference in Node.js
title: Tutorials
- sections:
- local: guides/webgpu
title: Running models on WebGPU
- local: guides/dtypes
title: Using quantized models (dtypes)
- local: guides/private
title: Accessing Private/Gated Models
- local: guides/node-audio-processing
title: Server-side Audio Processing in Node.js
title: Server-side Audio Processing
title: Developer Guides
- sections:
- local: api/transformers
Expand Down
130 changes: 130 additions & 0 deletions docs/source/guides/dtypes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Using quantized models (dtypes)

Before Transformers.js v3, we used the `quantized` option to specify whether to use a quantized (q8) or full-precision (fp32) variant of the model by setting `quantized` to `true` or `false`, respectively. Now, we've added the ability to select from a much larger list with the `dtype` parameter.

The list of available quantizations depends on the model, but some common ones are: full-precision (`"fp32"`), half-precision (`"fp16"`), 8-bit (`"q8"`, `"int8"`, `"uint8"`), and 4-bit (`"q4"`, `"bnb4"`, `"q4f16"`).

<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/dtypes-dark.jpg" style="max-width: 100%;">
<source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/dtypes-light.jpg" style="max-width: 100%;">
<img alt="Available dtypes for mixedbread-ai/mxbai-embed-xsmall-v1" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/dtypes-dark.jpg" style="max-width: 100%;">
</picture>
<a href="https://huggingface.co/mixedbread-ai/mxbai-embed-xsmall-v1/tree/main/onnx">(e.g., mixedbread-ai/mxbai-embed-xsmall-v1)</a>
</p>

## Basic usage

**Example:** Run Qwen2.5-0.5B-Instruct in 4-bit quantization ([demo](https://v2.scrimba.com/s0dlcpv0ci))

```js
import { pipeline } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
"text-generation",
"onnx-community/Qwen2.5-0.5B-Instruct",
{ dtype: "q4", device: "webgpu" },
);

// Define the list of messages
const messages = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Tell me a funny joke." },
];

// Generate a response
const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);
```

## Per-module dtypes

Some encoder-decoder models, like Whisper or Florence-2, are extremely sensitive to quantization settings: especially of the encoder. For this reason, we added the ability to select per-module dtypes, which can be done by providing a mapping from module name to dtype.

**Example:** Run Florence-2 on WebGPU ([demo](https://v2.scrimba.com/s0pdm485fo))

```js
import { Florence2ForConditionalGeneration } from "@huggingface/transformers";

const model = await Florence2ForConditionalGeneration.from_pretrained(
"onnx-community/Florence-2-base-ft",
{
dtype: {
embed_tokens: "fp16",
vision_encoder: "fp16",
encoder_model: "q4",
decoder_model_merged: "q4",
},
device: "webgpu",
},
);
```

<p align="middle">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/florence-2-webgpu.gif" alt="Florence-2 running on WebGPU" />
</p>

<details>
<summary>
See full code example
</summary>

```js
import {
Florence2ForConditionalGeneration,
AutoProcessor,
AutoTokenizer,
RawImage,
} from "@huggingface/transformers";

// Load model, processor, and tokenizer
const model_id = "onnx-community/Florence-2-base-ft";
const model = await Florence2ForConditionalGeneration.from_pretrained(
model_id,
{
dtype: {
embed_tokens: "fp16",
vision_encoder: "fp16",
encoder_model: "q4",
decoder_model_merged: "q4",
},
device: "webgpu",
},
);
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

// Load image and prepare vision inputs
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg";
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Specify task and prepare text inputs
const task = "<MORE_DETAILED_CAPTION>";
const prompts = processor.construct_prompts(task);
const text_inputs = tokenizer(prompts);

// Generate text
const generated_ids = await model.generate({
...text_inputs,
...vision_inputs,
max_new_tokens: 100,
});

// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, {
skip_special_tokens: false,
})[0];

// Post-process the generated text
const result = processor.post_process_generation(
generated_text,
task,
image.size,
);
console.log(result);
// { '<MORE_DETAILED_CAPTION>': 'A green car is parked in front of a tan building. The building has a brown door and two brown windows. The car is a two door and the door is closed. The green car has black tires.' }
```

</details>
Loading

0 comments on commit e8c0f77

Please sign in to comment.