Releases: huggingface/transformers.js
2.14.2
What's new?
- Add support for new Jina AI jina-embeddings-v2 models (jinaai/jina-embeddings-v2-base-zh and jinaai/jina-embeddings-v2-base-de) in #542.
- Add support for wav2vec2-bert in #544. See here for the full list of supported models.
- Add zero-shot classification demo in #519 (see online demo):
Full Changelog: 2.14.1...2.14.2
2.14.1
What's new?
-
Add support for Depth Anything (#534). See here for the list of available models.
Example: Depth estimation with
Xenova/depth-anything-small-hf
.import { pipeline } from '@xenova/transformers'; // Create depth-estimation pipeline const depth_estimator = await pipeline('depth-estimation', 'Xenova/depth-anything-small-hf'); // Predict depth map for the given image const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/bread_small.png'; const output = await depth_estimator(url); // { // predicted_depth: Tensor { // dims: [350, 518], // type: 'float32', // data: Float32Array(181300) [...], // size: 181300 // }, // depth: RawImage { // data: Uint8Array(271360) [...], // width: 640, // height: 424, // channels: 1 // } // }
You can visualize the output with:
output.depth.save('depth.png');
Input image Visualized output Online demo: https://huggingface.co/spaces/Xenova/depth-anything-web
Example video:
depth-anything-demo-final.mp4
-
Fix typo in tokenizers.js (#518)
-
Return empty tokens array if text is empty after normalization (#535)
Full Changelog: 2.14.0...2.14.1
2.14.0
What's new?
🚀 Segment Anything Model (SAM)
The Segment Anything Model (SAM) can be used to generate segmentation masks for objects in a scene, given an input image and input points. See here for the full list of pre-converted models. Support for this model was added in #510.
Demo + source code: https://huggingface.co/spaces/Xenova/segment-anything-web
Example: Perform mask generation w/ Xenova/slimsam-77-uniform
.
import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';
const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');
const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]] // 2D localization of a window
const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);
const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);
// [
// Tensor {
// dims: [ 1, 3, 410, 614 ],
// type: 'bool',
// data: Uint8Array(755220) [ ... ],
// size: 755220
// }
// ]
const scores = outputs.iou_scores;
console.log(scores);
// Tensor {
// dims: [ 1, 1, 3 ],
// type: 'float32',
// data: Float32Array(3) [
// 0.8350210189819336,
// 0.9786665439605713,
// 0.8379436731338501
// ],
// size: 3
// }
You can then visualize the 3 predicted masks with:
const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');
Input image | Visualized output |
---|---|
Next, select the channel with the highest IoU score, which in this case is the second (green) channel. Intersecting this with the original image gives us an isolated version of the subject:
Selected Mask | Intersected |
---|---|
🛠️ Improvements
- Add support for processing non-square images w/
ConvNextFeatureExtractor
in #503 - Encode revision in remote URL by #507
Full Changelog: 2.13.4...2.14.0
2.13.4
What's new?
-
Add support for cross-encoder models (+fix token type ids) (#501)
Example: Information Retrieval w/
Xenova/ms-marco-TinyBERT-L-2-v2
.import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers'; const model = await AutoModelForSequenceClassification.from_pretrained('Xenova/ms-marco-TinyBERT-L-2-v2'); const tokenizer = await AutoTokenizer.from_pretrained('Xenova/ms-marco-TinyBERT-L-2-v2'); const features = tokenizer( ['How many people live in Berlin?', 'How many people live in Berlin?'], { text_pair: [ 'Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.', ], padding: true, truncation: true, } ) const { logits } = await model(features) console.log(logits.data); // quantized: [ 7.210887908935547, -11.559350967407227 ] // unquantized: [ 7.235750675201416, -11.562294006347656 ]
Check out the list of pre-converted models here. We also put out a demo for you to try out.
Full Changelog: 2.13.3...2.13.4
2.13.3
2.13.2
2.13.1
What's new?
-
Improve typing of
pipeline
function in #485. Thanks to @wesbos for the suggestion!This also means when you hover over the class name, you'll get example code to help you out.
-
Add
phi-1_5
model in #493.See example code
import { pipeline } from '@xenova/transformers'; // Create a text-generation pipeline const generator = await pipeline('text-generation', 'Xenova/phi-1_5_dev'); // Construct prompt const prompt = `\`\`\`py import math def print_prime(n): """ Print all primes between 1 and n """`; // Generate text const result = await generator(prompt, { max_new_tokens: 100, }); console.log(result[0].generated_text);
Results in:
import math def print_prime(n): """ Print all primes between 1 and n """ primes = [] for num in range(2, n+1): is_prime = True for i in range(2, int(math.sqrt(num))+1): if num % i == 0: is_prime = False break if is_prime: primes.append(num) print(primes) print_prime(20)
Running the code produces the correct result:
[2, 3, 5, 7, 11, 13, 17, 19]
Full Changelog: 2.13.0...2.13.1
2.13.0
What's new?
🎄 7 new architectures!
This release adds support for many new multimodal architectures, bringing the total number of supported architectures to 80! 🤯
1. VITS for multilingual text-to-speech across over 1000 languages! (#466)
import { pipeline } from '@xenova/transformers';
// Create English text-to-speech pipeline
const synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng');
// Generate speech
const output = await synthesizer('I love transformers');
// {
// audio: Float32Array(26112) [...],
// sampling_rate: 16000
// }
mms-tts-eng.mp4
See here for the list of available models. To start, we've converted 12 of the ~1140 models on the Hugging Face Hub. If we haven't added the one you wish to use, you can make it web-ready using our conversion script.
2. CLIPSeg for zero-shot image segmentation. (#478)
import { AutoTokenizer, AutoProcessor, CLIPSegForImageSegmentation, RawImage } from '@xenova/transformers';
// Load tokenizer, processor, and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clipseg-rd64-refined');
const processor = await AutoProcessor.from_pretrained('Xenova/clipseg-rd64-refined');
const model = await CLIPSegForImageSegmentation.from_pretrained('Xenova/clipseg-rd64-refined');
// Run tokenization
const texts = ['a glass', 'something to fill', 'wood', 'a jar'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });
// Read image and run processor
const image = await RawImage.read('https://github.com/timojl/clipseg/blob/master/example_image.jpg?raw=true');
const image_inputs = await processor(image);
// Run model with both text and pixel inputs
const { logits } = await model({ ...text_inputs, ...image_inputs });
// logits: Tensor {
// dims: [4, 352, 352],
// type: 'float32',
// data: Float32Array(495616)[ ... ],
// size: 495616
// }
You can visualize the predictions as follows:
const preds = logits
.unsqueeze_(1)
.sigmoid_()
.mul_(255)
.round_()
.to('uint8');
for (let i = 0; i < preds.dims[0]; ++i) {
const img = RawImage.fromTensor(preds[i]);
img.save(`prediction_${i}.png`);
}
Original | "a glass" |
"something to fill" |
"wood" |
"a jar" |
---|---|---|---|---|
See here for the list of available models.
3. SegFormer for semantic segmentation and image classification. (#480)
import { pipeline } from '@xenova/transformers';
// Create an image segmentation pipeline
const segmenter = await pipeline('image-segmentation', 'Xenova/segformer_b2_clothes');
// Segment an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/young-man-standing-and-leaning-on-car.jpg';
const output = await segmenter(url);
See output
[
{
score: null,
label: 'Background',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Hair',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Upper-clothes',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Pants',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Left-shoe',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Right-shoe',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Face',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Left-leg',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Right-leg',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Left-arm',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Right-arm',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
}
]
See here for the list of available models.
4. Table Transformer for table extraction from unstructured documents. (#477)
import { pipeline } from '@xenova/transformers';
// Create an object detection pipeline
const detector = await pipeline('object-detection', 'Xenova/table-transformer-detection', { quantized: false });
// Detect tables in an image
const img = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice-with-table.png';
const output = await detector(img);
// [{ score: 0.9967531561851501, label: 'table', box: { xmin: 52, ymin: 322, xmax: 546, ymax: 525 } }]
See here for the list of available models.
5. DiT for document image classification. (#474)
import { pipeline } from '@xenova/transformers';
// Create an image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/dit-base-finetuned-rvlcdip');
// Classify an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/coca_cola_advertisement.png';
const output = await classifier(url);
// [{ label: 'advertisement', score: 0.9035086035728455 }]
See here for the list of available models.
6. SigLIP for zero-shot image classification. (#473)
import { pipeline } from '@xenova/transformers';
// Create a zero-shot image classification pipeline
const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-base-patch16-224');
// Classify images according to provided labels
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
hypothesis_template: 'a photo of {}',
});
// [
// { score: 0.16770583391189575, label: '2 cats' },
// { score: 0.000022096000975579955, label: '2 dogs' }
// ]
See here for the list of available models.
7. RoFormer for masked language modelling, sequence classification, token classification, and question answering. (#464)
import { pipeline } from '@xenova/transformers';
// Create a masked language modelling pipeline
const pipe = await pipeline('fill-mask', 'Xenova/antiberta2');
// Predict missing token
const output = await pipe('Ḣ Q V Q ... C A [MASK] D ... T V S S');
See output
[
{
score: 0.48774364590644836,
token: 19,
token_str: 'R',
sequence: 'Ḣ Q V Q C A R D T V S S'
},
{
score: 0.2768442928791046,
token: 18,
token_str: 'Q...
2.12.1
What's new?
Patch for release 2.12.1, making @huggingface/jinja
a dependency instead of a peer dependency. This also means apply_chat_template
is now synchronous (and does not lazily load the module). In future, we may want to add this functionality, but for now, it causes issues with lazy loading from a CDN.
code
import { AutoTokenizer } from "@xenova/transformers";
// Load tokenizer from the Hugging Face Hub
const tokenizer = await AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1");
// Define chat messages
const chat = [
{ role: "user", content: "Hello, how are you?" },
{ role: "assistant", content: "I'm doing great. How can I help you today?" },
{ role: "user", content: "I'd like to show off how chat templating works!" },
]
const text = tokenizer.apply_chat_template(chat, { tokenize: false });
// "<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
const input_ids = tokenizer.apply_chat_template(chat, { tokenize: true, return_tensor: false });
// [1, 733, 16289, 28793, 22557, 28725, 910, 460, 368, 28804, 733, 28748, 16289, 28793, ...]
Full Changelog: 2.12.0...2.12.1
2.12.0
What's new?
💬 Chat templates!
This release adds support for chat templates, a highly-requested feature that enables users to convert conversations (represented as a list of chat objects) into a single tokenizable string, in the format that the model expects. As you may know, chat templates can vary greatly across model types, so it was important to design a system that: (1) supports complex chat templates; (2) is generalizable, and (3) is easy to use. So, how did we do it? 🤔
This is made possible with @huggingface/jinja
, a minimalistic JavaScript implementation of the Jinja templating engine, that we created to align with how transformers handles templating. Although it was originally designed for parsing and rendering ChatML templates, we decided to separate out the templating logic into an external (optional) library due to its usefulness in other types of applications. Special thanks to @tlaceby for his amazing "Guide to Interpreters" series, which provided the basis for our implementation. 🤗
Anyway, let's take a look at an example:
import { AutoTokenizer } from "@xenova/transformers";
// Load tokenizer from the Hugging Face Hub
const tokenizer = await AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1");
// Define chat messages
const chat = [
{ role: "user", content: "Hello, how are you?" },
{ role: "assistant", content: "I'm doing great. How can I help you today?" },
{ role: "user", content: "I'd like to show off how chat templating works!" },
]
const text = tokenizer.apply_chat_template(chat, { tokenize: false });
// "<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
Notice how the entire chat is condensed into a single string. If you would instead like to return the tokenized version (i.e., a list of token IDs), you can use the following:
const input_ids = tokenizer.apply_chat_template(chat, { tokenize: true, return_tensor: false });
// [1, 733, 16289, 28793, 22557, 28725, 910, 460, 368, 28804, 733, 28748, 16289, 28793, 28737, 28742, 28719, 2548, 1598, 28723, 1602, 541, 315, 1316, 368, 3154, 28804, 2, 28705, 733, 16289, 28793, 315, 28742, 28715, 737, 298, 1347, 805, 910, 10706, 5752, 1077, 3791, 28808, 733, 28748, 16289, 28793]
For more information about chat templates, check out the transformers documentation.
🐛 Bug fixes
-
Incorrect encoding/decoding of whitespace around special characters with Fast Llama tokenizers. These bugs will also soon be fixed in the transformers library. For backwards compatibility reasons, if the tokenizer was exported with the legacy behaviour, it will still act in the same way unless explicitly set otherwise. Newer exports won't be affected. If you wish to override this default, to either still use the legacy behaviour (for backwards compatibility reasons), or to upgrade to the fixed version, you can do so with:
// Use the default behaviour (specified in tokenizer_config.json, which in the case is `{legacy: false}`). const tokenizer = await AutoTokenizer.from_pretrained('Xenova/llama2-tokenizer'); const { input_ids } = tokenizer('<s>\n', { add_special_tokens: false, return_tensor: false }); console.log(input_ids); // [1, 13] // Use the legacy behaviour const tokenizer = await AutoTokenizer.from_pretrained('Xenova/llama2-tokenizer', { legacy: true }); const { input_ids } = tokenizer('<s>\n', { add_special_tokens: false, return_tensor: false }); console.log(input_ids); // [1, 29871, 13]
-
Strip whitespace around special tokens for wav2vec tokenizers.
🔨 Improvements
- More comprehensive tokenizer test suite: including both static and dynamic tokenizer tests for encoding, decoding, and chat templates.
Full Changelog: 2.11.0...2.12.0