`swift-embeddings`

Run embedding models locally in Swift using MLTensor. Inspired by mlx-embeddings.

Supported Models Archictectures

BERT (Bidirectional Encoder Representations from Transformers)

Some of the supported models on Hugging Face:

XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)

Some of the supported models on Hugging Face:

CLIP (Contrastive Language–Image Pre-training)

NOTE: only text encoding is supported for now. Some of the supported models on Hugging Face:

Word2Vec

NOTE: it's a word embedding model. It loads and keeps the whole model in memory. For the more memory efficient solution, you might want to use SQLiteVec. Some of the supported models on Hugging Face:

Installation

Add the following to your Package.swift file. In the package dependencies add:

dependencies: [
    .package(url: "https://github.com/jkrukowski/swift-embeddings", from: "0.0.7")
]

In the target dependencies add:

dependencies: [
    .product(name: "Embeddings", package: "swift-embeddings")
]

Usage

Encoding

import Embeddings

// load model and tokenizer from Hugging Face
let modelBundle = try await Bert.loadModelBundle(
    from: "sentence-transformers/all-MiniLM-L6-v2"
)

// encode text
let encoded = modelBundle.encode("The cat is black")
let result = await encoded.cast(to: Float.self).shapedArray(of: Float.self).scalars

// print result
print(result)

Batch Encoding

import Embeddings
import MLTensorUtils

let texts = [
    "The cat is black",
    "The dog is black",
    "The cat sleeps well"
]
let modelBundle = try await Bert.loadModelBundle(
    from: "sentence-transformers/all-MiniLM-L6-v2"
)
let encoded = modelBundle.batchEncode(texts)
let distance = cosineDistance(encoded, encoded)
let result = await distance.cast(to: Float.self).shapedArray(of: Float.self).scalars
print(result)

Command Line Demo

To run the command line demo, use the following command:

swift run embeddings-cli <subcommand> [--model-id <model-id>] [--model-file <model-file>] [--text <text>] [--max-length <max-length>]

Subcommands:

bert                    Encode text using BERT model
clip                    Encode text using CLIP model
xlm-roberta             Encode text using XLMRoberta model
word2vec                Encode word using Word2Vec model

Command line options:

--model-id <model-id>                       Id of the model to use
--model-file <model-file>                   Path to the model file (only for `Word2Vec`)
--text <text>                               Text to encode
--max-length <max-length>                   Maximum length of the input (not for `Word2Vec`)
-h, --help                                  Show help information.

Code Formatting

This project uses swift-format. To format the code run:

swift format . -i -r --configuration .swift-format

Acknowledgements

This project is based on and uses some of the code from:

mlx-embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Sources		Sources
Tests		Tests
.gitignore		.gitignore
.swift-format		.swift-format
LICENSE.md		LICENSE.md
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`swift-embeddings`

Supported Models Archictectures

BERT (Bidirectional Encoder Representations from Transformers)

XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)

CLIP (Contrastive Language–Image Pre-training)

Word2Vec

Installation

Usage

Encoding

Batch Encoding

Command Line Demo

Code Formatting

Acknowledgements

About

Releases 2

Languages

License

jkrukowski/swift-embeddings

Folders and files

Latest commit

History

Repository files navigation

swift-embeddings

Supported Models Archictectures

BERT (Bidirectional Encoder Representations from Transformers)

XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)

CLIP (Contrastive Language–Image Pre-training)

Word2Vec

Installation

Usage

Encoding

Batch Encoding

Command Line Demo

Code Formatting

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Languages

`swift-embeddings`