+ Lance contains a file format, table format, and catalog spec for multimodal AI, + allowing you to build a complete open lakehouse on top of object storage to power your AI workflows. + Lance brings high-performance vector search, full-text search, random access, and feature + engineering capabilities to the lakehouse, while you can still get all the existing lakehouse benefits + like SQL analytics, ACID transactions, time travel, and integrations with open engines (Apache Spark, Ray, Trino, DuckDB, etc.) + and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino, Hive Metastore, etc.) +
+ Learn More ++ Lance enables powerful hybrid search combining vector similarity, full-text search, + and SQL analytics on the same dataset. All query types are accelerated by corresponding + secondary indices as part of the Lance specification. +
++ Run semantic search on embeddings, BM25 search on keywords, and apply complex SQL predicates - + all using a single table with a unified interface. +
+ Learn More ++ Lance delivers 100x faster random access compared to Parquet or Iceberg. With efficient + row-addressing, you can access individual records across multiple files instantly, + making it perfect for real-time ML serving, random sampling, and interactive applications. +
++ Unlike traditional columnar formats, Lance maintains high performance even when + randomly accessing scattered rows across your entire dataset. +
+ Learn More ++ Store images, videos, audio, text, and embeddings in a single unified format. + Lance's blob encoding efficiently handles large binary objects with lazy loading, + while optimized vector storage accelerates similarity search. +
++ Perfect for AI/ML workloads where you need to store raw data alongside embeddings + for multimodal retrieval and generation workflows. +
+ Learn More ++ Schema evolution in most open table formats are metadata only and fast. + But when trying to backfill column values in existing rows, a full table rewrite is typically required. + Lance supports efficient schema evolution with backfill, making it perfect for ML + feature engineering, embedding and media content management. +
++ Adding a new column with data is as simple as writing new Lance files to the Lance table - + no need to rewrite your entire dataset. +
+ Learn More ++ As an open format, Lance integrates seamlessly with the Python data ecosystem and modern data platforms. + Work with your favorite tools including Pandas, Polars, and PyTorch for data processing and machine learning. + Connect with leading query engines like Apache DataFusion, DuckDB, Apache Spark, Trino, and Apache Flink/Fluss + to run SQL analytics and distributed processing on your Lance datasets. +
+ View Integrations +
+
-
-*Lance is a modern columnar data format optimized for machine learning and AI applications. It efficiently handles diverse multimodal data types while providing high-performance querying and versioning capabilities.*
-
-[Quickstart Locally With Python](quickstart){ .md-button .md-button--primary } [Read the Format Specification](format){ .md-button .md-button } [Train Your LLM on a Lance Dataset](examples/python/llm_training){ .md-button .md-button--primary }
-
-## 🎯 How Does Lance Work?
-
-Lance is designed to be used with images, videos, 3D point clouds, audio and tabular data. It supports any POSIX file systems, and cloud storage like AWS S3 and Google Cloud Storage.
-
-This file format is particularly suited for [**vector search**](quickstart/vector-search), full-text search and [**LLM training**](examples/python/llm_training) on multimodal data. To learn more about how Lance works, [**read the format specification**](format).
-
-!!! info "Looking for LanceDB?"
- **This is the Lance table format project** - the open source core that powers LanceDB.
- If you want the complete vector database and multimodal lakehouse built on Lance, visit [lancedb.com](https://lancedb.com)
-
-## ⚡ Key Features of Lance Format
-
-| Feature | Description |
-|---------|-------------|
-| 🚀 **[High-Performance Random Access](guide/performance)** | 100x faster than Parquet for random access patterns |
-| 🔄 **[Zero-Copy Data Evolution](guide/data_evolution)** | Add, drop or update column data without rewriting the entire dataset |
-| 🎨 **[Multimodal Data](guide/blob)** | Natively store large text, images, videos, documents and embeddings |
-| 🔍 **[Vector Search](quickstart/vector-search)** | Find nearest neighbors in under 1 millisecond with IVF-PQ, IVF-SQ, HNSW |
-| 📝 **[Full-Text Search](guide/tokenizer)** | Fast search over text with inverted index, Ngram index plus tokenizers |
-| 💾 **[Row Level Transaction](format#conflict-resolution)** | Fully ACID transaction with row level conflict resolution |
-