Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ afe = "afe"
typ = "typ"
rabit = "rabit"
flate = "flate"
Ines = "Ines"

[default.expect]
nprobs = "nprobes"
Expand Down
18 changes: 16 additions & 2 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
site_name: Lance
site_description: Modern columnar data format for ML and LLMs
site_description: Open Lakehouse Format for Multimodal AI
site_url: https://lancedb.github.io/lance/
docs_dir: src

Expand All @@ -8,6 +8,7 @@ repo_url: https://github.com/lancedb/lance

theme:
name: material
custom_dir: overrides
logo: logo/white.png
favicon: logo/logo.png
palette:
Expand Down Expand Up @@ -40,7 +41,11 @@ theme:
markdown_extensions:
- admonition
- pymdownx.details
- pymdownx.superfences
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
Expand Down Expand Up @@ -70,3 +75,12 @@ extra:
- icon: fontawesome/brands/twitter
link: https://twitter.com/lancedb

copyright: © 2025 Lance Format. All rights reserved.

extra_css:
- assets/stylesheets/termynal.css
- assets/stylesheets/home.css

extra_javascript:
- assets/javascript/termynal.js

263 changes: 263 additions & 0 deletions docs/overrides/home.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
{% extends "main.html" %}

{% block tabs %}
{{ super() }}

<style>
/* Prevent horizontal overflow */
body {
overflow-x: hidden;
}

/* Hide main content for home page */
.md-content {
display: none;
}

/* Hide table of contents */
@media screen and (min-width: 60em) {
.md-sidebar--secondary {
display: none;
}
}

/* Hide navigation */
@media screen and (min-width: 76.25em) {
.md-sidebar--primary {
display: none;
}
}

/* Make header static */
.md-header {
position: initial;
}

.md-main__inner {
margin: 0;
}
</style>

<!-- Hero Section -->
<section class="mdx-container">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I found this section didn't look good on mobile. Worth pulling up on a phone and trying it out.

<div class="container">
<div class="intro-message">
<div class="hero-logo">
<img src="logo/white.png" alt="Lance Logo">
<h1>Lance<sup>™</sup></h1>
</div>
<h3>The Open Lakehouse Format for Multimodal AI</h3>
<hr class="intro-divider" />
<ul class="list-inline">
<li>
<a href="quickstart" class="md-button md-button--primary">Get Started</a>
</li>
<li>
<a href="format" class="md-button">Read the Spec</a>
</li>
<li>
<a href="examples/python/llm_training" class="md-button">Train an LLM</a>
</li>
<li>
<a href="https://discord.gg/msY9kdwSYw" class="md-button" target="_blank" rel="noopener">Join Discord</a>
</li>
</ul>
</div>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that would be cool here is to show the main integrations here (above the fold, on the homepage), showing a little button for

  • Python
  • Ray
  • Spark
  • Trino

Then people start to see Lance as something that's supported in multiple engines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added one more feature in home page

</div>
</section>

<!-- What is Lance Section -->
<section class="lance-intro-section">
<div class="container">
<div class="lance-intro-content">
<h2>What is Lance<sup>™</sup>?</h2>
<p>
Lance contains a file format, table format, and catalog spec for multimodal AI,
allowing you to build a complete open lakehouse on top of object storage to power your AI workflows.
Lance brings high-performance vector search, full-text search, random access, and feature
engineering capabilities to the lakehouse, while you can still get all the existing lakehouse benefits
like SQL analytics, ACID transactions, time travel, and integrations with open engines (Apache Spark, Ray, Trino, DuckDB, etc.)
and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino, Hive Metastore, etc.)
</p>
<a href="quickstart" class="md-button md-button--primary">Learn More</a>
</div>
</div>
</section>

<!-- Feature 1: Expressive Hybrid Search -->
<section class="lance-feature-section">
<div class="container">
<div class="lance-feature-content">
<div class="lance-feature-text">
<h2>Expressive Hybrid Search</h2>
<p>
Lance enables powerful hybrid search combining vector similarity, full-text search,
and SQL analytics on the same dataset. All query types are accelerated by corresponding
secondary indices as part of the Lance specification.
</p>
<p>
Run semantic search on embeddings, BM25 search on keywords, and apply complex SQL predicates -
all using a single table with a unified interface.
</p>
<a href="quickstart/vector-search" class="md-button">Learn More</a>
</div>
<div class="lance-feature-demo">
<div id="termynal-hybrid-search"
data-termynal=""
data-ty-startdelay="500"
data-ty-typedelay="40"
data-ty-linedelay="700"
style="width: 500px;">
<span data-ty="input" data-ty-prompt=">>>">import lance</span>
<span data-ty="input" data-ty-prompt=">>>">ds = lance.dataset("s3://my-bucket/docs")</span>
<span data-ty="input" data-ty-prompt=">>>">ds.to_table(full_text_query="machine learning")</span>
<span data-ty="input" data-ty-prompt=">>>">ds.to_table(</span>
<span data-ty="input" data-ty-prompt="..."> nearest={"column": "embedding", "q": query_vec, "k": 10},</span>
<span data-ty="input" data-ty-prompt="..."> filter="year > 2020")</span>
</div>
</div>
</div>
</div>
</section>

<!-- Feature 2: Lightning-fast Random Access -->
<section class="lance-feature-section reverse">
<div class="container">
<div class="lance-feature-content">
<div class="lance-feature-text">
<h2>Lightning-fast Random Access</h2>
<p>
Lance delivers 100x faster random access compared to Parquet or Iceberg. With efficient
row-addressing, you can access individual records across multiple files instantly,
making it perfect for real-time ML serving, random sampling, and interactive applications.
</p>
<p>
Unlike traditional columnar formats, Lance maintains high performance even when
randomly accessing scattered rows across your entire dataset.
</p>
<a href="guide/read_and_write#random-access" class="md-button">Learn More</a>
</div>
<div class="lance-feature-demo">
<div id="termynal-random-access"
data-termynal=""
data-ty-startdelay="1500"
data-ty-typedelay="40"
data-ty-linedelay="700"
style="width: 500px;">
<span data-ty="input" data-ty-prompt=">>>">import lance</span>
<span data-ty="input" data-ty-prompt=">>>">dataset = lance.dataset("s3://my-bucket/embeddings")</span>
<span data-ty="input" data-ty-prompt=">>>">table = dataset.take([100, 5000, 1000000])</span>
<span data-ty="input" data-ty-prompt=">>>">dataset.take([0, 1], columns=["id", "vector"])</span>
</div>
</div>
</div>
</div>
</section>

<!-- Feature 3: Native Multimodal Data Support -->
<section class="lance-feature-section">
<div class="container">
<div class="lance-feature-content">
<div class="lance-feature-text">
<h2>Native Multimodal Data Support</h2>
<p>
Store images, videos, audio, text, and embeddings in a single unified format.
Lance's blob encoding efficiently handles large binary objects with lazy loading,
while optimized vector storage accelerates similarity search.
</p>
<p>
Perfect for AI/ML workloads where you need to store raw data alongside embeddings
for multimodal retrieval and generation workflows.
</p>
<a href="guide/blob" class="md-button">Learn More</a>
</div>
<div class="lance-feature-demo">
<div id="termynal-multimodal"
data-termynal=""
data-ty-startdelay="2500"
data-ty-typedelay="40"
data-ty-linedelay="700"
style="width: 500px;">
<span data-ty="input" data-ty-prompt=">>>">import lance</span>
<span data-ty="input" data-ty-prompt=">>>">import pyarrow as pa</span>
<span data-ty="input" data-ty-prompt=">>>">schema = pa.schema([</span>
<span data-ty="input" data-ty-prompt="..."> pa.field("video", pa.large_binary(),</span>
<span data-ty="input" data-ty-prompt="..."> metadata={"lance-encoding:blob": "true"}),</span>
<span data-ty="input" data-ty-prompt="..."> pa.field("embedding", pa.list_(pa.float32(), 128))])</span>
<span data-ty="input" data-ty-prompt=">>>">lance.write_dataset(table, "multimodal.lance", schema=schema)</span>
</div>
</div>
</div>
</div>
</section>

<!-- Feature 4: Data Evolution -->
<section class="lance-feature-section reverse">
<div class="container">
<div class="lance-feature-content">
<div class="lance-feature-text">
<h2>Data Evolution > Schema Evolution</h2>
<p>
Schema evolution in most open table formats are metadata only and fast.
But when trying to backfill column values in existing rows, a full table rewrite is typically required.
Lance supports efficient schema evolution with backfill, making it perfect for ML
feature engineering, embedding and media content management.
</p>
<p>
Adding a new column with data is as simple as writing new Lance files to the Lance table -
no need to rewrite your entire dataset.
</p>
<a href="guide/data_evolution" class="md-button">Learn More</a>
</div>
<div class="lance-feature-demo">
<div id="termynal-evolution"
data-termynal=""
data-ty-startdelay="3500"
data-ty-typedelay="40"
data-ty-linedelay="700"
style="width: 500px;">
<span data-ty="input" data-ty-prompt=">>>">import lance</span>
<span data-ty="input" data-ty-prompt=">>>">dataset = lance.dataset("my_data.lance")</span>
<span data-ty="input" data-ty-prompt=">>>">@lance.batch_udf()</span>
<span data-ty="input" data-ty-prompt="...">def add_embeddings(batch):</span>
<span data-ty="input" data-ty-prompt="..."> vectors = model.encode(batch["text"])</span>
<span data-ty="input" data-ty-prompt="..."> return {"embedding": vectors}</span>
Comment thread
jackye1995 marked this conversation as resolved.
<span data-ty="input" data-ty-prompt=">>>">dataset.add_columns(add_embeddings)</span>
</div>
</div>
</div>
</div>
</section>

<!-- Feature 5: Rich Ecosystem Integration -->
<section class="lance-feature-section">
<div class="container">
<div class="lance-feature-content">
<div class="lance-feature-text">
<h2>Rich Ecosystem Integrations</h2>
<p>
As an open format, Lance integrates seamlessly with the Python data ecosystem and modern data platforms.
Work with your favorite tools including Pandas, Polars, and PyTorch for data processing and machine learning.
Connect with leading query engines like Apache DataFusion, DuckDB, Apache Spark, Trino, and Apache Flink/Fluss
to run SQL analytics and distributed processing on your Lance datasets.
</p>
<a href="integrations/duckdb" class="md-button">View Integrations</a>
</div>
<div class="lance-feature-demo">
<img src="assets/images/ecosystem-integrations.png" alt="Lance Ecosystem Integrations" style="max-width: 500px; width: 100%; height: auto; border-radius: 8px;">
</div>
</div>
</div>
</section>

<script
src="assets/javascript/termynal.js"
data-termynal-container="#termynal-hybrid-search|#termynal-random-access|#termynal-multimodal|#termynal-evolution">
</script>

{% endblock %}

{% block content %}{% endblock %}
{% block footer %}
{{ super() }}
{% endblock %}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/assets/images/lance-mj.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading