-
Notifications
You must be signed in to change notification settings - Fork 648
docs: introduce lance as a lakehouse format #5209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9f69d47
c95b681
05fe5ac
ed7b157
7c008e2
ce37bc0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,263 @@ | ||
| {% extends "main.html" %} | ||
|
|
||
| {% block tabs %} | ||
| {{ super() }} | ||
|
|
||
| <style> | ||
| /* Prevent horizontal overflow */ | ||
| body { | ||
| overflow-x: hidden; | ||
| } | ||
|
|
||
| /* Hide main content for home page */ | ||
| .md-content { | ||
| display: none; | ||
| } | ||
|
|
||
| /* Hide table of contents */ | ||
| @media screen and (min-width: 60em) { | ||
| .md-sidebar--secondary { | ||
| display: none; | ||
| } | ||
| } | ||
|
|
||
| /* Hide navigation */ | ||
| @media screen and (min-width: 76.25em) { | ||
| .md-sidebar--primary { | ||
| display: none; | ||
| } | ||
| } | ||
|
|
||
| /* Make header static */ | ||
| .md-header { | ||
| position: initial; | ||
| } | ||
|
|
||
| .md-main__inner { | ||
| margin: 0; | ||
| } | ||
| </style> | ||
|
|
||
| <!-- Hero Section --> | ||
| <section class="mdx-container"> | ||
| <div class="container"> | ||
| <div class="intro-message"> | ||
| <div class="hero-logo"> | ||
| <img src="logo/white.png" alt="Lance Logo"> | ||
| <h1>Lance<sup>™</sup></h1> | ||
| </div> | ||
| <h3>The Open Lakehouse Format for Multimodal AI</h3> | ||
| <hr class="intro-divider" /> | ||
| <ul class="list-inline"> | ||
| <li> | ||
| <a href="quickstart" class="md-button md-button--primary">Get Started</a> | ||
| </li> | ||
| <li> | ||
| <a href="format" class="md-button">Read the Spec</a> | ||
| </li> | ||
| <li> | ||
| <a href="examples/python/llm_training" class="md-button">Train an LLM</a> | ||
| </li> | ||
| <li> | ||
| <a href="https://discord.gg/msY9kdwSYw" class="md-button" target="_blank" rel="noopener">Join Discord</a> | ||
| </li> | ||
| </ul> | ||
| </div> | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One thing that would be cool here is to show the main integrations here (above the fold, on the homepage), showing a little button for
Then people start to see Lance as something that's supported in multiple engines.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added one more feature in home page |
||
| </div> | ||
| </section> | ||
|
|
||
| <!-- What is Lance Section --> | ||
| <section class="lance-intro-section"> | ||
| <div class="container"> | ||
| <div class="lance-intro-content"> | ||
| <h2>What is Lance<sup>™</sup>?</h2> | ||
| <p> | ||
| Lance contains a file format, table format, and catalog spec for multimodal AI, | ||
| allowing you to build a complete open lakehouse on top of object storage to power your AI workflows. | ||
| Lance brings high-performance vector search, full-text search, random access, and feature | ||
| engineering capabilities to the lakehouse, while you can still get all the existing lakehouse benefits | ||
| like SQL analytics, ACID transactions, time travel, and integrations with open engines (Apache Spark, Ray, Trino, DuckDB, etc.) | ||
| and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino, Hive Metastore, etc.) | ||
| </p> | ||
| <a href="quickstart" class="md-button md-button--primary">Learn More</a> | ||
| </div> | ||
| </div> | ||
| </section> | ||
|
|
||
| <!-- Feature 1: Expressive Hybrid Search --> | ||
| <section class="lance-feature-section"> | ||
| <div class="container"> | ||
| <div class="lance-feature-content"> | ||
| <div class="lance-feature-text"> | ||
| <h2>Expressive Hybrid Search</h2> | ||
| <p> | ||
| Lance enables powerful hybrid search combining vector similarity, full-text search, | ||
| and SQL analytics on the same dataset. All query types are accelerated by corresponding | ||
| secondary indices as part of the Lance specification. | ||
| </p> | ||
| <p> | ||
| Run semantic search on embeddings, BM25 search on keywords, and apply complex SQL predicates - | ||
| all using a single table with a unified interface. | ||
| </p> | ||
| <a href="quickstart/vector-search" class="md-button">Learn More</a> | ||
| </div> | ||
| <div class="lance-feature-demo"> | ||
| <div id="termynal-hybrid-search" | ||
| data-termynal="" | ||
| data-ty-startdelay="500" | ||
| data-ty-typedelay="40" | ||
| data-ty-linedelay="700" | ||
| style="width: 500px;"> | ||
| <span data-ty="input" data-ty-prompt=">>>">import lance</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">ds = lance.dataset("s3://my-bucket/docs")</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">ds.to_table(full_text_query="machine learning")</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">ds.to_table(</span> | ||
| <span data-ty="input" data-ty-prompt="..."> nearest={"column": "embedding", "q": query_vec, "k": 10},</span> | ||
| <span data-ty="input" data-ty-prompt="..."> filter="year > 2020")</span> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </section> | ||
|
|
||
| <!-- Feature 2: Lightning-fast Random Access --> | ||
| <section class="lance-feature-section reverse"> | ||
| <div class="container"> | ||
| <div class="lance-feature-content"> | ||
| <div class="lance-feature-text"> | ||
| <h2>Lightning-fast Random Access</h2> | ||
| <p> | ||
| Lance delivers 100x faster random access compared to Parquet or Iceberg. With efficient | ||
| row-addressing, you can access individual records across multiple files instantly, | ||
| making it perfect for real-time ML serving, random sampling, and interactive applications. | ||
| </p> | ||
| <p> | ||
| Unlike traditional columnar formats, Lance maintains high performance even when | ||
| randomly accessing scattered rows across your entire dataset. | ||
| </p> | ||
| <a href="guide/read_and_write#random-access" class="md-button">Learn More</a> | ||
| </div> | ||
| <div class="lance-feature-demo"> | ||
| <div id="termynal-random-access" | ||
| data-termynal="" | ||
| data-ty-startdelay="1500" | ||
| data-ty-typedelay="40" | ||
| data-ty-linedelay="700" | ||
| style="width: 500px;"> | ||
| <span data-ty="input" data-ty-prompt=">>>">import lance</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">dataset = lance.dataset("s3://my-bucket/embeddings")</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">table = dataset.take([100, 5000, 1000000])</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">dataset.take([0, 1], columns=["id", "vector"])</span> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </section> | ||
|
|
||
| <!-- Feature 3: Native Multimodal Data Support --> | ||
| <section class="lance-feature-section"> | ||
| <div class="container"> | ||
| <div class="lance-feature-content"> | ||
| <div class="lance-feature-text"> | ||
| <h2>Native Multimodal Data Support</h2> | ||
| <p> | ||
| Store images, videos, audio, text, and embeddings in a single unified format. | ||
| Lance's blob encoding efficiently handles large binary objects with lazy loading, | ||
| while optimized vector storage accelerates similarity search. | ||
| </p> | ||
| <p> | ||
| Perfect for AI/ML workloads where you need to store raw data alongside embeddings | ||
| for multimodal retrieval and generation workflows. | ||
| </p> | ||
| <a href="guide/blob" class="md-button">Learn More</a> | ||
| </div> | ||
| <div class="lance-feature-demo"> | ||
| <div id="termynal-multimodal" | ||
| data-termynal="" | ||
| data-ty-startdelay="2500" | ||
| data-ty-typedelay="40" | ||
| data-ty-linedelay="700" | ||
| style="width: 500px;"> | ||
| <span data-ty="input" data-ty-prompt=">>>">import lance</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">import pyarrow as pa</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">schema = pa.schema([</span> | ||
| <span data-ty="input" data-ty-prompt="..."> pa.field("video", pa.large_binary(),</span> | ||
| <span data-ty="input" data-ty-prompt="..."> metadata={"lance-encoding:blob": "true"}),</span> | ||
| <span data-ty="input" data-ty-prompt="..."> pa.field("embedding", pa.list_(pa.float32(), 128))])</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">lance.write_dataset(table, "multimodal.lance", schema=schema)</span> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </section> | ||
|
|
||
| <!-- Feature 4: Data Evolution --> | ||
| <section class="lance-feature-section reverse"> | ||
| <div class="container"> | ||
| <div class="lance-feature-content"> | ||
| <div class="lance-feature-text"> | ||
| <h2>Data Evolution > Schema Evolution</h2> | ||
| <p> | ||
| Schema evolution in most open table formats are metadata only and fast. | ||
| But when trying to backfill column values in existing rows, a full table rewrite is typically required. | ||
| Lance supports efficient schema evolution with backfill, making it perfect for ML | ||
| feature engineering, embedding and media content management. | ||
| </p> | ||
| <p> | ||
| Adding a new column with data is as simple as writing new Lance files to the Lance table - | ||
| no need to rewrite your entire dataset. | ||
| </p> | ||
| <a href="guide/data_evolution" class="md-button">Learn More</a> | ||
| </div> | ||
| <div class="lance-feature-demo"> | ||
| <div id="termynal-evolution" | ||
| data-termynal="" | ||
| data-ty-startdelay="3500" | ||
| data-ty-typedelay="40" | ||
| data-ty-linedelay="700" | ||
| style="width: 500px;"> | ||
| <span data-ty="input" data-ty-prompt=">>>">import lance</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">dataset = lance.dataset("my_data.lance")</span> | ||
| <span data-ty="input" data-ty-prompt=">>>">@lance.batch_udf()</span> | ||
| <span data-ty="input" data-ty-prompt="...">def add_embeddings(batch):</span> | ||
| <span data-ty="input" data-ty-prompt="..."> vectors = model.encode(batch["text"])</span> | ||
| <span data-ty="input" data-ty-prompt="..."> return {"embedding": vectors}</span> | ||
|
jackye1995 marked this conversation as resolved.
|
||
| <span data-ty="input" data-ty-prompt=">>>">dataset.add_columns(add_embeddings)</span> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </section> | ||
|
|
||
| <!-- Feature 5: Rich Ecosystem Integration --> | ||
| <section class="lance-feature-section"> | ||
| <div class="container"> | ||
| <div class="lance-feature-content"> | ||
| <div class="lance-feature-text"> | ||
| <h2>Rich Ecosystem Integrations</h2> | ||
| <p> | ||
| As an open format, Lance integrates seamlessly with the Python data ecosystem and modern data platforms. | ||
| Work with your favorite tools including Pandas, Polars, and PyTorch for data processing and machine learning. | ||
| Connect with leading query engines like Apache DataFusion, DuckDB, Apache Spark, Trino, and Apache Flink/Fluss | ||
| to run SQL analytics and distributed processing on your Lance datasets. | ||
| </p> | ||
| <a href="integrations/duckdb" class="md-button">View Integrations</a> | ||
| </div> | ||
| <div class="lance-feature-demo"> | ||
| <img src="assets/images/ecosystem-integrations.png" alt="Lance Ecosystem Integrations" style="max-width: 500px; width: 100%; height: auto; border-radius: 8px;"> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </section> | ||
|
|
||
| <script | ||
| src="assets/javascript/termynal.js" | ||
| data-termynal-container="#termynal-hybrid-search|#termynal-random-access|#termynal-multimodal|#termynal-evolution"> | ||
| </script> | ||
|
|
||
| {% endblock %} | ||
|
|
||
| {% block content %}{% endblock %} | ||
| {% block footer %} | ||
| {{ super() }} | ||
| {% endblock %} | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I found this section didn't look good on mobile. Worth pulling up on a phone and trying it out.