stable-worldmodel-a-high-performance-platform-for-reproducible-world-model-research
Ayush Chaurasia
Quentin Lhoest
Lucas Maes
Quentin Le Lidec
reproducible-data-curation-in-the-multimodal-lakehouse
Prashanth Rao
newsletter-may-2026
ChanChan Mao
newsletter-april-2026
ChanChan Mao
how-lancedb-accelerates-vector-search-at-10-billion-scale
Yang Cen
opensearch-vs-lancedb-for-vector-search-query-cost-and-infrastructure
Justin Miller
volcano-engine-autonomous-driving-data-lake-solution
Kejian Ju
unifying-the-av-ml-stack-lancedb
Ayush Chaurasia
lance-json-support-why-you-might-not-really-need-variant
Jack Ye
building-a-storage-format-for-the-next-era-of-biology
Pavan Ramkumar
newsletter-march-2026
ChanChan Mao
smart-parsing-meets-sharp-retrieval-combining-liteparse-and-lancedb
Clelia Astra Bertelli
Prashanth Rao
lance-format-v2-2-benchmarks-half-the-storage-none-of-the-slowdown
Xuanwo
make-your-sql-workflows-multimodal-with-lancedb-x-duckdb
Prashanth Rao
agentic-coding-as-community-stewardship
Xuanwo
what-we-mean-by-multimodal
Prashanth Rao
ai-native-development-local-continue-lancedb
Ty Dunn
lance-file-format-2-2-taming-complex-data
Xuanwo
lance-blob-v2
Xuanwo
Jack Ye
openclaw-lancedb-memory-layer
Xuanwo
Prashanth Rao
openclaw-lancedb-seed2
LanceDB
openclaw-memory-from-zero-to-lancedb-pro
Prashanth Rao
upload-lance-datasets-to-hf-hub
Prashanth Rao
zero-shot-image-classification-with-vector-search
Vipul Maheshwari
werides-data-platform-transformation-how-lancedb-fuels-model-development-velocity
Qian Zhu
Fei Chen
training-a-variational-autoencoder-from-scratch-with-the-lance-file-format
LanceDB
track-ai-trends-crewai-agents-rag
LanceDB
tokens-per-second-is-not-all-you-need
Mingran Wang
Tan Li
the-future-of-open-source-table-formats-iceberg-and-lance
Jack Ye
the-case-for-random-access-i-o
LanceDB
series-a-funding
Chang She
semanticdotart
Ayush Chaurasia
second-dinners-secret-weapon-lancedb-powered-rag-for-faster-smarter-game-development
Qian Zhu
search-within-an-image-331b54e4285e
Kaushal Choudhary
scalable-computer-vision-with-lancedb-voxel51-d8b65066d5f6
LanceDB
rethinking-table-file-paths-lance-multi-base-layout
Jack Ye
rag-isnt-one-size-fits-all
Leonard Marcq
python-package-to-convert-image-datasets-to-lance-type
Vipul Maheshwari
one-million-iops
Weston Pace
november-feature-roundup
Will Jones
newsletter-september-2025
Jasmine Wang
newsletter-october-2025
Jasmine Wang
newsletter-november-2025
ChanChan Mao
newsletter-june-2025
David Myriel
newsletter-july-2025
Jasmine Wang
newsletter-january-2026
ChanChan Mao
newsletter-february-2026
ChanChan Mao
newsletter-december-2025
ChanChan Mao
newsletter-august-2025
Jasmine Wang
my-summer-internship-experience-at-lancedb-2
Raunak Sinha
my-simd-is-faster-than-yours-fb2989bf25e7
LanceDB
multimodal-myntra-fashion-search-engine-using-lancedb
LanceDB
multimodal-lakehouse
David Myriel
multi-document-agentic-rag-a-walkthrough
Vipul Maheshwari
modified-rag-parent-document-bigger-chunk-retriever-62b3d1e79bc6
Mahesh Deshwal
memgpt-os-inspired-llms-that-manage-their-own-memory-793d6eed417e
Ayush Chaurasia
late-interaction-efficient-multi-modal-retrievers-need-more-than-just-a-vector-index
Ayush Chaurasia
lancedb-x-continue
LanceDB
lance-x-huggingface-a-new-era-of-sharing-multimodal-data
Prashanth Rao
Quentin Lhoest
Xuanwo
Ayush Chaurasia
lance-x-duckdb-sql-retrieval-on-the-multimodal-lakehouse-format
Xuanwo
lance-windows-windows-lance
Chang She
lance-v2
Weston Pace
lance-namespace-lancedb-and-ray
Jack Ye
lance-file-2-1-stable
Weston Pace
lance-file-2-1-smaller-and-simpler
Weston Pace
lance-data-viewer
Gordon Murray
lance-community-governance
Jack Ye
introducing-lance-namespace-spark-integration
Jack Ye
implementing-corrective-rag-in-the-easiest-way-2
LanceDB
hybrid-search-rag-for-real-life-production-grade-applications-e1e727b3965a
Mahesh Deshwal
hybrid-search-combining-bm25-and-semantic-search-for-better-results-with-lan-1358038fe7e6
LanceDB
hybrid-search-and-custom-reranking-with-lancedb-4c10a6a3447e
LanceDB
how-to-reduce-hallucinations-from-llm-powered-agents-using-long-term-memory-72f262c3cc1f
Tevin Wang
guide-to-use-contextual-retrieval-and-prompt-caching-with-lancedb
LanceDB
grpo-understanding-and-fine-tuning-the-next-gen-reasoning-model-2
Mahesh Deshwal
graphrag-hierarchical-approach-to-retrieval-augmented-generation
Akash Desai
gpu-accelerated-indexing-in-lancedb-27558fa7eee5
LanceDB
geo-support
Jack Ye
geneva-twelvelabs
David Myriel
geneva-feature-engineering
Jonathan Hsieh
from-bi-to-ai-lance-and-iceberg
Jack Ye
Prashanth Rao
fluss-integration
Wayne Wang
file-readers-in-depth-parallelism-without-row-groups
Weston Pace
feature-rabitq-quantization
David Myriel
Yang Cen
feature-full-text-search
David Myriel
enhance-rag-integrate-contextual-compression-and-filtering-for-precision-a29d4a810301
Kaushal Choudhary
effortlessly-loading-and-processing-images-with-lance-a-code-walkthrough
LanceDB
designing-a-table-format-for-ml-workloads
Weston Pace
custom-dataset-for-llm-training-using-lance
LanceDB
creating-a-fintech-agent
Vipul Maheshwari
convert-any-image-dataset-to-lance
LanceDB
columnar-file-readers-in-depth-structural-encoding
Weston Pace
columnar-file-readers-in-depth-repetition-definition-levels
Weston Pace
columnar-file-readers-in-depth-compression-transparency
Weston Pace
columnar-file-readers-in-depth-column-shredding
Weston Pace
columnar-file-readers-in-depth-backpressure
Weston Pace
columnar-file-readers-in-depth-apis-and-fusion
Weston Pace
chunking-techniques-with-langchain-and-llamaindex
Prashant Kumar
chunking-analysis-which-is-the-right-chunking-approach-for-your-language
Shresth Shukla
chat-with-csv-excel-using-lancedb
LanceDB
case-study-netflix
David Myriel
case-study-dosu
Qian Zhu
Michael Ludden
case-study-cognee
David Myriel
Vasilije Markovic
case-study-coderabbit
Qian Zhu
building-rag-on-codebases-part-2
Sankalp Shubham
building-rag-on-codebases-part-1
Sankalp Shubham
branching-and-shallow-clone
Jack Ye
better-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f
LanceDB
benchmarking-random-access-in-lance
Chang She
benchmarking-lancedb-92b01032874a-2
LanceDB
benchmarking-cohere-reranker-with-lancedb
LanceDB
anythingllms-competitive-edge-lancedb-for-seamless-rag-and-agent-workflows
Ayush Chaurasia
announcing-lance-sdk
Weston Pace
agentic-rag-using-langgraph-building-a-simple-customer-support-autonomous-agent
LanceDB
advanced-rag-precise-zero-shot-dense-retrieval-with-hyde-0946c54dfdcb
LanceDB
accelerate-vector-search-applications-using-openvino-lancedb
LanceDB
a-primer-on-text-chunking-and-its-types-a420efc96a13
Prashant Kumar
a-practical-guide-to-training-custom-rerankers
Ayush Chaurasia
a-practical-guide-to-fine-tuning-embedding-models
Ayush Chaurasia
keep-your-data-fresh-with-cocoindex-and-lancedb
Prashanth Rao
Linghua Jin

Manage Lance Tables in Any Catalog using Lance Namespace and Spark

August 8, 2025
Engineering

Data management in AI and analytics workflows often involves juggling multiple systems and formats.

Today, we’re excited to introduce Lance Namespace, an open specification that standardizes access to collections of Lance tables, making it easier than ever to integrate Lance with your existing data infrastructure.

What is Lance Namespace?

Lance Namespace is an open specification built on top of the storage-based Lance table and file format. It provides a standardized way for metadata services like Apache Hive MetaStore, Apache Gravitino, Unity Catalog, AWS Glue Data Catalog, and others to store and manage Lance tables. This means you can seamlessly use Lance tables alongside your existing data lakehouse infrastructure.

Why “Namespace” Instead of “Catalog”?

While the data lake world traditionally uses hierarchical structures with catalogs, databases, and tables, the ML and AI communities often prefer flatter organizational models like simple directories. Lance Namespace embraces this flexibility by providing a multi-level namespace abstraction that adapts to your data organization strategy, whether that’s a simple directory structure or a complex multi-level hierarchy.

Current Implementations and Building Your Own

Lance Namespace currently supports several implementations out of the box:

Building Custom Namespaces

You can build your own namespace implementation in two ways:

  1. REST Server: Implement the Lance REST Namespace OpenAPI specification to create a standardized server that any Lance tool can connect to
  2. Native Implementation: Build a direct implementation as a library

When deciding between building an adapter (REST server proxying to your metadata service) versus a native implementation,
consider factors like multi-language support needs, tooling compatibility, security requirements, and performance sensitivity.
See the

Integration with Apache Spark

One of the most highly requested features in the Lance community that is enabled by Lance Namespace is seamless integration with Apache Spark, with the ability to use Lance not just as a data format plugin, but as a complete Spark table catalog that users can access and manage Lance tables in Spark, run proper SQL analytics, and use Spark MLlib in the training process. Here we walk through how you can do that now with Lance Namespace.

Getting Started: A Practical Example

Let’s walk through a simple example of using Lance Namespace with Spark to manage and query Lance tables.

If you’d like to get started quickly without worrying about the setup, we’ve prepared a Docker image with everything pre-configured. Check out our Lance Spark Connector Quick Start guide to get up and running in minutes.

Step 1: Set Up Your Spark Session

First, configure Spark with the Lance Namespace catalog. Here’s an example using a directory-based namespace:

from pyspark.sql import SparkSession

# Create a Spark session with Lance catalog
spark = SparkSession.builder \
    .appName("lance-namespace-demo") \
    .config("spark.jars.packages", "com.lancedb:lance-spark-bundle-3.5_2.12:0.0.6") \
    .config("spark.sql.catalog.lance", "com.lancedb.lance.spark.LanceNamespaceSparkCatalog") \
    .config("spark.sql.catalog.lance.impl", "dir") \
    .config("spark.sql.catalog.lance.root", "/path/to/lance/data") \
    .config("spark.sql.defaultCatalog", "lance") \
    .getOrCreate()

This creates a Spark catalog lance that is configured to talk with the directory at /path/to/lance/data, and also sets it as the default catalog in the current Spark session.

Step 2: Create and Manage Tables

With the catalog configured, you can now create and manage Lance tables using familiar SQL commands:

# Create a Lance table
spark.sql("""
    CREATE TABLE embeddings (
        id BIGINT,
        text STRING,
        embedding ARRAY<FLOAT>,
        timestamp TIMESTAMP
    )
    TBLPROPERTIES (
      'embedding.arrow.fixed-size-list.size'='3'
    )
""")

# Insert data into the table
spark.sql("""
    INSERT INTO embeddings 
    VALUES 
        (1, 'Hello world', array(0.1, 0.2, 0.3), current_timestamp()),
        (2, 'Lance and Spark', array(0.4, 0.5, 0.6), current_timestamp())
""")

Notice that when the user specifies an embedding column embedding ARRAY<FLOAT>, with the table property 'embedding.arrow.fixed-size-list.size'='3', it creates a fixed-size vector column in the underlying Lance format table that is optimized for vector search performance.

Step 3: Query Your Data

Query Lance tables just like any other Spark table:

# Query using SQL
results = spark.sql("""
    SELECT id, text, size(embedding) as dim
    FROM embeddings
    WHERE id > 0
""")
results.show()

# Or use the DataFrame API
df = spark.table("embeddings")
filtered_df = df.filter(df.id > 0).select("id", "text")
filtered_df.show()

Step 4: Integration with ML Workflows

Lance’s columnar format and vector support make it ideal for ML workflows:

from pyspark.sql import functions as F

# Simulate generation of new embeddings
new_embeddings_df = spark.sql("""
    SELECT 
        3 as id,
        'Machine learning with Lance' as text,
        array(0.7, 0.8, 0.9) as embedding,
        current_timestamp() as timestamp
    UNION ALL
    SELECT 
        4 as id,
        'Vector databases are fast' as text,
        array(0.2, 0.4, 0.6) as embedding,
        current_timestamp() as timestamp
""")

# Append new embeddings to the Lance table
new_embeddings_df.writeTo("embeddings").append()

# Verify the combined dataset and compute embedding statistics
spark.sql("""
    SELECT 
        COUNT(*) as total_records,
        ROUND(AVG(aggregate(embedding, 0D, (acc, x) -> acc + x * x)), 3) as avg_l2_norm,
        ROUND(MIN(embedding[0]), 2) as min_first_dim,
        ROUND(MAX(embedding[0]), 2) as max_first_dim
    FROM embeddings
""").show()

Advanced Namespace Configurations

Here are some other configuration examples for connecting to a few Lance namespace implementations:

Directory Namespace on S3 Cloud Storage

spark = SparkSession.builder \
    .config("spark.sql.catalog.lance.impl", "dir") \
    .config("spark.sql.catalog.lance.root", "s3://bucket/lance-data") \
    .config("spark.sql.catalog.lance.storage.access_key_id", "your-key") \
    .config("spark.sql.catalog.lance.storage.secret_access_key", "your-secret") \
    .getOrCreate()

LanceDB Cloud REST Namespace

spark = SparkSession.builder \
    .config("spark.sql.catalog.lance.impl", "rest") \
    .config("spark.sql.catalog.lance.uri", "https://your-database.api.lancedb.com") \
    .config("spark.sql.catalog.lance.headers.x-api-key", "your-api-key") \
    .getOrCreate()

AWS Glue Namespace

spark = SparkSession.builder \
    .config("spark.sql.catalog.lance.impl", "glue") \
    .config("spark.sql.catalog.lance.region", "us-east-1") \
    .config("spark.sql.catalog.lance.root", "s3://your-bucket/lance") \
    .getOrCreate()

Benefits for AI and Analytics Teams

Lance Namespace with Spark integration brings several key benefits:

  1. Unified Data Management: Manage Lance tables alongside your existing data assets
  2. Flexibility: Choose the namespace backend that fits your infrastructure
  3. Performance: Leverage Lance’s table and file format with Spark’s distributed processing
  4. Simplicity: Use familiar SQL and DataFrame APIs
  5. Scalability: Handle everything from local experiments to production workloads

For more information on LanceDB’s features and capabilities, check out our comprehensive documentation.

What’s Next?

Lance Namespace is designed to be extensible and community-driven. We’re actively working on:

  • Additional namespace implementations: Unity Catalog, Apache Gravitino, and Apache Polaris work in progress
  • Enhanced vector search capabilities within Spark
  • Tighter integration with ML frameworks with features like data evolution
  • Support for more compute engines beyond Spark

If you’re interested in getting started with LanceDB or exploring our enterprise features, we have comprehensive guides available.

Thank You to Our Contributors

We’d like to extend our heartfelt thanks to the community members who have contributed to making Lance Namespace and the Spark integration a reality:

  • Bryan Keller from Netflix
  • Drew Gallardo from AWS
  • Jinglun and Vino Yang from ByteDance

Your contributions have been instrumental in making Lance Namespace a robust solution for the community.

Get Involved

Lance Namespace is open source and we welcome all kinds of contributions! Whether you’re interested in adding new namespace implementations, improving the Spark connector, building integration with more engines, or just trying it out, we’d love to hear from you.

Conclusion

Lance Namespace bridges the gap between modern AI workloads and traditional data infrastructure. By providing a standardized way to manage Lance tables and seamless integration with Apache Spark, it makes it easier than ever to build scalable AI and analytics pipelines.

Try it out today and let us know what you think! Whether you’re building a recommendation system, managing embeddings for RAG applications , or analyzing large-scale datasets, Lance Namespace and Spark provide the foundation you need for success.

Jack Ye
Data engineer and table format specialist, working on distributed systems and modern data lake architectures.

Stable-Worldmodel: A High Performance Platform for Reproducible World Model Research

Ayush Chaurasia
Quentin Lhoest
Lucas Maes
Quentin Le Lidec
June 2, 2026
stable-worldmodel-a-high-performance-platform-for-reproducible-world-model-research

🌍 Lance-Backed World Model Platform, 🦆 Multimodal SQL with Lance DuckDB Extension, 💰 LanceDB vs OpenSearch Cost Breakdown

ChanChan Mao
May 28, 2026
newsletter-may-2026

Reproducible Data Curation In The Multimodal Lakehouse

Prashanth Rao
May 29, 2026
reproducible-data-curation-in-the-multimodal-lakehouse