LanceDB Documentation Edit on GitHub

What is LanceDB?

LanceDB is a multimodal lakehouse that serves two different use cases, both built on the foundation of the powerful Lance format .

Vector Search and Generative AI
LanceDB can be used as a vector database to build production-ready AI applications. Vector search is available in OSS , Cloud , and Enterprise editions.
Training, Feature Engineering and Analytics
Our enterprise-grade platform enables ML engineers and data scientists to perform large-scale training, multimodal EDA and AI model experimentation. Lakehouse features are available in OSS and Enterprise editions.

Use Cases

Vector Search and Generative AI

LanceDB is the preferred choice for developers building production-ready search and generative AI applications, including e-commerce search, recommendation systems, RAG (Retrieval-Augmented Generation), and autonomous agents.

Acting as a vector database, LanceDB natively stores vectors alongside multiple data modalities (text, images, video, audio), serving as a unified data store that eliminates the need for separate databases to manage source data.

Feature	LanceDB OSS	LanceDB Cloud	LanceDB Enterprise
Search	✅ Local	✅ Managed	✅ Managed
Storage	✅ Local Disk + AWS S3, Azure Blob, GCS	✅ Managed	✅ Managed, with Caching
SQL	✅ Local, via DuckDB, Spark, Trino	✅ Managed	✅ Managed

LanceDB OSS gives you a free and embedded vector database for self-hosted deployments. Check the full feature list.
LanceDB Cloud provides a fully managed, serverless experience with automatic indexing, scaling and other quality of life features .
LanceDB Enterprise offers a distributed & managed database with all the same benefits of LanceDB Cloud and OSS, plus additional performance and security benefits .

Training, Feature Engineering and Analytics

Our multimodal lakehouse platform empowers ML engineers and data scientists to train and fine-tune custom models on petabyte-scale multimodal datasets.

The platform serves as a unified data hub for internal search, analytics, and model experimentation workflows. Enhanced with SQL analytics, training pipelines, and feature engineering capabilities to accelerate AI development.

Feature	LanceDB OSS	LanceDB Enterprise
Search	✅ Local	✅ Managed
Storage	✅ Local Disk + AWS S3, Azure Blob, GCS	✅ Managed, with Caching
SQL	✅ Local, via DuckDB, Spark, Trino	✅ Managed
Training	✅ Local, via PyTorch	✅ Managed, via PyTorch
Feature Engineering	✅ API-only (local compute, no caching)	✅ Managed, via Geneva

LanceDB OSS provides a free, self-hosted lakehouse platform that seamlessly works with training and analytics tools.
LanceDB Enterprise delivers a managed lakehouse with distributed architecture, accelleration through caching, and custom-built training and feature engineering support. Learn about Enterprise capabilities.

Vector Search and Storage

LanceDB is used as a vector database that’s designed to store and search data of different modalities. You can use LanceDB to build fast, scalable, and intelligent applications that rely on vector search and analytics.

It is ideal for powering semantic search engines , recommendation systems , and AI-driven applications (RAG, Agents) that require real-time insights.

1. Single Source Database

The Source of Truth: Most existing vector databases only store and search embeddings and their metadata. The original data is usually stored elsewhere, so you need another database as a source of truth. LanceDB can effortlessly store both the source data and its embeddings.
Technology: It is built on top of Lance , an open-source columnar data format designed for extreme storage, performant ML workloads and fast random access.

2. Broad Multimodal Support

Multimodal: You can store vectors, metadata, raw images, videos, text, audio files and more. All modalities are stored in the Lance format, which provides automatic data versioning and blazing fast retrievals and filtering.

3. Custom Query Engine

Indexing: By combining columnar storage with cutting-edge indexing techniques, LanceDB enables efficient querying of both structured and unstructured data.
Search-at-Scale: Columnar storage for read and write performance on large scale datasets, especially vector-heavy workloads.

4. Flexible Deployment

Embedded: LanceDB OSS database is a library that runs in-process in your app, making it simple and cheap to implement on top of multiple remote storage options (such as S3).
Serverless: LanceDB Cloud is a fully managed, serverless vector database that scales automatically with your storage or search needs, eliminating infrastructure management overhead.
Managed: LanceDB Enterprise offers a dedicated, enterprise-grade deployment with advanced security, compliance features, and dedicated support for mission-critical AI applications.

Training, Feature Engineering and Analytics

1. Distributed Architecture

Scalability: Optimized for performance at scale, the Enterprise edition supports a fully managed, horizontally scalable deployment that can handle billions of rows and petabyte-scale data volumes.
Caching for Performance: A distributed NVMe cache fleet enables high IOPS and throughput—up to 5M IOPS and 10+ GB/s—while reducing API calls to cloud object stores like S3, GCS, and Azure Blob. This dramatically lowers inference and training costs.

2. Scalable Experimentation

Resilience: Feature engineering pipelines include built-in checkpointing and automatic resumption, making workloads resilient to interruptions and suitable for preemptible (spot) instances.
Distributed Processing: Python user-defined functions (UDFs) orchestrate distributed data transformations across Ray or Kubernetes clusters, allowing fast, declarative feature creation and evolution.
ML Workflow Integration: Offers fast random access, named SQL views for training, and direct integration with PyTorch/JAX data loaders to streamline ML workflows.

3. Advanced Search

Multimodal Search: Enterprise deployments include full-text, vector, and hybrid search using secondary indices such as BTree, NGram, and vector indices—all backed by the Lance format for low-latency access.

4. Enterprise Support

Enterprise Security: Supports BYOC (Bring Your Own Cloud), integrating natively with cloud provider security (IAM, audit logs, encryption) and enables private connectivity via AWS PrivateLink or GCP Private Service Connect.
Production Ready: Includes telemetry pipelines, enterprise SLAs, and control plane integration for job scheduling and observability across training, analytics, and search.

Integrations and Compatibility

LanceDB integrates seamlessly with the modern AI ecosystem, providing connectors for popular frameworks, embedding models, and development tools. Read more about LanceDB Integrations.

Popular Integrations

Category	Integrations	Documentation
AI Frameworks	LangChain, LlamaIndex, Kiln	AI Frameworks
Embedding Models	OpenAI, Cohere, Hugging Face, Custom Models	Embedding Models
Reranking Models	BGE-reranker, Cohere Rerank, Custom Models	Reranking Models
Data Platforms	DuckDB, Pandas, Polars	Data Platforms

💡 Getting Started

Create a LanceDB Cloud account to get started in minutes! Follow our guided tutorials to: