LanceDB Blog | AI-Native Multimodal Lakehouse

Make Handwritten Notes Searchable: Optimizing an OCR Pipeline with LanceDB

This is some text inside of a div block.

Learn how to build an OCR pipeline for handwritten medical notes using DSPy, GEPA, and LanceDB to manage images, labels, outputs, metrics, and retrieval.

Rebuilding the Data Foundation for Embodied AI with Lance: From Long Videos to Random-Access-Friendly Multimodal Samples

This is some text inside of a div block.

Lance turns long robotics videos into random-access multimodal training data with 1.7–6× faster reads and 42% lower storage use.

RaBitQ Gets Faster: Higher Recall, Lower Latency, Query-Time Control

This is some text inside of a div block.

How LanceDB's latest IVF_RQ improvements raise recall, cut p99 latency, and let teams tune vector search at query time with approx_mode.

📊 Lance vs Delta vs Iceberg, 🔗 Lance Blob V2 Late Materialization, 🤖 Stable-Worldmodel Research Platform

This is some text inside of a div block.

Benchmark Lance vs Delta vs Iceberg on S3 metadata performance, update blob rows without reading bytes, and train world models directly from object storage, plus upcoming events and enterprise and community updates.

From Messy PDFs to Verifiable Answers with LiteParse and LanceDB

This is some text inside of a div block.

Turn information-rich PDFs into a local, inspectable, searchable evidence store with LlamaIndex's LiteParse library and LanceDB.

Faster VLM Fine-Tuning With Materialized Model Features in LanceDB

This is some text inside of a div block.

How Lance format and LanceDB's Enterprise feature engineering platform make VLM fine-tuning faster by materializing expensive multimodal features once and training from those columns.

Lance Blob V2: Late Materialization for Large Binary Data in Spark

This is some text inside of a div block.

How late materialization in Lance Spark keeps large binary data as lightweight references through query plans, materializing bytes only on write.

Semantic Memory for Hermes Agent with LanceDB

This is some text inside of a div block.

Introducing a new LanceDB-backed memory plugin that gives Hermes Agent durable, semantic recall across sessions, with benchmarks and a hands-on remember/recall/forget walkthrough.

A Metadata Benchmark of Lance, Delta Lake, and Iceberg on S3

This is some text inside of a div block.

A Rust benchmark comparison of Lance, Delta Lake, and Apache Iceberg on S3 and S3 Express, and why Lance is optimized for object storage metadata.

Scalable Feature Engineering on Multimodal Datasets

This is some text inside of a div block.

How LanceDB uses the Lance format's flexible data evolution features to enable scalable feature engineering for multimodal datasets.

Stable-Worldmodel: A High Performance Platform for Reproducible World Model Research

This is some text inside of a div block.

Introducing stable-worldmodel, an open-source platform for reproducible world model research, evaluation, and benchmarking under visual and physical distribution shifts.

Reproducible Data Curation In The Multimodal Lakehouse

This is some text inside of a div block.

Learn how LanceDB turns raw multimodal data into reproducible, training-ready datasets with search, filtering, deduplication, sampling, and versioned curation workflows.

🌍 Lance-Backed World Model Platform, 🦆 Multimodal SQL with Lance DuckDB Extension, 💰 LanceDB vs OpenSearch Cost Breakdown

This is some text inside of a div block.

stable-worldmodel standardizes world model pipelines on Lance, DuckDB Lance extension adds native multimodal SQL, and LanceDB benchmarks 100M vectors at ~$779/month, plus upcoming events, enterprise updates, and community updates.

⚡Vector Search at 10B Scale, 📊 Lance Format Benchmarks, 🚗 AV Pipelines at Scale

This is some text inside of a div block.

Distributed vector search at 10B scale, more efficient storage with Lance format v2.2, and production AV pipelines simplified, plus upcoming events and community updates.

How LanceDB Accelerates Vector Search at 10 Billion Scale

This is some text inside of a div block.

How LanceDB applies distributed indexing, distributed query execution, HNSW centroid routing, and fast RaBitQ rotation to scale search to 10B vectors and beyond.

OpenSearch vs LanceDB for Vector Search: Query Cost and Infrastructure

This is some text inside of a div block.

Choosing a vector database usually comes down to a tradeoff between a full search service and an in-process library. This post showcases benchmarks that compare OpenSearch and LanceDB on the COCO 2017 images embedded with SigLIP. We measure ingestion throughput, query cost, storage layout, and overall infra cost.

Volcano Engine LAS's Lance-Based PB-Scale Autonomous Driving Data Lake Solution

This is some text inside of a div block.

How Bytedance Volcano Engine LAS (Lake for AI Service) leverages Lance as the core storage format, rapidly constructing a next-gen AI data lake to efficiently store, manage, and process multimodal data (text, images, audio/video).

Unifying the AV ML Stack: From Raw Data to Trained Model with LanceDB

This is some text inside of a div block.

A complete walkthrough of building an autonomous vehicle perception model training pipeline on top of LanceDB and the Multimodal Lakehouse.

Lance JSON Support: Why You Might Not Really Need Variant

This is some text inside of a div block.

Lance's JSONB storage, scalar indexing, data evolution, and full-text search already deliver what most users want from Variant — with explicit control, schema consistency, and no vendor lock-in.

Building A Storage Format For The Next Era of Biology

This is some text inside of a div block.

How Lance can serve as the foundation for AI on single-cell genomics atlases and a new generation for modeling in biology.

📄 Lance Blob V2, 🤗 Upload Lance Datasets to HF Hub, 🦞 LanceDB for OpenClaw's Memory

This is some text inside of a div block.

Lance Blob V2 introduces adaptive storage semantics, easily upload Lance datasets to Hugging Face Hub, and OpenClaw establishes LanceDB as a default memory layer for agents, plus community and enterprise updates.

Smart Parsing Meets Sharp Retrieval: Combining LiteParse and LanceDB

This is some text inside of a div block.

Build a structure-aware PDF QA agent with LiteParse, LanceDB, and Claude to answer complex questions over visually rich documents.

Lance Format v2.2 Benchmarks: Half the Storage, None of the Slowdown

This is some text inside of a div block.

Benchmarks that show how Lance format v2.2 cuts storage by 50%+, beats Parquet on compression, and delivers up to 68x faster blob reads — while preserving the scan and random access patterns that multimodal training depends on.

Make your SQL Workflows Multimodal With LanceDB × DuckDB

This is some text inside of a div block.

A hands-on walkthrough showing how LanceDB and DuckDB work together to query multimodal data in SQL, join against multiple tables, and materialize results back into LanceDB.

Agentic Coding as Community Stewardship

This is some text inside of a div block.

Agentic coding lets upstream maintainers turn ecosystem-wide mechanical updates into fast, contextual, low-friction contributions for downstream users.

One System, Many Workloads: Rethinking What "Multimodal" Means for AI

This is some text inside of a div block.

A practical definition of multimodal complexity, and how LanceDB’s Multimodal Lakehouse is built to address these challenges.

The Future of AI-Native Development is Local: Inside Continue's LanceDB-Powered Evolution

This is some text inside of a div block.

Discover how Continue revolutionized AI-native development with LanceDB's embedded TypeScript library, enabling lightning-fast semantic code search while maintaining complete developer privacy and offline capability.

Lance File Format 2.2: Taming Complex Data

This is some text inside of a div block.

Lance file format 2.2 introduces Blob V2, nested schema evolution, native Map type support, and additional compression and performance improvements for AI/ML data workloads.

Lance Blob V2: Making Multimodal Data a First-Class Citizen in the Lakehouse

This is some text inside of a div block.

How we redesigned blob storage in Lance to make multimodal data a first-class citizen, with four storage semantics (Inline, Packed, Dedicated, External) that automatically adapt to your workload.

Why LanceDB Is the Most Natural Memory Layer for OpenClaw

This is some text inside of a div block.

OpenClaw and similar personal autonomous agents need a local-first long-term memory layer. LanceDB fits that role with embedded deployment, filesystem-native storage, and multimodal retrieval.

OpenClaw + LanceDB + Seed 2.0: Turn Visual Ideas into Reality, Fast!

This is some text inside of a div block.

OpenClaw and similar personal autonomous agents need a local-first long-term memory layer. LanceDB fits that role with embedded deployment, filesystem-native storage, and multimodal retrieval.

Memory for OpenClaw: From Zero to LanceDB Pro

This is some text inside of a div block.

Benchmarking three OpenClaw memory plugins on the LOCOMO dataset

A Guide to Uploading Lance Datasets on the Hugging Face Hub

This is some text inside of a div block.

Build a multimodal Lance dataset, publish it to the Hub, and query the precomputed vector + FTS indexes in LanceDB, without needing to download the dataset locally.

Zero Shot Image Classification with Vector Search

This is some text inside of a div block.

Get about zero shot image classification with vector search. Get practical steps, examples, and best practices you can use now.

WeRide's Data Platform Transformation: How LanceDB Fuels Model Development Velocity

This is some text inside of a div block.

Discover how WeRide, a leading autonomous driving company, leveraged LanceDB to revolutionize their data platform, achieving 90x improvement in ML developer productivity and reducing data mining time from 1 week to 1 hour.

Training a Variational AutoEncoder from Scratch with Lance File Format

This is some text inside of a div block.

Train a Variational Autoencoder end‑to‑end using Lance for fast, scalable data handling. You’ll set up the dataset, build the VAE in PyTorch, and run training, sampling, and reconstructions.

Track AI Trends: CrewAI Agents & RAG

This is some text inside of a div block.

This article will teach us how to make an AI Trends Searcher using CrewAI Agents and their Tasks. But before diving into that, let's first understand what CrewAI is and how we can use it for these applications.

Tokens per Second Is NOT All You Need

This is some text inside of a div block.

Explore about tokens per second is not all you need. Get practical steps, examples, and best practices you can use now.

The Future of Open Source Table Formats: Apache Iceberg and Lance

This is some text inside of a div block.

Explore the future of open source table formats: apache iceberg and lance with practical insights and expert guidance from the LanceDB team.

The Case for Random Access I/O

This is some text inside of a div block.

One of the reasons we started the Lance file format and have been investigating new encodings is because we wanted a format with better support for random access.

LanceDB Raises $30M Series A to Build the Multimodal Lakehouse

This is some text inside of a div block.

We have closed another funding round to accelerate development of the Multimodal Lakehouse - a unified platform for AI data infrastructure.

SemanticDotArt: Rethinking Art Discovery with LanceDB

This is some text inside of a div block.

SemanticDotArt turns art discovery into a multimodal search experience, matching feelings, phrases, and images with LanceDB's hybrid retrieval.

Second Dinner's Secret Weapon: LanceDB-Powered RAG for Faster, Smarter Game Development

This is some text inside of a div block.

Discover how Second Dinner, creators of Marvel Snap, leveraged LanceDB Cloud to transform game development workflows, reducing prototyping time from months to hours and automating QA test generation with 81% better results.

Search Within an Image with Segment Anything

This is some text inside of a div block.

Get about search within an image with segment anything. Get practical steps, examples, and best practices you can use now.

Scalable Computer Vision with LanceDB & Voxel51

This is some text inside of a div block.

Explore about scalable computer vision with lancedb & voxel51. Get practical steps, examples, and best practices you can use now.

Rethinking Table File Paths with Uber: Lance’s Multi-Base Layout

This is some text inside of a div block.

A tour of Lance's file path design, and how Lance’s new multi-base layout enables multi-location datasets (such as Uber’s multi-bucket setup) with minimal metadata rewrites.

RAG Isn't One-Size-Fits-All: Here's How to Tune It for Your Use Case

This is some text inside of a div block.

Great RAG comes from a tight iteration loop. Learn how to systematically improve each layer of your RAG system using Kiln and LanceDB.

Python Package to convert image datasets to Lance type

This is some text inside of a div block.

Explore python package to convert image datasets to lance type with practical insights and expert guidance from the LanceDB team.

The Quest for One Million IOPS: Benchmarking Storage at LanceDB

This is some text inside of a div block.

Learn how LanceDB benchmarks storage and how we achieved one million disk reads per second.

November Feature Roundup

This is some text inside of a div block.

Explore November feature roundup with practical insights and expert guidance from the LanceDB team.

🛡️ Newly Knighted Lancelot, ▶️ TwelveLabs Semantic Video Recommendations, 🧠 Cognee's AI Memory Layer with LanceDB

This is some text inside of a div block.

Our September newsletter welcomes new Lancelot members, highlights TwelveLabs semantic video recommendations, Cognee’s AI memory layer, and shares the latest product and community updates.

🎨 Semantic.Art, 💾 Stable Lance 2.1, 🎥 Ray+LanceDB powers Netflix

This is some text inside of a div block.

Our October newsletter highlights Semantic.Art, Lance File 2.1, RaBitQ Quantization, upcoming events, latest product and community updates.

🛡️Lance Community Governance, Lance + Iceberg 🧊, Netflix’s Multimodal Search Demo 🔍

This is some text inside of a div block.

Our November newsletter highlights Lance community governance, a deep dive on Lance and Iceberg, a demo of Netflix's multimodal search, previous talk recordings, and the latest product and community updates.

June 2025: $30M Series A, Multimodal Lakehouse Launch & Product Updates

This is some text inside of a div block.

LanceDB's June 2025 newsletter covering latest company news, product updates, open source releases, and community highlights.

⚖️ Harvey’s Enterprise-Grade RAG on LanceDB, 💼 Dosu Case Study, Minimax&LumaLabs❤️Lance-Ray

This is some text inside of a div block.

Our August newsletter features a new case study with Dosu, recaps from events with Harvey and Databricks, and the latest product and community updates.

🦆 Lance x DuckDB SQL Retrieval, 🚗 Uber-Scale Storage, ⚡ 1.5M IOPS

This is some text inside of a div block.

Kicking off 2026 with Lance-native SQL retrieval via DuckDB, Uber-scale multi-bucket storage, 1.5M IOPS benchmarks, and continued OSS momentum across the Lance ecosystem.

🤗 Lance x Hugging Face, 🪾Git-Style Branching, 🏔️ Geospatial in Lance

This is some text inside of a div block.

Native Lance support on Hugging Face Hub, Git-style branching and shallow clone for AI data, and Arrow-native geospatial with R-Tree indexing, plus steady OSS and community momentum.

💾 Lance SDK v1.0.0, 🗓️ 1st Lance Community Sync, 🔍 Wikisearch

This is some text inside of a div block.

Our December newsletter highlights Lance SDK v1.0.0, our upcoming Lance community sync, Wikisearch demo, and the latest product and community updates.

Netflix’s Media Data Lake ❤️ LanceDB, CodeRabbit 💼 Case Study, Lance Namespace

This is some text inside of a div block.

Our August newsletter highlights LanceDB powering Netflix's Media Data Lake, a case study on CodeRabbit's AI-powered code reviews, and updates on Lance Namespace and Spark integration.

My Summer Internship Experience at LanceDB

This is some text inside of a div block.

I'm Raunak, a master's student at the University of Illinois, Urbana-Champaign. This summer, I had the opportunity to intern as a Software Engineer at LanceDB, an early-stage startup based in San Francisco.

My SIMD Is Faster than Yours

This is some text inside of a div block.

An untold story about how we make LanceDB vector search fast. Get practical steps and examples from 'My SIMD is faster than Yours'.

Multimodal Myntra Fashion Search Engine Using LanceDB

This is some text inside of a div block.

Build a multimodal fashion search engine with LanceDB and CLIP embeddings. Follow a step‑by‑step workflow to register embeddings, create the table, query by text or image, and ship a Streamlit UI.

What is the LanceDB Multimodal Lakehouse?

This is some text inside of a div block.

Introducing the Multimodal Lakehouse - a unified platform for managing AI data from raw files to production-ready features, now part of LanceDB Enterprise.

Multi Document Agentic RAG: a Walkthrough

This is some text inside of a div block.

Unlock about multi document agentic rag: a walkthrough. Get practical steps, examples, and best practices you can use now.

Modified RAG: Parent Document & Bigger Chunk Retriever

This is some text inside of a div block.

Get about modified rag: parent document & bigger chunk retriever. Get practical steps, examples, and best practices you can use now.

MemGPT: OS Inspired LLMs That Manage Their Own Memory

This is some text inside of a div block.

Explore about memgpt: os inspired llms that manage their own memory. Get practical steps, examples, and best practices you can use now.

Late Interaction & Efficient Multi-modal Retrievers Need More Than a Vector Index

This is some text inside of a div block.

Explore late interaction & efficient multi-modal retrievers need more than a vector index with practical insights and expert guidance from the LanceDB team.

Developers, Ditch the Black Box: Welcome to Continue

This is some text inside of a div block.

Remember flipping through coding manuals? Those quickly became relics with the rise of Google and Stack Overflow, a one-stop shop for developer queries.

Lance × Hugging Face: A New Era of Sharing Multimodal Data on the Hub

This is some text inside of a div block.

Announcing native read support for Lance format on Hugging Face Hub. You can now distribute your large multimodal datasets as a single, searchable artifact (including blobs, embeddings and indexes) all in one place!

Lance × DuckDB: SQL for Retrieval on the Multimodal Lakehouse Format

This is some text inside of a div block.

Use the Lance format as your lakehouse layer for retrieval, RAG and more, with the native Lance extension for DuckDB

Lance, Windows. Windows, Lance

This is some text inside of a div block.

It was Spring of 2012. After being an avid user for 2+ years, I finally decided to join Wes Mckinney and work on pandas full time.

Lance v2: A New Columnar Container Format

This is some text inside of a div block.

Explore lance v2: a new columnar container format with practical insights and expert guidance from the LanceDB team.

Productionalize AI Workloads with Lance Namespace, LanceDB, and Ray

This is some text inside of a div block.

Learn how to productionalize AI workloads with Lance Namespace's enterprise stack integration and the scalability of LanceDB and Ray for end-to-end ML pipelines.

Lance File 2.1 is Now Stable

This is some text inside of a div block.

The 2.1 file version is now stable, learn what that means for you and what's coming next.

Lance File 2.1: Smaller and Simpler

This is some text inside of a div block.

Explore lance file 2.1: smaller and simpler with practical insights and expert guidance from the LanceDB team.

Introducing Lance Data Viewer: A Simple Way to Explore Lance Tables

This is some text inside of a div block.

A lightweight open source web UI for exploring Lance datasets, viewing schemas, and browsing table data with vector visualization support.

Building the Future Together: Introducing Lance Community Governance

This is some text inside of a div block.

Announcing the formal governance structure for the Lance community, establishing clear pathways for contribution and leadership with a three-tier system of contributors, maintainers and PMC.

Manage Lance Tables in Any Catalog using Lance Namespace and Spark

This is some text inside of a div block.

Access and manage your Lance tables in Hive, Glue, Unity Catalog, or any catalog service using Lance Namespace with the latest Lance Spark connector.

Implementing Corrective RAG in the Easiest Way

This is some text inside of a div block.

Even though text-generation models are good at generating content, they sometimes need to improve in returning facts. This happens because of the way they are trained.

Hybrid Search: RAG for Real-Life Production-Grade Applications

This is some text inside of a div block.

Get about hybrid search: rag for real-life production-grade applications. Get practical steps, examples, and best practices you can use now.

Hybrid Search: Combining BM25 and Semantic Search for Better Results with Langchain

This is some text inside of a div block.

Have you ever thought about how search engines find exactly what you're looking for? They usually use a mix of matching specific words and understanding the meaning behind them.

Hybrid Search and Custom Reranking with LanceDB

This is some text inside of a div block.

Combine keyword and vector search for higher‑quality results with LanceDB. This post shows how to run hybrid search and compare rerankers (linear combination, Cohere, ColBERT) with code and benchmarks.

Reduce Hallucinations from LLM-Powered Agents Using Long-Term Memory

This is some text inside of a div block.

Understand about reduce hallucinations from llm-powered agents using long-term memory. Get practical steps, examples, and best practices you can use now.

Implement Contextual Retrieval and Prompt Caching with LanceDB

This is some text inside of a div block.

Unlock about implement contextual retrieval and prompt caching with lancedb. Get practical steps, examples, and best practices you can use now.

RAG with GRPO Fine-Tuned Reasoning Model

This is some text inside of a div block.

Explore rag with grpo fine-tuned reasoning model with practical insights and expert guidance from the LanceDB team.

GraphRAG: Hierarchical Approach to Retrieval-Augmented Generation

This is some text inside of a div block.

Explore GraphRAG: hierarchical approach to retrieval-augmented-generation with practical insights and expert guidance from the LanceDB team.

GPU-Accelerated Indexing in LanceDB

This is some text inside of a div block.

Speed up vector index training in LanceDB with CUDA or Apple Silicon (MPS). See how GPU‑accelerated IVF/PQ training compares to CPU and how to enable it in code.

How We Added Geospatial Support To Lance With No New Code

This is some text inside of a div block.

How Lance's Arrow-native architecture enables first-class geospatial support through extension types, GeoDataFusion integration, and R-Tree indexing.

Building Semantic Video Recommendations with TwelveLabs and LanceDB

This is some text inside of a div block.

Build semantic video recommendations using TwelveLabs embeddings, LanceDB storage, and Geneva pipelines with Ray.

LanceDB's Geneva: Scalable Feature Engineering

This is some text inside of a div block.

Learn how to build scalable feature engineering pipelines with Geneva and LanceDB. This demo transforms image data into rich features including captions, embeddings, and metadata using distributed Ray clusters.

From BI to AI: A Modern Lakehouse Stack with Lance and Iceberg

This is some text inside of a div block.

A comparison of where Iceberg and Lance sit in the modern lakehouse stack. We highlight emerging architectures that are bridging the worlds of analytics and AI/ML workloads using these two formats, while being built on the same data foundation.

Setup Real-Time Multimodal AI Analytics with Apache Fluss (incubating) and Lance

This is some text inside of a div block.

Learn how to build real-time multimodal AI analytics by integrating Apache Fluss streaming storage with Lance's AI-optimized lakehouse. This guide demonstrates streaming multimodal data processing for RAG systems and ML workflows.

Columnar File Readers in Depth: Parallelism without Row Groups

This is some text inside of a div block.

Explore columnar file readers in depth: column shredding with practical insights and expert guidance from the LanceDB team.

LanceDB's RaBitQ Quantization for Blazing Fast Vector Search

This is some text inside of a div block.

Introducing RaBitQ quantization in LanceDB for higher compression, faster indexing, and better recall on high‑dimensional embeddings.

LanceDB WikiSearch: Native Full-Text Search on 41M Wikipedia Docs

This is some text inside of a div block.

No more Tantivy! We stress-tested native full-text search in our latest massive-scale search demo. Let's break down how it works and what we did to scale it.

Efficient RAG with Compression and Filtering

This is some text inside of a div block.

Discover about efficient rag with compression and filtering. Get practical steps, examples, and best practices you can use now.

Effortlessly Loading and Processing Images with Lance: a Code Walkthrough

This is some text inside of a div block.

Working with large image datasets in machine learning can be challenging, often requiring significant computational resources and efficient data-handling techniques.

Designing a Table Format for ML Workloads

This is some text inside of a div block.

Explore designing a table format for ML workloads with practical insights and expert guidance from the LanceDB team.

Custom Datasets for Efficient LLM Training Using Lance

This is some text inside of a div block.

See about custom datasets for efficient llm training using lance. Get practical steps, examples, and best practices you can use now.

Creating a FinTech AI Agent From Scratch

This is some text inside of a div block.

Explore fintech ai agent from scratch with practical insights and expert guidance from the LanceDB team.

Rebuilding the Data Foundation for Embodied AI with Lance: From Long Videos to Random-Access-Friendly Multimodal Samples

Make Handwritten Notes Searchable: Optimizing an OCR Pipeline with LanceDB

RaBitQ Gets Faster: Higher Recall, Lower Latency, Query-Time Control

Implement Contextual Retrieval and Prompt Caching with LanceDB

Late Interaction & Efficient Multi-modal Retrievers Need More Than a Vector Index

Training a Variational AutoEncoder from Scratch with Lance File Format

Multi Document Agentic RAG: a Walkthrough

The Case for Random Access I/O

My Summer Internship Experience at LanceDB

Zero Shot Image Classification with Vector Search

Chat with Your Stats Using Langchain Dataframe Agent & LanceDB Hybrid Search

Columnar File Readers in Depth: APIs and Fusion

Developers, Ditch the Black Box: Welcome to Continue

Columnar File Readers in Depth: Parallelism without Row Groups

Benchmarking Cohere Rerankers with LanceDB