Privacy Settings
Privacy Policy
We use cookies and other technologies across categories below. Toggle any to accept or reject related data collection.
Required
Essential
Necessary for basic website functionality.
Necessary for basic website functionality.
Marketing
Helps deliver targeted ads that are relevant to you.
Helps deliver targeted ads that are relevant to you.
Analytics
Provides insights on your usage and site interaction.
Provides insights on your usage and site interaction.
Personalization
Tailors your experience by remembering your settings.
Tailors your experience by remembering your settings.
Existing Privacy Settings Found
Submitted On
Mar 26, 2025 – 14:37:22 UTC
Your Consent ID
This ID lets you retrieve or verify your consent settings. It stays saved for {0} months or until you clear cookies — keep a copy in case you need to reference it.
AMC89MQ329407NSADRFA09MA7SD
Required
Essential Consent
Off
On
Marketing Consent
Off
On
Analytics Consent
Off
On
Personalization Consent
LanceDB uses cookies and similar technologies to improve your experience, analyze traffic, and to show you relevant content and advertising.
  • You can accept all, reject all, or customize your privacy settings.
  • Non-essential cookies are disabled by default.
  • Closing this banner does not confirm any choice.
  • See our Privacy Policy for more information.
LanceDB uses cookies and similar technologies to improve your experience, analyze traffic, and to show you relevant content and advertising.
Do not sell or share my personal information
See Privacy Policy for more info.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
LanceDB uses cookies and similar technologies to improve your experience, analyze traffic, and to show you relevant content and advertising.
Why Multimodal Data Needs a Better Lakehouse? — Download the Research Study
CustomersDocsBlog
Social media
Sign up
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Searching in Blog Posts
Agentic Coding as Community Stewardship
Agentic coding lets upstream maintainers turn ecosystem-wide mechanical updates into fast, contextual, low-friction contributions for downstream users.
One System, Many Workloads: Rethinking What "Multimodal" Means for AI
A practical definition of multimodal complexity, and how LanceDB’s Multimodal Lakehouse is built to address these challenges.
The Future of AI-Native Development is Local: Inside Continue's LanceDB-Powered Evolution
Discover how Continue revolutionized AI-native development with LanceDB's embedded TypeScript library, enabling lightning-fast semantic code search while maintaining complete developer privacy and offline capability.
Lance File Format 2.2: Taming Complex Data
Lance file format 2.2 introduces Blob V2, nested schema evolution, native Map type support, and additional compression and performance improvements for AI/ML data workloads.
Lance Blob V2: Making Multimodal Data a First-Class Citizen in the Lakehouse
How we redesigned blob storage in Lance to make multimodal data a first-class citizen, with four storage semantics (Inline, Packed, Dedicated, External) that automatically adapt to your workload.
Why LanceDB Is the Most Natural Memory Layer for OpenClaw
OpenClaw and similar personal autonomous agents need a local-first long-term memory layer. LanceDB fits that role with embedded deployment, filesystem-native storage, and multimodal retrieval.
OpenClaw + LanceDB + Seed 2.0: Turn Visual Ideas into Reality, Fast!
OpenClaw and similar personal autonomous agents need a local-first long-term memory layer. LanceDB fits that role with embedded deployment, filesystem-native storage, and multimodal retrieval.
Memory for OpenClaw: From Zero to LanceDB Pro
Benchmarking three OpenClaw memory plugins on the LOCOMO dataset
A Guide to Uploading Lance Datasets on the Hugging Face Hub
Build a multimodal Lance dataset, publish it to the Hub, and query the precomputed vector + FTS indexes in LanceDB, without needing to download the dataset locally.
Zero Shot Image Classification with Vector Search
Get about zero shot image classification with vector search. Get practical steps, examples, and best practices you can use now.
WeRide's Data Platform Transformation: How LanceDB Fuels Model Development Velocity
Discover how WeRide, a leading autonomous driving company, leveraged LanceDB to revolutionize their data platform, achieving 90x improvement in ML developer productivity and reducing data mining time from 1 week to 1 hour.
Training a Variational AutoEncoder from Scratch with Lance File Format
Train a Variational Autoencoder end‑to‑end using Lance for fast, scalable data handling. You’ll set up the dataset, build the VAE in PyTorch, and run training, sampling, and reconstructions.
Track AI Trends: CrewAI Agents & RAG
This article will teach us how to make an AI Trends Searcher using CrewAI Agents and their Tasks. But before diving into that, let's first understand what CrewAI is and how we can use it for these applications.
Tokens per Second Is NOT All You Need
Explore about tokens per second is not all you need. Get practical steps, examples, and best practices you can use now.
The Future of Open Source Table Formats: Apache Iceberg and Lance
Explore the future of open source table formats: apache iceberg and lance with practical insights and expert guidance from the LanceDB team.
The Case for Random Access I/O
One of the reasons we started the Lance file format and have been investigating new encodings is because we wanted a format with better support for random access.
LanceDB Raises $30M Series A to Build the Multimodal Lakehouse
We have closed another funding round to accelerate development of the Multimodal Lakehouse - a unified platform for AI data infrastructure.
SemanticDotArt: Rethinking Art Discovery with LanceDB
SemanticDotArt turns art discovery into a multimodal search experience, matching feelings, phrases, and images with LanceDB's hybrid retrieval.
Second Dinner's Secret Weapon: LanceDB-Powered RAG for Faster, Smarter Game Development
Discover how Second Dinner, creators of Marvel Snap, leveraged LanceDB Cloud to transform game development workflows, reducing prototyping time from months to hours and automating QA test generation with 81% better results.
Search Within an Image with Segment Anything
Get about search within an image with segment anything. Get practical steps, examples, and best practices you can use now.
Scalable Computer Vision with LanceDB & Voxel51
Explore about scalable computer vision with lancedb & voxel51. Get practical steps, examples, and best practices you can use now.
Rethinking Table File Paths with Uber: Lance’s Multi-Base Layout
A tour of Lance's file path design, and how Lance’s new multi-base layout enables multi-location datasets (such as Uber’s multi-bucket setup) with minimal metadata rewrites.
RAG Isn't One-Size-Fits-All: Here's How to Tune It for Your Use Case
Great RAG comes from a tight iteration loop. Learn how to systematically improve each layer of your RAG system using Kiln and LanceDB.
Python Package to convert image datasets to Lance type
Explore python package to convert image datasets to lance type with practical insights and expert guidance from the LanceDB team.
The Quest for One Million IOPS: Benchmarking Storage at LanceDB
Learn how LanceDB benchmarks storage and how we achieved one million disk reads per second.
November Feature Roundup
Explore November feature roundup with practical insights and expert guidance from the LanceDB team.
🛡️ Newly Knighted Lancelot, ▶️ TwelveLabs Semantic Video Recommendations, 🧠 Cognee's AI Memory Layer with LanceDB
Our September newsletter welcomes new Lancelot members, highlights TwelveLabs semantic video recommendations, Cognee’s AI memory layer, and shares the latest product and community updates.
🎨 Semantic.Art, 💾 Stable Lance 2.1, 🎥 Ray+LanceDB powers Netflix
Our October newsletter highlights Semantic.Art, Lance File 2.1, RaBitQ Quantization, upcoming events, latest product and community updates.
🛡️Lance Community Governance, Lance + Iceberg 🧊, Netflix’s Multimodal Search Demo 🔍
Our November newsletter highlights Lance community governance, a deep dive on Lance and Iceberg, a demo of Netflix's multimodal search, previous talk recordings, and the latest product and community updates.
June 2025: $30M Series A, Multimodal Lakehouse Launch & Product Updates
LanceDB's June 2025 newsletter covering latest company news, product updates, open source releases, and community highlights.
⚖️ Harvey’s Enterprise-Grade RAG on LanceDB, 💼 Dosu Case Study, Minimax&LumaLabs❤️Lance-Ray
Our August newsletter features a new case study with Dosu, recaps from events with Harvey and Databricks, and the latest product and community updates.
🦆 Lance x DuckDB SQL Retrieval, 🚗 Uber-Scale Storage, ⚡ 1.5M IOPS
Kicking off 2026 with Lance-native SQL retrieval via DuckDB, Uber-scale multi-bucket storage, 1.5M IOPS benchmarks, and continued OSS momentum across the Lance ecosystem.
🤗 Lance x Hugging Face, 🪾Git-Style Branching, 🏔️ Geospatial in Lance
Native Lance support on Hugging Face Hub, Git-style branching and shallow clone for AI data, and Arrow-native geospatial with R-Tree indexing, plus steady OSS and community momentum.
💾 Lance SDK v1.0.0, 🗓️ 1st Lance Community Sync, 🔍 Wikisearch
Our December newsletter highlights Lance SDK v1.0.0, our upcoming Lance community sync, Wikisearch demo, and the latest product and community updates.
Netflix’s Media Data Lake ❤️ LanceDB, CodeRabbit 💼 Case Study, Lance Namespace
Our August newsletter highlights LanceDB powering Netflix's Media Data Lake, a case study on CodeRabbit's AI-powered code reviews, and updates on Lance Namespace and Spark integration.
My Summer Internship Experience at LanceDB
I'm Raunak, a master's student at the University of Illinois, Urbana-Champaign. This summer, I had the opportunity to intern as a Software Engineer at LanceDB, an early-stage startup based in San Francisco.
My SIMD Is Faster than Yours
An untold story about how we make LanceDB vector search fast. Get practical steps and examples from 'My SIMD is faster than Yours'.
Multimodal Myntra Fashion Search Engine Using LanceDB
Build a multimodal fashion search engine with LanceDB and CLIP embeddings. Follow a step‑by‑step workflow to register embeddings, create the table, query by text or image, and ship a Streamlit UI.
What is the LanceDB Multimodal Lakehouse?
Introducing the Multimodal Lakehouse - a unified platform for managing AI data from raw files to production-ready features, now part of LanceDB Enterprise.
Multi Document Agentic RAG: a Walkthrough
Unlock about multi document agentic rag: a walkthrough. Get practical steps, examples, and best practices you can use now.
Modified RAG: Parent Document & Bigger Chunk Retriever
Get about modified rag: parent document & bigger chunk retriever. Get practical steps, examples, and best practices you can use now.
MemGPT: OS Inspired LLMs That Manage Their Own Memory
Explore about memgpt: os inspired llms that manage their own memory. Get practical steps, examples, and best practices you can use now.
Late Interaction & Efficient Multi-modal Retrievers Need More Than a Vector Index
Explore late interaction & efficient multi-modal retrievers need more than a vector index with practical insights and expert guidance from the LanceDB team.
Developers, Ditch the Black Box: Welcome to Continue
Remember flipping through coding manuals? Those quickly became relics with the rise of Google and Stack Overflow, a one-stop shop for developer queries.
Lance × Hugging Face: A New Era of Sharing Multimodal Data on the Hub
Announcing native read support for Lance format on Hugging Face Hub. You can now distribute your large multimodal datasets as a single, searchable artifact (including blobs, embeddings and indexes) all in one place!
Lance × DuckDB: SQL for Retrieval on the Multimodal Lakehouse Format
Use the Lance format as your lakehouse layer for retrieval, RAG and more, with the native Lance extension for DuckDB
Lance, Windows. Windows, Lance
It was Spring of 2012. After being an avid user for 2+ years, I finally decided to join Wes Mckinney and work on pandas full time.
Lance v2: A New Columnar Container Format
Explore lance v2: a new columnar container format with practical insights and expert guidance from the LanceDB team.
Productionalize AI Workloads with Lance Namespace, LanceDB, and Ray
Learn how to productionalize AI workloads with Lance Namespace's enterprise stack integration and the scalability of LanceDB and Ray for end-to-end ML pipelines.
Lance File 2.1 is Now Stable
The 2.1 file version is now stable, learn what that means for you and what's coming next.
Lance File 2.1: Smaller and Simpler
Explore lance file 2.1: smaller and simpler with practical insights and expert guidance from the LanceDB team.
Introducing Lance Data Viewer: A Simple Way to Explore Lance Tables
A lightweight open source web UI for exploring Lance datasets, viewing schemas, and browsing table data with vector visualization support.
Building the Future Together: Introducing Lance Community Governance
Announcing the formal governance structure for the Lance community, establishing clear pathways for contribution and leadership with a three-tier system of contributors, maintainers and PMC.
Manage Lance Tables in Any Catalog using Lance Namespace and Spark
Access and manage your Lance tables in Hive, Glue, Unity Catalog, or any catalog service using Lance Namespace with the latest Lance Spark connector.
Implementing Corrective RAG in the Easiest Way
Even though text-generation models are good at generating content, they sometimes need to improve in returning facts. This happens because of the way they are trained.
Hybrid Search: RAG for Real-Life Production-Grade Applications
Get about hybrid search: rag for real-life production-grade applications. Get practical steps, examples, and best practices you can use now.
Hybrid Search: Combining BM25 and Semantic Search for Better Results with Langchain
Have you ever thought about how search engines find exactly what you're looking for? They usually use a mix of matching specific words and understanding the meaning behind them.
Hybrid Search and Custom Reranking with LanceDB
Combine keyword and vector search for higher‑quality results with LanceDB. This post shows how to run hybrid search and compare rerankers (linear combination, Cohere, ColBERT) with code and benchmarks.
Reduce Hallucinations from LLM-Powered Agents Using Long-Term Memory
Understand about reduce hallucinations from llm-powered agents using long-term memory. Get practical steps, examples, and best practices you can use now.
Implement Contextual Retrieval and Prompt Caching with LanceDB
Unlock about implement contextual retrieval and prompt caching with lancedb. Get practical steps, examples, and best practices you can use now.
RAG with GRPO Fine-Tuned Reasoning Model
Explore rag with grpo fine-tuned reasoning model with practical insights and expert guidance from the LanceDB team.
GraphRAG: Hierarchical Approach to Retrieval-Augmented Generation
Explore GraphRAG: hierarchical approach to retrieval-augmented-generation with practical insights and expert guidance from the LanceDB team.
GPU-Accelerated Indexing in LanceDB
Speed up vector index training in LanceDB with CUDA or Apple Silicon (MPS). See how GPU‑accelerated IVF/PQ training compares to CPU and how to enable it in code.
How We Added Geospatial Support To Lance With No New Code
How Lance's Arrow-native architecture enables first-class geospatial support through extension types, GeoDataFusion integration, and R-Tree indexing.
Building Semantic Video Recommendations with TwelveLabs and LanceDB
Build semantic video recommendations using TwelveLabs embeddings, LanceDB storage, and Geneva pipelines with Ray.
LanceDB's Geneva: Scalable Feature Engineering
Learn how to build scalable feature engineering pipelines with Geneva and LanceDB. This demo transforms image data into rich features including captions, embeddings, and metadata using distributed Ray clusters.
From BI to AI: A Modern Lakehouse Stack with Lance and Iceberg
A comparison of where Iceberg and Lance sit in the modern lakehouse stack. We highlight emerging architectures that are bridging the worlds of analytics and AI/ML workloads using these two formats, while being built on the same data foundation.
Setup Real-Time Multimodal AI Analytics with Apache Fluss (incubating) and Lance
Learn how to build real-time multimodal AI analytics by integrating Apache Fluss streaming storage with Lance's AI-optimized lakehouse. This guide demonstrates streaming multimodal data processing for RAG systems and ML workflows.
Columnar File Readers in Depth: Parallelism without Row Groups
Explore columnar file readers in depth: column shredding with practical insights and expert guidance from the LanceDB team.
LanceDB's RaBitQ Quantization for Blazing Fast Vector Search
Introducing RaBitQ quantization in LanceDB for higher compression, faster indexing, and better recall on high‑dimensional embeddings.
LanceDB WikiSearch: Native Full-Text Search on 41M Wikipedia Docs
No more Tantivy! We stress-tested native full-text search in our latest massive-scale search demo. Let's break down how it works and what we did to scale it.
Efficient RAG with Compression and Filtering
Discover about efficient rag with compression and filtering. Get practical steps, examples, and best practices you can use now.
Effortlessly Loading and Processing Images with Lance: a Code Walkthrough
Working with large image datasets in machine learning can be challenging, often requiring significant computational resources and efficient data-handling techniques.
Designing a Table Format for ML Workloads
Explore designing a table format for ML workloads with practical insights and expert guidance from the LanceDB team.
Custom Datasets for Efficient LLM Training Using Lance
See about custom datasets for efficient llm training using lance. Get practical steps, examples, and best practices you can use now.
Creating a FinTech AI Agent From Scratch
Explore fintech ai agent from scratch with practical insights and expert guidance from the LanceDB team.
Convert Any Image Dataset to Lance
In our article, we explored the remarkable capabilities of the Lance format, a modern, columnar data storage solution designed to revolutionize the way we work with large image datasets in machine learning.
Columnar File Readers in Depth: Structural Encoding
Deep dive into LanceDB's dual structural encoding approach - mini-block for small data types and full-zip for large multimodal data. Learn how this optimizes compression and random access performance compared to Parquet.
Columnar File Readers in Depth: Repetition & Definition Levels
Explore columnar file readers in depth: repetition & definition levels with practical insights and expert guidance from the LanceDB team.
Columnar File Readers in Depth: Compression Transparency
Explore columnar file readers in depth: compression transparency with practical insights and expert guidance from the LanceDB team.
Columnar File Readers in Depth: Column Shredding
Explore columnar file readers in depth: column shredding with practical insights and expert guidance from the LanceDB team.
Columnar File Readers in Depth: Backpressure
Streaming data applications can be tricky. When you can read data faster than you can process the data then bad things tend to happen. The various solutions to this problem are largely classified as backpressure.
Columnar File Readers in Depth: APIs and Fusion
The API used to read files has evolved over time, from simple full table reads to batch reads and eventually to iterative record batch readers. Lance takes this a step further to return a stream of read tasks.
Chunking Techniques with Langchain and LlamaIndex
In our last blog, we talked about chunking and why it is necessary for processing data through LLMs. We covered some simple techniques to perform text chunking.
Chunking Analysis: Which is the right chunking approach for your language?
Explore chunking analysis: which is the right chunking approach for your language? with practical insights and expert guidance from the LanceDB team.
Chat with Your Stats Using Langchain Dataframe Agent & LanceDB Hybrid Search
In this blog, we’ll explore how to build a chat application that interacts with CSV and Excel files using LanceDB’s hybrid search capabilities.
Netflix's Media Data Lake and the Rise of the Multimodal Lakehouse
How Netflix built a Media Data Lake powered by LanceDB and the Multimodal Lakehouse to unify petabytes of media assets for machine learning pipelines.
Case Study: Meet Dosu - the Intelligent Knowledge Base for Software Teams and Agents
How Dosu uses LanceDB to transform codebases into living knowledge bases with real-time search and versioning.
How Cognee Builds AI Memory Layers with LanceDB
How Cognee uses LanceDB to deliver durable, isolated, and low-ops AI memory from local development to managed production.
Case Study: How CodeRabbit Leverages LanceDB for AI-Powered Code Reviews
How CodeRabbit leverages LanceDB-powered context engineering turns every review into a quality breakthrough.
Building RAG on codebases: Part 2
Building a Cursor-like @codebase RAG solution. Part 2 focuses on the generating embeddings and the retrieval strategy using a combination of techniques in LanceDB.
Building RAG on codebases: Part 1
Building a Cursor-like @codebase RAG solution. Part 1 focuses on indexing techniques, chunking strategies, and generating embeddings in LanceDB.
Branching and Shallow Cloning in Lance: Towards a "Git for AI Data"
A deep dive into how table formats handle version management for ML/AI experimentation, and how Lance unifies branching, tagging, and shallow clone on top of its multi-base architecture.
Better RAG with Active Retrieval Augmented Generation FLARE
by Akash A. Get practical steps and examples from 'Better RAG with Active Retrieval Augmented Generation FLARE'.
Benchmarking Random Access in Lance
In this short blog post we’ll take you through some simple benchmarks to show the random access performance of Lance format. Get practical steps and examples from 'Benchmarking random access in Lance'.
Inverted File Product Quantization (IVF_PQ): Accelerate Vector Search by Creating Indices
Compress vectors with PQ and accelerate retrieval with IVF_PQ in LanceDB. The tutorial explains the concepts, memory savings, and a minimal implementation with search tuning knobs.
Benchmarking Cohere Rerankers with LanceDB
Improve retrieval quality by reranking LanceDB results with Cohere and ColBERT. You’ll plug rerankers into vector, FTS, and hybrid search and compare accuracy on real datasets.
AnythingLLM's Competitive Edge: LanceDB for Seamless RAG and Agent Workflows
Discover how AnythingLLM leveraged LanceDB's serverless architecture to eliminate vector database setup complexity, enabling seamless cross-platform RAG and agent workflows with zero configuration required.
Announcing Lance SDK 1.0.0: What This Milestone Means for the Community
We’re excited to announce that the core Rust SDK and the Python and Java binding SDKs are graduating to version 1.0.0, alongside a new, community-driven release strategy.
Agentic RAG Using LangGraph: Build an Autonomous Customer Support Agent
Build an autonomous customer support agent using LangGraph and LanceDB that automatically fetches, classifies, drafts, and responds to emails with RAG-powered policy retrieval.
No results found in blog posts
David Myriel
Writer, Software Engineer
GitHub
Linkedin
Twitter

June 2025: $30M Series A, Multimodal Lakehouse Launch & Product Updates

David Myriel
•
March 22, 2026
LanceDB's June 2025 newsletter covering latest company news, product updates, open source releases, and community highlights.
Newsletter
newsletter-june-2025

What is the LanceDB Multimodal Lakehouse?

David Myriel
•
March 22, 2026
Introducing the Multimodal Lakehouse - a unified platform for managing AI data from raw files to production-ready features, now part of LanceDB Enterprise.
Engineering
multimodal-lakehouse

Building Semantic Video Recommendations with TwelveLabs and LanceDB

David Myriel
•
March 22, 2026
Build semantic video recommendations using TwelveLabs embeddings, LanceDB storage, and Geneva pipelines with Ray.
Engineering
geneva-twelvelabs

LanceDB's RaBitQ Quantization for Blazing Fast Vector Search

David Myriel
Yang Cen
•
March 22, 2026
Introducing RaBitQ quantization in LanceDB for higher compression, faster indexing, and better recall on high‑dimensional embeddings.
Engineering
feature-rabitq-quantization

LanceDB WikiSearch: Native Full-Text Search on 41M Wikipedia Docs

David Myriel
•
March 22, 2026
No more Tantivy! We stress-tested native full-text search in our latest massive-scale search demo. Let's break down how it works and what we did to scale it.
Engineering
feature-full-text-search

Netflix's Media Data Lake and the Rise of the Multimodal Lakehouse

David Myriel
•
March 22, 2026
How Netflix built a Media Data Lake powered by LanceDB and the Multimodal Lakehouse to unify petabytes of media assets for machine learning pipelines.
Case Study
case-study-netflix

How Cognee Builds AI Memory Layers with LanceDB

David Myriel
Vasilije Markovic
•
March 22, 2026
How Cognee uses LanceDB to deliver durable, isolated, and low-ops AI memory from local development to managed production.
Case Study
case-study-cognee
The AI-Native Multimodal Lakehouse. Built on top of open source Lance format.
Lancedb
BlogCareers
Resources
DocumentationLance FormatSupport
Legal
TermsPolicySecurity
© 2026 LanceDB Inc. All rights reserved.
Certifications:
AICPA SOC compliance badgeGDPR compliance badgeHIPAA compliance badge