David Myriel

⚡Vector Search at 10B Scale, 📊 Lance Format Benchmarks, 🚗 AV Pipelines at Scale

Distributed vector search at 10B scale, more efficient storage with Lance format v2.2, and production AV pipelines simplified, plus upcoming events and community updates.

How LanceDB Accelerates Vector Search at 10 Billion Scale

How LanceDB applies distributed indexing, distributed query execution, HNSW centroid routing, and fast RaBitQ rotation to scale search to 10B vectors and beyond.

OpenSearch vs LanceDB for Vector Search: Query Cost and Infrastructure

Choosing a vector database usually comes down to a tradeoff between a full search service and an in-process library. This post showcases benchmarks that compare OpenSearch and LanceDB on the COCO 2017 images embedded with SigLIP. We measure ingestion throughput, query cost, storage layout, and overall infra cost.

Volcano Engine LAS's Lance-Based PB-Scale Autonomous Driving Data Lake Solution

How Bytedance Volcano Engine LAS (Lake for AI Service) leverages Lance as the core storage format, rapidly constructing a next-gen AI data lake to efficiently store, manage, and process multimodal data (text, images, audio/video).

Unifying the AV ML Stack: From Raw Data to Trained Model with LanceDB

A complete walkthrough of building an autonomous vehicle perception model training pipeline on top of LanceDB and the Multimodal Lakehouse.

Lance JSON Support: Why You Might Not Really Need Variant

Lance's JSONB storage, scalar indexing, data evolution, and full-text search already deliver what most users want from Variant — with explicit control, schema consistency, and no vendor lock-in.

Building A Storage Format For The Next Era of Biology

How Lance can serve as the foundation for AI on single-cell genomics atlases and a new generation for modeling in biology.

📄 Lance Blob V2, 🤗 Upload Lance Datasets to HF Hub, 🦞 LanceDB for OpenClaw's Memory

Lance Blob V2 introduces adaptive storage semantics, easily upload Lance datasets to Hugging Face Hub, and OpenClaw establishes LanceDB as a default memory layer for agents, plus community and enterprise updates.

Smart Parsing Meets Sharp Retrieval: Combining LiteParse and LanceDB

Build a structure-aware PDF QA agent with LiteParse, LanceDB, and Claude to answer complex questions over visually rich documents.

Lance Format v2.2 Benchmarks: Half the Storage, None of the Slowdown

Benchmarks that show how Lance format v2.2 cuts storage by 50%+, beats Parquet on compression, and delivers up to 68x faster blob reads — while preserving the scan and random access patterns that multimodal training depends on.

Agentic Coding as Community Stewardship

Agentic coding lets upstream maintainers turn ecosystem-wide mechanical updates into fast, contextual, low-friction contributions for downstream users.

One System, Many Workloads: Rethinking What "Multimodal" Means for AI

A practical definition of multimodal complexity, and how LanceDB’s Multimodal Lakehouse is built to address these challenges.

The Future of AI-Native Development is Local: Inside Continue's LanceDB-Powered Evolution

Discover how Continue revolutionized AI-native development with LanceDB's embedded TypeScript library, enabling lightning-fast semantic code search while maintaining complete developer privacy and offline capability.

Lance File Format 2.2: Taming Complex Data

Lance file format 2.2 introduces Blob V2, nested schema evolution, native Map type support, and additional compression and performance improvements for AI/ML data workloads.

Lance Blob V2: Making Multimodal Data a First-Class Citizen in the Lakehouse

How we redesigned blob storage in Lance to make multimodal data a first-class citizen, with four storage semantics (Inline, Packed, Dedicated, External) that automatically adapt to your workload.

Why LanceDB Is the Most Natural Memory Layer for OpenClaw

OpenClaw and similar personal autonomous agents need a local-first long-term memory layer. LanceDB fits that role with embedded deployment, filesystem-native storage, and multimodal retrieval.

OpenClaw + LanceDB + Seed 2.0: Turn Visual Ideas into Reality, Fast!

OpenClaw and similar personal autonomous agents need a local-first long-term memory layer. LanceDB fits that role with embedded deployment, filesystem-native storage, and multimodal retrieval.

Memory for OpenClaw: From Zero to LanceDB Pro

Benchmarking three OpenClaw memory plugins on the LOCOMO dataset

A Guide to Uploading Lance Datasets on the Hugging Face Hub

Build a multimodal Lance dataset, publish it to the Hub, and query the precomputed vector + FTS indexes in LanceDB, without needing to download the dataset locally.

Zero Shot Image Classification with Vector Search

Get about zero shot image classification with vector search. Get practical steps, examples, and best practices you can use now.

WeRide's Data Platform Transformation: How LanceDB Fuels Model Development Velocity

Discover how WeRide, a leading autonomous driving company, leveraged LanceDB to revolutionize their data platform, achieving 90x improvement in ML developer productivity and reducing data mining time from 1 week to 1 hour.

Training a Variational AutoEncoder from Scratch with Lance File Format

Train a Variational Autoencoder end‑to‑end using Lance for fast, scalable data handling. You’ll set up the dataset, build the VAE in PyTorch, and run training, sampling, and reconstructions.

Track AI Trends: CrewAI Agents & RAG

This article will teach us how to make an AI Trends Searcher using CrewAI Agents and their Tasks. But before diving into that, let's first understand what CrewAI is and how we can use it for these applications.

Tokens per Second Is NOT All You Need

Explore about tokens per second is not all you need. Get practical steps, examples, and best practices you can use now.

The Future of Open Source Table Formats: Apache Iceberg and Lance

Explore the future of open source table formats: apache iceberg and lance with practical insights and expert guidance from the LanceDB team.

The Case for Random Access I/O

One of the reasons we started the Lance file format and have been investigating new encodings is because we wanted a format with better support for random access.

LanceDB Raises $30M Series A to Build the Multimodal Lakehouse

We have closed another funding round to accelerate development of the Multimodal Lakehouse - a unified platform for AI data infrastructure.

SemanticDotArt: Rethinking Art Discovery with LanceDB

SemanticDotArt turns art discovery into a multimodal search experience, matching feelings, phrases, and images with LanceDB's hybrid retrieval.

Second Dinner's Secret Weapon: LanceDB-Powered RAG for Faster, Smarter Game Development

Discover how Second Dinner, creators of Marvel Snap, leveraged LanceDB Cloud to transform game development workflows, reducing prototyping time from months to hours and automating QA test generation with 81% better results.

Search Within an Image with Segment Anything

Get about search within an image with segment anything. Get practical steps, examples, and best practices you can use now.

Scalable Computer Vision with LanceDB & Voxel51

Explore about scalable computer vision with lancedb & voxel51. Get practical steps, examples, and best practices you can use now.

Rethinking Table File Paths with Uber: Lance’s Multi-Base Layout

A tour of Lance's file path design, and how Lance’s new multi-base layout enables multi-location datasets (such as Uber’s multi-bucket setup) with minimal metadata rewrites.

RAG Isn't One-Size-Fits-All: Here's How to Tune It for Your Use Case

Great RAG comes from a tight iteration loop. Learn how to systematically improve each layer of your RAG system using Kiln and LanceDB.

Python Package to convert image datasets to Lance type

Explore python package to convert image datasets to lance type with practical insights and expert guidance from the LanceDB team.

The Quest for One Million IOPS: Benchmarking Storage at LanceDB

Learn how LanceDB benchmarks storage and how we achieved one million disk reads per second.

November Feature Roundup

Explore November feature roundup with practical insights and expert guidance from the LanceDB team.

🛡️ Newly Knighted Lancelot, ▶️ TwelveLabs Semantic Video Recommendations, 🧠 Cognee's AI Memory Layer with LanceDB

Our September newsletter welcomes new Lancelot members, highlights TwelveLabs semantic video recommendations, Cognee’s AI memory layer, and shares the latest product and community updates.

🎨 Semantic.Art, 💾 Stable Lance 2.1, 🎥 Ray+LanceDB powers Netflix

Our October newsletter highlights Semantic.Art, Lance File 2.1, RaBitQ Quantization, upcoming events, latest product and community updates.

🛡️Lance Community Governance, Lance + Iceberg 🧊, Netflix’s Multimodal Search Demo 🔍

Our November newsletter highlights Lance community governance, a deep dive on Lance and Iceberg, a demo of Netflix's multimodal search, previous talk recordings, and the latest product and community updates.

June 2025: $30M Series A, Multimodal Lakehouse Launch & Product Updates

LanceDB's June 2025 newsletter covering latest company news, product updates, open source releases, and community highlights.

⚖️ Harvey’s Enterprise-Grade RAG on LanceDB, 💼 Dosu Case Study, Minimax&LumaLabs❤️Lance-Ray

Our August newsletter features a new case study with Dosu, recaps from events with Harvey and Databricks, and the latest product and community updates.

🦆 Lance x DuckDB SQL Retrieval, 🚗 Uber-Scale Storage, ⚡ 1.5M IOPS

Kicking off 2026 with Lance-native SQL retrieval via DuckDB, Uber-scale multi-bucket storage, 1.5M IOPS benchmarks, and continued OSS momentum across the Lance ecosystem.

🤗 Lance x Hugging Face, 🪾Git-Style Branching, 🏔️ Geospatial in Lance

Native Lance support on Hugging Face Hub, Git-style branching and shallow clone for AI data, and Arrow-native geospatial with R-Tree indexing, plus steady OSS and community momentum.

💾 Lance SDK v1.0.0, 🗓️ 1st Lance Community Sync, 🔍 Wikisearch

Our December newsletter highlights Lance SDK v1.0.0, our upcoming Lance community sync, Wikisearch demo, and the latest product and community updates.

Netflix’s Media Data Lake ❤️ LanceDB, CodeRabbit 💼 Case Study, Lance Namespace

Our August newsletter highlights LanceDB powering Netflix's Media Data Lake, a case study on CodeRabbit's AI-powered code reviews, and updates on Lance Namespace and Spark integration.

My Summer Internship Experience at LanceDB

I'm Raunak, a master's student at the University of Illinois, Urbana-Champaign. This summer, I had the opportunity to intern as a Software Engineer at LanceDB, an early-stage startup based in San Francisco.

My SIMD Is Faster than Yours

An untold story about how we make LanceDB vector search fast. Get practical steps and examples from 'My SIMD is faster than Yours'.

Multimodal Myntra Fashion Search Engine Using LanceDB

Build a multimodal fashion search engine with LanceDB and CLIP embeddings. Follow a step‑by‑step workflow to register embeddings, create the table, query by text or image, and ship a Streamlit UI.

What is the LanceDB Multimodal Lakehouse?

Introducing the Multimodal Lakehouse - a unified platform for managing AI data from raw files to production-ready features, now part of LanceDB Enterprise.

Multi Document Agentic RAG: a Walkthrough

Unlock about multi document agentic rag: a walkthrough. Get practical steps, examples, and best practices you can use now.

Modified RAG: Parent Document & Bigger Chunk Retriever

Get about modified rag: parent document & bigger chunk retriever. Get practical steps, examples, and best practices you can use now.

MemGPT: OS Inspired LLMs That Manage Their Own Memory

Explore about memgpt: os inspired llms that manage their own memory. Get practical steps, examples, and best practices you can use now.

Late Interaction & Efficient Multi-modal Retrievers Need More Than a Vector Index

Explore late interaction & efficient multi-modal retrievers need more than a vector index with practical insights and expert guidance from the LanceDB team.

Developers, Ditch the Black Box: Welcome to Continue

Remember flipping through coding manuals? Those quickly became relics with the rise of Google and Stack Overflow, a one-stop shop for developer queries.

Lance × Hugging Face: A New Era of Sharing Multimodal Data on the Hub

Announcing native read support for Lance format on Hugging Face Hub. You can now distribute your large multimodal datasets as a single, searchable artifact (including blobs, embeddings and indexes) all in one place!

Lance × DuckDB: SQL for Retrieval on the Multimodal Lakehouse Format

Use the Lance format as your lakehouse layer for retrieval, RAG and more, with the native Lance extension for DuckDB

Lance, Windows. Windows, Lance

It was Spring of 2012. After being an avid user for 2+ years, I finally decided to join Wes Mckinney and work on pandas full time.

Lance v2: A New Columnar Container Format

Explore lance v2: a new columnar container format with practical insights and expert guidance from the LanceDB team.

Productionalize AI Workloads with Lance Namespace, LanceDB, and Ray

Learn how to productionalize AI workloads with Lance Namespace's enterprise stack integration and the scalability of LanceDB and Ray for end-to-end ML pipelines.

Lance File 2.1 is Now Stable

The 2.1 file version is now stable, learn what that means for you and what's coming next.

Lance File 2.1: Smaller and Simpler

Explore lance file 2.1: smaller and simpler with practical insights and expert guidance from the LanceDB team.

Introducing Lance Data Viewer: A Simple Way to Explore Lance Tables

A lightweight open source web UI for exploring Lance datasets, viewing schemas, and browsing table data with vector visualization support.

Building the Future Together: Introducing Lance Community Governance

Announcing the formal governance structure for the Lance community, establishing clear pathways for contribution and leadership with a three-tier system of contributors, maintainers and PMC.

Manage Lance Tables in Any Catalog using Lance Namespace and Spark

Access and manage your Lance tables in Hive, Glue, Unity Catalog, or any catalog service using Lance Namespace with the latest Lance Spark connector.

Implementing Corrective RAG in the Easiest Way

Even though text-generation models are good at generating content, they sometimes need to improve in returning facts. This happens because of the way they are trained.

Hybrid Search: RAG for Real-Life Production-Grade Applications

Get about hybrid search: rag for real-life production-grade applications. Get practical steps, examples, and best practices you can use now.

Hybrid Search: Combining BM25 and Semantic Search for Better Results with Langchain

Have you ever thought about how search engines find exactly what you're looking for? They usually use a mix of matching specific words and understanding the meaning behind them.

Hybrid Search and Custom Reranking with LanceDB

Combine keyword and vector search for higher‑quality results with LanceDB. This post shows how to run hybrid search and compare rerankers (linear combination, Cohere, ColBERT) with code and benchmarks.

Reduce Hallucinations from LLM-Powered Agents Using Long-Term Memory

Understand about reduce hallucinations from llm-powered agents using long-term memory. Get practical steps, examples, and best practices you can use now.

Implement Contextual Retrieval and Prompt Caching with LanceDB

Unlock about implement contextual retrieval and prompt caching with lancedb. Get practical steps, examples, and best practices you can use now.

RAG with GRPO Fine-Tuned Reasoning Model

Explore rag with grpo fine-tuned reasoning model with practical insights and expert guidance from the LanceDB team.

GraphRAG: Hierarchical Approach to Retrieval-Augmented Generation

Explore GraphRAG: hierarchical approach to retrieval-augmented-generation with practical insights and expert guidance from the LanceDB team.

GPU-Accelerated Indexing in LanceDB

Speed up vector index training in LanceDB with CUDA or Apple Silicon (MPS). See how GPU‑accelerated IVF/PQ training compares to CPU and how to enable it in code.

How We Added Geospatial Support To Lance With No New Code

How Lance's Arrow-native architecture enables first-class geospatial support through extension types, GeoDataFusion integration, and R-Tree indexing.

Building Semantic Video Recommendations with TwelveLabs and LanceDB

Build semantic video recommendations using TwelveLabs embeddings, LanceDB storage, and Geneva pipelines with Ray.

LanceDB's Geneva: Scalable Feature Engineering

Learn how to build scalable feature engineering pipelines with Geneva and LanceDB. This demo transforms image data into rich features including captions, embeddings, and metadata using distributed Ray clusters.

From BI to AI: A Modern Lakehouse Stack with Lance and Iceberg

A comparison of where Iceberg and Lance sit in the modern lakehouse stack. We highlight emerging architectures that are bridging the worlds of analytics and AI/ML workloads using these two formats, while being built on the same data foundation.

Setup Real-Time Multimodal AI Analytics with Apache Fluss (incubating) and Lance

Learn how to build real-time multimodal AI analytics by integrating Apache Fluss streaming storage with Lance's AI-optimized lakehouse. This guide demonstrates streaming multimodal data processing for RAG systems and ML workflows.

Columnar File Readers in Depth: Parallelism without Row Groups

Explore columnar file readers in depth: column shredding with practical insights and expert guidance from the LanceDB team.

LanceDB's RaBitQ Quantization for Blazing Fast Vector Search

Introducing RaBitQ quantization in LanceDB for higher compression, faster indexing, and better recall on high‑dimensional embeddings.

LanceDB WikiSearch: Native Full-Text Search on 41M Wikipedia Docs

No more Tantivy! We stress-tested native full-text search in our latest massive-scale search demo. Let's break down how it works and what we did to scale it.

Efficient RAG with Compression and Filtering

Discover about efficient rag with compression and filtering. Get practical steps, examples, and best practices you can use now.

Effortlessly Loading and Processing Images with Lance: a Code Walkthrough

Working with large image datasets in machine learning can be challenging, often requiring significant computational resources and efficient data-handling techniques.

Designing a Table Format for ML Workloads

Explore designing a table format for ML workloads with practical insights and expert guidance from the LanceDB team.

Custom Datasets for Efficient LLM Training Using Lance

See about custom datasets for efficient llm training using lance. Get practical steps, examples, and best practices you can use now.

Creating a FinTech AI Agent From Scratch

Explore fintech ai agent from scratch with practical insights and expert guidance from the LanceDB team.

Convert Any Image Dataset to Lance

In our article, we explored the remarkable capabilities of the Lance format, a modern, columnar data storage solution designed to revolutionize the way we work with large image datasets in machine learning.

Columnar File Readers in Depth: Structural Encoding

Deep dive into LanceDB's dual structural encoding approach - mini-block for small data types and full-zip for large multimodal data. Learn how this optimizes compression and random access performance compared to Parquet.

Columnar File Readers in Depth: Repetition & Definition Levels

Explore columnar file readers in depth: repetition & definition levels with practical insights and expert guidance from the LanceDB team.

Columnar File Readers in Depth: Compression Transparency

Explore columnar file readers in depth: compression transparency with practical insights and expert guidance from the LanceDB team.

Columnar File Readers in Depth: Column Shredding

Explore columnar file readers in depth: column shredding with practical insights and expert guidance from the LanceDB team.

Columnar File Readers in Depth: Backpressure

Streaming data applications can be tricky. When you can read data faster than you can process the data then bad things tend to happen. The various solutions to this problem are largely classified as backpressure.

Columnar File Readers in Depth: APIs and Fusion

The API used to read files has evolved over time, from simple full table reads to batch reads and eventually to iterative record batch readers. Lance takes this a step further to return a stream of read tasks.

Chunking Techniques with Langchain and LlamaIndex

In our last blog, we talked about chunking and why it is necessary for processing data through LLMs. We covered some simple techniques to perform text chunking.

Chunking Analysis: Which is the right chunking approach for your language?

Explore chunking analysis: which is the right chunking approach for your language? with practical insights and expert guidance from the LanceDB team.

Chat with Your Stats Using Langchain Dataframe Agent & LanceDB Hybrid Search

In this blog, we’ll explore how to build a chat application that interacts with CSV and Excel files using LanceDB’s hybrid search capabilities.

Netflix's Media Data Lake and the Rise of the Multimodal Lakehouse

How Netflix built a Media Data Lake powered by LanceDB and the Multimodal Lakehouse to unify petabytes of media assets for machine learning pipelines.

Case Study: Meet Dosu - the Intelligent Knowledge Base for Software Teams and Agents

How Dosu uses LanceDB to transform codebases into living knowledge bases with real-time search and versioning.

How Cognee Builds AI Memory Layers with LanceDB

How Cognee uses LanceDB to deliver durable, isolated, and low-ops AI memory from local development to managed production.

Case Study: How CodeRabbit Leverages LanceDB for AI-Powered Code Reviews

How CodeRabbit leverages LanceDB-powered context engineering turns every review into a quality breakthrough.