from-messy-pdfs-to-verifiable-answers-with-liteparse-and-lancedb
Prashanth Rao
Clelia Astra Bertelli
faster-vlm-fine-tuning-with-materialized-model-features-in-lancedb
Prashanth Rao
Ayush Chaurasia
lance-blob-v2-late-materialization-for-large-binary-data-in-spark
Drew Gallardo
semantic-memory-for-hermes-agent-with-lancedb
Prashanth Rao
a-metadata-benchmark-of-lance-delta-lake-and-iceberg-on-s3
Jack Ye
scalable-feature-engineering-on-multimodal-datasets
Prashanth Rao
stable-worldmodel-a-high-performance-platform-for-reproducible-world-model-research
Ayush Chaurasia
Quentin Lhoest
Lucas Maes
Quentin Le Lidec
reproducible-data-curation-in-the-multimodal-lakehouse
Prashanth Rao
newsletter-may-2026
ChanChan Mao
newsletter-april-2026
ChanChan Mao
how-lancedb-accelerates-vector-search-at-10-billion-scale
Yang Cen
opensearch-vs-lancedb-for-vector-search-query-cost-and-infrastructure
Justin Miller
volcano-engine-autonomous-driving-data-lake-solution
Kejian Ju
unifying-the-av-ml-stack-lancedb
Ayush Chaurasia
lance-json-support-why-you-might-not-really-need-variant
Jack Ye
building-a-storage-format-for-the-next-era-of-biology
Pavan Ramkumar
newsletter-march-2026
ChanChan Mao
smart-parsing-meets-sharp-retrieval-combining-liteparse-and-lancedb
Clelia Astra Bertelli
Prashanth Rao
lance-format-v2-2-benchmarks-half-the-storage-none-of-the-slowdown
Xuanwo
make-your-sql-workflows-multimodal-with-lancedb-x-duckdb
Prashanth Rao
agentic-coding-as-community-stewardship
Xuanwo
what-we-mean-by-multimodal
Prashanth Rao
ai-native-development-local-continue-lancedb
Ty Dunn
lance-file-format-2-2-taming-complex-data
Xuanwo
lance-blob-v2
Xuanwo
Jack Ye
openclaw-lancedb-memory-layer
Xuanwo
Prashanth Rao
openclaw-lancedb-seed2
LanceDB
openclaw-memory-from-zero-to-lancedb-pro
Prashanth Rao
upload-lance-datasets-to-hf-hub
Prashanth Rao
zero-shot-image-classification-with-vector-search
Vipul Maheshwari
werides-data-platform-transformation-how-lancedb-fuels-model-development-velocity
Qian Zhu
Fei Chen
training-a-variational-autoencoder-from-scratch-with-the-lance-file-format
LanceDB
track-ai-trends-crewai-agents-rag
LanceDB
tokens-per-second-is-not-all-you-need
Mingran Wang
Tan Li
the-future-of-open-source-table-formats-iceberg-and-lance
Jack Ye
the-case-for-random-access-i-o
LanceDB
series-a-funding
Chang She
semanticdotart
Ayush Chaurasia
second-dinners-secret-weapon-lancedb-powered-rag-for-faster-smarter-game-development
Qian Zhu
search-within-an-image-331b54e4285e
Kaushal Choudhary
scalable-computer-vision-with-lancedb-voxel51-d8b65066d5f6
LanceDB
rethinking-table-file-paths-lance-multi-base-layout
Jack Ye
rag-isnt-one-size-fits-all
Leonard Marcq
python-package-to-convert-image-datasets-to-lance-type
Vipul Maheshwari
one-million-iops
Weston Pace
november-feature-roundup
Will Jones
newsletter-september-2025
Jasmine Wang
newsletter-october-2025
Jasmine Wang
newsletter-november-2025
ChanChan Mao
newsletter-june-2025
David Myriel
newsletter-july-2025
Jasmine Wang
newsletter-january-2026
ChanChan Mao
newsletter-february-2026
ChanChan Mao
newsletter-december-2025
ChanChan Mao
newsletter-august-2025
Jasmine Wang
my-summer-internship-experience-at-lancedb-2
Raunak Sinha
my-simd-is-faster-than-yours-fb2989bf25e7
LanceDB
multimodal-myntra-fashion-search-engine-using-lancedb
LanceDB
multimodal-lakehouse
David Myriel
multi-document-agentic-rag-a-walkthrough
Vipul Maheshwari
modified-rag-parent-document-bigger-chunk-retriever-62b3d1e79bc6
Mahesh Deshwal
memgpt-os-inspired-llms-that-manage-their-own-memory-793d6eed417e
Ayush Chaurasia
late-interaction-efficient-multi-modal-retrievers-need-more-than-just-a-vector-index
Ayush Chaurasia
lancedb-x-continue
LanceDB
lance-x-huggingface-a-new-era-of-sharing-multimodal-data
Prashanth Rao
Quentin Lhoest
Xuanwo
Ayush Chaurasia
lance-x-duckdb-sql-retrieval-on-the-multimodal-lakehouse-format
Xuanwo
lance-windows-windows-lance
Chang She
lance-v2
Weston Pace
lance-namespace-lancedb-and-ray
Jack Ye
lance-file-2-1-stable
Weston Pace
lance-file-2-1-smaller-and-simpler
Weston Pace
lance-data-viewer
Gordon Murray
lance-community-governance
Jack Ye
introducing-lance-namespace-spark-integration
Jack Ye
implementing-corrective-rag-in-the-easiest-way-2
LanceDB
hybrid-search-rag-for-real-life-production-grade-applications-e1e727b3965a
Mahesh Deshwal
hybrid-search-combining-bm25-and-semantic-search-for-better-results-with-lan-1358038fe7e6
LanceDB
hybrid-search-and-custom-reranking-with-lancedb-4c10a6a3447e
LanceDB
how-to-reduce-hallucinations-from-llm-powered-agents-using-long-term-memory-72f262c3cc1f
Tevin Wang
guide-to-use-contextual-retrieval-and-prompt-caching-with-lancedb
LanceDB
grpo-understanding-and-fine-tuning-the-next-gen-reasoning-model-2
Mahesh Deshwal
graphrag-hierarchical-approach-to-retrieval-augmented-generation
Akash Desai
gpu-accelerated-indexing-in-lancedb-27558fa7eee5
LanceDB
geo-support
Jack Ye
geneva-twelvelabs
David Myriel
geneva-feature-engineering
Jonathan Hsieh
from-bi-to-ai-lance-and-iceberg
Jack Ye
Prashanth Rao
fluss-integration
Wayne Wang
file-readers-in-depth-parallelism-without-row-groups
Weston Pace
feature-rabitq-quantization
David Myriel
Yang Cen
feature-full-text-search
David Myriel
enhance-rag-integrate-contextual-compression-and-filtering-for-precision-a29d4a810301
Kaushal Choudhary
effortlessly-loading-and-processing-images-with-lance-a-code-walkthrough
LanceDB
designing-a-table-format-for-ml-workloads
Weston Pace
custom-dataset-for-llm-training-using-lance
LanceDB
creating-a-fintech-agent
Vipul Maheshwari
convert-any-image-dataset-to-lance
LanceDB
columnar-file-readers-in-depth-structural-encoding
Weston Pace
columnar-file-readers-in-depth-repetition-definition-levels
Weston Pace
columnar-file-readers-in-depth-compression-transparency
Weston Pace
columnar-file-readers-in-depth-column-shredding
Weston Pace
columnar-file-readers-in-depth-backpressure
Weston Pace
columnar-file-readers-in-depth-apis-and-fusion
Weston Pace
chunking-techniques-with-langchain-and-llamaindex
Prashant Kumar
chunking-analysis-which-is-the-right-chunking-approach-for-your-language
Shresth Shukla
chat-with-csv-excel-using-lancedb
LanceDB
case-study-netflix
David Myriel
case-study-dosu
Qian Zhu
Michael Ludden
case-study-cognee
David Myriel
Vasilije Markovic
case-study-coderabbit
Qian Zhu
building-rag-on-codebases-part-2
Sankalp Shubham
building-rag-on-codebases-part-1
Sankalp Shubham
branching-and-shallow-clone
Jack Ye
better-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f
LanceDB
benchmarking-random-access-in-lance
Chang She
benchmarking-lancedb-92b01032874a-2
LanceDB
benchmarking-cohere-reranker-with-lancedb
LanceDB
anythingllms-competitive-edge-lancedb-for-seamless-rag-and-agent-workflows
Ayush Chaurasia
announcing-lance-sdk
Weston Pace
agentic-rag-using-langgraph-building-a-simple-customer-support-autonomous-agent
LanceDB
advanced-rag-precise-zero-shot-dense-retrieval-with-hyde-0946c54dfdcb
LanceDB
accelerate-vector-search-applications-using-openvino-lancedb
LanceDB
a-primer-on-text-chunking-and-its-types-a420efc96a13
Prashant Kumar
a-practical-guide-to-training-custom-rerankers
Ayush Chaurasia
a-practical-guide-to-fine-tuning-embedding-models
Ayush Chaurasia
keep-your-data-fresh-with-cocoindex-and-lancedb
Prashanth Rao
Linghua Jin

🦆 Lance x DuckDB SQL Retrieval, 🚗 Uber-Scale Storage, ⚡ 1.5M IOPS

February 9, 2026
Newsletter

🦆 Lance x DuckDB: SQL for Retrieval on the Multimodal Lakehouse Format

Lance x DuckDB

The Lance extension for DuckDB turns DuckDB into a SQL compute engine over Lance datasets, exposing vector, full-text, and hybrid retrieval as SQL table functions. This enables fully composable retrieval workflows — joins with eval data, reproducible top-k slicing, SQL-based debugging, and materialization back into Lance.

This extension bridges traditional SQL analytics with multimodal retrieval on a single open dataset format.

Read more →

🚗 Rethinking Table File Paths with Uber: Lance's Multi-Base Layout

Rethinking Table File Paths with Uber: Lance's Multi-Base Layout

Working with Uber's AI Infrastructure team, Lance introduced a multi-base layout to support product systems that need a single dataset to span multiple S3 buckets for parallel reads and writes.

By separating storage bases from file references, Lance enables multi-bucket and multi-region layouts with compact, relocatable metadata — allowing Uber to scale training and retrieval workloads without fragmenting datasets or rewriting metadata.

Read more →

📍 The Quest for One Million IOPS: Benchmarking Storage at Lance

The Quest for One Million IOPS: Benchmarking Storage at Lance

Recent storage benchmarks in Lance reached up to 1.5 million IOPS by combining a scheduler rework with io_uring, showing that high random-access throughput depends more on reducing CPU overhead and context switching than on single-read latency.

This blog explains how this design better drives modern NVMe hardware for vector, text, and key-based lookups, and contrasts embedded and disaggregated architectures to show how LanceDB scales from single-process deployments to large, distributed systems.

Read more →

📖 Also Published This Month

📅 Upcoming Events

February Open Data + AI Meetup - Peninsula, Bay Area Edition — Thursday, February 12

Hear from speakers from LanceDB, Fivetran, Dremio, and typedef about what they're building and how they're defining the future of open data and AI.

Register →

NYC Lakehouse Meetup — Tuesday, February 17

​We're bringing together Apache Iceberg, Lance, and Apache DataFusion communities in NYC to chat about all things open lakehouse and data infrastructure at Cloudflare's NYC office!

Register →

🏗️ LanceDB Enterprise Updates

Feature Description
Add page cache prewarm API
  • Users can prewarm LanceDB tables using a LanceDB administrative API. (It is also possible to prewarm some columns, but not others.)
  • This is useful for cases where we want to ensure that data is in the page cache prior to running a specific workload. It is also useful for benchmarking.
Admission Control for Feature Engineering Jobs Avoid deadlocks by rejecting jobs if the cluster does not have enough resources to execute the job.
Adaptive Batch sizing for Feature Engineering
Job checkpoints
Backfill jobs now change checkpoint size depending on udf execution time. Internal benchmarks show up to 2x performance improvements.

🌟 Open Source Releases

Project Description
Lance v1.0.1 - v1.0.4
Release notes
  • Multi-base storage layouts enabling a single dataset to span multiple buckets or regions for parallel reads and writes (#5790, #5801)
  • Faster query execution via tighter WAND block score bounds and reduced per-query overhead (#5668, #5696)
LanceDB v0.26 - v0.28
Release notes
  • DuckDB-powered SQL retrieval with vector, FTS, and hybrid search exposed as composable table functions (#2946, #2957)
  • Expanded embedding support (VoyageAI v4, multimodal) and improved ingestion robustness via parallel embedding computation and better remote query cancellation (#2959, #2887, #2896, #2913)
lance-graph v0.4.0 - v0.5.0
Release notes
  • Significantly expanded Cypher expressiveness with WITH clause chaining, COLLECT, and COUNT(DISTINCT …) support (#86, #85, #116)
  • Integrated vector search and similarity UDFs into graph queries, with improved execution efficiency on object stores (#80, #81, #83, #89, #96)
lance-context v0.2.0 - v0.2.1
Release notes
  • Core context store APIs for append, search, and versioned checkout across Python and Rust (#6, #11, #12, #24)
  • Improved runtime behavior with multimodal context support, background compaction, and reduced Python-side blocking during remote I/O (#9, #28, #29)
lance-duckdb v0.4.1 - v0.5.0
Release notes
  • Improved DuckDB integration with global aggregate pushdown and expanded vector search ergonomics, including ARRAY-based query vectors and tuning controls (#124, #119, #120)
lance-namespace v0.4.4 - v0.4.5
Release notes
  • New Lance partitioning specification for defining and operating on partitioned datasets (#279, #297)
lance-ray v0.1.0 - v0.2.0
Release notes
  • Distributed Ray-based IVF_SQ / PQ / FLAT index builder for scalable, parallel index creation (#67)
lance-spark v0.2.0
Release notes
  • Spark MERGE INTO support for upserts and deletes, plus vector search and distributed index creation for large-scale Spark pipelines (#172, #189, #171)

🫶 Community Contributions

Thank you to contributors from Uber, Netflix, Hugging Face, Bytedance, Huawei, Tencent, and Alibaba for improvements across embeddings, query robustness, storage compatibility, distributed indexing, Spark integration, and core format reliability in LanceDB, Lance, lance-spark, and lance-ray.

Notable contributions this month:

- [@fzowl](https://github.com/fzowl) — Added support for VoyageAI v4 and multimodal models, expanding first-class embedding options in LanceDB.
- [@dcfocus](https://github.com/dcfocus) — Delivered major Cypher features in lance-graph, including `COLLECT` aggregation, `WITH` clause query chaining, and foundational context APIs.
- [@ChunxuTang](https://github.com/ChunxuTang) — Expanded Cypher query capabilities with `COUNT(DISTINCT …)`, case-insensitive matching, and vector search operators.
- [@beinan](https://github.com/beinan) — Improved execution efficiency and deployability across lance-graph and lance-context, enabling more scalable production deployments.
- [@jja725](https://github.com/jja725) — Implemented background compaction for Lance fragments, improving long-running system performance.
- [@ex172000](https://github.com/ex172000) — Improved performance and correctness through executor fixes and parallelized embedding computation.
- [@fatelei](https://github.com/fatelei) — Prevented Python-side blocking by releasing the GIL during remote storage operations.
- [@wojiaodoubao](https://github.com/wojiaodoubao) — Introduced the Lance partitioning specification, enabling native support for partitioned datasets.
- [@chenghao-guo](https://github.com/chenghao-guo) — Implemented a Ray-based distributed IVF index builder, enabling scalable index construction.
- [@nyl3532016](https://github.com/nyl3532016) — Added vector search support to `lance-spark`, enabling similarity search in Spark pipelines.
- [@jiaoew1991](https://github.com/jiaoew1991) — Built a fragment-aware join optimizer to improve Spark query performance on Lance datasets.
- [@jtuglu1](https://github.com/jtuglu1) — Implemented distributed full-text search index creation in `lance-spark`.
- [@bryanck](https://github.com/bryanck) — Improved stability of `lance-spark` by fixing Kryo serialization and classloader issues.
- [@zhangyue19921010](https://github.com/zhangyue19921010) — Implemented Spark `MERGE INTO` support for upsert and delete operations on Lance tables.

We want to especially highlight the initial release of lance-context contributed by Uber.

A heartfelt thank you to our community contributors of Lance and LanceDB this past month:

@fzowl@dcfocus@ChunxuTang@beinan@jja725@ex172000@hushengquan@fatelei@ddupg@Mesut-Doner@amanharshx@Angryrou@youssef-tharwat@leiyuou@prrao87@fenfeng9@chyyran@camilesing@zhangyue19921010@touch-of-grey@fredlarochelle@LuciferYang@lhoestq@majin1102@yanghua@wojiaodoubao@lichuang@Ke-Wang@niebayes@HaochengLIU@markmcd@chenghao-guo@nyl3532016@jiaoew1991@jtuglu1@bryanck@fangbo@majian1998@hamersaw

🤝 Lance Community Sync Recap

In January, we held two Lance Community Syncs focused on the upcoming Lance 2.0.0 release (now at RC4 and approaching final community vote), growing ecosystem integrations with DuckDB, Polaris, and Hugging Face, and the formalization of lance-context and lance-graph as official sub-projects.

We also discussed recent performance work across Spark, vector indexing, and WAL/mem-table updates, alongside forward-looking proposals covering schema semantics, metadata visibility, clustering strategies, and a new Incubator governance stage for emerging projects.

The next Lance Community Sync will take place on Thursday, February 12, 2026.

ChanChan Mao
Developer Relations @ LanceDB

From Messy PDFs to Verifiable Answers with LiteParse and LanceDB

Prashanth Rao
Clelia Astra Bertelli
July 2, 2026
from-messy-pdfs-to-verifiable-answers-with-liteparse-and-lancedb

Faster VLM Fine-Tuning With Materialized Model Features in LanceDB

Prashanth Rao
Ayush Chaurasia
June 24, 2026
faster-vlm-fine-tuning-with-materialized-model-features-in-lancedb

Lance Blob V2: Late Materialization for Large Binary Data in Spark

Drew Gallardo
June 17, 2026
lance-blob-v2-late-materialization-for-large-binary-data-in-spark