stable-worldmodel-a-high-performance-platform-for-reproducible-world-model-research
Ayush Chaurasia
Quentin Lhoest
Lucas Maes
Quentin Le Lidec
reproducible-data-curation-in-the-multimodal-lakehouse
Prashanth Rao
newsletter-may-2026
ChanChan Mao
newsletter-april-2026
ChanChan Mao
how-lancedb-accelerates-vector-search-at-10-billion-scale
Yang Cen
opensearch-vs-lancedb-for-vector-search-query-cost-and-infrastructure
Justin Miller
volcano-engine-autonomous-driving-data-lake-solution
Kejian Ju
unifying-the-av-ml-stack-lancedb
Ayush Chaurasia
lance-json-support-why-you-might-not-really-need-variant
Jack Ye
building-a-storage-format-for-the-next-era-of-biology
Pavan Ramkumar
newsletter-march-2026
ChanChan Mao
smart-parsing-meets-sharp-retrieval-combining-liteparse-and-lancedb
Clelia Astra Bertelli
Prashanth Rao
lance-format-v2-2-benchmarks-half-the-storage-none-of-the-slowdown
Xuanwo
make-your-sql-workflows-multimodal-with-lancedb-x-duckdb
Prashanth Rao
agentic-coding-as-community-stewardship
Xuanwo
what-we-mean-by-multimodal
Prashanth Rao
ai-native-development-local-continue-lancedb
Ty Dunn
lance-file-format-2-2-taming-complex-data
Xuanwo
lance-blob-v2
Xuanwo
Jack Ye
openclaw-lancedb-memory-layer
Xuanwo
Prashanth Rao
openclaw-lancedb-seed2
LanceDB
openclaw-memory-from-zero-to-lancedb-pro
Prashanth Rao
upload-lance-datasets-to-hf-hub
Prashanth Rao
zero-shot-image-classification-with-vector-search
Vipul Maheshwari
werides-data-platform-transformation-how-lancedb-fuels-model-development-velocity
Qian Zhu
Fei Chen
training-a-variational-autoencoder-from-scratch-with-the-lance-file-format
LanceDB
track-ai-trends-crewai-agents-rag
LanceDB
tokens-per-second-is-not-all-you-need
Mingran Wang
Tan Li
the-future-of-open-source-table-formats-iceberg-and-lance
Jack Ye
the-case-for-random-access-i-o
LanceDB
series-a-funding
Chang She
semanticdotart
Ayush Chaurasia
second-dinners-secret-weapon-lancedb-powered-rag-for-faster-smarter-game-development
Qian Zhu
search-within-an-image-331b54e4285e
Kaushal Choudhary
scalable-computer-vision-with-lancedb-voxel51-d8b65066d5f6
LanceDB
rethinking-table-file-paths-lance-multi-base-layout
Jack Ye
rag-isnt-one-size-fits-all
Leonard Marcq
python-package-to-convert-image-datasets-to-lance-type
Vipul Maheshwari
one-million-iops
Weston Pace
november-feature-roundup
Will Jones
newsletter-september-2025
Jasmine Wang
newsletter-october-2025
Jasmine Wang
newsletter-november-2025
ChanChan Mao
newsletter-june-2025
David Myriel
newsletter-july-2025
Jasmine Wang
newsletter-january-2026
ChanChan Mao
newsletter-february-2026
ChanChan Mao
newsletter-december-2025
ChanChan Mao
newsletter-august-2025
Jasmine Wang
my-summer-internship-experience-at-lancedb-2
Raunak Sinha
my-simd-is-faster-than-yours-fb2989bf25e7
LanceDB
multimodal-myntra-fashion-search-engine-using-lancedb
LanceDB
multimodal-lakehouse
David Myriel
multi-document-agentic-rag-a-walkthrough
Vipul Maheshwari
modified-rag-parent-document-bigger-chunk-retriever-62b3d1e79bc6
Mahesh Deshwal
memgpt-os-inspired-llms-that-manage-their-own-memory-793d6eed417e
Ayush Chaurasia
late-interaction-efficient-multi-modal-retrievers-need-more-than-just-a-vector-index
Ayush Chaurasia
lancedb-x-continue
LanceDB
lance-x-huggingface-a-new-era-of-sharing-multimodal-data
Prashanth Rao
Quentin Lhoest
Xuanwo
Ayush Chaurasia
lance-x-duckdb-sql-retrieval-on-the-multimodal-lakehouse-format
Xuanwo
lance-windows-windows-lance
Chang She
lance-v2
Weston Pace
lance-namespace-lancedb-and-ray
Jack Ye
lance-file-2-1-stable
Weston Pace
lance-file-2-1-smaller-and-simpler
Weston Pace
lance-data-viewer
Gordon Murray
lance-community-governance
Jack Ye
introducing-lance-namespace-spark-integration
Jack Ye
implementing-corrective-rag-in-the-easiest-way-2
LanceDB
hybrid-search-rag-for-real-life-production-grade-applications-e1e727b3965a
Mahesh Deshwal
hybrid-search-combining-bm25-and-semantic-search-for-better-results-with-lan-1358038fe7e6
LanceDB
hybrid-search-and-custom-reranking-with-lancedb-4c10a6a3447e
LanceDB
how-to-reduce-hallucinations-from-llm-powered-agents-using-long-term-memory-72f262c3cc1f
Tevin Wang
guide-to-use-contextual-retrieval-and-prompt-caching-with-lancedb
LanceDB
grpo-understanding-and-fine-tuning-the-next-gen-reasoning-model-2
Mahesh Deshwal
graphrag-hierarchical-approach-to-retrieval-augmented-generation
Akash Desai
gpu-accelerated-indexing-in-lancedb-27558fa7eee5
LanceDB
geo-support
Jack Ye
geneva-twelvelabs
David Myriel
geneva-feature-engineering
Jonathan Hsieh
from-bi-to-ai-lance-and-iceberg
Jack Ye
Prashanth Rao
fluss-integration
Wayne Wang
file-readers-in-depth-parallelism-without-row-groups
Weston Pace
feature-rabitq-quantization
David Myriel
Yang Cen
feature-full-text-search
David Myriel
enhance-rag-integrate-contextual-compression-and-filtering-for-precision-a29d4a810301
Kaushal Choudhary
effortlessly-loading-and-processing-images-with-lance-a-code-walkthrough
LanceDB
designing-a-table-format-for-ml-workloads
Weston Pace
custom-dataset-for-llm-training-using-lance
LanceDB
creating-a-fintech-agent
Vipul Maheshwari
convert-any-image-dataset-to-lance
LanceDB
columnar-file-readers-in-depth-structural-encoding
Weston Pace
columnar-file-readers-in-depth-repetition-definition-levels
Weston Pace
columnar-file-readers-in-depth-compression-transparency
Weston Pace
columnar-file-readers-in-depth-column-shredding
Weston Pace
columnar-file-readers-in-depth-backpressure
Weston Pace
columnar-file-readers-in-depth-apis-and-fusion
Weston Pace
chunking-techniques-with-langchain-and-llamaindex
Prashant Kumar
chunking-analysis-which-is-the-right-chunking-approach-for-your-language
Shresth Shukla
chat-with-csv-excel-using-lancedb
LanceDB
case-study-netflix
David Myriel
case-study-dosu
Qian Zhu
Michael Ludden
case-study-cognee
David Myriel
Vasilije Markovic
case-study-coderabbit
Qian Zhu
building-rag-on-codebases-part-2
Sankalp Shubham
building-rag-on-codebases-part-1
Sankalp Shubham
branching-and-shallow-clone
Jack Ye
better-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f
LanceDB
benchmarking-random-access-in-lance
Chang She
benchmarking-lancedb-92b01032874a-2
LanceDB
benchmarking-cohere-reranker-with-lancedb
LanceDB
anythingllms-competitive-edge-lancedb-for-seamless-rag-and-agent-workflows
Ayush Chaurasia
announcing-lance-sdk
Weston Pace
agentic-rag-using-langgraph-building-a-simple-customer-support-autonomous-agent
LanceDB
advanced-rag-precise-zero-shot-dense-retrieval-with-hyde-0946c54dfdcb
LanceDB
accelerate-vector-search-applications-using-openvino-lancedb
LanceDB
a-primer-on-text-chunking-and-its-types-a420efc96a13
Prashant Kumar
a-practical-guide-to-training-custom-rerankers
Ayush Chaurasia
a-practical-guide-to-fine-tuning-embedding-models
Ayush Chaurasia
keep-your-data-fresh-with-cocoindex-and-lancedb
Prashanth Rao
Linghua Jin

Lance File 2.1 is Now Stable

October 3, 2025
Engineering

The 2.1 version of the Lance file format has been in beta for a while now and we’re excited to announce that it is now stable. This means we’ve documented the spec and any potential breaking changes will now be part of 2.2 and we are committing to backwards compatibility of 2.1.

Compression Without Impacting Random Access

A recent paper measured random access performance and stated:

Lance is the fastest because it does not have cascading encoding or compression like the others, enabling direct offset calculation for certain types (e.g., integers) and minimizing read amplification."

This lack of compression was a significant limitation of 2.0. That’s why the primary reason behind the 2.1 format has been to introduce cascading encoding and compression without sacrificing random access performance. We wrote more about the structural encodings that enable this feature and studied this in more depth in our research paper . I’m happy to report that we were able to achieve our goal and avoid impacts to random access performance.

I guess this means we’re still the fastest.

Other Benefits

In addition to the compression benefits, there are a few other minor benefits added in the 2.1 format:

  • Fewer IOPS when reading nested data (lists and structs)
  • Support for distinguishing between null structs and null values.
  • Optional repetition index caching to further reduce IOPS of variable-width data at the expense of more memory usage.

How to Upgrade

The data storage version is set on a per-dataset basis. If you are happy with your dataset performance with 2.0, there is no push to upgrade. We will of course continue to maintain 2.0. If you would like to take advantage of the new features in 2.1 then you will need to make a copy of your dataset. The simplest way to do this is to do something like this:

import lance

ds = lance.dataset("my_2_0_dataset")
lance.write_dataset(ds, "my_2_1_dataset", data_storage_version="2.1")

Should I Upgrade?

Some workflows will not significantly benefit. Vector embeddings, images, and audio are all pre-compressed and often make up the majority of the data in a dataset, so there may not be much total impact. Compressing the smaller columns will still speed up scans of those columns but not all workflows rely on scans if they make good use of secondary indices.

The most likely workflows to benefit will be those that scan smaller columns as these workloads are typically bound by the disk bandwidth. You may want to try converting a subset of your data to see if there is a meaningful reduction in size or performance.

Ensuring a Smooth Transition (Even if you Don’t Upgrade)

The 0.38.0 release of Lance is the first release to fully support reading 2.1 files. You could potentially run into trouble if you are reading dataset with older versions of Lance while writing 2.1 files with newer versions of Lance.

As a result, we recommend upgrading all of your software to 0.38.0 or higher before you start writing 2.1 files. To facilitate this, we are not making 2.1 the default file format in 0.38.0. You will still need to opt-in to 2.1 when writing a dataset:

# In 0.38.0 you still need to opt-in to 2.1
lance.write_dataset(data, "my_2_1_dataset", data_storage_version="2.1")

We will be changing the default in the near future (potentially the next release). If you are planning on keeping old versions of Lance around for some time you should ensure you explicitly set the 2.0 version:

# You should explicitly set the data storage version to 2.0 if you
# plan on running a mixed environment with both older and newer versions
# of Lance.
lance.write_dataset(data, "my_2_0_dataset", data_storage_version="2.0")

In both cases this is only a concern for creating new datasets. Adding data to an existing dataset or updating data will always use the data storage version of the dataset.

What’s Next?

Work on 2.2 has already begun. It is too early to say what will be in it for sure but we are excited to share some previews of ideas we are working on.

We want to make it simpler to migrate

We expect 2.1 to be the last version that will require a dataset copy to upgrade. 2.1 has established an overall structure for file readers that will be consistent regardless of what new encodings are added. As a result, we are hoping to support mixed-version datasets by the time we release 2.2.

Some cases need better compression

Our goal in 2.1 was to establish the overall strategy for compression and define nice easy-to-implement traits for compression algorithms. We also implemented a number of popular lightweight compression techniques. However, there are still gaps in our compression coverage that we hope to fill in 2.2. If you love columnar compression and are interested in contributing, then a lot of these gaps might be nice starting issues. Look for some of the good_first_issue tags in the 2.2 milestone .

We want to fully support struct packing

We have supported struct packing for fixed-width fields for a while now. However, without support for variable-width fields it is difficult to use the packing feature to it’s fullest potential. We hope to add support for variable-width fields soon. This will allow for flexible trade-offs between row-major and column-major storage. This is important for use cases like model training from cloud storage which can be dominated by random access read patterns on smaller materialized subsets of the data.

We plan to investigate better JSON encoding

We’ve been ramping up our JSON support in the table format with the addition of JSON indexes. We are also exploring how we can best store JSON data in the file format. Common examples include JSONB and the new Parquet Variant data type. We hope to have more details on this in the future.

Join the Conversation

We’re always happy to chat more on our Discord and on Github . Feel free to ask for more details or help us find better ways to do things! If something doesn’t work or could be faster then let us know too.

Weston Pace
Data engineer from the open source space, working on LanceDB, Arrow, Substrait.

Stable-Worldmodel: A High Performance Platform for Reproducible World Model Research

Ayush Chaurasia
Quentin Lhoest
Lucas Maes
Quentin Le Lidec
June 2, 2026
stable-worldmodel-a-high-performance-platform-for-reproducible-world-model-research

🌍 Lance-Backed World Model Platform, 🦆 Multimodal SQL with Lance DuckDB Extension, 💰 LanceDB vs OpenSearch Cost Breakdown

ChanChan Mao
May 28, 2026
newsletter-may-2026

Reproducible Data Curation In The Multimodal Lakehouse

Prashanth Rao
May 29, 2026
reproducible-data-curation-in-the-multimodal-lakehouse