faster-vlm-fine-tuning-with-materialized-model-features-in-lancedb
Prashanth Rao
Ayush Chaurasia
lance-blob-v2-late-materialization-for-large-binary-data-in-spark
Drew Gallardo
semantic-memory-for-hermes-agent-with-lancedb
Prashanth Rao
a-metadata-benchmark-of-lance-delta-lake-and-iceberg-on-s3
Jack Ye
scalable-feature-engineering-on-multimodal-datasets
Prashanth Rao
stable-worldmodel-a-high-performance-platform-for-reproducible-world-model-research
Ayush Chaurasia
Quentin Lhoest
Lucas Maes
Quentin Le Lidec
reproducible-data-curation-in-the-multimodal-lakehouse
Prashanth Rao
newsletter-may-2026
ChanChan Mao
newsletter-april-2026
ChanChan Mao
how-lancedb-accelerates-vector-search-at-10-billion-scale
Yang Cen
opensearch-vs-lancedb-for-vector-search-query-cost-and-infrastructure
Justin Miller
volcano-engine-autonomous-driving-data-lake-solution
Kejian Ju
unifying-the-av-ml-stack-lancedb
Ayush Chaurasia
lance-json-support-why-you-might-not-really-need-variant
Jack Ye
building-a-storage-format-for-the-next-era-of-biology
Pavan Ramkumar
newsletter-march-2026
ChanChan Mao
smart-parsing-meets-sharp-retrieval-combining-liteparse-and-lancedb
Clelia Astra Bertelli
Prashanth Rao
lance-format-v2-2-benchmarks-half-the-storage-none-of-the-slowdown
Xuanwo
make-your-sql-workflows-multimodal-with-lancedb-x-duckdb
Prashanth Rao
agentic-coding-as-community-stewardship
Xuanwo
what-we-mean-by-multimodal
Prashanth Rao
ai-native-development-local-continue-lancedb
Ty Dunn
lance-file-format-2-2-taming-complex-data
Xuanwo
lance-blob-v2
Xuanwo
Jack Ye
openclaw-lancedb-memory-layer
Xuanwo
Prashanth Rao
openclaw-lancedb-seed2
LanceDB
openclaw-memory-from-zero-to-lancedb-pro
Prashanth Rao
upload-lance-datasets-to-hf-hub
Prashanth Rao
zero-shot-image-classification-with-vector-search
Vipul Maheshwari
werides-data-platform-transformation-how-lancedb-fuels-model-development-velocity
Qian Zhu
Fei Chen
training-a-variational-autoencoder-from-scratch-with-the-lance-file-format
LanceDB
track-ai-trends-crewai-agents-rag
LanceDB
tokens-per-second-is-not-all-you-need
Mingran Wang
Tan Li
the-future-of-open-source-table-formats-iceberg-and-lance
Jack Ye
the-case-for-random-access-i-o
LanceDB
series-a-funding
Chang She
semanticdotart
Ayush Chaurasia
second-dinners-secret-weapon-lancedb-powered-rag-for-faster-smarter-game-development
Qian Zhu
search-within-an-image-331b54e4285e
Kaushal Choudhary
scalable-computer-vision-with-lancedb-voxel51-d8b65066d5f6
LanceDB
rethinking-table-file-paths-lance-multi-base-layout
Jack Ye
rag-isnt-one-size-fits-all
Leonard Marcq
python-package-to-convert-image-datasets-to-lance-type
Vipul Maheshwari
one-million-iops
Weston Pace
november-feature-roundup
Will Jones
newsletter-september-2025
Jasmine Wang
newsletter-october-2025
Jasmine Wang
newsletter-november-2025
ChanChan Mao
newsletter-june-2025
David Myriel
newsletter-july-2025
Jasmine Wang
newsletter-january-2026
ChanChan Mao
newsletter-february-2026
ChanChan Mao
newsletter-december-2025
ChanChan Mao
newsletter-august-2025
Jasmine Wang
my-summer-internship-experience-at-lancedb-2
Raunak Sinha
my-simd-is-faster-than-yours-fb2989bf25e7
LanceDB
multimodal-myntra-fashion-search-engine-using-lancedb
LanceDB
multimodal-lakehouse
David Myriel
multi-document-agentic-rag-a-walkthrough
Vipul Maheshwari
modified-rag-parent-document-bigger-chunk-retriever-62b3d1e79bc6
Mahesh Deshwal
memgpt-os-inspired-llms-that-manage-their-own-memory-793d6eed417e
Ayush Chaurasia
late-interaction-efficient-multi-modal-retrievers-need-more-than-just-a-vector-index
Ayush Chaurasia
lancedb-x-continue
LanceDB
lance-x-huggingface-a-new-era-of-sharing-multimodal-data
Prashanth Rao
Quentin Lhoest
Xuanwo
Ayush Chaurasia
lance-x-duckdb-sql-retrieval-on-the-multimodal-lakehouse-format
Xuanwo
lance-windows-windows-lance
Chang She
lance-v2
Weston Pace
lance-namespace-lancedb-and-ray
Jack Ye
lance-file-2-1-stable
Weston Pace
lance-file-2-1-smaller-and-simpler
Weston Pace
lance-data-viewer
Gordon Murray
lance-community-governance
Jack Ye
introducing-lance-namespace-spark-integration
Jack Ye
implementing-corrective-rag-in-the-easiest-way-2
LanceDB
hybrid-search-rag-for-real-life-production-grade-applications-e1e727b3965a
Mahesh Deshwal
hybrid-search-combining-bm25-and-semantic-search-for-better-results-with-lan-1358038fe7e6
LanceDB
hybrid-search-and-custom-reranking-with-lancedb-4c10a6a3447e
LanceDB
how-to-reduce-hallucinations-from-llm-powered-agents-using-long-term-memory-72f262c3cc1f
Tevin Wang
guide-to-use-contextual-retrieval-and-prompt-caching-with-lancedb
LanceDB
grpo-understanding-and-fine-tuning-the-next-gen-reasoning-model-2
Mahesh Deshwal
graphrag-hierarchical-approach-to-retrieval-augmented-generation
Akash Desai
gpu-accelerated-indexing-in-lancedb-27558fa7eee5
LanceDB
geo-support
Jack Ye
geneva-twelvelabs
David Myriel
geneva-feature-engineering
Jonathan Hsieh
from-bi-to-ai-lance-and-iceberg
Jack Ye
Prashanth Rao
fluss-integration
Wayne Wang
file-readers-in-depth-parallelism-without-row-groups
Weston Pace
feature-rabitq-quantization
David Myriel
Yang Cen
feature-full-text-search
David Myriel
enhance-rag-integrate-contextual-compression-and-filtering-for-precision-a29d4a810301
Kaushal Choudhary
effortlessly-loading-and-processing-images-with-lance-a-code-walkthrough
LanceDB
designing-a-table-format-for-ml-workloads
Weston Pace
custom-dataset-for-llm-training-using-lance
LanceDB
creating-a-fintech-agent
Vipul Maheshwari
convert-any-image-dataset-to-lance
LanceDB
columnar-file-readers-in-depth-structural-encoding
Weston Pace
columnar-file-readers-in-depth-repetition-definition-levels
Weston Pace
columnar-file-readers-in-depth-compression-transparency
Weston Pace
columnar-file-readers-in-depth-column-shredding
Weston Pace
columnar-file-readers-in-depth-backpressure
Weston Pace
columnar-file-readers-in-depth-apis-and-fusion
Weston Pace
chunking-techniques-with-langchain-and-llamaindex
Prashant Kumar
chunking-analysis-which-is-the-right-chunking-approach-for-your-language
Shresth Shukla
chat-with-csv-excel-using-lancedb
LanceDB
case-study-netflix
David Myriel
case-study-dosu
Qian Zhu
Michael Ludden
case-study-cognee
David Myriel
Vasilije Markovic
case-study-coderabbit
Qian Zhu
building-rag-on-codebases-part-2
Sankalp Shubham
building-rag-on-codebases-part-1
Sankalp Shubham
branching-and-shallow-clone
Jack Ye
better-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f
LanceDB
benchmarking-random-access-in-lance
Chang She
benchmarking-lancedb-92b01032874a-2
LanceDB
benchmarking-cohere-reranker-with-lancedb
LanceDB
anythingllms-competitive-edge-lancedb-for-seamless-rag-and-agent-workflows
Ayush Chaurasia
announcing-lance-sdk
Weston Pace
agentic-rag-using-langgraph-building-a-simple-customer-support-autonomous-agent
LanceDB
advanced-rag-precise-zero-shot-dense-retrieval-with-hyde-0946c54dfdcb
LanceDB
accelerate-vector-search-applications-using-openvino-lancedb
LanceDB
a-primer-on-text-chunking-and-its-types-a420efc96a13
Prashant Kumar
a-practical-guide-to-training-custom-rerankers
Ayush Chaurasia
a-practical-guide-to-fine-tuning-embedding-models
Ayush Chaurasia
keep-your-data-fresh-with-cocoindex-and-lancedb
Prashanth Rao
Linghua Jin

Faster VLM Fine-Tuning With Materialized Model Features in LanceDB

June 24, 2026
Engineering

Fine-tuning a vision-language model (VLM) is often viewed as purely a modeling problem: pick a base model and fine-tuning technique (e.g., QLoRA), point it at your data, and train. But once you’ve gone through this workflow end-to-end, you realize that most of the friction turns out to live one layer below, in the data pipeline.

During training, two problems show up time and again. The first is wasted compute. Most pipelines recompute, on every training step, image embeddings that only ever needed to be computed once.

Those embeddings come out of the vision tower: the image-processing side of a VLM. It takes the image pixels, runs them through a vision encoder, and projects the result into the same hidden space the language model uses for text tokens. In this QLoRA setup, that vision tower stays frozen during fine-tuning, so it returns the exact same output for a given image on every epoch.

The second problem is data sprawl: in traditional pipelines, derived features get relegated to standalone files scattered around the dataset, drifting away from the rows they describe, making it hard to reliably reproduce experiments and to know what produced a given feature.

The cleaner shape is to keep the raw data and the features derived from it together as columns in place, which is LanceDB’s approach. Instead of trying to coordinate images, prompts, labels, embeddings, and metadata across separate files, every training example becomes a queryable row inside a Lance table, with the raw inputs and materialized features living side by side.

In this post, we’ll illustrate that fine-tuning a VLM is as much a data-management problem as it is a modeling one, and once we view it that way, it will become clear how the features of Lance (the format) and LanceDB (the platform) make this whole process simpler.

The key idea: Materialize the expensive part, once

The solution follows straight from the problem: if the vision tower returns the same embeddings every epoch, we should compute them once and store them. So we run each image through the vision tower a single time and write the result into a column on the same table, vision_tower_hiddens.

Our running example throughout is Qwen2.5-VL-3B, fine-tuned with QLoRA on TextVQA, a benchmark that asks the model to answer questions about text written inside an image. At that model’s 560px input size, each image reduces to 400 tokens at the language model’s hidden size of 2048, so the materialized column holds a fixed-size fp16[400, 2048] tensor per row.

With the embeddings materialized, the training loop never touches the vision tower. Instead of decoding an image and running the encoder on every step, it reads them straight off the table and splices them into the language model’s input at the image-token positions. We can drop the vision tower from the model entirely, leaving nothing in the loop but the language model’s own forward and backward pass.

Without materialization, every training step decodes the image and runs the frozen vision tower before the language model. With it, the loop reads precomputed embeddings from the vision_tower_hiddens column and runs only the language model.

While the idea above is straightforward, it might be challenging to implement using a traditional data stack. When precomputed features are scattered across flat files, a metadata database, and object storage, they’re hard to stream to the GPU fast enough to keep it sufficiently fed, and every new experiment slows down because the data is not in one place.

Lance and LanceDB provide three specific benefits, making this approach both fast and practical:

Benefit Enabled by Details
Creating the feature cheaply Lance format Data evolution appends vision_tower_hiddens and the tokenized text columns as new columns, with no dataset rewrite and no extra files to keep in sync.
Reading it efficiently Lance format Fixed-size list columns decode efficiently, and the Permutation API serves the shuffled, random-access reads training needs every epoch without dragging along data the batch doesn't use.
Iterating rapidly Geneva (LanceDB Enterprise) Express each feature as a plain Python UDF and backfill it across the entire corpus, scaling the same code from a one-line CPU function to a GPU pass over every image.

We’ll unpack these one at a time.

Benefit 1: Creating the feature is cheap

When operating over Parquet-based table formats, adding a new column means rewriting the dataset (or large parts of it), because of Parquet’s row-group design and how data is laid out on disk. For a large multimodal corpus where every row already stores image bytes, that’s an enormous amount of data to rewrite just to add on one derived feature column (that could be tiny in comparison to the full table).

To avoid this problem, teams typically leave the original data alone, write the new feature to separate files, and join everything back at load time. Even though the write stays cheap, there’s now a coordination problem to manage, which can slow down experimentation over time, as ad hoc scripts need to be written to manage and track what changed.

Lance is built for data that grows in two dimensions (rows and columns). It’s called “data evolution”. Appending a column writes only that column’s data plus a new version of the table’s manifest; the existing columns are never rewritten. The new feature lands in the same table as the data it describes, with versioning and provenance baked into the whole design.

This is the capability we need to materialize a feature column. We connect through LanceDB, open the table, and grow its schema with add_columns. For a column the database can compute on its own, we hand it a SQL expression and Lance materializes the values into a new column, leaving the existing data untouched:

import lancedb

db = lancedb.connect("data")          # directory holding the table
table = db.open_table("textvqa")

# a cheap derived column, computed in-database, no rewrite of the existing columns
table.add_columns({"question_length": "length(question)"})

The expensive columns grow the schema the same way: vision_tower_hiddens is just another column on this table, but the heavy lifting happens in a separate backfill step. That backfill can be applied manually by constructing PyArrow records in batches, and looping through the entire dataset, but Geneva (the feature engineering package that ships with LanceDB Enterprise) makes this a lot simpler, and we’ll get to it in Benefit 3.

The highlight is that Lance format makes the column cheap to add. Because the existing columns are never rewritten, raw data and all derived features live in one table. That table becomes a single source of truth for exploration, curation, training, and evaluation, with no custom metadata log to maintain and no second copy that drifts out of sync.

Benefit 2: Reading it back during training is cheap

Training use cases tend to impose punishing access patterns on the storage layer: shuffled batches of random rows, mixed in with sequential scans, every epoch. Lance handles these mixed workloads well because its fixed-size list columns store each row’s tensor at a known offset, so it jumps straight to the rows a batch wants and reads only those bytes.

Parquet, on the other hand, packs rows into large row groups, so a random batch forces it to decode whole row groups just to recover a handful of rows. To test the gap in practice, we converted two column groups to uncompressed Parquet and timed their sequential and shuffled reads against Lance:

Read throughput on this small multimodal table of 1,000 rows. The Parquet data is uncompressed, so this compares layout and access patterns, not a specific codec. Higher is better. The full code to reproduce this benchmark is available in the example repo.

Parquet is fastest in the one case training rarely sees: a sequential scan of the raw columns. But in the access patterns training actually uses, Lance pulls ahead. Shuffled raw batches stay fast (2,613 rows/s in Lance vs 352 in Parquet), and on the materialized fp16 vision vectors Lance is ~16x faster even sequentially (1,452 vs 90). Parquet’s fp16 shuffled read is slow enough that we skip it in this benchmark, while Lance’s holds up at 2,149 rows/s.

From the user’s perspective, this fast path is exposed through a convenient abstraction: LanceDB’s Permutation API, which lets AI engineers express shuffled, projected, random-access reads without dropping down to the format internals. We’ll put it to work when we wire up the training loop later in the post.

Benefit 3: Iterating on the feature is fast

A multimodal dataset rarely stays still for long. Feature engineering work starts with something trivial like counting OCR tokens (a CPU function that runs in seconds), then grows into something heavy like running a frozen vision tower over the whole corpus on a GPU. The cheap end is easy to script, but the expensive end is where iteration slows down, because now you’re hand-rolling batching, GPU placement, checkpointing, and retries just to compute one column.

Benefit 1 showed how the Lance format makes backfilling into new columns cheap, but something still needs to compute the values. Geneva provides that engine. You define a feature as a Python UDF, and the same abstraction works whether it’s a one-line CPU function or a stateful class that lazy-loads a model onto the GPU:

import pyarrow as pa
from geneva.transformer import udf

@udf(data_type=pa.int32(), input_columns=["ocr_tokens"])
def ocr_token_count(ocr_tokens: list[str] | None) -> int:    # Tier 1: CPU, runs in seconds
    return len(ocr_tokens) if ocr_tokens else 0

@udf(data_type=pa.list_(pa.float16(), 400 * 2048), input_columns=["image"])
class VisionTowerEmbedder:                                   # Tier 3: loads Qwen's ViT on GPU
    def __call__(self, image: bytes) -> list[float]:
        ...   # decode -> frozen vision tower -> fp16[400, 2048]

To materialize a new column, we register the UDF on the table and let Geneva run the backfill across the corpus, distributing the work, checkpointing progress, and scaling concurrency so we don’t have to:

import geneva

g = geneva.connect("data")
table = g.open_table("textvqa")

table.add_columns({"vision_tower_hiddens": VisionTowerEmbedder})
table.backfill("vision_tower_hiddens", concurrency=8)

Using these patterns, you can easily define transforms from the cheap Tier 1 text columns used to curate a training slice, through a Tier 2 perceptual hash for near-duplicate detection, all the way to the Tier 3 GPU embeddings the training loop reads.

Adding a new feature means writing one function and running its backfill, not standing up a new pipeline, and that’s what keeps experiment turnaround short.

How fine-tuning is done

Let’s briefly go over the fine-tuning task that’s enabled by the steps above. TextVQA is a harder variant of conventional visual question answering: the model has to answer questions about text that appears inside the image, for example:

  • The airline or brand name on a sugar packet
  • The time on a phone screen

The task expects the model to actually reason over the image and its textual contents rather than just recognize the objects in it. The base model we’ll use is Qwen2.5-VL-3B-Instruct, and our goal is to improve its performance by fine-tuning it with the QLoRA method.

A single TextVQA row contains both the raw example used for exploration and the derived columns the training loop consumes. Geneva backfills the expensive features once, then the Permutation API reads only the columns needed for each shuffled batch.

Keeping fine-tuning small with QLoRA

Fine-tuning a model this size the naive way means updating all of its billions of weights, which needs a matching amount of GPU memory. QLoRA gets around that on two fronts. LoRA (Low-Rank Adaptation) freezes the base model and trains only a small set of low-rank adapter matrices inserted into the language model’s attention projections, so we update a tiny fraction of the weights.

The “Q” (quantized LoRA) loads that frozen base in 4-bit precision, shrinking its memory footprint further. Together they bring a 3B vision-language model within reach of a single small GPU, around 5 GB of VRAM in this example.

From the table into the model

The vision tower stays frozen throughout fine-tuning, which is exactly why we could materialize its output ahead of time. So we take it out of the training model altogether and load only the 4-bit quantized language model with its LoRA adapters.

Each training step pulls a shuffled batch of the materialized columns through LanceDB’s Permutation API, the abstraction we mentioned in Benefit 2. It gives AI engineers a simple way to express shuffled, projected reads, while underneath it leans on everything the Lance format provides: fixed-size list columns fetched by offset and streamed to the GPU as Arrow batches, with no row groups to re-decode and no per-row Python overhead.

from lancedb.permutation import Permutation

perm = (
    Permutation.identity(table)
    .select_columns(["vision_tower_hiddens", "input_ids", "attention_mask", "labels"])
    .with_format("arrow")
)

This yields a stream of Arrow record batches, one per step. A small collate function turns each batch into the tensors the model expects, reshaping the flat vision_tower_hiddens back into [batch, 400, 2048] and converting the token columns into integer tensors:

import torch

def collate(batch):                      # batch: a pa.RecordBatch from the Permutation
    n = batch.num_rows
    vision = batch.column("vision_tower_hiddens").values.to_numpy(zero_copy_only=False)
    return {
        "vision_hiddens": torch.from_numpy(vision.reshape(n, 400, 2048)),   # fp16
        "input_ids":      torch.tensor(batch.column("input_ids").to_pylist()),
        "attention_mask": torch.tensor(batch.column("attention_mask").to_pylist()),
        "labels":         torch.tensor(batch.column("labels").to_pylist()),
    }

The last step is to put the vision embeddings where the model expects them. The prompt reserves one <|image_pad|> slot per vision token (400 per image), and we drop the materialized vision_tower_hiddens into exactly those positions before the language model runs:

inputs_embeds = model.get_input_embeddings()(input_ids)
mask = (input_ids == image_pad_id).unsqueeze(-1).expand_as(inputs_embeds)

# write the materialized embeddings into the image-pad slots
inputs_embeds = inputs_embeds.masked_scatter(mask, vision_hiddens.to(inputs_embeds.dtype))

With that, the model sees the image without ever running the encoder.

The code after that is just plain PyTorch: optimizer, gradient accumulation, checkpointing, and saving the QLoRA adapter.

loader = make_cached_loader(
    "/path/to/textvqa_colab_train.lance",
    batch_size=2,
    shuffle=True,
)

for batch in loader:
    batch = batch.to(device)
    loss = forward_cached(model, batch, image_pad_id)
    (loss / grad_accum).backward()
    ...

The training loss falls as the adapter learns from the cached features, and peak VRAM stays at 5.3 GB because QLoRA trains without keeping the vision tower active.

step  10/300  loss=2.6694  5.9 samples/s
step  20/300  loss=2.3133  6.1 samples/s
                 .
                 .
                 .
step 290/300  loss=0.0359  6.3 samples/s
step 300/300  loss=0.4750  6.3 samples/s
saved adapter to runs/colab_lora/lora | peak VRAM 5.3 GB

One training loop, end to end

What makes this loop “end to end” is that everything it touches lives in one Lance table: raw image blobs, CLIP embeddings, OCR metadata, and the derived vision_tower_hiddens and token columns. There’s no constellation of bespoke files, databases, and formats to stitch together, the kind that traditional training infrastructure usually accumulates over time.

Because the data sits in one place and in the right shape, wiring it into training is “boring” in the best way possible: the data loader and forward pass shown above are ordinary PyTorch code. We didn’t need to rewrite the training loop; we just pointed it at a Lance table. The data layer does the heavy lifting, so the model training code stays as it was.

In this end-to-end example, the fine-tuning run completed in ~15 minutes (on an H100 GPU) and the held-out curated validation split produced the following results:

Model TextVQA accuracy
Base Qwen2.5-VL-3B-Instruct 0.799
QLoRA fine-tuned version 0.820

Although a 2.1 pp gain is relatively modest, it serves as a proof point that the pipeline trains well end to end. Using this methodology, it’s possible to train a better base model on more data to push the gains up even further.

Takeaway: Faster training lifecycles

In this post, we demonstrated how the Lance format’s performance and the platform’s abstractions come together to compress the model training and fine-tuning lifecycle. We pre-computed features with LanceDB’s feature-engineering library, persisted them to the same Lance table as the raw data, and leaned on the performance benefits of the Lance format to keep the end-to-end loop fast.

The gains from using LanceDB aren’t purely about model performance on the task. An AI researcher can test out their ideas far more rapidly by writing out their transformation logic and letting the system handle the scale and distribution of the compute. And every one of those ideas benefits from the performance optimizations in the underlying Lance format layer, which come together in two ways: fast random access and scans on the exact kind of data training sees, and, just as importantly, cheap data evolution that lets researchers add as many new columns as they need without any full table rewrites.

If you’re an AI researcher, going from idea → implementation → validation has never been this straightforward.

To reproduce the full pipeline end to end, including the curation, exploration, and evaluation steps, dig into the resources below:

Resource What's in it
Docs walkthrough The full VLM fine-tuning tutprial, explained
Full code Runnable notebook, Geneva UDFs, dataloader, training, and evals
More LanceDB training examples Several end-to-end training examples with LanceDB

Faster VLM Fine-Tuning With Materialized Model Features in LanceDB

Prashanth Rao
Ayush Chaurasia
June 24, 2026
faster-vlm-fine-tuning-with-materialized-model-features-in-lancedb

Lance Blob V2: Late Materialization for Large Binary Data in Spark

Drew Gallardo
June 17, 2026
lance-blob-v2-late-materialization-for-large-binary-data-in-spark

Semantic Memory for Hermes Agent with LanceDB

Prashanth Rao
June 15, 2026
semantic-memory-for-hermes-agent-with-lancedb