stable-worldmodel-a-high-performance-platform-for-reproducible-world-model-research
Ayush Chaurasia
Quentin Lhoest
Lucas Maes
Quentin Le Lidec
reproducible-data-curation-in-the-multimodal-lakehouse
Prashanth Rao
newsletter-may-2026
ChanChan Mao
newsletter-april-2026
ChanChan Mao
how-lancedb-accelerates-vector-search-at-10-billion-scale
Yang Cen
opensearch-vs-lancedb-for-vector-search-query-cost-and-infrastructure
Justin Miller
volcano-engine-autonomous-driving-data-lake-solution
Kejian Ju
unifying-the-av-ml-stack-lancedb
Ayush Chaurasia
lance-json-support-why-you-might-not-really-need-variant
Jack Ye
building-a-storage-format-for-the-next-era-of-biology
Pavan Ramkumar
newsletter-march-2026
ChanChan Mao
smart-parsing-meets-sharp-retrieval-combining-liteparse-and-lancedb
Clelia Astra Bertelli
Prashanth Rao
lance-format-v2-2-benchmarks-half-the-storage-none-of-the-slowdown
Xuanwo
make-your-sql-workflows-multimodal-with-lancedb-x-duckdb
Prashanth Rao
agentic-coding-as-community-stewardship
Xuanwo
what-we-mean-by-multimodal
Prashanth Rao
ai-native-development-local-continue-lancedb
Ty Dunn
lance-file-format-2-2-taming-complex-data
Xuanwo
lance-blob-v2
Xuanwo
Jack Ye
openclaw-lancedb-memory-layer
Xuanwo
Prashanth Rao
openclaw-lancedb-seed2
LanceDB
openclaw-memory-from-zero-to-lancedb-pro
Prashanth Rao
upload-lance-datasets-to-hf-hub
Prashanth Rao
zero-shot-image-classification-with-vector-search
Vipul Maheshwari
werides-data-platform-transformation-how-lancedb-fuels-model-development-velocity
Qian Zhu
Fei Chen
training-a-variational-autoencoder-from-scratch-with-the-lance-file-format
LanceDB
track-ai-trends-crewai-agents-rag
LanceDB
tokens-per-second-is-not-all-you-need
Mingran Wang
Tan Li
the-future-of-open-source-table-formats-iceberg-and-lance
Jack Ye
the-case-for-random-access-i-o
LanceDB
series-a-funding
Chang She
semanticdotart
Ayush Chaurasia
second-dinners-secret-weapon-lancedb-powered-rag-for-faster-smarter-game-development
Qian Zhu
search-within-an-image-331b54e4285e
Kaushal Choudhary
scalable-computer-vision-with-lancedb-voxel51-d8b65066d5f6
LanceDB
rethinking-table-file-paths-lance-multi-base-layout
Jack Ye
rag-isnt-one-size-fits-all
Leonard Marcq
python-package-to-convert-image-datasets-to-lance-type
Vipul Maheshwari
one-million-iops
Weston Pace
november-feature-roundup
Will Jones
newsletter-september-2025
Jasmine Wang
newsletter-october-2025
Jasmine Wang
newsletter-november-2025
ChanChan Mao
newsletter-june-2025
David Myriel
newsletter-july-2025
Jasmine Wang
newsletter-january-2026
ChanChan Mao
newsletter-february-2026
ChanChan Mao
newsletter-december-2025
ChanChan Mao
newsletter-august-2025
Jasmine Wang
my-summer-internship-experience-at-lancedb-2
Raunak Sinha
my-simd-is-faster-than-yours-fb2989bf25e7
LanceDB
multimodal-myntra-fashion-search-engine-using-lancedb
LanceDB
multimodal-lakehouse
David Myriel
multi-document-agentic-rag-a-walkthrough
Vipul Maheshwari
modified-rag-parent-document-bigger-chunk-retriever-62b3d1e79bc6
Mahesh Deshwal
memgpt-os-inspired-llms-that-manage-their-own-memory-793d6eed417e
Ayush Chaurasia
late-interaction-efficient-multi-modal-retrievers-need-more-than-just-a-vector-index
Ayush Chaurasia
lancedb-x-continue
LanceDB
lance-x-huggingface-a-new-era-of-sharing-multimodal-data
Prashanth Rao
Quentin Lhoest
Xuanwo
Ayush Chaurasia
lance-x-duckdb-sql-retrieval-on-the-multimodal-lakehouse-format
Xuanwo
lance-windows-windows-lance
Chang She
lance-v2
Weston Pace
lance-namespace-lancedb-and-ray
Jack Ye
lance-file-2-1-stable
Weston Pace
lance-file-2-1-smaller-and-simpler
Weston Pace
lance-data-viewer
Gordon Murray
lance-community-governance
Jack Ye
introducing-lance-namespace-spark-integration
Jack Ye
implementing-corrective-rag-in-the-easiest-way-2
LanceDB
hybrid-search-rag-for-real-life-production-grade-applications-e1e727b3965a
Mahesh Deshwal
hybrid-search-combining-bm25-and-semantic-search-for-better-results-with-lan-1358038fe7e6
LanceDB
hybrid-search-and-custom-reranking-with-lancedb-4c10a6a3447e
LanceDB
how-to-reduce-hallucinations-from-llm-powered-agents-using-long-term-memory-72f262c3cc1f
Tevin Wang
guide-to-use-contextual-retrieval-and-prompt-caching-with-lancedb
LanceDB
grpo-understanding-and-fine-tuning-the-next-gen-reasoning-model-2
Mahesh Deshwal
graphrag-hierarchical-approach-to-retrieval-augmented-generation
Akash Desai
gpu-accelerated-indexing-in-lancedb-27558fa7eee5
LanceDB
geo-support
Jack Ye
geneva-twelvelabs
David Myriel
geneva-feature-engineering
Jonathan Hsieh
from-bi-to-ai-lance-and-iceberg
Jack Ye
Prashanth Rao
fluss-integration
Wayne Wang
file-readers-in-depth-parallelism-without-row-groups
Weston Pace
feature-rabitq-quantization
David Myriel
Yang Cen
feature-full-text-search
David Myriel
enhance-rag-integrate-contextual-compression-and-filtering-for-precision-a29d4a810301
Kaushal Choudhary
effortlessly-loading-and-processing-images-with-lance-a-code-walkthrough
LanceDB
designing-a-table-format-for-ml-workloads
Weston Pace
custom-dataset-for-llm-training-using-lance
LanceDB
creating-a-fintech-agent
Vipul Maheshwari
convert-any-image-dataset-to-lance
LanceDB
columnar-file-readers-in-depth-structural-encoding
Weston Pace
columnar-file-readers-in-depth-repetition-definition-levels
Weston Pace
columnar-file-readers-in-depth-compression-transparency
Weston Pace
columnar-file-readers-in-depth-column-shredding
Weston Pace
columnar-file-readers-in-depth-backpressure
Weston Pace
columnar-file-readers-in-depth-apis-and-fusion
Weston Pace
chunking-techniques-with-langchain-and-llamaindex
Prashant Kumar
chunking-analysis-which-is-the-right-chunking-approach-for-your-language
Shresth Shukla
chat-with-csv-excel-using-lancedb
LanceDB
case-study-netflix
David Myriel
case-study-dosu
Qian Zhu
Michael Ludden
case-study-cognee
David Myriel
Vasilije Markovic
case-study-coderabbit
Qian Zhu
building-rag-on-codebases-part-2
Sankalp Shubham
building-rag-on-codebases-part-1
Sankalp Shubham
branching-and-shallow-clone
Jack Ye
better-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f
LanceDB
benchmarking-random-access-in-lance
Chang She
benchmarking-lancedb-92b01032874a-2
LanceDB
benchmarking-cohere-reranker-with-lancedb
LanceDB
anythingllms-competitive-edge-lancedb-for-seamless-rag-and-agent-workflows
Ayush Chaurasia
announcing-lance-sdk
Weston Pace
agentic-rag-using-langgraph-building-a-simple-customer-support-autonomous-agent
LanceDB
advanced-rag-precise-zero-shot-dense-retrieval-with-hyde-0946c54dfdcb
LanceDB
accelerate-vector-search-applications-using-openvino-lancedb
LanceDB
a-primer-on-text-chunking-and-its-types-a420efc96a13
Prashant Kumar
a-practical-guide-to-training-custom-rerankers
Ayush Chaurasia
a-practical-guide-to-fine-tuning-embedding-models
Ayush Chaurasia
keep-your-data-fresh-with-cocoindex-and-lancedb
Prashanth Rao
Linghua Jin

SemanticDotArt: Rethinking Art Discovery with LanceDB

October 16, 2025
Applications

Try it out →

In an age of infinite scroll, we can browse more art than any real-world gallery could hold–yet finding a piece that feels right could still take minutes, or hours. SemanticDotArt began with a hunch: meaning lives not in pixels or tags, but in the mood of a painting, the rhythm of a brushstroke, or the metaphors that tie them together. From that intuition grew a multimodal retrieval system we built, with LanceDB at its core. This post traces the journey of how words meet images, and how we taught search to feel a little more human.

The Vision

Art discovery often begins with a mood, rather than a checklist of traits. We imagine what we’re looking for through thoughts like: “find me something restless but hopeful,” “show me a painting that feels like a quiet storm.” SemanticDotArt is built for that kind of language. It lets you search not just with literal phrases, but with poetry, prose, or the emotions they stir in you. Sometimes, how you ask is part of what you’re looking for.

LanceDB table overview

We wanted:

  • A unified art corpus that continually grows across museums, marketplaces, and open archives.
  • Metadata which captures both literal content and emotional subtext.
  • A retrieval engine that can handle large amounts of text and images, and knows when a query is poetic, prosaic, artistic or literal.
  • An interface that feels exploratory, rather than transactional.

LanceDB is used as the multimodal foundation for this system, because it offers the following key features:

  • On disk index which allow building large scale multi-feature multi-index retrieval system
  • First-class hybrid & full-text search support, fast SQL-style filtering
  • Built-in support for various multimodal embedding models, and hooks for creating custom rerankers.
  • Being truly multimodal, it doubles as an object store, every artwork’s vectors, text, and JPEG bytes live in the same source table, preventing chaos as we keep adding new features.

How this works

As with any other retrieval system, the workflow is broken down into two main parts: ingestion and querying.

Multi-representation

The guiding principle is simple: we create multiple representation of the artworks so the we have different ways of looking at it. For any given piece, we record multiple perspectives: poetic impressions, literal descriptions, mood tags, color palettes, and even stylistic fingerprints. Some become vector columns, some remain as text, and others live on as raw media. LanceDB lets us stitch all of that into a single row of a table so we can keep adding features as the dataset evolves without reindexing the world. The main idea is to offer maximum flexibility at query time, so that we can experiment with different search paths dynamically, as we’ll see in the next section.

A single painting, many views

van gogh painting

Take Van Gogh’s Path Through a Field with Willows. We keep several parallel interpretations so that whichever language a visitor uses, there is an index ready to meet it. Here are examples of some:

  • Poetic caption
A path winds on beneath a vibrant sky, where sun-warmed grasses whisper secrets. Brushstrokes dance with restless energy, quiet fields hold deep intensity, and a lonely journey is bathed in golden light. Nature breathes both calm and wild, colors sing a song of solitude, and hope lingers where the track ascends.
  • Natural caption
A path meanders under a bright sky as sun-warmed grasses softly rustle and energetic brushstrokes of light and shadow play across quiet fields. The solitary journey glows with golden light, nature around it feels tranquil yet untamed, and the colors evoke solitude while hope follows the upward slope of the path.
  • Mood keywordsnature, solitude, dream, deep, track, intensity, path, field, willows,

These ingredients seed separate full-text, vector, and keyword indexes. The corpus keeps expanding, but because the representations belong to the same row, we can add new features, such as palette embeddings, brushstroke fingerprints, provenance signals, without having to refactor our storage solution.

Ingestion workflow diagram

Semantic Routing: Matching Feelings with Features dynamically

Because each piece of artwork is over-represented in the data, retrieval turns into a choose-your-own-adventure task. A session might start with text, an image, or both. Semantic routing inspects that intent and helps us choose the dynamic search path that fits poetic vectors when the request feels lyrical, with natural-language embeddings for straightforward descriptions, and visual features when a user starts with image/pixels. Along the way, new features can be added in as they become available. When we blend or rewrite the query, mood hint keywords are used with LanceDB’s SQL-style prefilters to narrow down the search space. Finally, a custom reranker weights the results to surface pieces that echo the emotional signature of the request.

This is our rendition of classic retrieval strategies like query understanding, query rewriting, and multi-index routing. The agent classifies how the visitor is describing the art, rephrases the prompt so it aligns with the chosen representation, and finally selects which LanceDB indices to use. Every new representation we add becomes another branch the router can learn to take. From there, we can switch tactics on the fly depending on the query type and the column that seems most relevant.

With LanceDB, a complex hybrid search with prefiltering and reranking looks is simply:

results = (
    table.search(query_type="hybrid", vector_column_name="poetic_vector")
         .vector(query_embedding)
         .text(query_text)
         .where(prefilter_clause, prefilter=True)
         .limit(10)
         .rerank(CustomKeywordRanker(keywords))
)
Semantic routing paths

Here’s how one of those pathways can unfold when someone uploads an image and adds a short poem:

  1. Classify the intent – Label the request as poetic, natural, or another style so downstream steps know which feature column to favor.
  2. Caption the image – If pixels are present, synthesize a caption in the same style as the intent so image and text signals travel together.
  3. Rewrite the prompt – Blend the visitor’s words and the generated caption into a single rewritten query that preserves both mood and literal anchors.
  4. Extract mood keywords – Pull out a keyword set that reflects the emotional signature sitting inside the rewritten query.
  5. Prefilter via SQL support – Apply a LanceDB filter using those keywords so the search space collapses to artworks that share at least one mood.
  6. Choose search technique – Switch between full-text, vector, or hybrid search depending on length and style – we over-fetch by roughly 2× so the reranker has room to maneuver.
  7. Rerank – Score the candidates with a custom keyword ranker that weights overlap between query and artwork moods, then surface the pieces that best echo the request.

The reranker leans on a weighted blend of recall and precision over the keyword sets:

$$ \text{score} = w \cdot \left( 0.7 \cdot \frac{\text{matches}}{|B|} + 0.3 \cdot \frac{\text{matches}}{|A|} \right) $$

Where matches is the overlap count between the artwork keyword set (A) and the query keyword set (B), so the recall term is represented by:

$$ \frac{|A \cap B|}{|B|} $$

and the precision term by:

$$ \frac{|A \cap B|}{|A|} $$

This keeps the responses feeling both relevant and surprising without drifting into uncanny matches. LanceDB supports custom rerankers natively, so we can plug in new ranking strategies as the dataset and features evolve.

Conclusion

AI-generated images are everywhere, but the thrill of discovery still belongs to human-made art. SemanticDotArt uses AI as a bridge: drop in an image a model just imagined, or a photo you took, and it will lead you to the paintings and sculptures shaped by people who felt that idea before you did. Whether you search with a poetic cue like “a quiet optimism painted in the sky” or with a literal description, the path ends in the same place: human creations that echo your feeling.

A quiet optimism

Try SemanticDotArt →

Tools used

  • LanceDB – Core multimodal store and retrieval engine: vectors, captions, mood keywords, and original JPEGs share one table, with hybrid search and custom rerankers stacking on top.
  • Google Gemini – Multimodal model powering poetic rewrites, intent classification, and on-demand captioning to keep text and image evidence aligned.
  • Modal Labs – Managed the backend services and high-throughput batch ingestion pipelines without building new infrastructure.

Credits

This project was created by Bryan Bischof, Ayush Chaurasia, and Chang She. Kelly Chong and Pavitar Saini designed and implemented the UI. Adam Conway and Mischa Lamoureux assisted with the website.

Stable-Worldmodel: A High Performance Platform for Reproducible World Model Research

Ayush Chaurasia
Quentin Lhoest
Lucas Maes
Quentin Le Lidec
June 2, 2026
stable-worldmodel-a-high-performance-platform-for-reproducible-world-model-research

🌍 Lance-Backed World Model Platform, 🦆 Multimodal SQL with Lance DuckDB Extension, 💰 LanceDB vs OpenSearch Cost Breakdown

ChanChan Mao
May 28, 2026
newsletter-may-2026

Reproducible Data Curation In The Multimodal Lakehouse

Prashanth Rao
May 29, 2026
reproducible-data-curation-in-the-multimodal-lakehouse