Open source LanceDB already has a reputation for handling large-scale vector search well. On the surface, things look simple with a familiar table-and-query interface, but underneath runs a serious stack of indexing machinery: IVF-style partitioning, Product Quantization (PQ), RaBitQ quantization, full-precision reranking, and Lance’s columnar storage format for fast random access.
That stack is already enough for a wide range of production workloads. PQ shrinks the amount of vector data each query has to scan inside a partition. RaBitQ goes further for high-dimensional embeddings, turning residual vectors into compact binary codes and adding small corrective terms to preserve recall. On large datasets, these quantizers are the difference between scanning raw floating-point vectors and scanning a much smaller representation that still keeps the right candidates within reach.
But “large” keeps moving. A 100M-vector table and a 10B-vector table aren’t the same system problem. At 10B scale, parts of the query path that used to look negligible start showing up in profiles. Index build time, partition routing, per-query CPU work, and executor utilization all become first-order concerns.
Where Open Source Indexing Starts to Strain
Open source LanceDB follows the same high-level query plan that most IVF-based ANN systems use:
- Train centroids and assign vectors to partitions.
- Search the nearest partitions for a query vector.
- Run compressed-vector search inside those partitions.
- Rerank a candidate set using the original vectors.
This works because it turns one giant nearest-neighbor problem into a small routing problem plus several local searches. The catch is, when the dataset grows, both sides of that split grow with it.
First, index creation takes longer. We have to train partition centroids, encode billions of vectors, write index files, and maintain enough metadata to keep the result queryable. Even when every individual step is efficient, the wall-clock build time gets painful once it all runs as one large job.
Second, partition sizing becomes a balancing act. Keep partitions too large, and each query does too much work after routing. Shrink them, and the number of partitions grows, which makes the first query step (finding the closest nprobes partitions) more expensive. At 10B scale, we tend to want both: enough partitions to keep local scans small, and fast routing so the query doesn’t spend too much time deciding where to search.
The practical bottleneck isn’t one algorithm in isolation. It’s the interaction between data scale, partition count, dimensionality, and how much parallel compute the system can actually use. This is where LanceDB Enterprise picks up: it takes the same logical query plan and distributes it across a cluster, so index construction and query execution can each scale on their own.
Splitting the Index into Segments
LanceDB Enterprise’s distributed indexing path starts by splitting the large table into index segments. Instead of treating a 10B-vector index as one monolithic artifact, we build multiple segment-level indexes, each covering a disjoint slice of the table.
That changes the operational shape of indexing. Each segment can be assigned to its own indexing worker, and many segments build in parallel across the cluster. The final table still exposes one logical index to the user, but the physical work is no longer bound to a single machine.
The important part is that this isn’t just “more threads.” It’s coarse-grained parallelism at the index-artifact level. The scheduler can launch independent segment builds, retry the ones that fail, and merge or commit segment metadata once each unit finishes. For large tables, that turns index build time from “one huge job” into “many bounded jobs,” which is far easier to scale and operate.
Segmented indexing also gives us a clean unit for distributed query execution. Each segment carries its own vector index and can sit close to a Plan Executor backed by a local SSD cache. The query layer fans out work to the executors that own the relevant segments and merges their partial answers into the final top-k.
Querying Across Plan Executors
In the distributed query path, the query node acts as a coordinator. It plans the query, routes segment-level work to the Plan Executors that own the relevant segments, and merges the partial results they return.
This matters because vector search is a mix of CPU-heavy and I/O-heavy stages. Partition routing and compressed distance computation are CPU-bound. Candidate reranking and row retrieval need efficient access to stored vectors and payload columns. Spreading segment indexes across Plan Executors puts more CPU cores, more memory bandwidth, and more cache capacity to work at once.
From the user’s side, the API stays the same: a table search returning top-k. Underneath, the work is fanned out across index segments, where each segment searches its local partitions and the query node stitches the partial results back into a single ranked answer.
HNSW over Centroids
The first step of an IVF-style search is to find the nearest nprobes partitions for a query. In a small index, scanning every centroid is cheap enough to ignore. In a 10B-vector index, the centroid set itself grows large enough that this routing step becomes a real cost.
LanceDB addresses this by building an HNSW graph over the IVF centroids at the query node. Instead of comparing the query against every centroid, graph search reaches the nearest clusters in only a few hops. Lance keeps a dedicated fast path for training HNSW over centroids when the centroid matrix is large enough. This is basically a way to speed up finding the nearest clusters in the vector indexing path.
At 10B scale, this is a major win. In our large-scale experiments, HNSW centroid routing cut the cost of the “find nearest partitions” stage substantially. That speedup matters because partition routing happens before any useful local search can begin. Slow routing means every executor starts late.
This is also why we treat centroid search as its own optimization target. The compressed scan inside a partition matters, but optimizing only the second half of the query isn’t enough. At very large scale, the routing layer also has to be approximate, cache-friendly, and fast.
The implementation lives in Lance’s vector index code, including the centroid speedup utilities and IVF find_partitions path: vector utils and IVF storage .
Fast Rotation for RaBitQ
Routing is only half the query. Once HNSW has chosen the right nprobes partitions, each Plan Executor still has to scan them, comparing the query against the compressed vectors inside. At 10B scale, that scan is handled by RaBitQ, which adds its own per-query setup cost before any comparison can happen.
We touched on RaBitQ earlier; for the full mechanics, see our previous post on RaBitQ quantization. One detail that matters here is that RaBitQ applies a random orthogonal rotation to vectors during indexing, which spreads information evenly across dimensions so that very compact binary codes can still preserve distances reliably. Stored codes therefore live in that rotated space, and every query vector has to pass through the same rotation before it can be compared. The straightforward implementation does this by multiplying the query vector by a dense rotation matrix.
For a vector of dimension d, that dense matrix multiplication is O(d^2). At 768, 960, 1024, or higher dimensions, the cost piles up fast, and it has to be paid on every query.
LanceDB’s RaBitQ path adds a fast rotation mode for exactly this reason. Instead of materializing and multiplying by a dense orthogonal matrix, the fast path composes random sign flips, Fast Walsh-Hadamard Transform style mixing, and Kac-style pairwise mixing. The transform stays matrix-free, and the rotation cost drops to roughly O(d log d).
The implementation exposes two rotation modes, fast and matrix, with fast as the default. The fast path lives in Lance’s RaBitQ rotation code: bq/rotation.rs .
The performance effect scales with dimensionality. On high-dimensional embeddings, fast rotation cuts the per-query RaBitQ preparation cost noticeably. Combined with binary dot products inside partitions, it makes RaBitQ a much better fit for the 10B-scale query path, where every micro-stage gets multiplied by high QPS.
The Combined Architecture
The 10B-scale path isn’t a single trick. It’s a stack of changes that strip serial work out of every stage of the search pipeline:
Together, these changes let LanceDB keep the user-facing API simple while the internal execution model scales out. Users still issue a vector search against a table. Underneath, LanceDB distributes indexing across the cluster, routes per-segment queries to Plan Executors, uses HNSW at the query node to skip linear centroid scans, applies fast RaBitQ rotation for high-dimensional vectors, and reranks the final candidates with full-precision vectors.
The result is a vector search system that scales in both phases that matter: building the index and serving queries. Index construction runs as distributed work over bounded segments. Query execution fans out across Plan Executors. Routing, quantized search, and reranking each stay focused on the amount of data they actually need to touch.
This is the core point: LanceDB is built to keep on scaling as vector datasets grow, without changing how users query them. On large datasets, the architecture has already proven itself in production, while keeping the table-centric search experience users expect.
Further Reading
Learn more about how LanceDB Enterprise uses compute-storage separation and distributed compute in the architecture docs.



