Pydantic

Pydantic is a data validation library in Python. LanceDB integrates with Pydantic for schema inference, data ingestion, and query result casting. Using lancedb.pydantic.LanceModel, users can seamlessly integrate Pydantic with the rest of the LanceDB APIs.

First, import the necessary LanceDB and Pydantic modules:

python
import lancedb
from lancedb.pydantic import Vector, LanceModel

Next, define your Pydantic model by inheriting from LanceModel and specifying your fields including a vector field:

python
    class PersonModel(LanceModel):
        name: str
        age: int
        vector: Vector(2)

Set the database connection URL:

python
    url = "./example"

Now you can create a table, add data, and perform vector search operations:

python
    db = lancedb.connect(url)
    table = db.create_table("person", schema=PersonModel)
    table.add(
        [
            PersonModel(name="bob", age=1, vector=[1.0, 2.0]),
            PersonModel(name="alice", age=2, vector=[3.0, 4.0]),
        ]
    )
    assert table.count_rows() == 2
    person = table.search([0.0, 0.0]).limit(1).to_pydantic(PersonModel)
    assert person[0].name == "bob"

Vector Field

LanceDB provides a lancedb.pydantic.Vector method to define a vector Field in a Pydantic Model.

python
>>> import pydantic
>>> from lancedb.pydantic import Vector
...
>>> class MyModel(pydantic.BaseModel):
...     id: int
...     url: str
...     embeddings: Vector(768)
>>> schema = pydantic_to_schema(MyModel)
>>> assert schema == pa.schema([
...     pa.field("id", pa.int64(), False),
...     pa.field("url", pa.utf8(), False),
...     pa.field("embeddings", pa.list_(pa.float32(), 768))
... ])

This example demonstrates how LanceDB automatically converts Pydantic field types to their corresponding Apache Arrow data types. The pydantic_to_schema() function takes a Pydantic model and generates an Arrow schema where:

  • int fields become pa.int64() (64-bit integers)
  • str fields become pa.utf8() (UTF-8 encoded strings)
  • Vector(768) becomes pa.list_(pa.float32(), 768) (fixed-size list of 768 float32 values)
  • The False parameter indicates that the fields are not nullable

Type Conversion

LanceDB automatically convert Pydantic fields to Apache Arrow DataType .

Current supported type conversions:

Pydantic Field Type PyArrow Data Type
int pyarrow.int64
float pyarrow.float64
bool pyarrow.bool
str pyarrow.utf8()
list pyarrow.List
BaseModel pyarrow.Struct
Vector(n) pyarrow.FixedSizeList(float32, n)

LanceDB supports to create Apache Arrow Schema from a pydantic.BaseModel via lancedb.pydantic.pydantic_to_schema method.

python
>>> from typing import List, Optional
>>> import pydantic
>>> from lancedb.pydantic import pydantic_to_schema, Vector
>>> class FooModel(pydantic.BaseModel):
...     id: int
...     s: str
...     vec: Vector(1536)  # fixed_size_list<item: float32>[1536]
...     li: List[int]
...
>>> schema = pydantic_to_schema(FooModel)
>>> assert schema == pa.schema([
...     pa.field("id", pa.int64(), False),
...     pa.field("s", pa.utf8(), False),
...     pa.field("vec", pa.list_(pa.float32(), 1536)),
...     pa.field("li", pa.list_(pa.int64()), False),
... ])

This example shows a more complex Pydantic model with various field types and demonstrates how LanceDB handles:

  • Basic types: int and str fields
  • Vector fields: Vector(1536) creates a fixed-size list of 1536 float32 values
  • List fields: List[int] becomes a variable-length list of int64 values
  • Schema generation: The pydantic_to_schema() function automatically converts all these types to their Arrow equivalents