Pydantic
is a data validation library in Python.
LanceDB integrates with Pydantic for schema inference, data ingestion, and query result casting.
Using lancedb.pydantic.LanceModel
, users can seamlessly
integrate Pydantic with the rest of the LanceDB APIs.
First, import the necessary LanceDB and Pydantic modules:
import lancedb
from lancedb.pydantic import Vector, LanceModel
Next, define your Pydantic model by inheriting from LanceModel
and specifying your fields including a vector field:
class PersonModel(LanceModel):
name: str
age: int
vector: Vector(2)
Set the database connection URL:
url = "./example"
Now you can create a table, add data, and perform vector search operations:
db = lancedb.connect(url)
table = db.create_table("person", schema=PersonModel)
table.add(
[
PersonModel(name="bob", age=1, vector=[1.0, 2.0]),
PersonModel(name="alice", age=2, vector=[3.0, 4.0]),
]
)
assert table.count_rows() == 2
person = table.search([0.0, 0.0]).limit(1).to_pydantic(PersonModel)
assert person[0].name == "bob"
Vector Field
LanceDB provides a lancedb.pydantic.Vector
method to define a
vector Field in a Pydantic Model.
>>> import pydantic
>>> from lancedb.pydantic import Vector
...
>>> class MyModel(pydantic.BaseModel):
... id: int
... url: str
... embeddings: Vector(768)
>>> schema = pydantic_to_schema(MyModel)
>>> assert schema == pa.schema([
... pa.field("id", pa.int64(), False),
... pa.field("url", pa.utf8(), False),
... pa.field("embeddings", pa.list_(pa.float32(), 768))
... ])
This example demonstrates how LanceDB automatically converts Pydantic field types to their corresponding Apache Arrow data types. The pydantic_to_schema()
function takes a Pydantic model and generates an Arrow schema where:
int
fields becomepa.int64()
(64-bit integers)str
fields becomepa.utf8()
(UTF-8 encoded strings)Vector(768)
becomespa.list_(pa.float32(), 768)
(fixed-size list of 768 float32 values)- The
False
parameter indicates that the fields are not nullable
Type Conversion
LanceDB automatically convert Pydantic fields to Apache Arrow DataType .
Current supported type conversions:
Pydantic Field Type | PyArrow Data Type |
---|---|
int |
pyarrow.int64 |
float |
pyarrow.float64 |
bool |
pyarrow.bool |
str |
pyarrow.utf8() |
list |
pyarrow.List |
BaseModel |
pyarrow.Struct |
Vector(n) |
pyarrow.FixedSizeList(float32, n) |
LanceDB supports to create Apache Arrow Schema from a
pydantic.BaseModel
via lancedb.pydantic.pydantic_to_schema
method.
>>> from typing import List, Optional
>>> import pydantic
>>> from lancedb.pydantic import pydantic_to_schema, Vector
>>> class FooModel(pydantic.BaseModel):
... id: int
... s: str
... vec: Vector(1536) # fixed_size_list<item: float32>[1536]
... li: List[int]
...
>>> schema = pydantic_to_schema(FooModel)
>>> assert schema == pa.schema([
... pa.field("id", pa.int64(), False),
... pa.field("s", pa.utf8(), False),
... pa.field("vec", pa.list_(pa.float32(), 1536)),
... pa.field("li", pa.list_(pa.int64()), False),
... ])
This example shows a more complex Pydantic model with various field types and demonstrates how LanceDB handles:
- Basic types:
int
andstr
fields - Vector fields:
Vector(1536)
creates a fixed-size list of 1536 float32 values - List fields:
List[int]
becomes a variable-length list of int64 values - Schema generation: The
pydantic_to_schema()
function automatically converts all these types to their Arrow equivalents