Getting Started with LanceDB This is a minimal tutorial for Python users. In Basic Usage, we'll show you how to work with our Typescript and Rust SDKs. . 1. Install LanceDB LanceDB requires Python 3.8+ and can be installed via pip. The pandas package is optional but recommended for data manipulation. By default, you can manage data using Python lists or dictionaries. LanceDB also integrates seamlessly with popular data libraries like pyarrow, pydantic, and polars to provide flexible data handling options. pip install lancedb pandas 2. Import Libraries Import the libraries. LanceDB provides the core vector database functionality, while pandas helps with data handling. import lancedb import pandas as pd 3. Connect to LanceDB LanceDB supports both managed and local deployments. The connection uri determines where your data is stored. We recommend using LanceDB Cloud or Enterprise for production workloads as they provide a managed infrastructure, security, and automatic backups. CloudEnterpriseOpen Source Sync APIAsync API db = lancedb.connect( uri="db://your-project-slug", api_key="your-api-key", region="us-east-1" ) db = await lancedb.connect_async( uri="db://your-project-slug", api_key="your-api-key", region="us-east-1" ) For LanceDB Enterprise, set the host override to your private cloud endpoint. Sync APIAsync API host_override = os.environ.get("LANCEDB_HOST_OVERRIDE") db = lancedb.connect( uri=uri, api_key=api_key, region=region, host_override=host_override ) host_override = os.environ.get("LANCEDB_HOST_OVERRIDE") db = await lancedb.connect_async( uri=uri, api_key=api_key, region=region, host_override=host_override ) Sync APIAsync API uri = "data/sample-lancedb" db = lancedb.connect(uri) uri = "data/sample-lancedb" db = await lancedb.connect_async(uri) 4. Add Data Create a pandas DataFrame with your data. Each row must contain a vector field (list of floats) and can include additional metadata. data = pd.DataFrame([ {"id": "1", "vector": [0.9, 0.4, 0.8], "text": "knight"}, {"id": "2", "vector": [0.8, 0.5, 0.3], "text": "ranger"}, {"id": "3", "vector": [0.5, 0.9, 0.6], "text": "cleric"}, {"id": "4", "vector": [0.3, 0.8, 0.7], "text": "rogue"}, {"id": "5", "vector": [0.2, 1.0, 0.5], "text": "thief"}, ]) 5. Create a Table Create a table in the database. The table takes on the schema of your ingested data. Sync APIAsync API table = db.create_table("adventurers", data) table = await db.create_table_async("adventurers", data) 6. Vector Search Perform a vector similarity search. The query vector should have the same dimensionality as your data vectors. The search returns the most similar vectors based on euclidean distance. Our query is "warrior" - [0.8, 0.3, 0.8]. Let's find the most similar adventurer: Sync APIAsync API query_vector = [0.8, 0.3, 0.8] results = table.search(query_vector).limit(3).to_pandas() print(results) query_vector = [0.8, 0.3, 0.8] results = await table.search(query_vector).limit(3).to_pandas() print(results) 7. Results The results show the most similar vectors to your query, sorted by similarity score (distance). Lower distance means higher similarity. | id | vector | text | distance | |----|-----------------|---------|-----------| | 1 | [0.9, 0.4, 0.8] | knight | 0.02 | | 2 | [0.8, 0.5, 0.3] | ranger | 0.29 | | 3 | [0.5, 0.9, 0.6] | cleric | 0.49 | What's Next? Check out some Basic Usage tips. After that, we'll teach you how to build a small app.