# üç≥ Multimodal Recipe Agent with LanceDB and PydanticAI

In this tutorial, you'll build an intelligent AI agent that can understand both text and images to help users discover recipes. The agent uses LanceDB for multimodal data storage and PydanticAI for intelligent reasoning.

## What You'll Learn

- How to build AI agents with multimodal capabilities
- Using LanceDB for efficient vector storage and retrieval
- Creating custom tools for PydanticAI agents
- Building conversational interfaces with Streamlit
- Handling both text and image inputs in a single agent

## Prerequisites

This tutorial assumes you have:
- Python 3.8+ installed
- Basic understanding of vector databases
- Familiarity with AI/ML concepts (helpful but not required)

Let's get started!


## 1. Setup and Installation

First, let's install the required dependencies:


In [None]:
# Install required packages
%pip install lancedb pydantic-ai sentence-transformers transformers pillow streamlit pandas numpy tqdm python-dotenv logfire


## 2. Data Preparation

For this tutorial, we'll use a recipe dataset with both text and images. Let's start by setting up our data directory and downloading a sample dataset:


In [None]:
import os
import pandas as pd
from pathlib import Path
import requests
import zipfile

# Create data directory
data_dir = Path("data")
data_dir.mkdir(exist_ok=True)

print("üìÅ Data directory created")


In [None]:
# For this tutorial, we'll create a sample dataset
# In practice, you would load your actual recipe data
sample_recipes = [
    {
        "id": "recipe_1",
        "title": "Classic Spaghetti Carbonara",
        "ingredients": ["pasta", "eggs", "pancetta", "parmesan", "black pepper"],
        "instructions": "Cook pasta according to package directions. In a bowl, whisk eggs with parmesan. Cook pancetta until crispy. Toss hot pasta with pancetta, then with egg mixture. Serve immediately.",
        "image_name": "carbonara.jpg"
    },
    {
        "id": "recipe_2",
        "title": "Chocolate Chip Cookies",
        "ingredients": ["flour", "butter", "sugar", "chocolate chips", "vanilla", "baking soda"],
        "instructions": "Preheat oven to 375¬∞F. Mix dry ingredients. Cream butter and sugars. Add eggs and vanilla. Combine wet and dry ingredients. Fold in chocolate chips. Bake 9-11 minutes.",
        "image_name": "cookies.jpg"
    },
    {
        "id": "recipe_3",
        "title": "Grilled Salmon with Herbs",
        "ingredients": ["salmon fillets", "olive oil", "dill", "lemon", "garlic", "salt", "pepper"],
        "instructions": "Preheat grill. Season salmon with salt, pepper, and herbs. Brush with olive oil. Grill 4-5 minutes per side. Serve with lemon wedges.",
        "image_name": "salmon.jpg"
    }
]

# Create sample CSV
df = pd.DataFrame(sample_recipes)
df.to_csv("data/recipes.csv", index=False)

print(f"‚úÖ Created sample dataset with {len(sample_recipes)} recipes")
print("\nSample recipes:")
for recipe in sample_recipes:
    print(f"- {recipe['title']}")


## 3. Setting Up LanceDB

Now let's set up LanceDB to store our recipe data with both text and image embeddings:


In [None]:
import lancedb
import numpy as np
import torch
from sentence_transformers import SentenceTransformer
from transformers import CLIPModel, CLIPProcessor
from PIL import Image
import io
import base64

# Configuration
LANCEDB_PATH = "data/recipes.lance"
TEXT_MODEL = "all-MiniLM-L6-v2"
IMAGE_MODEL = "openai/clip-vit-base-patch32"

print("üîß Setting up models and database...")

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load models
text_model = SentenceTransformer(TEXT_MODEL)
text_model.to(device)

image_model = CLIPModel.from_pretrained(IMAGE_MODEL)
image_processor = CLIPProcessor.from_pretrained(IMAGE_MODEL)
image_model.to(device)

print("‚úÖ Models loaded successfully")


In [None]:
# Connect to LanceDB
db = lancedb.connect(LANCEDB_PATH)

# Create sample image data (in practice, you'd load actual images)
def create_sample_image(width=224, height=224, color=(100, 150, 200)):
    """Create a sample image for demonstration"""
    image = Image.new('RGB', (width, height), color)
    return image

# Process recipes and create embeddings
recipes_data = []

for i, recipe in enumerate(sample_recipes):
    # Create text embedding
    text_content = f"{recipe['title']} {' '.join(recipe['ingredients'])} {recipe['instructions']}"
    text_embedding = text_model.encode([text_content], convert_to_tensor=True, device=device)
    text_vector = text_embedding.cpu().numpy().flatten()
    
    # Create sample image and embedding
    sample_image = create_sample_image()
    
    # Convert image to bytes for storage
    img_buffer = io.BytesIO()
    sample_image.save(img_buffer, format='JPEG')
    image_binary = img_buffer.getvalue()
    
    # Create image embedding
    inputs = image_processor(images=sample_image, return_tensors="pt", padding=True)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        image_features = image_model.get_image_features(**inputs)
        image_features = image_features / image_features.norm(dim=-1, keepdim=True)
    
    image_vector = image_features.cpu().numpy().flatten()
    
    # Prepare recipe data
    recipe_data = {
        "id": recipe["id"],
        "title": recipe["title"],
        "ingredients": recipe["ingredients"],
        "instructions": recipe["instructions"],
        "image_name": recipe["image_name"],
        "text_embedding": text_vector,
        "image_embedding": image_vector,
        "image_binary": image_binary,
        "num_ingredients": len(recipe["ingredients"])
    }
    
    recipes_data.append(recipe_data)

print(f"‚úÖ Processed {len(recipes_data)} recipes with embeddings")


In [None]:
# Create LanceDB table
if "recipes" in db.table_names():
    db.drop_table("recipes")

table = db.create_table("recipes", recipes_data)
print("‚úÖ LanceDB table created successfully")
print(f"Table schema: {table.schema}")


## 4. Building the AI Agent

Now let's create our PydanticAI agent with custom tools for recipe search:


In [None]:
from pydantic_ai import Agent
from typing import List, Dict, Any, Optional
import base64

class RecipeSearchTools:
    """Tools for the PydanticAI agent"""
    
    def __init__(self, db_path: str, text_model, image_model, image_processor, device):
        self.db = lancedb.connect(db_path)
        self.table = self.db.open_table("recipes")
        self.text_model = text_model
        self.image_model = image_model
        self.image_processor = image_processor
        self.device = device
    
    def _safe_convert(self, value):
        """Safely convert numpy types to Python types for JSON serialization"""
        import numpy as np
        
        if isinstance(value, np.ndarray):
            if value.size == 1:
                return value.item()
            else:
                return value.tolist()
        elif hasattr(value, "item") and hasattr(value, "size") and value.size == 1:
            return value.item()
        elif hasattr(value, "tolist"):
            return value.tolist()
        elif isinstance(value, (list, tuple)):
            return [self._safe_convert(item) for item in value]
        else:
            return value
    
    def search_recipes_by_text(self, query: str, limit: int = 5) -> List[Dict[str, Any]]:
        """Search recipes by text query"""
        # Generate query embedding
        query_embedding = self.text_model.encode(
            [query], convert_to_tensor=True, device=self.device
        )
        query_vector = query_embedding.cpu().numpy().flatten()
        
        # Search in LanceDB
        results = (
            self.table.search(query_vector, vector_column_name="text_embedding")
            .limit(limit)
            .to_pandas()
        )
        
        # Convert to list of dicts
        recipes = []
        for _, row in results.iterrows():
            recipe = {
                "id": self._safe_convert(row["id"]),
                "title": self._safe_convert(row["title"]),
                "ingredients": self._safe_convert(row["ingredients"]),
                "instructions": self._safe_convert(row["instructions"]),
                "num_ingredients": self._safe_convert(row["num_ingredients"]),
                "score": self._safe_convert(row.get("_distance", 0)),
            }
            recipes.append(recipe)
        
        return recipes
    
    def get_available_ingredients(self) -> List[str]:
        """Get all unique ingredients in the dataset"""
        try:
            results = self.table.search().select(["ingredients"]).to_pandas()
            all_ingredients = set()
            
            for _, row in results.iterrows():
                if row["ingredients"]:
                    all_ingredients.update(row["ingredients"])
            
            return sorted(list(all_ingredients))
        except Exception as e:
            print(f"Error getting ingredients: {e}")
            return []
    
    def get_recipes_with_images(self, query: str, limit: int = 5) -> str:
        """Search recipes and return formatted response"""
        try:
            recipes = self.search_recipes_by_text(query, limit)
            
            if not recipes:
                return "No recipes found matching your query."
            
            response_parts = []
            response_parts.append(f"Here are {len(recipes)} recipes that match your query:\\n")
            
            for recipe in recipes:
                response_parts.append(f"## {recipe['title']}")
                response_parts.append(f"**Ingredients:** {recipe['num_ingredients']} ingredients")
                
                # Add ingredients list
                if recipe.get("ingredients"):
                    ingredients_text = ", ".join(recipe["ingredients"][:5])
                    if len(recipe["ingredients"]) > 5:
                        ingredients_text += "..."
                    response_parts.append(f"*{ingredients_text}*")
                
                # Add instructions preview
                if recipe.get("instructions"):
                    instructions = recipe["instructions"]
                    if len(instructions) > 200:
                        instructions = instructions[:200] + "..."
                    response_parts.append(f"**Instructions:** {instructions}")
                
                response_parts.append("---\\n")
            
            return "\\n".join(response_parts)
        
        except Exception as e:
            return f"Error searching recipes: {str(e)}"

print("‚úÖ Recipe search tools defined")


In [None]:
# Initialize tools
tools_instance = RecipeSearchTools(
    LANCEDB_PATH, text_model, image_model, image_processor, device
)

# Create PydanticAI agent
agent = Agent(
    "gpt-4o-mini",
    tools=[
        tools_instance.get_recipes_with_images,
        tools_instance.search_recipes_by_text,
        tools_instance.get_available_ingredients,
    ],
    system_prompt="""You are a helpful recipe assistant. You can search for recipes by text.

CRITICAL RULES - FOLLOW THESE EXACTLY:
1. ALWAYS use the provided tools to search for recipes - NEVER generate recipe responses manually
2. For ANY text-based recipe search request, use get_recipes_with_images tool
3. These tools automatically format recipes with proper markdown
4. DO NOT generate your own recipe responses - always use the tools

Be helpful and provide detailed recipe information with proper markdown formatting.""",
)

print("‚úÖ PydanticAI agent created successfully")


## 5. Testing the Agent

Let's test our agent with some sample queries:


In [None]:
import asyncio # Import asyncio to run the async agent

# Test the agent with different queries
query = "Find me some pasta recipes"
print(f"\nüîç Query: {query}")
print("-" * 50)

# In Colab, an asyncio event loop is already running.
# We can directly await the async method of the agent.
if 'agent' in locals(): # Check if agent was successfully created
    result = await agent.run(query)
    # Print the entire result object to inspect its structure
    print(result.output)
else:
    print("Agent could not be initialized. Please ensure your OpenAI API key is set in Colab secrets.")


## 6. Summary and Next Steps

Congratulations! You've built a complete multimodal recipe agent with the following features:

### What You've Accomplished

1. **Multimodal Data Storage**: Used LanceDB to store both text and image embeddings
2. **AI Agent Development**: Created a PydanticAI agent with custom tools
3. **Semantic Search**: Implemented text-based recipe search using vector similarity
4. **Production Features**: Added proper error handling and data conversion

### Key Technologies Used

- **LanceDB**: Multimodal vector database for efficient storage and retrieval
- **PydanticAI**: Modern AI agent framework with type safety
- **Sentence Transformers**: Text embeddings for semantic search
- **CLIP**: Vision-language model for image understanding

### Next Steps

1. **Add Image Search**: Implement the image search functionality
2. **Scale Up**: Use a larger recipe dataset
3. **Deploy**: Deploy your agent to a cloud platform
4. **Enhance UI**: Add more interactive features
5. **Add More Tools**: Extend the agent with additional capabilities

### Running Your Agent

To run your complete recipe agent, you can create a simple script:

```python
# Simple test script
result = agent.run_sync("Find me some dessert recipes")
print(result.data)
```

Your agent is now ready to help users discover recipes through natural language conversations!
