Building and Optimizing AI Applications with DSPy and Gemini Flash 2.5
In this hands-on tutorial, we’ll explore how to leverage DSPy (Declarative Self-improving Python) with Google’s Gemini Flash 2.5 model to build powerful, reliable AI applications. Instead of spending hours crafting and tweaking prompts, you’ll learn how to program language models declaratively and optimize their performance programmatically.
Table of Contents
- Introduction to DSPy and Gemini Flash 2.5
- Setting Up Your Environment
- DSPy Core Concepts
- Connecting DSPy to Gemini Flash 2.5
- Basic DSPy Modules
- Building a Question-Answering System
- Implementing RAG Applications
- Using DSPy Assertions
- Optimizing DSPy Programs
- Leveraging Gemini’s Thinking Capabilities
- Building Complex AI Applications
- Conclusion and Next Steps
Introduction to DSPy and Gemini Flash 2.5
DSPy is a framework developed by Stanford NLP that enables developers to build AI systems using a declarative programming approach rather than tedious prompt engineering. It shifts focus from crafting perfect prompts to programming with structured, declarative modules that can be automatically optimized.
Gemini Flash 2.5 is Google’s latest fast, cost-efficient thinking model with powerful reasoning capabilities. Released in April 2025, it’s Google’s first fully hybrid reasoning model, allowing developers to toggle thinking capabilities on or off and set thinking budgets to optimize the balance between quality, cost, and latency.
By combining these technologies, we can build AI applications that are more reliable, maintainable, and efficient than traditional approaches based on prompt engineering.
Setting Up Your Environment
Let’s start by setting up our development environment with the necessary dependencies:
# Install required packages
!pip install dspy-ai google-generativeai
Code Explanation: This command installs two Python libraries:
dspy-ai
: The DSPy framework for programming language modelsgoogle-generativeai
: Google’s official library for interacting with their Gemini models
The exclamation mark (!
) at the beginning lets you run shell commands directly in a Jupyter notebook or similar environment.
Now, import the necessary libraries:
import dspy
import google.generativeai as genai
import os
from dotenv import load_dotenv
import json
Code Explanation: This section imports the necessary libraries:
dspy
: The main DSPy framework we’ll use to program AI modelsgoogle.generativeai
: Abbreviated asgenai
to make it easier to referenceos
: A standard Python library for interacting with the operating systemdotenv
: A library that loads environment variables from a.env
filejson
: A standard Python library for working with JSON data (a common format for structured data)
Set up API keys using environment variables:
# Load environment variables
load_dotenv()
# Set up Gemini API key
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
genai.configure(api_key=GEMINI_API_KEY)
Code Explanation: This code:
- Loads environment variables from a
.env
file in your project directory. This is a secure way to store sensitive information like API keys. - Retrieves the Gemini API key from these environment variables
- Configures the Google Generative AI library with this API key so it can authenticate requests to the Gemini API
DSPy Core Concepts
Before diving into code, let’s understand the key concepts that make DSPy powerful:
1. Signatures
Signatures are declarative specifications that define the input and output behavior of a module. They’re like function signatures in programming languages but for natural language tasks.
# Example signature for a question-answering task
class QuestionAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""
question = dspy.InputField()
answer = dspy.OutputField(desc="often between 1 and 5 words")
Code Explanation: This code defines a “signature” which is like a contract for what goes in and comes out of a DSPy module:
- We create a class that inherits from
dspy.Signature
- The docstring (
"""Answer questions with short factoid answers."""
) helps the model understand the task question = dspy.InputField()
defines an input called “question”answer = dspy.OutputField(...)
defines an output called “answer”- The
desc
parameter provides additional details about the expected output format (in this case, specifying that the answer should be short)
2. Modules
Modules are reusable components that implement specific strategies for invoking language models. They encapsulate prompting techniques like chain-of-thought, RAG, and more.
# Example module that uses a signature
class SimpleQA(dspy.Module):
def __init__(self):
super().__init__()
self.qa_predictor = dspy.Predict(QuestionAnswer)
def forward(self, question):
return self.qa_predictor(question=question)
Code Explanation: This code defines a custom DSPy module for question answering:
- We create a class that inherits from
dspy.Module
- In the
__init__
method, we initialize adspy.Predict
object that uses ourQuestionAnswer
signature - The
forward
method defines what happens when the module is called - it takes a question as input, passes it to the predictor, and returns the result - This structure is similar to how neural networks are defined in frameworks like PyTorch
3. Optimizers
Optimizers (previously called “teleprompters”) improve the prompts or weights in your program based on examples and metrics.
# Example optimizer usage
optimizer = dspy.MIPROv2(metric=answer_accuracy)
optimized_program = optimizer.compile(
program=my_program,
trainset=examples[:80],
valset=examples[80:100]
)
Code Explanation: This code shows how to use an optimizer to improve a DSPy program:
- We create an optimizer (
MIPROv2
) and specify a metric to optimize for - We call the
compile
method with our program and datasets - The optimizer automatically improves the program by finding better prompts or weights
- It uses the training set to find improvements and the validation set to evaluate them
- The result is an optimized version of our program that performs better on the specified metric
Connecting DSPy to Gemini Flash 2.5
To use Gemini Flash 2.5 with DSPy, we need to create a custom DSPy module that interfaces with the Gemini API. Here’s how:
class GeminiFlash(dspy.LM):
"""DSPy module for Gemini Flash 2.5"""
def __init__(self, model="gemini-2.5-flash-preview-04-17", thinking_budget=0, **kwargs):
super().__init__()
self.model = model
self.thinking_budget = thinking_budget
self.kwargs = kwargs
Code Explanation:
This code defines a new class GeminiFlash
that inherits from dspy.LM
(Language Model). Let’s break it down:
- The class is used to create a custom DSPy module for the Gemini Flash 2.5 model
__init__
is a special method in Python classes that runs when you create a new instancemodel="gemini-2.5-flash-preview-04-17"
sets a default value for the model parameterthinking_budget=0
specifies how much “thinking” the model should do (more on this later)**kwargs
collects any additional parameters you might want to passsuper().__init__()
calls the initialization method of the parent class
def basic_request(self, prompt, **kwargs):
"""Send a request to the Gemini API"""
config = genai.types.GenerateContentConfig(
temperature=kwargs.get("temperature", 0.0),
max_output_tokens=kwargs.get("max_tokens", 1024),
top_p=kwargs.get("top_p", 0.95),
top_k=kwargs.get("top_k", 0)
)
Code Explanation: This method handles sending requests to the Gemini API:
basic_request
is the method DSPy will call when it needs to send a prompt to the modelprompt
is the text input that will be sent to the model- The
config
object configures how the model generates text:temperature
: Controls randomness (0.0 = most deterministic, 1.0 = most random)max_output_tokens
: Maximum length of the responsetop_p
: Controls diversity by only considering the top probability tokenstop_k
: Similar to top_p, limits the selection to the top k tokens
- The
kwargs.get()
pattern is using any values passed in, or the default value if none is provided
# Add thinking budget if specified
if self.thinking_budget > 0:
config.thinking_config = genai.types.ThinkingConfig(
thinking_budget=self.thinking_budget
)
model = genai.GenerativeModel(self.model)
response = model.generate_content(prompt, config)
return response.text
Code Explanation:
This code continues the basic_request
method:
- The first block configures Gemini’s unique “thinking” capability:
- If
thinking_budget
is greater than 0, we add aThinkingConfig
- This tells Gemini to spend time reasoning internally before responding
- Higher budgets allow for more thorough reasoning but increase cost and latency
- If
- Next, we create a
GenerativeModel
instance with our chosen model name - We call
generate_content()
with our prompt and configuration - Finally, we return just the text part of the response
# Set up Gemini Flash 2.5 as the default LM in DSPy
gemini_flash = GeminiFlash(thinking_budget=0) # Start with thinking disabled
dspy.settings.configure(lm=gemini_flash)
Code Explanation: This code:
- Creates an instance of our custom
GeminiFlash
class with thinking disabled - Configures DSPy to use this as the default language model for all modules
- This means all DSPy modules will use Gemini Flash 2.5 unless explicitly told to use a different model
Note: As of recent reports, there have been issues with connecting Gemini models directly to DSPy, and users have tried various approaches. One alternative approach is to use the OpenAI-compatible endpoint:
# Alternative approach using OpenAI compatibility layer
lm = dspy.LM(
"openai/gemini-2.5-flash",
api_key=api_key,
api_base="https://generativelanguage.googleapis.com/v1beta/openai/",
max_tokens=1000,
temperature=0.7,
)
dspy.configure(lm=lm)
Code Explanation: This is an alternative method that uses Google’s OpenAI-compatible endpoint:
- Instead of our custom class, we use the built-in
dspy.LM
class - We specify the model as “openai/gemini-2.5-flash” (a special format)
- We provide the special API base URL that supports OpenAI-compatible requests
- We set max_tokens (1000) and temperature (0.7) to control output length and randomness
- This approach takes advantage of Google’s compatibility layer to simplify integration
Basic DSPy Modules
DSPy provides several built-in modules for different prompting strategies. Let’s explore the most important ones:
Simple Predict Module
The most basic module is dspy.Predict
, which simply takes a signature and generates outputs:
# Define a simple signature
class Sentiment(dspy.Signature):
"""Determine the sentiment of a text."""
text = dspy.InputField()
sentiment = dspy.OutputField(desc="either 'positive', 'negative', or 'neutral'")
# Create a predictor
sentiment_predictor = dspy.Predict(Sentiment)
# Test it
result = sentiment_predictor(text="I absolutely loved the new movie!")
print(result.sentiment) # Output: positive
Code Explanation: This example creates a simple sentiment analyzer:
- First, we define a
Sentiment
signature with a text input and sentiment output - The sentiment output has a description specifying the expected values (positive, negative, neutral)
- We create a predictor using
dspy.Predict
and our signature - We call the predictor with a sample text (“I absolutely loved the new movie!”)
- We print the
sentiment
property of the result, which should be “positive”
Behind the scenes, DSPy:
- Creates a prompt explaining the task using the signature’s docstring and field descriptions
- Sends the prompt and user’s text to the Gemini model
- Parses the response to extract the sentiment value
- Returns a structured result with the sentiment field
Chain of Thought
dspy.ChainOfThought
elicits step-by-step reasoning before producing an answer, which improves accuracy for complex tasks:
# Define a math problem signature
class MathProblem(dspy.Signature):
"""Solve a math word problem."""
question = dspy.InputField()
answer = dspy.OutputField(desc="the numerical answer")
# Create a chain-of-thought solver
math_solver = dspy.ChainOfThought(MathProblem)
# Customize the reasoning prompt (optional)
custom_rationale = dspy.OutputField(
prefix="Reasoning: Let's solve this step by step to",
desc="${produce the answer}. We need to..."
)
custom_math_solver = dspy.ChainOfThought(MathProblem, rationale_type=custom_rationale)
# Test it
result = math_solver(question="If a train travels at 120 km/h for 2.5 hours, then slows to 90 km/h for 1.5 hours, what is the total distance traveled?")
print(result.rationale) # Shows the reasoning
print(result.answer) # Shows the final answer
Code Explanation: This code creates a math problem solver using Chain of Thought reasoning:
- We define a
MathProblem
signature with question input and answer output - We create a solver using
dspy.ChainOfThought
with our signature - We also show how to customize the reasoning prompt by creating a custom rationale field
prefix
sets the beginning of the reasoning promptdesc
provides a template for how to continue the reasoning${produce the answer}
is a template variable that will be filled in
- We test the solver with a math problem about a train
- We print both the reasoning process (
rationale
) and the final answer
The Chain of Thought approach is powerful because:
- It encourages the model to think step-by-step before answering
- It makes the reasoning process transparent and inspectable
- It often produces more accurate answers for complex problems
- It’s similar to how humans solve difficult problems
Program of Thought
dspy.ProgramOfThought
is similar to Chain of Thought but structures the reasoning as executable code:
# Define signature for a computational problem
class Computation(dspy.Signature):
"""Perform a complex calculation."""
problem = dspy.InputField()
solution = dspy.OutputField()
# Create a program-of-thought solver
pot_solver = dspy.ProgramOfThought(Computation)
# Test it
result = pot_solver(problem="What is the sum of squares of all integers from 1 to 10?")
print(result.program) # Shows the generated code
print(result.solution) # Shows the final solution
Code Explanation: This code creates a computational problem solver using Program of Thought:
- We define a
Computation
signature with problem input and solution output - We create a solver using
dspy.ProgramOfThought
with our signature - We test it with a mathematical problem (sum of squares)
- We print both the generated program and the final solution
Program of Thought is useful because:
- It structures reasoning as executable code instead of natural language
- The code can be run to verify correctness
- It’s particularly effective for mathematical or algorithmic problems
- It helps the model think more precisely through computational steps
ReAct Framework
dspy.ReAct
implements the Reasoning + Acting framework, which allows models to interact with tools:
# Define tools that ReAct can use
tools = [
{
"name": "search",
"description": "Search for information on the web",
"parameters": {"query": "string"}
},
{
"name": "calculator",
"description": "Perform mathematical calculations",
"parameters": {"expression": "string"}
}
]
# Define a signature for a task requiring tools
class ProblemSolving(dspy.Signature):
"""Solve a complex problem using tools if needed."""
question = dspy.InputField()
answer = dspy.OutputField()
# Create a ReAct solver with tools
react_solver = dspy.ReAct(ProblemSolving, tools=tools, max_iters=5)
# Test it
result = react_solver(question="What is the population of France divided by the population of Canada?")
print(result.answer)
Code Explanation: This code creates a problem solver that can use tools:
- We define a list of tools that the model can use:
- A search tool for finding information
- A calculator tool for performing calculations
- Each tool has a name, description, and parameters
- We define a
ProblemSolving
signature with question input and answer output - We create a solver using
dspy.ReAct
with our signature, tools, and a maximum number of iterations - We test it with a question that requires both searching for information and performing a calculation
- We print the final answer
The ReAct framework is powerful because:
- It allows models to use tools to solve problems they couldn’t solve on their own
- It alternates between reasoning and acting (using tools)
- It can handle multi-step tasks that require external information or computation
- It makes LLMs more capable by extending their abilities with specialized tools
TypedPredictor and TypedChainOfThought
For structured outputs, DSPy provides TypedPredictor
and TypedChainOfThought
:
from pydantic import BaseModel
# Define structured input and output using Pydantic
class Person(BaseModel):
name: str
age: int
email: str
class ExtractPersonInput(BaseModel):
text: str
# Create a typed predictor
person_extractor = dspy.TypedPredictor("input:ExtractPersonInput -> output:Person")
# Test it
result = person_extractor(input={"text": "John Doe is 30 years old. You can reach him at john.doe@example.com."})
print(result) # Output will be structured as a Person object
Code Explanation: This code creates an information extractor with typed outputs:
- We import
BaseModel
from Pydantic, a data validation library - We define two models:
Person
: A structured output with name, age, and email fieldsExtractPersonInput
: A simple input with a text field
- We create a predictor using
dspy.TypedPredictor
with a string signature- The signature format is “input:InputType -> output:OutputType”
- We test it with a text containing person information
- The result will be a structured Person object with name, age, and email fields
TypedPredictor is useful because:
- It ensures outputs are properly structured and validated
- It works well with Pydantic, a popular data validation library
- It helps create more robust systems by enforcing output types
- It’s particularly useful for information extraction tasks
Building a Question-Answering System
Now, let’s build a simple question-answering system that uses Chain of Thought reasoning:
# Define a signature for our QA system
class DetailedQA(dspy.Signature):
"""Answer questions with detailed explanations."""
question = dspy.InputField()
answer = dspy.OutputField(desc="comprehensive answer with explanation")
# Create a QA module with Chain of Thought
class EnhancedQA(dspy.Module):
def __init__(self):
super().__init__()
self.qa_predictor = dspy.ChainOfThought(DetailedQA)
def forward(self, question):
# Enhance question with instructions for better answers
enhanced_question = f"Provide a detailed, accurate answer to this question: {question}"
return self.qa_predictor(question=enhanced_question)
# Test our QA system
qa_system = EnhancedQA()
response = qa_system("What causes aurora borealis?")
print(response.rationale) # The reasoning process
print(response.answer) # The final answer
Code Explanation: This code builds a question-answering system with detailed explanations:
- We define a
DetailedQA
signature with question input and answer output - We create a custom
EnhancedQA
module:- In
__init__
, we create aChainOfThought
predictor using our signature - In
forward
, we enhance the question with additional instructions - We return the result from the predictor
- In
- We test it with a question about aurora borealis
- We print both the reasoning process and the final answer
This approach is effective because:
- It uses Chain of Thought to encourage step-by-step reasoning
- It enhances the question with clear instructions
- It produces detailed, well-explained answers
- It makes the reasoning process transparent for inspection
Implementing RAG Applications
Retrieval-Augmented Generation (RAG) is a powerful technique that combines retrieval of relevant information with language model generation. DSPy makes it easy to implement RAG systems.
Basic RAG System
# First, set up a retriever
from dspy.retrieve.qdrant import QdrantRetriever
# Initialize a vector database (this is a placeholder - replace with your actual DB)
retriever = QdrantRetriever(
collection_name="my_documents",
embedding_model="sentence-transformers/all-MiniLM-L6-v2"
)
Code Explanation: This code sets up a document retriever for our RAG system:
- We import
QdrantRetriever
from DSPy, which interfaces with the Qdrant vector database - We initialize it with:
collection_name
: The name of the document collection in the databaseembedding_model
: A model that converts text to numerical vectors (embeddings)
A vector database is a specialized database that stores text as numerical vectors, allowing for semantic search (finding documents with similar meaning, not just matching keywords).
# Define a RAG signature
class RAG(dspy.Signature):
"""Generate answers based on retrieved documents."""
question = dspy.InputField()
context = dspy.InputField()
answer = dspy.OutputField()
# Create a basic RAG module
class BasicRAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()
self.retrieve = dspy.Retrieve(retriever, k=num_passages)
self.generate = dspy.ChainOfThought(RAG)
def forward(self, question):
# Retrieve relevant passages
passages = self.retrieve(question).passages
# Concatenate passages into context
context = "\n\n".join(passages)
# Generate answer based on context
response = self.generate(question=question, context=context)
return response
Code Explanation: This code defines our RAG system:
- We define a
RAG
signature with:question
: The user’s querycontext
: The retrieved documentsanswer
: The generated response
- We create a
BasicRAG
module:num_passages
: How many documents to retrieve (default: 3)self.retrieve
: A retriever component that fetches documentsself.generate
: A generator that creates answers based on context
- In the
forward
method:- We retrieve relevant passages based on the question
- We join them into a single context string
- We generate an answer based on both the question and context
- We return the response
# Test our RAG system
rag_system = BasicRAG(num_passages=5)
response = rag_system("What are the main principles of quantum computing?")
print(response.answer)
Code Explanation: This code tests our RAG system:
- We create a
BasicRAG
instance with 5 passages per query - We call it with a question about quantum computing
- We print the generated answer
The RAG approach is powerful because:
- It combines the knowledge from a document database with the reasoning abilities of an LLM
- It can answer questions about specific documents or specialized knowledge
- It provides more accurate and up-to-date information than the LLM alone
- It can cite sources for its answers, increasing trustworthiness
Multi-Hop RAG
For complex questions requiring multiple retrieval steps:
# Define a multi-hop RAG module
class MultiHopRAG(dspy.Module):
def __init__(self, num_passages_per_hop=3, max_hops=2):
super().__init__()
self.generate_query = [dspy.ChainOfThought("context, question -> query") for _ in range(max_hops)]
self.retrieve = dspy.Retrieve(retriever, k=num_passages_per_hop)
self.generate_answer = dspy.ChainOfThought(RAG)
self.max_hops = max_hops
Code Explanation: This code defines a more advanced RAG system with multiple “hops” (retrieval steps):
- We create a
MultiHopRAG
module with:num_passages_per_hop
: How many documents to retrieve in each stepmax_hops
: The maximum number of retrieval steps
- We initialize:
self.generate_query
: An array of query generators, one for each hopself.retrieve
: A retriever component for fetching documentsself.generate_answer
: A final answer generatorself.max_hops
: Storage for the maximum number of hops
def forward(self, question):
context = []
prev_queries = [question]
# Perform multiple hops of retrieval
for hop in range(self.max_hops):
# Generate a new query based on current context and original question
query = self.generate_query[hop](context="\n\n".join(context), question=question).query
# Skip if query is similar to previous queries
if query in prev_queries:
continue
prev_queries.append(query)
# Retrieve passages
passages = self.retrieve(query).passages
# Add to context (avoiding duplicates)
context = self._deduplicate(context + passages)
# Generate final answer
response = self.generate_answer(question=question, context="\n\n".join(context))
return response
Code Explanation:
This continues the MultiHopRAG
module with the forward
method:
- We initialize empty lists for context and previous queries
- For each hop (up to
max_hops
):- We generate a new query based on the current context and original question
- We skip this query if it’s identical to a previous one (to avoid loops)
- We add it to the list of previous queries
- We retrieve passages using this query
- We add them to the context, removing duplicates
- After all hops, we generate a final answer using all collected context
- We return this response
def _deduplicate(self, passages):
seen = set()
unique_passages = []
for passage in passages:
if passage not in seen:
seen.add(passage)
unique_passages.append(passage)
return unique_passages
Code Explanation: This is a helper method to remove duplicate passages:
- We create a set to track passages we’ve already seen
- We initialize an empty list for unique passages
- For each passage:
- If we haven’t seen it before, we add it to both the set and the list
- We return the list of unique passages
Multi-hop RAG is powerful because:
- It can answer complex questions that require connecting multiple pieces of information
- It simulates a multi-step research process
- It can follow chains of reasoning across different documents
- It’s particularly effective for questions like “What is the connection between X and Y?”
RAG with Structured Output
For extracting structured information from retrieved documents:
from pydantic import BaseModel, Field
from typing import List, Optional
# Define structured output schema
class ResearchPaper(BaseModel):
title: str
authors: List[str]
year: int
abstract: str
keywords: Optional[List[str]] = None
class StructuredRAGInput(BaseModel):
question: str
context: str
Code Explanation: This code defines structured data models for our information extraction:
- We import necessary modules from Pydantic and Python’s typing system
- We define a
ResearchPaper
model with:title
: The paper’s title (a string)authors
: A list of author namesyear
: The publication year (an integer)abstract
: The paper’s abstractkeywords
: An optional list of keywords
- We define a
StructuredRAGInput
model with:question
: The user’s querycontext
: The retrieved documents
# Create a structured RAG system
class StructuredRAG(dspy.Module):
def __init__(self, num_passages=5):
super().__init__()
self.retrieve = dspy.Retrieve(retriever, k=num_passages)
self.extract = dspy.TypedPredictor("input:StructuredRAGInput -> output:ResearchPaper")
def forward(self, question):
# Retrieve passages
passages = self.retrieve(question).passages
context = "\n\n".join(passages)
# Extract structured information
input_data = {"question": question, "context": context}
result = self.extract(input=input_data)
return result
Code Explanation: This code defines a RAG system that outputs structured data:
- We create a
StructuredRAG
module with:num_passages
: How many documents to retrieve
- We initialize:
self.retrieve
: A retriever componentself.extract
: A typed predictor that converts text to structured data
- In the
forward
method:- We retrieve relevant passages
- We join them into a context string
- We create an input dictionary with question and context
- We extract structured information from this input
- We return the structured result
# Test our structured RAG system
structured_rag = StructuredRAG()
response = structured_rag("Find the research paper about transformer architecture")
print(f"Title: {response.title}")
print(f"Authors: {', '.join(response.authors)}")
print(f"Year: {response.year}")
print(f"Abstract: {response.abstract}")
if response.keywords:
print(f"Keywords: {', '.join(response.keywords)}")
Code Explanation: This code tests our structured RAG system:
- We create a
StructuredRAG
instance - We call it with a query about transformer architecture
- We print each field of the structured response:
- The paper title
- The author list (joined with commas)
- The publication year
- The abstract
- The keywords (if any)
Structured RAG is powerful because:
- It extracts specific information in a structured format
- It can be easily integrated with downstream systems
- It ensures outputs follow a predefined schema
- It’s ideal for tasks like information extraction, data population, and building structured databases
Using DSPy Assertions
DSPy Assertions allow you to add constraints to ensure outputs meet specific requirements:
# Define a tweet generator with length constraints
class TweetGenerator(dspy.Module):
def __init__(self):
super().__init__()
self.generate = dspy.ChainOfThought("topic -> tweet")
def forward(self, topic):
response = self.generate(topic=topic)
# Add assertion to check tweet length
dspy.Assert(
len(response.tweet) <= 280,
"Tweet must be 280 characters or less."
)
return response
Code Explanation: This code creates a tweet generator with a length constraint:
- We define a
TweetGenerator
module with a chain-of-thought generator - In the
forward
method:- We generate a tweet based on the topic
- We add an assertion that the tweet must be 280 characters or less
- If the assertion fails, DSPy will retry with the error message
- We return the final response (which satisfies the assertion)
Assertions are powerful because:
- They enforce constraints on the output
- They allow the model to self-correct when constraints are violated
- They don’t require manually encoding constraints in prompts
- They make outputs more reliable and predictable
# Test our constrained generator
tweet_gen = TweetGenerator()
response = tweet_gen("artificial intelligence")
print(response.tweet)
Code Explanation: This code tests our tweet generator:
- We create a
TweetGenerator
instance - We call it with the topic “artificial intelligence”
- We print the generated tweet (which will be 280 characters or less)
For more complex assertions:
# Define a function to validate JSON format
def is_valid_json(text):
try:
json.loads(text)
return True
except:
return False
# Create a module with multiple assertions
class JSONGenerator(dspy.Module):
def __init__(self):
super().__init__()
self.generate = dspy.Predict("schema, description -> json_output")
def forward(self, schema, description):
response = self.generate(schema=schema, description=description)
# Add assertion for valid JSON
dspy.Assert(
is_valid_json(response.json_output),
"Output must be valid JSON."
)
# Add suggestion for JSON formatting
dspy.Suggest(
response.json_output.startswith("{") and response.json_output.endswith("}"),
"JSON should be enclosed in curly braces."
)
return response
Code Explanation: This code creates a JSON generator with multiple constraints:
- We define a helper function
is_valid_json
that checks if a string is valid JSON - We create a
JSONGenerator
module with a prediction component - In the
forward
method:- We generate JSON based on a schema and description
- We add a strict assertion that the output must be valid JSON
- We add a softer suggestion that JSON should be enclosed in curly braces
- We return the response
The difference between assertions and suggestions:
- Assertions are strict constraints that must be satisfied
- Suggestions are softer guidance that the model should try to follow
- If an assertion fails, DSPy will retry with the error message
- If a suggestion fails, DSPy may not retry
# Test our JSON generator
json_gen = JSONGenerator()
schema = '{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "number"}}}'
response = json_gen(schema=schema, description="Generate a person object")
print(response.json_output)
Code Explanation: This code tests our JSON generator:
- We create a
JSONGenerator
instance - We call it with:
- A JSON schema defining a person object with name and age
- A description of what to generate
- We print the generated JSON (which will be valid according to our assertions)
Optimizing DSPy Programs
One of DSPy’s most powerful features is its ability to optimize prompts through examples and metrics:
BootstrapFewShot
# Define a simple accuracy metric
def accuracy_metric(example, prediction):
return 1.0 if example.answer.lower() == prediction.answer.lower() else 0.0
# Create example data
examples = [
dspy.Example(question="What is the capital of France?", answer="Paris"),
dspy.Example(question="What is the capital of Japan?", answer="Tokyo"),
dspy.Example(question="What is the capital of Brazil?", answer="Brasília"),
# Add more examples...
]
Code Explanation: This code prepares for optimization:
- We define an
accuracy_metric
function that:- Takes an example (with the correct answer) and a prediction
- Returns 1.0 if the answers match (ignoring case), 0.0 otherwise
- We create a list of example questions and answers
- Each
dspy.Example
contains both a question and its correct answer - These will be used to teach the model the pattern
- Each
# Initialize the BootstrapFewShot optimizer
from dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot(metric=accuracy_metric)
# Compile the optimized QA system
optimized_qa = optimizer.compile(
program=SimpleQA(),
trainset=examples[:8],
valset=examples[8:10]
)
Code Explanation: This code performs the optimization:
- We import
BootstrapFewShot
from DSPy’s teleprompt module - We create an optimizer with our accuracy metric
- We compile an optimized version of our
SimpleQA
program:- We use the first 8 examples as a training set
- We use the last 2 examples as a validation set
- The optimizer finds good examples to include in prompts
- It returns an optimized version of our program
The BootstrapFewShot process:
- Runs the original program on training examples
- Collects successful examples (where the program produced correct answers)
- Uses these examples as few-shot demonstrations in prompts
- Evaluates the improved program on the validation set
- Returns the optimized program with better performance
# Test the optimized system
response = optimized_qa("What is the capital of Australia?")
print(response.answer)
Code Explanation: This code tests the optimized system:
- We call the optimized QA system with a new question
- We print the generated answer
- The answer should be more accurate now that the system has been optimized
MIPROv2
# Initialize the MIPROv2 optimizer
from dspy.teleprompt import MIPROv2
mipro_optimizer = MIPROv2(metric=accuracy_metric)
# Compile with MIPROv2 for more sophisticated optimization
mipro_qa = mipro_optimizer.compile(
program=SimpleQA(),
trainset=examples[:8],
valset=examples[8:10],
max_rounds=3
)
Code Explanation: This code uses a more advanced optimizer:
- We import
MIPROv2
from DSPy’s teleprompt module - We create an optimizer with our accuracy metric
- We compile an optimized version of our
SimpleQA
program:- We use the same training and validation sets as before
- We specify
max_rounds=3
to run three rounds of optimization - The optimizer finds better instructions and demonstrations
- It returns a highly optimized version of our program
MIPROv2 (Multi-step Instruction Prompting Optimization) is more sophisticated:
- It bootstraps examples like BootstrapFewShot
- It also generates and tests different instructions for the model
- It explores combinations of instructions and examples
- It iteratively improves the program over multiple rounds
- It often achieves better performance than simpler optimizers
# Test the optimized system
response = mipro_qa("What is the capital of New Zealand?")
print(response.answer)
Code Explanation: This code tests the MIPROv2-optimized system:
- We call the optimized QA system with a new question
- We print the generated answer
- The answer should be even more accurate with this advanced optimization
BootstrapFineTune
# For smaller models that can be fine-tuned
from dspy.teleprompt import BootstrapFineTune
# Create a larger training set
training_examples = [
# Add many examples here
]
# Initialize the BootstrapFineTune optimizer
finetune_optimizer = BootstrapFineTune(
metric=accuracy_metric,
max_bootstrapped_demos=100
)
Code Explanation: This code prepares for fine-tuning optimization:
- We import
BootstrapFineTune
from DSPy’s teleprompt module - We would create a larger training set with many examples
- We create an optimizer with:
- Our accuracy metric
- A maximum of 100 bootstrapped demonstrations
The BootstrapFineTune approach:
- Goes beyond prompt optimization to actual model weight updates
- Generates examples automatically using bootstrapping
- Uses these examples to fine-tune the model weights
- Is particularly useful for smaller, locally-hosted models
# Compile with fine-tuning
# Note: This requires a model that supports fine-tuning
finetuned_qa = finetune_optimizer.compile(
program=SimpleQA(),
trainset=training_examples[:80],
valset=training_examples[80:100]
)
Code Explanation: This code performs the fine-tuning optimization:
- We compile a fine-tuned version of our
SimpleQA
program:- We use the first 80 examples as a training set
- We use the last 20 examples as a validation set
- The optimizer bootstraps additional examples
- It uses these to fine-tune the model weights
- It returns an improved program with a fine-tuned model
Note: Fine-tuning requires a model that supports it. Large hosted models like Gemini may not allow fine-tuning, but smaller local models often do.
# Test the fine-tuned system
response = finetuned_qa("What is the capital of South Africa?")
print(response.answer)
Code Explanation: This code tests the fine-tuned system:
- We call the fine-tuned QA system with a new question
- We print the generated answer
- The answer should be accurate thanks to the fine-tuned weights
Leveraging Gemini’s Thinking Capabilities
Gemini Flash 2.5’s thinking capabilities can be leveraged to improve complex reasoning tasks. Let’s modify our approach to use this feature:
# Create a module that benefits from thinking capabilities
class ThinkingMathSolver(dspy.Module):
def __init__(self, thinking_budget=2048):
super().__init__()
self.thinking_budget = thinking_budget
self.cot = dspy.ChainOfThought(MathProblem)
Code Explanation: This code creates a math solver that uses Gemini’s thinking capabilities:
- We define a
ThinkingMathSolver
module with:- A thinking budget parameter (default: 2048)
- A Chain of Thought component for solving math problems
- The thinking budget controls how much internal reasoning Gemini will do
def forward(self, question):
# Create a Gemini instance with thinking enabled
thinking_gemini = GeminiFlash(thinking_budget=self.thinking_budget)
# Use the thinking model for this specific task
with dspy.context(lm=thinking_gemini):
return self.cot(question=question)
Code Explanation:
This continues the ThinkingMathSolver
with the forward
method:
- We create a new
GeminiFlash
instance with thinking enabled - We use
dspy.context()
to temporarily override the default language model - Within this context, we run the Chain of Thought solver
- This allows us to use thinking for just this specific task
This approach gives you fine control over when to use Gemini’s thinking capabilities, allowing you to use it only for complex problems where it’s most valuable.
# Test with a complex reasoning problem
thinking_module = ThinkingMathSolver(thinking_budget=2048)
response = thinking_module("A store has a 30% discount on all items. After applying the discount, a customer pays $56 for a shirt. What was the original price of the shirt?")
print(response.rationale) # Show the reasoning process
print(response.answer) # Show the final answer
Code Explanation: This code tests our thinking-enabled solver:
- We create a
ThinkingMathSolver
with a budget of 2048 tokens - We call it with a discount calculation problem
- We print both the reasoning process and the final answer
Gemini’s thinking capabilities:
- Allow the model to reason internally before responding
- Improve accuracy on complex reasoning tasks
- Can be controlled with a thinking budget
- Have a cost-quality tradeoff (higher budgets cost more)
# Compare performance with different thinking budgets
for budget in [0, 512, 1024, 2048]:
solver = ThinkingMathSolver(thinking_budget=budget)
response = solver("If 5 workers can build 5 tables in 5 days, how many days would it take 10 workers to build 10 tables?")
print(f"Budget: {budget}")
print(f"Answer: {response.answer}")
print("-" * 50)
Code Explanation: This code experiments with different thinking budgets:
- We loop through four different budget levels (0, 512, 1024, 2048)
- For each, we create a solver with that budget
- We test it on a worker/table problem
- We print the budget and answer
- This helps find the right balance between quality and cost
Building Complex AI Applications
DSPy’s modular approach allows you to build sophisticated AI applications by combining different modules.
Multi-Agent Systems
# Define specialized agents
class ResearchAgent(dspy.Module):
def __init__(self, retriever):
super().__init__()
self.retrieve = dspy.Retrieve(retriever, k=5)
self.synthesize = dspy.ChainOfThought("question, context -> findings")
def forward(self, question):
passages = self.retrieve(question).passages
context = "\n\n".join(passages)
return self.synthesize(question=question, context=context)
Code Explanation: This code defines a research agent:
- We create a
ResearchAgent
module that:- Takes a retriever in its constructor
- Retrieves 5 passages for each question
- Uses Chain of Thought to synthesize findings from the context
- In the
forward
method:- We retrieve relevant passages
- We join them into a context
- We synthesize findings from the question and context
- We return the results
class WritingAgent(dspy.Module):
def __init__(self):
super().__init__()
self.generate = dspy.ChainOfThought("topic, outline -> article")
def forward(self, topic, outline):
return self.generate(topic=topic, outline=outline)
class OutlineAgent(dspy.Module):
def __init__(self):
super().__init__()
self.generate = dspy.ChainOfThought("topic, research -> outline")
def forward(self, topic, research):
return self.generate(topic=topic, research=research)
Code Explanation: This code defines two more specialized agents:
- A
WritingAgent
that:- Takes a topic and outline
- Generates a full article
- An
OutlineAgent
that:- Takes a topic and research findings
- Generates an outline for an article
Each agent focuses on a specific task and can be composed with others.
# Create a multi-agent system
class ContentCreationSystem(dspy.Module):
def __init__(self, retriever):
super().__init__()
self.research_agent = ResearchAgent(retriever)
self.outline_agent = OutlineAgent()
self.writing_agent = WritingAgent()
def forward(self, topic):
# Step 1: Research the topic
research_results = self.research_agent(topic)
# Step 2: Create an outline
outline = self.outline_agent(topic=topic, research=research_results.findings)
# Step 3: Write the article
article = self.writing_agent(topic=topic, outline=outline.outline)
return article
Code Explanation: This code creates a multi-agent system for content creation:
- We define a
ContentCreationSystem
that:- Takes a retriever in its constructor
- Creates three specialized agents
- In the
forward
method, we implement a workflow:- Step 1: Research the topic using the research agent
- Step 2: Create an outline using the outline agent and research results
- Step 3: Write an article using the writing agent and outline
- We return the final article
This approach is powerful because:
- It breaks down a complex task into manageable steps
- Each agent specializes in a specific sub-task
- The agents work together in a pipeline
- It mimics how humans collaborate on complex projects
# Test the multi-agent system
content_system = ContentCreationSystem(retriever)
response = content_system("The impact of artificial intelligence on healthcare")
print(response.article)
Code Explanation: This code tests our multi-agent system:
- We create a
ContentCreationSystem
with our retriever - We call it with a topic about AI in healthcare
- We print the generated article
Information Extraction
from pydantic import BaseModel
from typing import List, Dict, Optional
# Define structured output schema
class Company(BaseModel):
name: str
industry: str
founded_year: Optional[int]
headquarters: Optional[str]
key_products: Optional[List[str]]
revenue: Optional[str]
Code Explanation: This code defines a structured data model for company information:
- We import necessary modules from Pydantic and Python’s typing system
- We define a
Company
model with:- Required fields: name, industry
- Optional fields: founded_year, headquarters, key_products, revenue
- Appropriate types for each field
# Define an information extraction system
class CompanyProfileExtractor(dspy.Module):
def __init__(self, retriever):
super().__init__()
self.retrieve = dspy.Retrieve(retriever, k=10)
self.extract = dspy.TypedChainOfThought("input:StructuredRAGInput -> output:Company")
def forward(self, company_name):
# Retrieve information about the company
passages = self.retrieve(f"information about {company_name}").passages
context = "\n\n".join(passages)
# Extract structured information
input_data = {"question": f"Extract information about {company_name}", "context": context}
profile = self.extract(input=input_data)
return profile
Code Explanation: This code creates a system for extracting company profiles:
- We define a
CompanyProfileExtractor
module that:- Takes a retriever
- Retrieves 10 passages about the company
- Uses TypedChainOfThought to extract structured information
- In the
forward
method:- We retrieve passages about the company
- We join them into a context
- We create an input dictionary with question and context
- We extract a structured company profile
- We return the profile
This approach is useful for:
- Automatically populating databases with structured information
- Creating company profiles from unstructured text
- Extracting specific fields with proper typing
- Ensuring data consistency across extractions
# Test the information extraction system
extractor = CompanyProfileExtractor(retriever)
company_profile = extractor("Tesla")
print(f"Company: {company_profile.name}")
print(f"Industry: {company_profile.industry}")
print(f"Founded: {company_profile.founded_year}")
print(f"Headquarters: {company_profile.headquarters}")
if company_profile.key_products:
print(f"Key Products: {', '.join(company_profile.key_products)}")
print(f"Revenue: {company_profile.revenue}")
Code Explanation: This code tests our information extraction system:
- We create a
CompanyProfileExtractor
with our retriever - We call it with “Tesla” as the company name
- We print each field of the extracted profile:
- Company name
- Industry
- Founded year
- Headquarters
- Key products (if available)
- Revenue
Decision Making Systems
# Define a decision making system
class InvestmentAdvisor(dspy.Module):
def __init__(self, retriever):
super().__init__()
self.retrieve = dspy.Retrieve(retriever, k=7)
self.analyze = dspy.ChainOfThought("stock, context -> analysis")
self.decide = dspy.ChainOfThought("analysis, risk_tolerance -> recommendation")
def forward(self, stock_symbol, risk_tolerance):
# Retrieve information about the stock
passages = self.retrieve(f"{stock_symbol} stock analysis financial data").passages
context = "\n\n".join(passages)
# Analyze the stock
analysis = self.analyze(stock=stock_symbol, context=context)
# Make a recommendation
recommendation = self.decide(
analysis=analysis.analysis,
risk_tolerance=risk_tolerance
)
return recommendation
Code Explanation: This code creates an investment advisor system:
- We define an
InvestmentAdvisor
module that:- Takes a retriever
- Retrieves 7 passages about the stock
- Has two Chain of Thought components:
- One for analysis
- One for decision making
- In the
forward
method:- We retrieve passages about the stock
- We join them into a context
- We analyze the stock based on the context
- We make a recommendation based on the analysis and risk tolerance
- We return the recommendation
This multi-step approach:
- Separates analysis from decision making
- Takes risk tolerance into account
- Uses retrieved information for up-to-date analysis
- Provides transparent reasoning for recommendations
# Test the decision making system
advisor = InvestmentAdvisor(retriever)
recommendation = advisor(stock_symbol="AAPL", risk_tolerance="moderate")
print(recommendation.rationale) # The reasoning process
print(recommendation.recommendation) # The final recommendation
Code Explanation: This code tests our investment advisor:
- We create an
InvestmentAdvisor
with our retriever - We call it with:
- “AAPL” as the stock symbol
- “moderate” as the risk tolerance
- We print both the reasoning process and the final recommendation
Conclusion and Next Steps
In this comprehensive tutorial, we’ve explored how to combine DSPy’s declarative programming approach with Gemini Flash 2.5’s thinking capabilities to build powerful AI applications. We’ve covered:
- Setting up DSPy with Gemini Flash 2.5
- Core DSPy concepts: Signatures, Modules, and Optimizers
- Various DSPy modules like Predict, ChainOfThought, and ReAct
- Building RAG systems with single and multi-hop reasoning
- Using DSPy Assertions to enforce constraints
- Optimizing DSPy programs with different optimizers
- Leveraging Gemini’s thinking feature for complex reasoning
- Building complex applications like multi-agent systems
This combination provides a powerful framework for building AI applications that are more reliable, maintainable, and efficient than traditional approaches based on prompt engineering.
Next Steps
- Explore more advanced DSPy modules and optimizers
- Experiment with different thinking budgets in Gemini Flash 2.5
- Integrate with other data sources and vector databases
- Build domain-specific applications for your use cases
- Contribute to the DSPy community to help improve the framework
By programming language models rather than prompting them, you can create more robust and predictable AI systems that improve over time.
Resources
Happy coding with DSPy and Gemini Flash 2.5!
Saptak Sen
If you enjoyed this post, you should check out my book: Starting with Spark.