Building Powerful AI Applications by Integrating OpenAI Responses API with Google Gemini
We’ve entered what industry leaders call the “agentic era” - where AI systems not only understand and generate content but also take actions on behalf of users. OpenAI’s new Responses API and Google’s Gemini models, now accessible through the OpenAI Library, represent significant milestones in this evolution. This post explores how to integrate these technologies, creating versatile applications that leverage the unique advantages of each platform.
Understanding the Platforms
Before diving into the integration, let’s understand what each platform brings to the table:
OpenAI Responses API
OpenAI introduced the Responses API on March 11, 2025, as their new API primitive for leveraging built-in tools to build agents. It combines the simplicity of Chat Completions with the tool-use capabilities of the Assistants API, providing:
- Built-in tools: Web search, file search, and computer use (with code interpreter coming soon)
- Unified item-based design: For simpler polymorphism
- Intuitive streaming events: For real-time interactions
- SDK helpers: Like
response.output_text
to easily access the model’s text output - Event-driven architecture: Clearly emits semantic events detailing precisely what changed
According to OpenAI, they’re working to achieve full feature parity between the Assistants and the Responses API, including support for Assistant-like and Thread-like objects. Once complete, they plan to formally announce the deprecation of the Assistants API with a target sunset date in mid-2026.
Google Gemini
Google’s Gemini models, with the latest iteration being Gemini 2.0 introduced in December 2024, offer:
- Multimodal capabilities: Process and generate text, images, audio, and video
- Native tool use: Built specifically for the agentic era
- Performance strengths: Includes variants optimized for different use cases (Gemini 2.0 Flash for rapid responses, Gemini 2.0 Pro for advanced reasoning)
- Deep integration with Google Workspace: Can summarize content, manage tasks, create events, and more when connected
- Compatibility with OpenAI libraries: As of March 2025, Google made Gemini accessible through the OpenAI Library and REST API
- Hardware optimization: Built on custom hardware like Trillium and Google’s sixth-generation TPUs
The Integration Approach
Our integration strategy leverages the architectural strengths of OpenAI’s Responses API while incorporating Gemini’s models for specific tasks. Here’s the high-level approach:
- Primary conversation management: Use OpenAI’s Responses API for managing interactions and overall conversation flow
- Task delegation: Identify specialized tasks that benefit from Gemini’s capabilities
- API orchestration: Create a middleware layer that routes requests to the appropriate service
- Response synthesis: Combine outputs from both systems into coherent responses
Setting Up the Environment
First, let’s set up our development environment with the necessary libraries:
# Create and activate a virtual environment
python -m venv aienv
source aienv/bin/activate # On Windows: aienv\Scripts\activate
# Install required packages
pip install openai google-generativeai python-dotenv flask
Create a .env
file to store your API keys:
OPENAI_API_KEY=your_openai_api_key
GOOGLE_API_KEY=your_google_api_key
Basic Integration Architecture
Here’s our core integration architecture using the Responses API with Gemini:
import os
import openai
import json
from openai import OpenAI
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Configure OpenAI client
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Configure Gemini client using OpenAI compatibility layer
gemini_client = OpenAI(
api_key=os.getenv("GOOGLE_API_KEY"),
base_url="https://generativelanguage.googleapis.com/v1beta/"
)
# Define a function to use Gemini for specific tasks
def use_gemini(query, task_type):
"""Use Google Gemini for specialized tasks"""
# Select the appropriate Gemini model based on task type
if task_type == "image_analysis" or task_type == "creative_generation":
model = "gemini-2.0-pro"
elif task_type == "mathematical_reasoning" or task_type == "quick_response":
model = "gemini-2.0-flash"
else:
model = "gemini-1.5-flash" # Default to a balanced model
# Call Gemini API
response = gemini_client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": f"You are a specialized assistant for {task_type} tasks."},
{"role": "user", "content": query}
]
)
return response.choices[0].message.content
Creating the Integration Layer
Now, let’s create a function to create a unified Responses API call that can leverage Gemini’s capabilities:
def create_hybrid_response(user_query, response_mode="auto"):
"""
Create a response using either OpenAI or Gemini based on the query content.
Args:
user_query: The user's input query
response_mode: 'auto' (route based on query), 'openai', or 'gemini'
"""
if response_mode == "auto":
# Simple routing logic - could be enhanced with more sophisticated analysis
if "image" in user_query.lower() or "picture" in user_query.lower():
response_mode = "gemini" # Gemini has strong image capabilities
elif "math" in user_query.lower() or "calculate" in user_query.lower():
response_mode = "gemini" # Gemini is strong at mathematical reasoning
else:
response_mode = "openai" # Default to OpenAI's Responses API
if response_mode == "gemini":
# Use Gemini for the response
response = use_gemini(user_query, determine_task_type(user_query))
return {"source": "gemini", "response": response}
else:
# Use OpenAI's Responses API
response = openai_client.beta.responses.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": user_query}
],
tools=[
{"type": "web_search"}, # Enable web search for up-to-date information
{"type": "computer_use"} # Enable computer use for practical tasks
]
)
return {"source": "openai", "response": response.output_text}
def determine_task_type(query):
"""Determine the best task type for Gemini based on query content"""
if "image" in query.lower() or "picture" in query.lower():
return "image_analysis"
elif "math" in query.lower() or "calculate" in query.lower():
return "mathematical_reasoning"
elif "code" in query.lower() or "program" in query.lower():
return "code_generation"
elif "fast" in query.lower() or "quick" in query.lower():
return "quick_response"
else:
return "general"
Building the Full Integration
Let’s put everything together in a Flask application that can leverage both platforms:
from flask import Flask, request, jsonify
import json
import time
app = Flask(__name__)
# Store conversation history for context
conversation_history = {}
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
user_input = data.get('message')
session_id = data.get('session_id', 'default')
response_mode = data.get('mode', 'auto') # 'auto', 'openai', or 'gemini'
# Initialize conversation history if needed
if session_id not in conversation_history:
conversation_history[session_id] = []
# Add user message to history
conversation_history[session_id].append({
"role": "user",
"content": user_input
})
# Handle file uploads if present
files = []
if 'files' in data:
# Process file uploads - simplified for this example
file_references = data.get('files', [])
files = [{"type": file["type"], "content": file["content"]} for file in file_references]
# Enrich the query with file information if applicable
enriched_query = user_input
if files:
enriched_query += "\n[Note: The user has uploaded files that might be relevant to this query.]"
# Get response
if 'web_search' in data and data['web_search']:
# If web search is explicitly requested, use OpenAI's Responses API
response_result = openai_client.beta.responses.create(
model="gpt-4o",
messages=conversation_history[session_id],
tools=[{"type": "web_search"}]
)
response_text = response_result.output_text
source = "openai_web_search"
else:
# Otherwise use our hybrid approach
result = create_hybrid_response(enriched_query, response_mode)
response_text = result["response"]
source = result["source"]
# Add assistant response to history
conversation_history[session_id].append({
"role": "assistant",
"content": response_text
})
# Limit history size to prevent token overflow
if len(conversation_history[session_id]) > 20:
# Remove oldest messages but keep the system message if present
conversation_history[session_id] = conversation_history[session_id][-20:]
return jsonify({
"response": response_text,
"source": source,
"session_id": session_id
})
@app.route('/stream-chat', methods=['POST'])
def stream_chat():
"""Endpoint for streaming responses"""
data = request.json
user_input = data.get('message')
session_id = data.get('session_id', 'default')
# Initialize conversation history if needed
if session_id not in conversation_history:
conversation_history[session_id] = []
# Add user message to history
conversation_history[session_id].append({
"role": "user",
"content": user_input
})
# Use streaming API from OpenAI
def generate():
# Use streaming for real-time responses
stream = openai_client.beta.responses.create(
model="gpt-4o",
messages=conversation_history[session_id],
stream=True
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta:
content = chunk.choices[0].delta.content or ""
full_response += content
yield f"data: {json.dumps({'content': content, 'done': False})}\n\n"
# Add the complete response to history
conversation_history[session_id].append({
"role": "assistant",
"content": full_response
})
yield f"data: {json.dumps({'content': '', 'done': True})}\n\n"
return Response(generate(), mimetype='text/event-stream')
if __name__ == '__main__':
app.run(debug=True)
Using the OpenAI Compatibility Layer for Gemini
As announced in March 2025, Google now offers direct compatibility with OpenAI libraries, which makes integration even simpler. This compatibility layer allows developers to access Gemini models using the familiar OpenAI interface with just three simple changes to their existing code:
from openai import OpenAI
# Initialize the client with Gemini configuration
client = OpenAI(
api_key="gemini_api_key",
base_url="https://generativelanguage.googleapis.com/v1beta/"
)
# Send a request to Gemini using OpenAI's client
response = client.chat.completions.create(
model="gemini-2.0-flash", # Use Gemini models
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms"}
]
)
print(response.choices[0].message)
For Node.js developers, the implementation is equally straightforward:
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: "gemini_api_key",
baseURL: "https://generativelanguage.googleapis.com/v1beta/"
});
const response = await openai.chat.completions.create({
model: "gemini-2.0-flash",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing in simple terms" },
],
});
console.log(response.choices[0].message);
The integration also supports streaming responses, which is crucial for creating responsive user interfaces:
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta)
This compatibility layer makes it much easier to experiment with Gemini models without having to learn an entirely new API structure.
Practical Applications
Let’s explore some practical applications of this integration that leverage the built-in capabilities of both platforms:
1. Enhanced Web Search with OpenAI’s Web Search API
One of the built-in tools in OpenAI’s Responses API is web search, which allows developers to create applications that can search the web and present findings with proper citations. This is valuable for applications that need to provide up-to-date information.
def web_search_with_citations(query):
"""Perform a web search with citations using OpenAI's Responses API"""
response = openai_client.beta.responses.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a research assistant. Search the web for accurate information and include citations."},
{"role": "user", "content": query}
],
tools=[{"type": "web_search"}]
)
# Extract the answer with citations
answer = response.output_text
return answer
This capability is particularly valuable for research assistants, content creation tools, and educational applications that need to provide accurate and verifiable information from the web.
2. Computer Use API: Enabling AI to Navigate Interfaces
Another powerful tool introduced with the Responses API is the computer use API, which allows AI to navigate browsers, interact with web interfaces, and perform tasks:
def use_computer_interface(task_description):
"""Use the computer use API to perform tasks on web interfaces"""
response = openai_client.beta.responses.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a automation assistant that can use a computer to perform tasks."},
{"role": "user", "content": task_description}
],
tools=[{"type": "computer_use"}]
)
# The response will contain a record of the actions taken
actions_taken = response.output_text
return actions_taken
For developers, this means being able to create AI agents that can interact with existing web applications and interfaces without requiring significant modifications to those applications.
3. Image Processing with Gemini
Gemini 2.0 features impressive image processing capabilities that allow it to understand, analyze, and manipulate images based on natural language instructions:
def process_image(image_data, instruction):
"""Use Gemini for image processing and understanding"""
# Create multimodal prompt
prompt = {
"role": "user",
"content": [
{"type": "text", "text": instruction},
{"type": "image", "image_url": {"url": image_data}}
]
}
# Use Gemini for image processing
gemini_response = gemini_client.chat.completions.create(
model="gemini-2.0-pro",
messages=[prompt]
)
return gemini_response.choices[0].message.content
4. Integration with Google Workspace
A particularly valuable aspect of Gemini is its deep integration with Google Workspace apps and services. When connected to Google Workspace, Gemini can:
- Summarize, get quick answers, and find information from apps like Gmail, Docs, and Drive
- Add and retrieve tasks from Google Tasks
- Create and retrieve notes and lists from Google Keep
- Create and manage events in Google Calendar
For developers building applications that integrate with Google Workspace, the ability to leverage Gemini’s understanding of these services could provide significant value.
Performance Comparison: When to Choose Which Platform
When deciding between OpenAI’s models and Google Gemini for your applications, or considering using both through the integration, several factors should be taken into account:
-
Gemini 2.0 Pro is optimized for low latency, making it crucial for real-time interactions in consumer-facing apps. It benefits from integration with Google’s robust infrastructure, ensuring consistent performance across devices.
-
OpenAI’s models excel in enterprise-level applications where the depth of analysis and precision outweigh the need for ultra-fast responses. This makes them ideal for specialized applications where the quality of reasoning is paramount.
The choice of platform also depends on your existing technology stack and ecosystem:
- If you’re heavily invested in Google Cloud or use Google Workspace extensively, Gemini offers seamless integration with these services.
- If you’ve already built applications using OpenAI’s libraries, the ability to access Gemini models through the familiar OpenAI interface can save development time and resources.
Future Developments and Roadmap
The integration between OpenAI’s Responses API and Google Gemini represents just the beginning of what promises to be an exciting evolution in AI capabilities:
-
OpenAI is working to achieve full feature parity between the Assistants and the Responses API, including support for Assistant-like and Thread-like objects, and the Code Interpreter tool. They plan to formally announce the deprecation of the Assistants API with a target sunset date in mid-2026.
-
Google has indicated plans for additional compatibility between Gemini and OpenAI libraries in the coming months. Initially supporting the Chat Completions API and Embeddings API, we can expect this compatibility to expand to other APIs and features.
-
Both platforms continue to evolve their multimodal capabilities, with Gemini 2.0 already featuring advanced image processing and manipulation. As these capabilities evolve, we can expect even more sophisticated integration of text, images, audio, and potentially video.
Conclusion
The integration between OpenAI’s Responses API and Google Gemini marks a significant milestone in the evolution of AI platforms, providing developers with unprecedented flexibility and power in building intelligent applications for the agentic era. By allowing access to Gemini models through the familiar OpenAI interface, this integration reduces the learning curve and development time for those looking to leverage the strengths of both platforms.
As we’ve explored throughout this guide, both platforms bring unique strengths to the table. OpenAI’s Responses API provides a flexible foundation for building agentic applications with built-in tools like web search and computer use. Gemini offers impressive multimodal capabilities, particularly in image processing and manipulation, and deep integration with Google’s ecosystem of services.
For developers looking to stay at the cutting edge of AI technology, experimenting with both platforms through this integration will be essential for building the next generation of intelligent applications.
Resources
- OpenAI Responses API Documentation
- Google Gemini API Documentation
- Google’s OpenAI Compatibility Layer
- Computer Use API in OpenAI
- Gemini Multimodal Capabilities
Happy building!
Saptak Sen
If you enjoyed this post, you should check out my book: Starting with Spark.