Powerful AI Applications by Integrating OpenAI Responses API with Google Gemini

We’ve entered what industry leaders call the “agentic era” - where AI systems not only understand and generate content but also take actions on behalf of users. OpenAI’s new Responses API and Google’s Gemini models, now accessible through the OpenAI Library, represent significant milestones in this evolution. This post explores how to integrate these technologies, creating versatile applications that leverage the unique advantages of each platform.

Understanding the Platforms

Before diving into the integration, let’s understand what each platform brings to the table:

OpenAI Responses API

OpenAI introduced the Responses API on March 11, 2025, as their new API primitive for leveraging built-in tools to build agents. It combines the simplicity of Chat Completions with the tool-use capabilities of the Assistants API, providing:

Built-in tools: Web search, file search, and computer use (with code interpreter coming soon)
Unified item-based design: For simpler polymorphism
Intuitive streaming events: For real-time interactions
SDK helpers: Like response.output_text to easily access the model’s text output
Event-driven architecture: Clearly emits semantic events detailing precisely what changed

According to OpenAI, they’re working to achieve full feature parity between the Assistants and the Responses API, including support for Assistant-like and Thread-like objects. Once complete, they plan to formally announce the deprecation of the Assistants API with a target sunset date in mid-2026.

Google Gemini

Google’s Gemini models, with the latest iteration being Gemini 2.0 introduced in December 2024, offer:

Multimodal capabilities: Process and generate text, images, audio, and video
Native tool use: Built specifically for the agentic era
Performance strengths: Includes variants optimized for different use cases (Gemini 2.0 Flash for rapid responses, Gemini 2.0 Pro for advanced reasoning)
Deep integration with Google Workspace: Can summarize content, manage tasks, create events, and more when connected
Compatibility with OpenAI libraries: As of March 2025, Google made Gemini accessible through the OpenAI Library and REST API
Hardware optimization: Built on custom hardware like Trillium and Google’s sixth-generation TPUs

The Integration Approach

Our integration strategy leverages the architectural strengths of OpenAI’s Responses API while incorporating Gemini’s models for specific tasks. Here’s the high-level approach:

Primary conversation management: Use OpenAI’s Responses API for managing interactions and overall conversation flow
Task delegation: Identify specialized tasks that benefit from Gemini’s capabilities
API orchestration: Create a middleware layer that routes requests to the appropriate service
Response synthesis: Combine outputs from both systems into coherent responses

Setting Up the Environment

First, let’s set up our development environment with the necessary libraries:

# Create and activate a virtual environment
python -m venv aienv
source aienv/bin/activate  # On Windows: aienv\Scripts\activate

# Install required packages
pip install openai google-generativeai python-dotenv flask

Create a .env file to store your API keys:

OPENAI_API_KEY=your_openai_api_key
GOOGLE_API_KEY=your_google_api_key

Basic Integration Architecture

Here’s our core integration architecture using the Responses API with Gemini:

import os
import openai
import json
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Configure OpenAI client
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Configure Gemini client using OpenAI compatibility layer
gemini_client = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/"
)

# Define a function to use Gemini for specific tasks
def use_gemini(query, task_type):
    """Use Google Gemini for specialized tasks"""
    
    # Select the appropriate Gemini model based on task type
    if task_type == "image_analysis" or task_type == "creative_generation":
        model = "gemini-2.0-pro"
    elif task_type == "mathematical_reasoning" or task_type == "quick_response":
        model = "gemini-2.0-flash"
    else:
        model = "gemini-1.5-flash"  # Default to a balanced model
    
    # Call Gemini API
    response = gemini_client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": f"You are a specialized assistant for {task_type} tasks."},
            {"role": "user", "content": query}
        ]
    )
    
    return response.choices[0].message.content

Creating the Integration Layer

Now, let’s create a function to create a unified Responses API call that can leverage Gemini’s capabilities:

def create_hybrid_response(user_query, response_mode="auto"):
    """
    Create a response using either OpenAI or Gemini based on the query content.
    
    Args:
        user_query: The user's input query
        response_mode: 'auto' (route based on query), 'openai', or 'gemini'
    """
    
    if response_mode == "auto":
        # Simple routing logic - could be enhanced with more sophisticated analysis
        if "image" in user_query.lower() or "picture" in user_query.lower():
            response_mode = "gemini"  # Gemini has strong image capabilities
        elif "math" in user_query.lower() or "calculate" in user_query.lower():
            response_mode = "gemini"  # Gemini is strong at mathematical reasoning
        else:
            response_mode = "openai"  # Default to OpenAI's Responses API
    
    if response_mode == "gemini":
        # Use Gemini for the response
        response = use_gemini(user_query, determine_task_type(user_query))
        return {"source": "gemini", "response": response}
    else:
        # Use OpenAI's Responses API
        response = openai_client.beta.responses.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": user_query}
            ],
            tools=[
                {"type": "web_search"},  # Enable web search for up-to-date information
                {"type": "computer_use"}  # Enable computer use for practical tasks
            ]
        )
        return {"source": "openai", "response": response.output_text}

def determine_task_type(query):
    """Determine the best task type for Gemini based on query content"""
    if "image" in query.lower() or "picture" in query.lower():
        return "image_analysis"
    elif "math" in query.lower() or "calculate" in query.lower():
        return "mathematical_reasoning"
    elif "code" in query.lower() or "program" in query.lower():
        return "code_generation"
    elif "fast" in query.lower() or "quick" in query.lower():
        return "quick_response"
    else:
        return "general"

Building the Full Integration

Let’s put everything together in a Flask application that can leverage both platforms:

from flask import Flask, request, jsonify
import json
import time

app = Flask(__name__)

# Store conversation history for context
conversation_history = {}

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    user_input = data.get('message')
    session_id = data.get('session_id', 'default')
    response_mode = data.get('mode', 'auto')  # 'auto', 'openai', or 'gemini'
    
    # Initialize conversation history if needed
    if session_id not in conversation_history:
        conversation_history[session_id] = []
    
    # Add user message to history
    conversation_history[session_id].append({
        "role": "user", 
        "content": user_input
    })
    
    # Handle file uploads if present
    files = []
    if 'files' in data:
        # Process file uploads - simplified for this example
        file_references = data.get('files', [])
        files = [{"type": file["type"], "content": file["content"]} for file in file_references]
    
    # Enrich the query with file information if applicable
    enriched_query = user_input
    if files:
        enriched_query += "\n[Note: The user has uploaded files that might be relevant to this query.]"
    
    # Get response
    if 'web_search' in data and data['web_search']:
        # If web search is explicitly requested, use OpenAI's Responses API
        response_result = openai_client.beta.responses.create(
            model="gpt-4o",
            messages=conversation_history[session_id],
            tools=[{"type": "web_search"}]
        )
        response_text = response_result.output_text
        source = "openai_web_search"
    else:
        # Otherwise use our hybrid approach
        result = create_hybrid_response(enriched_query, response_mode)
        response_text = result["response"]
        source = result["source"]
    
    # Add assistant response to history
    conversation_history[session_id].append({
        "role": "assistant", 
        "content": response_text
    })
    
    # Limit history size to prevent token overflow
    if len(conversation_history[session_id]) > 20:
        # Remove oldest messages but keep the system message if present
        conversation_history[session_id] = conversation_history[session_id][-20:]
    
    return jsonify({
        "response": response_text,
        "source": source,
        "session_id": session_id
    })

@app.route('/stream-chat', methods=['POST'])
def stream_chat():
    """Endpoint for streaming responses"""
    data = request.json
    user_input = data.get('message')
    session_id = data.get('session_id', 'default')
    
    # Initialize conversation history if needed
    if session_id not in conversation_history:
        conversation_history[session_id] = []
    
    # Add user message to history
    conversation_history[session_id].append({
        "role": "user", 
        "content": user_input
    })
    
    # Use streaming API from OpenAI
    def generate():
        # Use streaming for real-time responses
        stream = openai_client.beta.responses.create(
            model="gpt-4o",
            messages=conversation_history[session_id],
            stream=True
        )
        
        full_response = ""
        for chunk in stream:
            if chunk.choices[0].delta:
                content = chunk.choices[0].delta.content or ""
                full_response += content
                yield f"data: {json.dumps({'content': content, 'done': False})}\n\n"
        
        # Add the complete response to history
        conversation_history[session_id].append({
            "role": "assistant", 
            "content": full_response
        })
        
        yield f"data: {json.dumps({'content': '', 'done': True})}\n\n"
    
    return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
    app.run(debug=True)

Using the OpenAI Compatibility Layer for Gemini

As announced in March 2025, Google now offers direct compatibility with OpenAI libraries, which makes integration even simpler. This compatibility layer allows developers to access Gemini models using the familiar OpenAI interface with just three simple changes to their existing code:

from openai import OpenAI

# Initialize the client with Gemini configuration
client = OpenAI(
    api_key="gemini_api_key",
    base_url="https://generativelanguage.googleapis.com/v1beta/"
)

# Send a request to Gemini using OpenAI's client
response = client.chat.completions.create(
    model="gemini-2.0-flash",  # Use Gemini models
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ]
)

print(response.choices[0].message)

For Node.js developers, the implementation is equally straightforward:

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: "gemini_api_key",
    baseURL: "https://generativelanguage.googleapis.com/v1beta/"
});

const response = await openai.chat.completions.create({
    model: "gemini-2.0-flash",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Explain quantum computing in simple terms" },
    ],
});

console.log(response.choices[0].message);

The integration also supports streaming responses, which is crucial for creating responsive user interfaces:

response = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta)

This compatibility layer makes it much easier to experiment with Gemini models without having to learn an entirely new API structure.

Practical Applications

Let’s explore some practical applications of this integration that leverage the built-in capabilities of both platforms:

1. Enhanced Web Search with OpenAI’s Web Search API

One of the built-in tools in OpenAI’s Responses API is web search, which allows developers to create applications that can search the web and present findings with proper citations. This is valuable for applications that need to provide up-to-date information.

def web_search_with_citations(query):
    """Perform a web search with citations using OpenAI's Responses API"""
    
    response = openai_client.beta.responses.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a research assistant. Search the web for accurate information and include citations."},
            {"role": "user", "content": query}
        ],
        tools=[{"type": "web_search"}]
    )
    
    # Extract the answer with citations
    answer = response.output_text
    
    return answer

This capability is particularly valuable for research assistants, content creation tools, and educational applications that need to provide accurate and verifiable information from the web.

2. Computer Use API: Enabling AI to Navigate Interfaces

Another powerful tool introduced with the Responses API is the computer use API, which allows AI to navigate browsers, interact with web interfaces, and perform tasks:

def use_computer_interface(task_description):
    """Use the computer use API to perform tasks on web interfaces"""
    
    response = openai_client.beta.responses.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a automation assistant that can use a computer to perform tasks."},
            {"role": "user", "content": task_description}
        ],
        tools=[{"type": "computer_use"}]
    )
    
    # The response will contain a record of the actions taken
    actions_taken = response.output_text
    
    return actions_taken

For developers, this means being able to create AI agents that can interact with existing web applications and interfaces without requiring significant modifications to those applications.

3. Image Processing with Gemini

Gemini 2.0 features impressive image processing capabilities that allow it to understand, analyze, and manipulate images based on natural language instructions:

def process_image(image_data, instruction):
    """Use Gemini for image processing and understanding"""
    
    # Create multimodal prompt
    prompt = {
        "role": "user",
        "content": [
            {"type": "text", "text": instruction},
            {"type": "image", "image_url": {"url": image_data}}
        ]
    }
    
    # Use Gemini for image processing
    gemini_response = gemini_client.chat.completions.create(
        model="gemini-2.0-pro",
        messages=[prompt]
    )
    
    return gemini_response.choices[0].message.content

4. Integration with Google Workspace

A particularly valuable aspect of Gemini is its deep integration with Google Workspace apps and services. When connected to Google Workspace, Gemini can:

Summarize, get quick answers, and find information from apps like Gmail, Docs, and Drive
Add and retrieve tasks from Google Tasks
Create and retrieve notes and lists from Google Keep
Create and manage events in Google Calendar

For developers building applications that integrate with Google Workspace, the ability to leverage Gemini’s understanding of these services could provide significant value.

Performance Comparison: When to Choose Which Platform

When deciding between OpenAI’s models and Google Gemini for your applications, or considering using both through the integration, several factors should be taken into account:

Gemini 2.0 Pro is optimized for low latency, making it crucial for real-time interactions in consumer-facing apps. It benefits from integration with Google’s robust infrastructure, ensuring consistent performance across devices.
OpenAI’s models excel in enterprise-level applications where the depth of analysis and precision outweigh the need for ultra-fast responses. This makes them ideal for specialized applications where the quality of reasoning is paramount.

The choice of platform also depends on your existing technology stack and ecosystem:

If you’re heavily invested in Google Cloud or use Google Workspace extensively, Gemini offers seamless integration with these services.
If you’ve already built applications using OpenAI’s libraries, the ability to access Gemini models through the familiar OpenAI interface can save development time and resources.

Future Developments and Roadmap

The integration between OpenAI’s Responses API and Google Gemini represents just the beginning of what promises to be an exciting evolution in AI capabilities:

OpenAI is working to achieve full feature parity between the Assistants and the Responses API, including support for Assistant-like and Thread-like objects, and the Code Interpreter tool. They plan to formally announce the deprecation of the Assistants API with a target sunset date in mid-2026.
Google has indicated plans for additional compatibility between Gemini and OpenAI libraries in the coming months. Initially supporting the Chat Completions API and Embeddings API, we can expect this compatibility to expand to other APIs and features.
Both platforms continue to evolve their multimodal capabilities, with Gemini 2.0 already featuring advanced image processing and manipulation. As these capabilities evolve, we can expect even more sophisticated integration of text, images, audio, and potentially video.

Conclusion

The integration between OpenAI’s Responses API and Google Gemini marks a significant milestone in the evolution of AI platforms, providing developers with unprecedented flexibility and power in building intelligent applications for the agentic era. By allowing access to Gemini models through the familiar OpenAI interface, this integration reduces the learning curve and development time for those looking to leverage the strengths of both platforms.

As we’ve explored throughout this guide, both platforms bring unique strengths to the table. OpenAI’s Responses API provides a flexible foundation for building agentic applications with built-in tools like web search and computer use. Gemini offers impressive multimodal capabilities, particularly in image processing and manipulation, and deep integration with Google’s ecosystem of services.

For developers looking to stay at the cutting edge of AI technology, experimenting with both platforms through this integration will be essential for building the next generation of intelligent applications.

Resources

Happy building!

Agentic AI: Why and How to Implement Autonomous AI Systems

Saptak Sen

If you enjoyed this post, you should check out my book: Starting with Spark.