Google Gemini API & AI Studio – Developer Guide
📌 Last Updated: June 2026. This post covers the Gemini model family (2.5 Pro, 2.5 Flash, Flash-Lite, Nano), Google AI Studio vs Vertex AI Studio, Gemini API capabilities, pricing tiers, rate limits, safety settings, Gemini Code Assist, and comparison with OpenAI GPT and Anthropic Claude.
- Google Gemini is Google’s family of multimodal AI models that can process text, images, video, audio, and code.
- Gemini models are available through the Gemini Developer API (via Google AI Studio) and through Google Cloud’s Vertex AI (now Gemini Enterprise Agent Platform).
- Gemini 2.5 Pro and 2.5 Flash became generally available (GA) in June 2025, providing production-ready stability and scalability.
- The model family spans from Gemini 2.5 Pro (most capable) to Gemini Nano (on-device), covering cloud API, enterprise, and edge use cases.
- Gemini supports a 1M+ token context window, the largest among frontier models, enabling processing of entire codebases, long documents, and hours of video in a single prompt.
Gemini Model Family
- Gemini 2.5 Pro – Google’s most capable model for complex reasoning, coding, and multimodal tasks.
- 1M token context window (input), up to 66K output tokens
- Excels at coding, mathematical reasoning, scientific analysis, and multi-step problem solving
- Supports “thinking” mode with configurable thinking budgets for chain-of-thought reasoning
- Natively multimodal – processes text, images, audio, video, and PDFs
- Supports function calling, structured output (JSON mode), grounding with Google Search, and code execution
- GA since June 2025; model ID:
gemini-2.5-pro
- Gemini 2.5 Flash – Hybrid reasoning model optimized for speed and cost-efficiency.
- 1M token context window with thinking capabilities (first Flash model with thinking)
- Configurable thinking budgets – control reasoning depth vs latency tradeoff
- Excellent for production workloads needing fast responses at lower cost
- Supports all Pro capabilities: multimodal input, function calling, grounding, JSON mode
- ~4x cheaper than Pro for input tokens, ~4x cheaper for output tokens
- GA since June 2025; model ID:
gemini-2.5-flash
- Gemini 2.5 Flash-Lite – Most cost-efficient cloud model for high-volume tasks.
- 1M token context window
- Optimized for high-throughput, cost-sensitive workloads: classification, translation, simple data processing
- Pricing starts at $0.10/1M input tokens and $0.40/1M output tokens
- Supports grounding with Google Search and Google Maps
- Model ID:
gemini-2.5-flash-lite
- Gemini Nano – On-device model for Android and Chrome.
- Runs natively on device hardware (NPU/GPU) without cloud connectivity
- Available on Pixel 8 Pro, Pixel 9/10 series, Samsung Galaxy S24+ and later
- Supports summarization, smart reply, proofreading, rewriting, and image description
- Available in Chrome via the Prompt API (downloaded automatically with browser updates)
- Privacy-preserving – all processing stays on device
- Accessible via ML Kit GenAI APIs on Android and AICore
- Supports hybrid inference – dynamically switches between on-device Nano and cloud-hosted Gemini models
Google AI Studio vs Vertex AI Studio
- Google AI Studio (aistudio.google.com) – Free, web-based IDE for prototyping with Gemini.
- Quick experimentation with prompts, no Google Cloud account required
- Get an API key instantly for development
- Supports prompt testing, side-by-side model comparison, and code export
- Build mode for vibe-coding full-stack applications directly in the browser
- One-click deployment to Google Cloud Run (up to 2 apps free via Starter Tier)
- Free tier available with generous rate limits
- Content on free tier may be used to improve Google products
- Best for: rapid prototyping, learning, hackathons, individual developers
- Vertex AI Studio / Gemini Enterprise Agent Platform (Google Cloud Console) – Enterprise-grade AI platform.
- Requires Google Cloud project with billing enabled
- Full IAM, VPC, audit logging, and enterprise security controls
- Model fine-tuning (SFT), RAG Engine, model evaluation, and ML pipelines
- Access to Model Garden with 200+ models (not just Gemini)
- Provisioned throughput for guaranteed capacity
- Data residency and compliance (HIPAA, SOC 2, FedRAMP)
- Agent Builder for no-code conversational agent development
- Content is never used to improve Google products
- Best for: production enterprise deployments, regulated industries, multi-model workflows
💡 Certification Tip: If a question describes quick prototyping with no GCP account, the answer is Google AI Studio. If it mentions IAM, VPC, fine-tuning, RAG, or ML pipelines, the answer is Vertex AI / Gemini Enterprise Agent Platform.
Gemini API Capabilities
Multimodal Input & Output
- Text – Natural language understanding, generation, summarization, translation, and Q&A
- Images – Image understanding (describe, analyze, OCR) and native image generation (Gemini 2.5 Flash Image)
- Video – Process and understand video content up to hours in length; video generation via Veo models
- Audio – Audio understanding, speech-to-text, native audio output, and text-to-speech (TTS)
- Code – Code generation, debugging, explanation, refactoring across 20+ programming languages
- Documents/PDFs – Process entire PDF documents natively with layout understanding
Function Calling
- Connect Gemini to external tools, APIs, and databases
- Model determines when to call a function and provides structured parameters
- Supports parallel function calling (multiple functions in one turn)
- Automatic function calling mode available in SDKs
- Gemini 3+ models generate unique IDs for each function call for tracing
- Works with both Google AI Studio and Vertex AI endpoints
Grounding with Google Search
- Connects Gemini to real-time, publicly available web content
- Provides accurate, up-to-date answers with cited verifiable sources beyond model’s training cutoff
- Returns grounding metadata with source URLs and support chunks
- Supports dynamic retrieval – only charges when grounding actually contributes to response
- Works with all available languages
- Rate limits: Free tier gets 500 RPD (requests per day); Paid tier gets 1,500 RPD free then $35/1,000 grounded prompts
- Limit of 1M queries per day (contact support for higher)
- Respects robots.txt Google-Extended directives from web publishers
Structured Output (JSON Mode)
- Force Gemini to output valid JSON conforming to a provided schema
- Specify response schema using JSON Schema format
- Guarantees parseable output for programmatic consumption
- Supports enums, nested objects, arrays, and optional fields
- Set
response_mime_type: "application/json"in generation config
System Instructions
- Set persistent behavioral guidelines that apply across all turns in a conversation
- Define persona, tone, output format, safety constraints, and domain expertise
- System instructions are separate from user messages and persist throughout the session
- Supports multi-part system instructions for complex configurations
Context Caching
- Cache large input contexts (documents, code repos) and reuse across multiple requests
- Reduces latency and cost for repeated context (up to 90% cheaper for cached tokens)
- Minimum cache size: 32,768 tokens
- Storage price: $1.00–$4.50 per 1M tokens per hour depending on model
- Available on paid tier only
Additional Capabilities
- Code Execution – Model can write and run Python code in a sandboxed environment to solve problems
- URL Context – Fetch and process content from URLs as part of the prompt
- Computer Use – Build browser control agents that automate tasks (Preview)
- File Search – Upload documents and perform semantic search across them
- Live API – Real-time, low-latency bidirectional streaming for voice/video applications
- Batch API – Process large volumes asynchronously at 50% cost reduction
- Thinking/Reasoning – Configurable chain-of-thought with thinking budgets and thought signatures
Context Window – 1M+ Tokens
- Gemini 2.5 Pro and Flash support a 1,000,000 token context window
- This is equivalent to approximately:
- ~750,000 words (longer than the entire Lord of the Rings trilogy)
- ~1.5 hours of video
- ~11 hours of audio
- ~30,000 lines of code
- Enables processing entire codebases, lengthy legal documents, research papers, or video content in a single prompt
- Output token limits: up to 66K for Pro, 65K for Flash
- Pricing tiers differ based on prompt length (≤200K tokens vs >200K tokens for Pro)
- Context caching available to reduce costs for repeated large-context queries
Gemini in Google Cloud (Vertex AI / Gemini Enterprise Agent Platform)
- Gemini models are available through Google Cloud’s enterprise AI platform (formerly Vertex AI, now Gemini Enterprise Agent Platform as of Cloud Next 2026)
- Provides enterprise-grade features beyond the Developer API:
- Fine-tuning (SFT) – Supervised fine-tuning on custom datasets
- RAG Engine – Built-in Retrieval-Augmented Generation with managed vector stores
- Model Evaluation – Automated evaluation pipelines with custom metrics
- Provisioned Throughput – Guaranteed capacity for latency-sensitive applications
- VPC Service Controls – Network isolation and data exfiltration prevention
- CMEK – Customer-managed encryption keys for data at rest
- Agent Builder – No-code platform for building conversational agents with grounding
- Same Gemini models as the Developer API but with enterprise SLAs and compliance certifications
- Supports HIPAA, SOC 1/2/3, ISO 27001, FedRAMP, PCI DSS compliance
- Data is never used to train or improve Google models
- Pricing may differ from Developer API; check Gemini Enterprise Agent Platform pricing page
Gemini Code Assist (IDE Integration)
- AI-powered coding assistant integrated directly into IDEs (VS Code and JetBrains IDEs)
- Key Capabilities:
- Inline code completions while typing
- Code generation from natural language prompts and comments
- Code transformation and refactoring via chat
- Smart actions (explain code, generate tests, fix bugs)
- Full-project context awareness with file/folder specification
- Agent mode for multi-step autonomous coding tasks (since Oct 2025)
- Custom commands and rules configuration
- Source citations for generated code
- Editions:
- Free tier – Available for individual developers via Google AI (Individual, Pro, Ultra tiers since June 2026)
- Standard – For teams, includes features beyond the IDE
- Enterprise – Large-context analysis (up to 1M tokens) across indexed repositories, integration with Google Cloud services, code customization
- Enterprise edition integrates with Google Cloud services: Cloud Build, Cloud Run, Cloud Logging
- Supports large-context analysis across entire repositories
Pricing Tiers (Free vs Paid)
Free Tier
- Available through Google AI Studio – no billing account required
- Access to Gemini 2.5 Pro, 2.5 Flash, and Flash-Lite models
- Free input and output tokens within rate limits
- Grounding with Google Search: up to 500 RPD (Flash/Flash-Lite)
- Content may be used to improve Google products
- Lower rate limits (5–15 RPM depending on model)
- No access to context caching, Batch API, or some advanced features
Paid Tier
- Link a billing account and prepay minimum $10 to upgrade
- Higher rate limits (150–300+ RPM at Tier 1)
- Access to context caching, Batch API (50% cost reduction), Flex and Priority inference
- Content NOT used to improve Google products (enterprise-grade data privacy)
- Tiered system based on cumulative spend: Tier 1 → Tier 2 → Tier 3 (postpay option)
Key Pricing (per 1M tokens, Standard tier)
- Gemini 2.5 Pro: $1.25 input (≤200K) / $2.50 (>200K) | $10.00 output (≤200K) / $15.00 (>200K)
- Gemini 2.5 Flash: $0.30 input (text/image/video) | $2.50 output
- Gemini 2.5 Flash-Lite: $0.10 input (text/image/video) | $0.40 output
- Grounding with Google Search: 1,500 RPD free, then $35/1,000 grounded prompts (2.5 models)
- Batch API: 50% discount on standard pricing across all models
- Context Caching: ~10% of input price per cached token read + storage fee per hour
Enterprise Tier (Gemini Enterprise Agent Platform)
- Custom pricing based on usage volume
- Dedicated support channels, advanced security, compliance certifications
- Provisioned throughput and volume-based discounts
- Contact Google Cloud sales for pricing
Rate Limits
- Rate limits are determined at the billing account level and vary by tier and model
- Free Tier:
- Gemini 2.5 Pro: ~5 RPM (requests per minute)
- Gemini 2.5 Flash: ~15 RPM
- Gemini 2.5 Flash-Lite: ~15 RPM
- Up to 250,000 tokens per minute
- Up to 1,000 requests per day
- Paid Tier 1:
- 150–300 RPM depending on model
- Higher token per minute limits
- Higher daily request limits
- Paid Tier 2–3: Progressively higher limits based on cumulative spend and account age
- Rate limit dimensions: RPM (requests/minute), TPM (tokens/minute), RPD (requests/day)
- Grounding with Google Search: 500 RPD (free) / 1,500 RPD free then pay-per-use (paid)
- Exceeding limits returns HTTP 429 (Resource Exhausted) – implement exponential backoff
Safety Settings
- Gemini API includes configurable content safety filters across multiple harm categories
- Harm Categories:
- HARM_CATEGORY_HARASSMENT – Harassment and bullying content
- HARM_CATEGORY_HATE_SPEECH – Hate speech targeting protected groups
- HARM_CATEGORY_SEXUALLY_EXPLICIT – Sexual content
- HARM_CATEGORY_DANGEROUS_CONTENT – Dangerous or harmful activities
- HARM_CATEGORY_CIVIC_INTEGRITY – Election/civic misinformation
- Blocking Thresholds:
- BLOCK_NONE – No blocking (may still have some restrictions)
- BLOCK_ONLY_HIGH – Block only high-probability unsafe content
- BLOCK_MEDIUM_AND_ABOVE – Block medium and high (default)
- BLOCK_LOW_AND_ABOVE – Most restrictive setting
- Safety ratings are provided for each response with probability levels: HIGH, MEDIUM, LOW, NEGLIGIBLE
- Filters are configurable (default off for paid tier) – can be adjusted per request
- System instructions can add additional safety guardrails on top of content filters
- Image generation has additional responsible AI filters (no violent extremism, no CSAM, no non-consensual imagery)
Comparison: Gemini vs OpenAI GPT vs Anthropic Claude
- All three platforms (Gemini, GPT, Claude) offer near-parity on general reasoning and coding benchmarks as of 2026
- Context Window:
- Gemini 2.5 Pro: 1,000,000 tokens (largest)
- Claude Opus 4: 200,000 tokens
- GPT-4o / GPT-5: 128,000–256,000 tokens
- Multimodal:
- Gemini: Native multimodal (text, image, video, audio, code, PDF) – strongest video/audio understanding
- GPT-4o: Text, image, audio (limited video)
- Claude: Text, image (no native audio/video)
- Coding:
- Claude Opus dominates SWE-bench Verified (~80-88% scores) – best for complex agentic coding
- Gemini 2.5 Pro: Strong coding with unique large-context advantage for full-repo understanding
- GPT-5: Strong general coding with excellent structured output
- Pricing (per 1M tokens, approximate):
- Gemini 2.5 Pro: $1.25 / $10.00 (input/output)
- Claude Opus 4: $5.00 / $25.00 (input/output)
- GPT-4o: $2.50 / $10.00 (input/output)
- Gemini 2.5 Flash: $0.30 / $2.50 – significantly cheaper than competitors’ mid-tier models
- Unique Strengths:
- Gemini: Largest context window, best multimodal (especially video/audio), native Google Search grounding, most cost-efficient at scale
- Claude: Best coding assistant, strongest safety/alignment, excellent for agentic multi-step tasks
- GPT: Strongest general reasoning and math, best ecosystem/plugin support, excellent structured output
- Integration:
- Gemini: Google Cloud, Android, Chrome, Google Workspace
- GPT: Azure OpenAI, Microsoft ecosystem
- Claude: AWS Bedrock, direct API
When to Use Which Gemini Model Size
- Use Gemini 2.5 Pro when:
- Complex multi-step reasoning is required
- Processing very large codebases or documents (full-repo analysis)
- Highest quality output is more important than cost/latency
- Advanced coding tasks: architecture decisions, complex refactoring, multi-file changes
- Scientific research, mathematical proofs, legal analysis
- Use Gemini 2.5 Flash when:
- Production applications needing balance of quality and speed
- Real-time user-facing applications (chatbots, assistants)
- Tasks requiring reasoning but with latency constraints
- General-purpose coding assistance, summarization, Q&A
- Budget-conscious applications that still need thinking capabilities
- Use Gemini 2.5 Flash-Lite when:
- High-volume, cost-sensitive workloads at scale
- Simple classification, entity extraction, sentiment analysis
- Translation and localization tasks
- Data processing pipelines with thousands of requests
- Tasks where speed and cost matter more than reasoning depth
- Use Gemini Nano when:
- On-device processing with no internet connectivity
- Privacy-sensitive applications (data never leaves device)
- Low-latency responses on mobile devices
- Smart replies, text summarization, image descriptions on Android
- Hybrid inference (Nano for simple queries, cloud for complex ones)
Google Gemini API & AI Studio – Practice Questions
- A developer needs to build a prototype chatbot using Gemini models with no Google Cloud account setup. Which service should they use?
- A. Vertex AI Studio
- B. Google AI Studio
- C. Cloud Run
- D. Firebase ML
Answer: B – Google AI Studio provides free access to Gemini models for prototyping without requiring a GCP account.
- Which Gemini model offers the largest context window for processing entire codebases in a single prompt?
- A. Gemini Nano
- B. Gemini 2.5 Flash-Lite
- C. Gemini 2.5 Pro
- D. GPT-4o
Answer: C – Gemini 2.5 Pro supports a 1M token context window, the largest among frontier models.
- A company requires enterprise security controls, HIPAA compliance, and fine-tuning capabilities for their Gemini deployment. Which platform should they choose?
- A. Google AI Studio Free Tier
- B. Google AI Studio Paid Tier
- C. Gemini Enterprise Agent Platform (Vertex AI)
- D. Gemini Code Assist
Answer: C – Gemini Enterprise Agent Platform provides enterprise security, compliance, fine-tuning, and VPC controls.
- Which Gemini API feature allows the model to access real-time information beyond its training data cutoff?
- A. Function Calling
- B. Context Caching
- C. Grounding with Google Search
- D. System Instructions
Answer: C – Grounding with Google Search connects Gemini to real-time web content and provides cited sources.
- A mobile app developer needs AI summarization that works offline on Android devices with no data leaving the device. Which model is appropriate?
- A. Gemini 2.5 Flash via API
- B. Gemini 2.5 Pro via Vertex AI
- C. Gemini Nano via ML Kit GenAI APIs
- D. Gemini 2.5 Flash-Lite via Batch API
Answer: C – Gemini Nano runs on-device without cloud connectivity, providing privacy-preserving AI processing.
- Which configuration ensures Gemini API always returns valid JSON that conforms to a specific schema? (Select TWO)
- A. Set response_mime_type to “application/json”
- B. Include a JSON example in the prompt
- C. Provide a response_schema in the generation config
- D. Use grounding with Google Search
- E. Enable function calling
Answer: A, C – Setting response_mime_type to “application/json” and providing a response_schema guarantees structured JSON output.
- A startup wants to minimize API costs while processing 100,000 classification requests daily. Which combination offers the lowest cost?
- A. Gemini 2.5 Pro with Standard inference
- B. Gemini 2.5 Flash with Priority inference
- C. Gemini 2.5 Flash-Lite with Batch API
- D. Gemini 2.5 Flash with context caching
Answer: C – Flash-Lite ($0.10/$0.40) with Batch API (50% discount) gives the lowest cost for high-volume simple tasks.
- What is the primary difference between the Free and Paid tiers of the Gemini Developer API regarding data usage?
- A. Free tier has no rate limits; Paid tier has rate limits
- B. Free tier content may be used to improve Google products; Paid tier content is not used
- C. Free tier only supports text; Paid tier supports multimodal
- D. Free tier uses older models; Paid tier uses newer models
Answer: B – Free tier content may be used to improve Google products, while paid tier provides enterprise-grade data privacy.
- Which safety setting category in the Gemini API is used to filter content related to election misinformation?
- A. HARM_CATEGORY_HARASSMENT
- B. HARM_CATEGORY_HATE_SPEECH
- C. HARM_CATEGORY_CIVIC_INTEGRITY
- D. HARM_CATEGORY_DANGEROUS_CONTENT
Answer: C – HARM_CATEGORY_CIVIC_INTEGRITY covers election and civic misinformation content.
- A developer is hitting rate limits (HTTP 429) on the free tier of Gemini API. What are valid options to increase throughput? (Select TWO)
- A. Implement exponential backoff retry logic
- B. Upgrade to paid tier by linking a billing account
- C. Switch from Pro to Nano model
- D. Disable safety settings
- E. Use system instructions to request faster processing
Answer: A, B – Exponential backoff handles transient limits; upgrading to paid tier increases RPM from 5-15 to 150-300+.