Table of Contents hide

Google Gemini API & AI Studio – Developer Guide

Gemini API Capabilities

Context Window – 1M+ Tokens

Gemini in Google Cloud (Vertex AI / Gemini Enterprise Agent Platform)

Gemini Code Assist (IDE Integration)

Pricing Tiers (Free vs Paid)

Rate Limits

Safety Settings

Comparison: Gemini vs OpenAI GPT vs Anthropic Claude

When to Use Which Gemini Model Size

Google Gemini API & AI Studio – Practice Questions

Google Gemini API & AI Studio – Developer Guide

📌 Last Updated: June 2026. This post covers the Gemini model family (2.5 Pro, 2.5 Flash, Flash-Lite, Nano), Google AI Studio vs Vertex AI Studio, Gemini API capabilities, pricing tiers, rate limits, safety settings, Gemini Code Assist, and comparison with OpenAI GPT and Anthropic Claude.

Google Gemini is Google’s family of multimodal AI models that can process text, images, video, audio, and code.
Gemini models are available through the Gemini Developer API (via Google AI Studio) and through Google Cloud’s Vertex AI (now Gemini Enterprise Agent Platform).

Gemini 2.5 Pro and 2.5 Flash became generally available (GA) in June 2025, providing production-ready stability and scalability.
The model family spans from Gemini 2.5 Pro (most capable) to Gemini Nano (on-device), covering cloud API, enterprise, and edge use cases.
Gemini supports a 1M+ token context window, the largest among frontier models, enabling processing of entire codebases, long documents, and hours of video in a single prompt.

Gemini Model Family

Gemini 2.5 Pro – Google’s most capable model for complex reasoning, coding, and multimodal tasks.
- 1M token context window (input), up to 66K output tokens
- Excels at coding, mathematical reasoning, scientific analysis, and multi-step problem solving
- Supports “thinking” mode with configurable thinking budgets for chain-of-thought reasoning
- Natively multimodal – processes text, images, audio, video, and PDFs
- Supports function calling, structured output (JSON mode), grounding with Google Search, and code execution
- GA since June 2025; model ID: gemini-2.5-pro

Gemini 2.5 Flash – Hybrid reasoning model optimized for speed and cost-efficiency.
- 1M token context window with thinking capabilities (first Flash model with thinking)
- Configurable thinking budgets – control reasoning depth vs latency tradeoff
- Excellent for production workloads needing fast responses at lower cost
- Supports all Pro capabilities: multimodal input, function calling, grounding, JSON mode
- ~4x cheaper than Pro for input tokens, ~4x cheaper for output tokens
- GA since June 2025; model ID: gemini-2.5-flash

Gemini 2.5 Flash-Lite – Most cost-efficient cloud model for high-volume tasks.
- 1M token context window
- Optimized for high-throughput, cost-sensitive workloads: classification, translation, simple data processing
- Pricing starts at $0.10/1M input tokens and $0.40/1M output tokens
- Supports grounding with Google Search and Google Maps
- Model ID: gemini-2.5-flash-lite

Gemini Nano – On-device model for Android and Chrome.
- Runs natively on device hardware (NPU/GPU) without cloud connectivity
- Available on Pixel 8 Pro, Pixel 9/10 series, Samsung Galaxy S24+ and later
- Supports summarization, smart reply, proofreading, rewriting, and image description
- Available in Chrome via the Prompt API (downloaded automatically with browser updates)
- Privacy-preserving – all processing stays on device
- Accessible via ML Kit GenAI APIs on Android and AICore
- Supports hybrid inference – dynamically switches between on-device Nano and cloud-hosted Gemini models

Google AI Studio vs Vertex AI Studio

Google AI Studio (aistudio.google.com) – Free, web-based IDE for prototyping with Gemini.
- Quick experimentation with prompts, no Google Cloud account required
- Get an API key instantly for development
- Supports prompt testing, side-by-side model comparison, and code export
- Build mode for vibe-coding full-stack applications directly in the browser
- One-click deployment to Google Cloud Run (up to 2 apps free via Starter Tier)
- Free tier available with generous rate limits
- Content on free tier may be used to improve Google products
- Best for: rapid prototyping, learning, hackathons, individual developers

Vertex AI Studio / Gemini Enterprise Agent Platform (Google Cloud Console) – Enterprise-grade AI platform.
- Requires Google Cloud project with billing enabled
- Full IAM, VPC, audit logging, and enterprise security controls
- Model fine-tuning (SFT), RAG Engine, model evaluation, and ML pipelines
- Access to Model Garden with 200+ models (not just Gemini)
- Provisioned throughput for guaranteed capacity
- Data residency and compliance (HIPAA, SOC 2, FedRAMP)
- Agent Builder for no-code conversational agent development
- Content is never used to improve Google products
- Best for: production enterprise deployments, regulated industries, multi-model workflows

💡 Certification Tip: If a question describes quick prototyping with no GCP account, the answer is Google AI Studio. If it mentions IAM, VPC, fine-tuning, RAG, or ML pipelines, the answer is Vertex AI / Gemini Enterprise Agent Platform.

Gemini API Capabilities

Multimodal Input & Output

Text – Natural language understanding, generation, summarization, translation, and Q&A
Images – Image understanding (describe, analyze, OCR) and native image generation (Gemini 2.5 Flash Image)

Video – Process and understand video content up to hours in length; video generation via Veo models
Audio – Audio understanding, speech-to-text, native audio output, and text-to-speech (TTS)
Code – Code generation, debugging, explanation, refactoring across 20+ programming languages

Documents/PDFs – Process entire PDF documents natively with layout understanding

Function Calling

Connect Gemini to external tools, APIs, and databases
Model determines when to call a function and provides structured parameters
Supports parallel function calling (multiple functions in one turn)
Automatic function calling mode available in SDKs
Gemini 3+ models generate unique IDs for each function call for tracing

Works with both Google AI Studio and Vertex AI endpoints

Grounding with Google Search

Connects Gemini to real-time, publicly available web content
Provides accurate, up-to-date answers with cited verifiable sources beyond model’s training cutoff
Returns grounding metadata with source URLs and support chunks
Supports dynamic retrieval – only charges when grounding actually contributes to response

Works with all available languages
Rate limits: Free tier gets 500 RPD (requests per day); Paid tier gets 1,500 RPD free then $35/1,000 grounded prompts
Limit of 1M queries per day (contact support for higher)
Respects robots.txt Google-Extended directives from web publishers

Structured Output (JSON Mode)

Force Gemini to output valid JSON conforming to a provided schema

Specify response schema using JSON Schema format
Guarantees parseable output for programmatic consumption
Supports enums, nested objects, arrays, and optional fields
Set response_mime_type: "application/json" in generation config

System Instructions

Set persistent behavioral guidelines that apply across all turns in a conversation
Define persona, tone, output format, safety constraints, and domain expertise
System instructions are separate from user messages and persist throughout the session

Supports multi-part system instructions for complex configurations

Context Caching

Cache large input contexts (documents, code repos) and reuse across multiple requests
Reduces latency and cost for repeated context (up to 90% cheaper for cached tokens)
Minimum cache size: 32,768 tokens
Storage price: $1.00–$4.50 per 1M tokens per hour depending on model

Available on paid tier only

Additional Capabilities

Code Execution – Model can write and run Python code in a sandboxed environment to solve problems
URL Context – Fetch and process content from URLs as part of the prompt
Computer Use – Build browser control agents that automate tasks (Preview)

File Search – Upload documents and perform semantic search across them
Live API – Real-time, low-latency bidirectional streaming for voice/video applications
Batch API – Process large volumes asynchronously at 50% cost reduction

Thinking/Reasoning – Configurable chain-of-thought with thinking budgets and thought signatures

Context Window – 1M+ Tokens

Gemini 2.5 Pro and Flash support a 1,000,000 token context window
This is equivalent to approximately:
- ~750,000 words (longer than the entire Lord of the Rings trilogy)
- ~1.5 hours of video
- ~11 hours of audio
- ~30,000 lines of code
Enables processing entire codebases, lengthy legal documents, research papers, or video content in a single prompt
Output token limits: up to 66K for Pro, 65K for Flash

Pricing tiers differ based on prompt length (≤200K tokens vs >200K tokens for Pro)
Context caching available to reduce costs for repeated large-context queries

Gemini in Google Cloud (Vertex AI / Gemini Enterprise Agent Platform)

Gemini models are available through Google Cloud’s enterprise AI platform (formerly Vertex AI, now Gemini Enterprise Agent Platform as of Cloud Next 2026)

Provides enterprise-grade features beyond the Developer API:
- Fine-tuning (SFT) – Supervised fine-tuning on custom datasets
- RAG Engine – Built-in Retrieval-Augmented Generation with managed vector stores
- Model Evaluation – Automated evaluation pipelines with custom metrics
- Provisioned Throughput – Guaranteed capacity for latency-sensitive applications
- VPC Service Controls – Network isolation and data exfiltration prevention
- CMEK – Customer-managed encryption keys for data at rest
- Agent Builder – No-code platform for building conversational agents with grounding
Same Gemini models as the Developer API but with enterprise SLAs and compliance certifications
Supports HIPAA, SOC 1/2/3, ISO 27001, FedRAMP, PCI DSS compliance

Data is never used to train or improve Google models
Pricing may differ from Developer API; check Gemini Enterprise Agent Platform pricing page

Gemini Code Assist (IDE Integration)

AI-powered coding assistant integrated directly into IDEs (VS Code and JetBrains IDEs)

Key Capabilities:
- Inline code completions while typing
- Code generation from natural language prompts and comments
- Code transformation and refactoring via chat
- Smart actions (explain code, generate tests, fix bugs)
- Full-project context awareness with file/folder specification
- Agent mode for multi-step autonomous coding tasks (since Oct 2025)
- Custom commands and rules configuration
- Source citations for generated code
Editions:
- Free tier – Available for individual developers via Google AI (Individual, Pro, Ultra tiers since June 2026)
- Standard – For teams, includes features beyond the IDE
- Enterprise – Large-context analysis (up to 1M tokens) across indexed repositories, integration with Google Cloud services, code customization
Enterprise edition integrates with Google Cloud services: Cloud Build, Cloud Run, Cloud Logging

Supports large-context analysis across entire repositories

Pricing Tiers (Free vs Paid)

Free Tier

Available through Google AI Studio – no billing account required
Access to Gemini 2.5 Pro, 2.5 Flash, and Flash-Lite models
Free input and output tokens within rate limits
Grounding with Google Search: up to 500 RPD (Flash/Flash-Lite)
Content may be used to improve Google products
Lower rate limits (5–15 RPM depending on model)
No access to context caching, Batch API, or some advanced features

Paid Tier

Link a billing account and prepay minimum $10 to upgrade
Higher rate limits (150–300+ RPM at Tier 1)
Access to context caching, Batch API (50% cost reduction), Flex and Priority inference
Content NOT used to improve Google products (enterprise-grade data privacy)

Tiered system based on cumulative spend: Tier 1 → Tier 2 → Tier 3 (postpay option)

Key Pricing (per 1M tokens, Standard tier)

Gemini 2.5 Pro: $1.25 input (≤200K) / $2.50 (>200K) | $10.00 output (≤200K) / $15.00 (>200K)
Gemini 2.5 Flash: $0.30 input (text/image/video) | $2.50 output
Gemini 2.5 Flash-Lite: $0.10 input (text/image/video) | $0.40 output
Grounding with Google Search: 1,500 RPD free, then $35/1,000 grounded prompts (2.5 models)

Batch API: 50% discount on standard pricing across all models
Context Caching: ~10% of input price per cached token read + storage fee per hour

Enterprise Tier (Gemini Enterprise Agent Platform)

Custom pricing based on usage volume
Dedicated support channels, advanced security, compliance certifications
Provisioned throughput and volume-based discounts
Contact Google Cloud sales for pricing

Rate Limits

Rate limits are determined at the billing account level and vary by tier and model

Free Tier:
- Gemini 2.5 Pro: ~5 RPM (requests per minute)
- Gemini 2.5 Flash: ~15 RPM
- Gemini 2.5 Flash-Lite: ~15 RPM
- Up to 250,000 tokens per minute
- Up to 1,000 requests per day
Paid Tier 1:
- 150–300 RPM depending on model
- Higher token per minute limits
- Higher daily request limits
Paid Tier 2–3: Progressively higher limits based on cumulative spend and account age

Rate limit dimensions: RPM (requests/minute), TPM (tokens/minute), RPD (requests/day)
Grounding with Google Search: 500 RPD (free) / 1,500 RPD free then pay-per-use (paid)
Exceeding limits returns HTTP 429 (Resource Exhausted) – implement exponential backoff

Safety Settings

Gemini API includes configurable content safety filters across multiple harm categories

Harm Categories:
- HARM_CATEGORY_HARASSMENT – Harassment and bullying content
- HARM_CATEGORY_HATE_SPEECH – Hate speech targeting protected groups
- HARM_CATEGORY_SEXUALLY_EXPLICIT – Sexual content
- HARM_CATEGORY_DANGEROUS_CONTENT – Dangerous or harmful activities
- HARM_CATEGORY_CIVIC_INTEGRITY – Election/civic misinformation
Blocking Thresholds:
- BLOCK_NONE – No blocking (may still have some restrictions)
- BLOCK_ONLY_HIGH – Block only high-probability unsafe content
- BLOCK_MEDIUM_AND_ABOVE – Block medium and high (default)
- BLOCK_LOW_AND_ABOVE – Most restrictive setting
Safety ratings are provided for each response with probability levels: HIGH, MEDIUM, LOW, NEGLIGIBLE

Filters are configurable (default off for paid tier) – can be adjusted per request
System instructions can add additional safety guardrails on top of content filters
Image generation has additional responsible AI filters (no violent extremism, no CSAM, no non-consensual imagery)

Comparison: Gemini vs OpenAI GPT vs Anthropic Claude

All three platforms (Gemini, GPT, Claude) offer near-parity on general reasoning and coding benchmarks as of 2026
Context Window:
- Gemini 2.5 Pro: 1,000,000 tokens (largest)
- Claude Opus 4: 200,000 tokens
- GPT-4o / GPT-5: 128,000–256,000 tokens
Multimodal:
- Gemini: Native multimodal (text, image, video, audio, code, PDF) – strongest video/audio understanding
- GPT-4o: Text, image, audio (limited video)
- Claude: Text, image (no native audio/video)
Coding:
- Claude Opus dominates SWE-bench Verified (~80-88% scores) – best for complex agentic coding
- Gemini 2.5 Pro: Strong coding with unique large-context advantage for full-repo understanding
- GPT-5: Strong general coding with excellent structured output
Pricing (per 1M tokens, approximate):
- Gemini 2.5 Pro: $1.25 / $10.00 (input/output)
- Claude Opus 4: $5.00 / $25.00 (input/output)
- GPT-4o: $2.50 / $10.00 (input/output)
- Gemini 2.5 Flash: $0.30 / $2.50 – significantly cheaper than competitors’ mid-tier models

Unique Strengths:
- Gemini: Largest context window, best multimodal (especially video/audio), native Google Search grounding, most cost-efficient at scale
- Claude: Best coding assistant, strongest safety/alignment, excellent for agentic multi-step tasks
- GPT: Strongest general reasoning and math, best ecosystem/plugin support, excellent structured output
Integration:
- Gemini: Google Cloud, Android, Chrome, Google Workspace
- GPT: Azure OpenAI, Microsoft ecosystem
- Claude: AWS Bedrock, direct API

When to Use Which Gemini Model Size

Use Gemini 2.5 Pro when:
- Complex multi-step reasoning is required
- Processing very large codebases or documents (full-repo analysis)
- Highest quality output is more important than cost/latency
- Advanced coding tasks: architecture decisions, complex refactoring, multi-file changes
- Scientific research, mathematical proofs, legal analysis

Use Gemini 2.5 Flash when:
- Production applications needing balance of quality and speed
- Real-time user-facing applications (chatbots, assistants)
- Tasks requiring reasoning but with latency constraints
- General-purpose coding assistance, summarization, Q&A
- Budget-conscious applications that still need thinking capabilities
Use Gemini 2.5 Flash-Lite when:
- High-volume, cost-sensitive workloads at scale
- Simple classification, entity extraction, sentiment analysis
- Translation and localization tasks
- Data processing pipelines with thousands of requests
- Tasks where speed and cost matter more than reasoning depth

Use Gemini Nano when:
- On-device processing with no internet connectivity
- Privacy-sensitive applications (data never leaves device)
- Low-latency responses on mobile devices
- Smart replies, text summarization, image descriptions on Android
- Hybrid inference (Nano for simple queries, cloud for complex ones)

Google Gemini API & AI Studio – Practice Questions

A developer needs to build a prototype chatbot using Gemini models with no Google Cloud account setup. Which service should they use?
- A. Vertex AI Studio
- B. Google AI Studio
- C. Cloud Run
- D. Firebase ML
Show Answer

Answer: B – Google AI Studio provides free access to Gemini models for prototyping without requiring a GCP account.
Which Gemini model offers the largest context window for processing entire codebases in a single prompt?
- A. Gemini Nano
- B. Gemini 2.5 Flash-Lite
- C. Gemini 2.5 Pro
- D. GPT-4o
Show Answer

Answer: C – Gemini 2.5 Pro supports a 1M token context window, the largest among frontier models.

A company requires enterprise security controls, HIPAA compliance, and fine-tuning capabilities for their Gemini deployment. Which platform should they choose?
- A. Google AI Studio Free Tier
- B. Google AI Studio Paid Tier
- C. Gemini Enterprise Agent Platform (Vertex AI)
- D. Gemini Code Assist
Show Answer

Answer: C – Gemini Enterprise Agent Platform provides enterprise security, compliance, fine-tuning, and VPC controls.
Which Gemini API feature allows the model to access real-time information beyond its training data cutoff?
- A. Function Calling
- B. Context Caching
- C. Grounding with Google Search
- D. System Instructions
Show Answer

Answer: C – Grounding with Google Search connects Gemini to real-time web content and provides cited sources.
A mobile app developer needs AI summarization that works offline on Android devices with no data leaving the device. Which model is appropriate?
- A. Gemini 2.5 Flash via API
- B. Gemini 2.5 Pro via Vertex AI
- C. Gemini Nano via ML Kit GenAI APIs
- D. Gemini 2.5 Flash-Lite via Batch API
Show Answer

Answer: C – Gemini Nano runs on-device without cloud connectivity, providing privacy-preserving AI processing.

Which configuration ensures Gemini API always returns valid JSON that conforms to a specific schema? (Select TWO)
- A. Set response_mime_type to “application/json”
- B. Include a JSON example in the prompt
- C. Provide a response_schema in the generation config
- D. Use grounding with Google Search
- E. Enable function calling
Show Answer

Answer: A, C – Setting response_mime_type to “application/json” and providing a response_schema guarantees structured JSON output.
A startup wants to minimize API costs while processing 100,000 classification requests daily. Which combination offers the lowest cost?
- A. Gemini 2.5 Pro with Standard inference
- B. Gemini 2.5 Flash with Priority inference
- C. Gemini 2.5 Flash-Lite with Batch API
- D. Gemini 2.5 Flash with context caching
Show Answer

Answer: C – Flash-Lite ($0.10/$0.40) with Batch API (50% discount) gives the lowest cost for high-volume simple tasks.
What is the primary difference between the Free and Paid tiers of the Gemini Developer API regarding data usage?
- A. Free tier has no rate limits; Paid tier has rate limits
- B. Free tier content may be used to improve Google products; Paid tier content is not used
- C. Free tier only supports text; Paid tier supports multimodal
- D. Free tier uses older models; Paid tier uses newer models
Show Answer

Answer: B – Free tier content may be used to improve Google products, while paid tier provides enterprise-grade data privacy.
Which safety setting category in the Gemini API is used to filter content related to election misinformation?
- A. HARM_CATEGORY_HARASSMENT
- B. HARM_CATEGORY_HATE_SPEECH
- C. HARM_CATEGORY_CIVIC_INTEGRITY
- D. HARM_CATEGORY_DANGEROUS_CONTENT
Show Answer

Answer: C – HARM_CATEGORY_CIVIC_INTEGRITY covers election and civic misinformation content.

A developer is hitting rate limits (HTTP 429) on the free tier of Gemini API. What are valid options to increase throughput? (Select TWO)
- A. Implement exponential backoff retry logic
- B. Upgrade to paid tier by linking a billing account
- C. Switch from Pro to Nano model
- D. Disable safety settings
- E. Use system instructions to request faster processing
Show Answer

Answer: A, B – Exponential backoff handles transient limits; upgrading to paid tier increases RPM from 5-15 to 150-300+.

Jayendra's Cloud Certification Blog

Google Gemini API & AI Studio – Developer Guide

Google Gemini API & AI Studio – Developer Guide

Gemini Model Family

Google AI Studio vs Vertex AI Studio

Gemini API Capabilities

Multimodal Input & Output

Function Calling

Grounding with Google Search

Structured Output (JSON Mode)

System Instructions

Context Caching

Additional Capabilities

Context Window – 1M+ Tokens

Gemini in Google Cloud (Vertex AI / Gemini Enterprise Agent Platform)

Gemini Code Assist (IDE Integration)

Pricing Tiers (Free vs Paid)

Free Tier

Paid Tier

Key Pricing (per 1M tokens, Standard tier)

Enterprise Tier (Gemini Enterprise Agent Platform)

Rate Limits

Safety Settings

Comparison: Gemini vs OpenAI GPT vs Anthropic Claude

When to Use Which Gemini Model Size

Google Gemini API & AI Studio – Practice Questions

References