Google Gemini API & AI Studio – Developer Guide

Google Gemini API & AI Studio – Developer Guide

📌 Last Updated: June 2026. This post covers the Gemini model family (2.5 Pro, 2.5 Flash, Flash-Lite, Nano), Google AI Studio vs Vertex AI Studio, Gemini API capabilities, pricing tiers, rate limits, safety settings, Gemini Code Assist, and comparison with OpenAI GPT and Anthropic Claude.

  • Google Gemini is Google’s family of multimodal AI models that can process text, images, video, audio, and code.
  • Gemini models are available through the Gemini Developer API (via Google AI Studio) and through Google Cloud’s Vertex AI (now Gemini Enterprise Agent Platform).
  • Gemini 2.5 Pro and 2.5 Flash became generally available (GA) in June 2025, providing production-ready stability and scalability.
  • The model family spans from Gemini 2.5 Pro (most capable) to Gemini Nano (on-device), covering cloud API, enterprise, and edge use cases.
  • Gemini supports a 1M+ token context window, the largest among frontier models, enabling processing of entire codebases, long documents, and hours of video in a single prompt.

Gemini Model Family

  • Gemini 2.5 Pro – Google’s most capable model for complex reasoning, coding, and multimodal tasks.
    • 1M token context window (input), up to 66K output tokens
    • Excels at coding, mathematical reasoning, scientific analysis, and multi-step problem solving
    • Supports “thinking” mode with configurable thinking budgets for chain-of-thought reasoning
    • Natively multimodal – processes text, images, audio, video, and PDFs
    • Supports function calling, structured output (JSON mode), grounding with Google Search, and code execution
    • GA since June 2025; model ID: gemini-2.5-pro
  • Gemini 2.5 Flash – Hybrid reasoning model optimized for speed and cost-efficiency.
    • 1M token context window with thinking capabilities (first Flash model with thinking)
    • Configurable thinking budgets – control reasoning depth vs latency tradeoff
    • Excellent for production workloads needing fast responses at lower cost
    • Supports all Pro capabilities: multimodal input, function calling, grounding, JSON mode
    • ~4x cheaper than Pro for input tokens, ~4x cheaper for output tokens
    • GA since June 2025; model ID: gemini-2.5-flash
  • Gemini 2.5 Flash-Lite – Most cost-efficient cloud model for high-volume tasks.
    • 1M token context window
    • Optimized for high-throughput, cost-sensitive workloads: classification, translation, simple data processing
    • Pricing starts at $0.10/1M input tokens and $0.40/1M output tokens
    • Supports grounding with Google Search and Google Maps
    • Model ID: gemini-2.5-flash-lite
  • Gemini Nano – On-device model for Android and Chrome.
    • Runs natively on device hardware (NPU/GPU) without cloud connectivity
    • Available on Pixel 8 Pro, Pixel 9/10 series, Samsung Galaxy S24+ and later
    • Supports summarization, smart reply, proofreading, rewriting, and image description
    • Available in Chrome via the Prompt API (downloaded automatically with browser updates)
    • Privacy-preserving – all processing stays on device
    • Accessible via ML Kit GenAI APIs on Android and AICore
    • Supports hybrid inference – dynamically switches between on-device Nano and cloud-hosted Gemini models

Google AI Studio vs Vertex AI Studio

  • Google AI Studio (aistudio.google.com) – Free, web-based IDE for prototyping with Gemini.
    • Quick experimentation with prompts, no Google Cloud account required
    • Get an API key instantly for development
    • Supports prompt testing, side-by-side model comparison, and code export
    • Build mode for vibe-coding full-stack applications directly in the browser
    • One-click deployment to Google Cloud Run (up to 2 apps free via Starter Tier)
    • Free tier available with generous rate limits
    • Content on free tier may be used to improve Google products
    • Best for: rapid prototyping, learning, hackathons, individual developers
  • Vertex AI Studio / Gemini Enterprise Agent Platform (Google Cloud Console) – Enterprise-grade AI platform.
    • Requires Google Cloud project with billing enabled
    • Full IAM, VPC, audit logging, and enterprise security controls
    • Model fine-tuning (SFT), RAG Engine, model evaluation, and ML pipelines
    • Access to Model Garden with 200+ models (not just Gemini)
    • Provisioned throughput for guaranteed capacity
    • Data residency and compliance (HIPAA, SOC 2, FedRAMP)
    • Agent Builder for no-code conversational agent development
    • Content is never used to improve Google products
    • Best for: production enterprise deployments, regulated industries, multi-model workflows

💡 Certification Tip: If a question describes quick prototyping with no GCP account, the answer is Google AI Studio. If it mentions IAM, VPC, fine-tuning, RAG, or ML pipelines, the answer is Vertex AI / Gemini Enterprise Agent Platform.

Gemini API Capabilities

Multimodal Input & Output

  • Text – Natural language understanding, generation, summarization, translation, and Q&A
  • Images – Image understanding (describe, analyze, OCR) and native image generation (Gemini 2.5 Flash Image)
  • Video – Process and understand video content up to hours in length; video generation via Veo models
  • Audio – Audio understanding, speech-to-text, native audio output, and text-to-speech (TTS)
  • Code – Code generation, debugging, explanation, refactoring across 20+ programming languages
  • Documents/PDFs – Process entire PDF documents natively with layout understanding

Function Calling

  • Connect Gemini to external tools, APIs, and databases
  • Model determines when to call a function and provides structured parameters
  • Supports parallel function calling (multiple functions in one turn)
  • Automatic function calling mode available in SDKs
  • Gemini 3+ models generate unique IDs for each function call for tracing
  • Works with both Google AI Studio and Vertex AI endpoints

Grounding with Google Search

  • Connects Gemini to real-time, publicly available web content
  • Provides accurate, up-to-date answers with cited verifiable sources beyond model’s training cutoff
  • Returns grounding metadata with source URLs and support chunks
  • Supports dynamic retrieval – only charges when grounding actually contributes to response
  • Works with all available languages
  • Rate limits: Free tier gets 500 RPD (requests per day); Paid tier gets 1,500 RPD free then $35/1,000 grounded prompts
  • Limit of 1M queries per day (contact support for higher)
  • Respects robots.txt Google-Extended directives from web publishers

Structured Output (JSON Mode)

  • Force Gemini to output valid JSON conforming to a provided schema
  • Specify response schema using JSON Schema format
  • Guarantees parseable output for programmatic consumption
  • Supports enums, nested objects, arrays, and optional fields
  • Set response_mime_type: "application/json" in generation config

System Instructions

  • Set persistent behavioral guidelines that apply across all turns in a conversation
  • Define persona, tone, output format, safety constraints, and domain expertise
  • System instructions are separate from user messages and persist throughout the session
  • Supports multi-part system instructions for complex configurations

Context Caching

  • Cache large input contexts (documents, code repos) and reuse across multiple requests
  • Reduces latency and cost for repeated context (up to 90% cheaper for cached tokens)
  • Minimum cache size: 32,768 tokens
  • Storage price: $1.00–$4.50 per 1M tokens per hour depending on model
  • Available on paid tier only

Additional Capabilities

  • Code Execution – Model can write and run Python code in a sandboxed environment to solve problems
  • URL Context – Fetch and process content from URLs as part of the prompt
  • Computer Use – Build browser control agents that automate tasks (Preview)
  • File Search – Upload documents and perform semantic search across them
  • Live API – Real-time, low-latency bidirectional streaming for voice/video applications
  • Batch API – Process large volumes asynchronously at 50% cost reduction
  • Thinking/Reasoning – Configurable chain-of-thought with thinking budgets and thought signatures

Context Window – 1M+ Tokens

  • Gemini 2.5 Pro and Flash support a 1,000,000 token context window
  • This is equivalent to approximately:
    • ~750,000 words (longer than the entire Lord of the Rings trilogy)
    • ~1.5 hours of video
    • ~11 hours of audio
    • ~30,000 lines of code
  • Enables processing entire codebases, lengthy legal documents, research papers, or video content in a single prompt
  • Output token limits: up to 66K for Pro, 65K for Flash
  • Pricing tiers differ based on prompt length (≤200K tokens vs >200K tokens for Pro)
  • Context caching available to reduce costs for repeated large-context queries

Gemini in Google Cloud (Vertex AI / Gemini Enterprise Agent Platform)

  • Gemini models are available through Google Cloud’s enterprise AI platform (formerly Vertex AI, now Gemini Enterprise Agent Platform as of Cloud Next 2026)
  • Provides enterprise-grade features beyond the Developer API:
    • Fine-tuning (SFT) – Supervised fine-tuning on custom datasets
    • RAG Engine – Built-in Retrieval-Augmented Generation with managed vector stores
    • Model Evaluation – Automated evaluation pipelines with custom metrics
    • Provisioned Throughput – Guaranteed capacity for latency-sensitive applications
    • VPC Service Controls – Network isolation and data exfiltration prevention
    • CMEK – Customer-managed encryption keys for data at rest
    • Agent Builder – No-code platform for building conversational agents with grounding
  • Same Gemini models as the Developer API but with enterprise SLAs and compliance certifications
  • Supports HIPAA, SOC 1/2/3, ISO 27001, FedRAMP, PCI DSS compliance
  • Data is never used to train or improve Google models
  • Pricing may differ from Developer API; check Gemini Enterprise Agent Platform pricing page

Gemini Code Assist (IDE Integration)

  • AI-powered coding assistant integrated directly into IDEs (VS Code and JetBrains IDEs)
  • Key Capabilities:
    • Inline code completions while typing
    • Code generation from natural language prompts and comments
    • Code transformation and refactoring via chat
    • Smart actions (explain code, generate tests, fix bugs)
    • Full-project context awareness with file/folder specification
    • Agent mode for multi-step autonomous coding tasks (since Oct 2025)
    • Custom commands and rules configuration
    • Source citations for generated code
  • Editions:
    • Free tier – Available for individual developers via Google AI (Individual, Pro, Ultra tiers since June 2026)
    • Standard – For teams, includes features beyond the IDE
    • Enterprise – Large-context analysis (up to 1M tokens) across indexed repositories, integration with Google Cloud services, code customization
  • Enterprise edition integrates with Google Cloud services: Cloud Build, Cloud Run, Cloud Logging
  • Supports large-context analysis across entire repositories

Pricing Tiers (Free vs Paid)

Free Tier

  • Available through Google AI Studio – no billing account required
  • Access to Gemini 2.5 Pro, 2.5 Flash, and Flash-Lite models
  • Free input and output tokens within rate limits
  • Grounding with Google Search: up to 500 RPD (Flash/Flash-Lite)
  • Content may be used to improve Google products
  • Lower rate limits (5–15 RPM depending on model)
  • No access to context caching, Batch API, or some advanced features

Paid Tier

  • Link a billing account and prepay minimum $10 to upgrade
  • Higher rate limits (150–300+ RPM at Tier 1)
  • Access to context caching, Batch API (50% cost reduction), Flex and Priority inference
  • Content NOT used to improve Google products (enterprise-grade data privacy)
  • Tiered system based on cumulative spend: Tier 1 → Tier 2 → Tier 3 (postpay option)

Key Pricing (per 1M tokens, Standard tier)

  • Gemini 2.5 Pro: $1.25 input (≤200K) / $2.50 (>200K) | $10.00 output (≤200K) / $15.00 (>200K)
  • Gemini 2.5 Flash: $0.30 input (text/image/video) | $2.50 output
  • Gemini 2.5 Flash-Lite: $0.10 input (text/image/video) | $0.40 output
  • Grounding with Google Search: 1,500 RPD free, then $35/1,000 grounded prompts (2.5 models)
  • Batch API: 50% discount on standard pricing across all models
  • Context Caching: ~10% of input price per cached token read + storage fee per hour

Enterprise Tier (Gemini Enterprise Agent Platform)

  • Custom pricing based on usage volume
  • Dedicated support channels, advanced security, compliance certifications
  • Provisioned throughput and volume-based discounts
  • Contact Google Cloud sales for pricing

Rate Limits

  • Rate limits are determined at the billing account level and vary by tier and model
  • Free Tier:
    • Gemini 2.5 Pro: ~5 RPM (requests per minute)
    • Gemini 2.5 Flash: ~15 RPM
    • Gemini 2.5 Flash-Lite: ~15 RPM
    • Up to 250,000 tokens per minute
    • Up to 1,000 requests per day
  • Paid Tier 1:
    • 150–300 RPM depending on model
    • Higher token per minute limits
    • Higher daily request limits
  • Paid Tier 2–3: Progressively higher limits based on cumulative spend and account age
  • Rate limit dimensions: RPM (requests/minute), TPM (tokens/minute), RPD (requests/day)
  • Grounding with Google Search: 500 RPD (free) / 1,500 RPD free then pay-per-use (paid)
  • Exceeding limits returns HTTP 429 (Resource Exhausted) – implement exponential backoff

Safety Settings

  • Gemini API includes configurable content safety filters across multiple harm categories
  • Harm Categories:
    • HARM_CATEGORY_HARASSMENT – Harassment and bullying content
    • HARM_CATEGORY_HATE_SPEECH – Hate speech targeting protected groups
    • HARM_CATEGORY_SEXUALLY_EXPLICIT – Sexual content
    • HARM_CATEGORY_DANGEROUS_CONTENT – Dangerous or harmful activities
    • HARM_CATEGORY_CIVIC_INTEGRITY – Election/civic misinformation
  • Blocking Thresholds:
    • BLOCK_NONE – No blocking (may still have some restrictions)
    • BLOCK_ONLY_HIGH – Block only high-probability unsafe content
    • BLOCK_MEDIUM_AND_ABOVE – Block medium and high (default)
    • BLOCK_LOW_AND_ABOVE – Most restrictive setting
  • Safety ratings are provided for each response with probability levels: HIGH, MEDIUM, LOW, NEGLIGIBLE
  • Filters are configurable (default off for paid tier) – can be adjusted per request
  • System instructions can add additional safety guardrails on top of content filters
  • Image generation has additional responsible AI filters (no violent extremism, no CSAM, no non-consensual imagery)

Comparison: Gemini vs OpenAI GPT vs Anthropic Claude

  • All three platforms (Gemini, GPT, Claude) offer near-parity on general reasoning and coding benchmarks as of 2026
  • Context Window:
    • Gemini 2.5 Pro: 1,000,000 tokens (largest)
    • Claude Opus 4: 200,000 tokens
    • GPT-4o / GPT-5: 128,000–256,000 tokens
  • Multimodal:
    • Gemini: Native multimodal (text, image, video, audio, code, PDF) – strongest video/audio understanding
    • GPT-4o: Text, image, audio (limited video)
    • Claude: Text, image (no native audio/video)
  • Coding:
    • Claude Opus dominates SWE-bench Verified (~80-88% scores) – best for complex agentic coding
    • Gemini 2.5 Pro: Strong coding with unique large-context advantage for full-repo understanding
    • GPT-5: Strong general coding with excellent structured output
  • Pricing (per 1M tokens, approximate):
    • Gemini 2.5 Pro: $1.25 / $10.00 (input/output)
    • Claude Opus 4: $5.00 / $25.00 (input/output)
    • GPT-4o: $2.50 / $10.00 (input/output)
    • Gemini 2.5 Flash: $0.30 / $2.50 – significantly cheaper than competitors’ mid-tier models
  • Unique Strengths:
    • Gemini: Largest context window, best multimodal (especially video/audio), native Google Search grounding, most cost-efficient at scale
    • Claude: Best coding assistant, strongest safety/alignment, excellent for agentic multi-step tasks
    • GPT: Strongest general reasoning and math, best ecosystem/plugin support, excellent structured output
  • Integration:
    • Gemini: Google Cloud, Android, Chrome, Google Workspace
    • GPT: Azure OpenAI, Microsoft ecosystem
    • Claude: AWS Bedrock, direct API

When to Use Which Gemini Model Size

  • Use Gemini 2.5 Pro when:
    • Complex multi-step reasoning is required
    • Processing very large codebases or documents (full-repo analysis)
    • Highest quality output is more important than cost/latency
    • Advanced coding tasks: architecture decisions, complex refactoring, multi-file changes
    • Scientific research, mathematical proofs, legal analysis
  • Use Gemini 2.5 Flash when:
    • Production applications needing balance of quality and speed
    • Real-time user-facing applications (chatbots, assistants)
    • Tasks requiring reasoning but with latency constraints
    • General-purpose coding assistance, summarization, Q&A
    • Budget-conscious applications that still need thinking capabilities
  • Use Gemini 2.5 Flash-Lite when:
    • High-volume, cost-sensitive workloads at scale
    • Simple classification, entity extraction, sentiment analysis
    • Translation and localization tasks
    • Data processing pipelines with thousands of requests
    • Tasks where speed and cost matter more than reasoning depth
  • Use Gemini Nano when:
    • On-device processing with no internet connectivity
    • Privacy-sensitive applications (data never leaves device)
    • Low-latency responses on mobile devices
    • Smart replies, text summarization, image descriptions on Android
    • Hybrid inference (Nano for simple queries, cloud for complex ones)

Google Gemini API & AI Studio – Practice Questions

  1. A developer needs to build a prototype chatbot using Gemini models with no Google Cloud account setup. Which service should they use?
    • A. Vertex AI Studio
    • B. Google AI Studio
    • C. Cloud Run
    • D. Firebase ML

    Answer: B – Google AI Studio provides free access to Gemini models for prototyping without requiring a GCP account.

  2. Which Gemini model offers the largest context window for processing entire codebases in a single prompt?
    • A. Gemini Nano
    • B. Gemini 2.5 Flash-Lite
    • C. Gemini 2.5 Pro
    • D. GPT-4o

    Answer: C – Gemini 2.5 Pro supports a 1M token context window, the largest among frontier models.

  3. A company requires enterprise security controls, HIPAA compliance, and fine-tuning capabilities for their Gemini deployment. Which platform should they choose?
    • A. Google AI Studio Free Tier
    • B. Google AI Studio Paid Tier
    • C. Gemini Enterprise Agent Platform (Vertex AI)
    • D. Gemini Code Assist

    Answer: C – Gemini Enterprise Agent Platform provides enterprise security, compliance, fine-tuning, and VPC controls.

  4. Which Gemini API feature allows the model to access real-time information beyond its training data cutoff?
    • A. Function Calling
    • B. Context Caching
    • C. Grounding with Google Search
    • D. System Instructions

    Answer: C – Grounding with Google Search connects Gemini to real-time web content and provides cited sources.

  5. A mobile app developer needs AI summarization that works offline on Android devices with no data leaving the device. Which model is appropriate?
    • A. Gemini 2.5 Flash via API
    • B. Gemini 2.5 Pro via Vertex AI
    • C. Gemini Nano via ML Kit GenAI APIs
    • D. Gemini 2.5 Flash-Lite via Batch API

    Answer: C – Gemini Nano runs on-device without cloud connectivity, providing privacy-preserving AI processing.

  6. Which configuration ensures Gemini API always returns valid JSON that conforms to a specific schema? (Select TWO)
    • A. Set response_mime_type to “application/json”
    • B. Include a JSON example in the prompt
    • C. Provide a response_schema in the generation config
    • D. Use grounding with Google Search
    • E. Enable function calling

    Answer: A, C – Setting response_mime_type to “application/json” and providing a response_schema guarantees structured JSON output.

  7. A startup wants to minimize API costs while processing 100,000 classification requests daily. Which combination offers the lowest cost?
    • A. Gemini 2.5 Pro with Standard inference
    • B. Gemini 2.5 Flash with Priority inference
    • C. Gemini 2.5 Flash-Lite with Batch API
    • D. Gemini 2.5 Flash with context caching

    Answer: C – Flash-Lite ($0.10/$0.40) with Batch API (50% discount) gives the lowest cost for high-volume simple tasks.

  8. What is the primary difference between the Free and Paid tiers of the Gemini Developer API regarding data usage?
    • A. Free tier has no rate limits; Paid tier has rate limits
    • B. Free tier content may be used to improve Google products; Paid tier content is not used
    • C. Free tier only supports text; Paid tier supports multimodal
    • D. Free tier uses older models; Paid tier uses newer models

    Answer: B – Free tier content may be used to improve Google products, while paid tier provides enterprise-grade data privacy.

  9. Which safety setting category in the Gemini API is used to filter content related to election misinformation?
    • A. HARM_CATEGORY_HARASSMENT
    • B. HARM_CATEGORY_HATE_SPEECH
    • C. HARM_CATEGORY_CIVIC_INTEGRITY
    • D. HARM_CATEGORY_DANGEROUS_CONTENT

    Answer: C – HARM_CATEGORY_CIVIC_INTEGRITY covers election and civic misinformation content.

  10. A developer is hitting rate limits (HTTP 429) on the free tier of Gemini API. What are valid options to increase throughput? (Select TWO)
    • A. Implement exponential backoff retry logic
    • B. Upgrade to paid tier by linking a billing account
    • C. Switch from Pro to Nano model
    • D. Disable safety settings
    • E. Use system instructions to request faster processing

    Answer: A, B – Exponential backoff handles transient limits; upgrading to paid tier increases RPM from 5-15 to 150-300+.

References

Posted in AWS

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.