Google Vertex AI – ML Platform & Model Garden

Google Cloud Vertex AI – ML Platform Overview

  • Vertex AI is Google Cloud’s unified machine learning platform that brings together all the tools and services needed to build, deploy, and scale ML models and AI applications.
  • Vertex AI consolidates previously separate Google Cloud AI services (AI Platform, AutoML, Dialogflow) into a single, cohesive platform.
  • In April 2026, Google rebranded Vertex AI as the Gemini Enterprise Agent Platform at Google Cloud Next ’26, shifting focus toward an agent-first architecture while retaining all existing ML/AI capabilities.
  • Vertex AI supports the entire ML lifecycle: data preparation, model training, evaluation, deployment, monitoring, and management.
  • Integrates natively with BigQuery, Cloud Storage, Dataflow, and other Google Cloud services.
  • Supports both traditional ML (tabular, vision, NLP) and generative AI (foundation models, LLMs, multimodal).
  • Provides both no-code/low-code (AutoML, Studio) and code-first (custom training, SDK, notebooks) approaches.

Vertex AI Key Components

  • Model Garden – Access to 200+ foundation models (first-party, third-party, open)
  • Vertex AI Studio – Interactive prompt design, testing, and tuning interface
  • Model Training – AutoML and custom training with GPUs/TPUs
  • Model Deployment – Online and batch prediction endpoints
  • Vertex AI Pipelines – ML workflow orchestration (Kubeflow/TFX)
  • Feature Store – Centralized feature management integrated with BigQuery
  • Vector Search – High-performance similarity search using ScaNN
  • Agent Builder – Build, deploy, and govern AI agents
  • Grounding – Connect models to real-time data sources
  • Model Evaluation – Comprehensive model assessment tools
  • Workbench – Managed Jupyter notebook environments
  • MLOps – Model Registry, Experiments, Monitoring

Model Garden

  • Model Garden provides access to 200+ curated foundation models, organized into three categories:
    • First-party models – Google’s own models: Gemini (text, code, multimodal), Imagen (image generation), Veo (video generation), Chirp (speech), Gemma (open models), Lyria (music)
    • Third-party models – Partner models: Anthropic’s Claude family, Mistral AI models
    • Open models – Community models: Meta’s Llama 3.2, Gemma 3, and others deployable on Google Cloud infrastructure
  • First-party and select third-party models are available as managed APIs (serverless, no infrastructure management required).
  • Open models can be deployed to dedicated endpoints with custom hardware configurations.
  • Model Garden supports one-click deployment, fine-tuning notebooks, and optimization features for open models.
  • Key Gemini models available:
    • Gemini 2.5 Flash – Fast, cost-efficient model for everyday tasks
    • Gemini 2.5 Pro – Advanced reasoning and complex tasks
    • Gemini 3 Flash / Gemini 3 Pro – Latest generation (2026)
  • Supports multimodal inputs: text, images, video, audio, and code.
  • Enterprise features: data residency controls, VPC Service Controls, customer-managed encryption keys (CMEK), and audit logging.

Vertex AI Studio

  • Vertex AI Studio is a web-based interface for rapidly prototyping and testing generative AI models.
  • Key capabilities:
    • Prompt Design – Interactive prompt editor with system instructions, few-shot examples, and parameter tuning (temperature, top-k, top-p)
    • Prompt Gallery – Pre-built prompt templates for common use cases
    • Prompt Optimizer – Automated prompt refinement using Zero-Shot and Data-Driven modes (GA 2025)
    • Model Tuning – Fine-tune models with custom datasets without managing infrastructure
    • Model Distillation – Train smaller, faster student models from larger teacher models
    • Multimodal Testing – Test text, image, video, and audio inputs/outputs
  • Tuning methods supported:
    • Supervised Fine-Tuning (SFT) – Train on labeled input-output pairs
    • Reinforcement Learning from Human Feedback (RLHF) – Align model outputs with human preferences
    • Distillation – Transfer knowledge from large teacher to smaller student model
    • LoRA (Low-Rank Adaptation) – Parameter-efficient fine-tuning
  • Supports Gemini, Imagen, and Codey model families for tuning.
  • Provides code export to Python SDK for production integration.

Model Training

  • Vertex AI provides two primary training approaches:
    • AutoML – No-code training for tabular, image, video, and text data with automatic feature engineering, architecture search, and hyperparameter tuning
    • Custom Training – Full control using any ML framework (TensorFlow, PyTorch, XGBoost, scikit-learn, JAX) with custom containers or prebuilt containers
  • AutoML capabilities:
    • AutoML Tabular – Classification, regression, forecasting
    • AutoML Image – Classification, object detection, segmentation
    • AutoML Video – Classification, object tracking, action recognition
    • AutoML Text – Classification, entity extraction, sentiment analysis
  • Custom Training features:
    • Supports NVIDIA GPUs: A100, H100, A3 (H100 80GB), and A4X (GB200 NVL72 rack-scale, 2026)
    • Supports Cloud TPUs: TPU v2, v3, v5e, v5p for large-scale training
    • Distributed training across multiple nodes with automatic orchestration
    • Prebuilt containers for TensorFlow, PyTorch, XGBoost, scikit-learn
    • Custom containers for any framework or dependency
    • Hyperparameter tuning service (Vizier-based)
  • Vertex AI Training with Cluster Director (2025) – Fully managed, resilient Slurm environment for large-scale training workloads, simplifying multi-node GPU/TPU training.
  • Training is serverless – no infrastructure provisioning; billed per compute-hour in 30-second increments.
  • No charges if training fails (except user-initiated cancellations).

Model Deployment

  • Vertex AI supports multiple deployment options for serving predictions:
    • Online Prediction – Low-latency, real-time predictions via HTTPS endpoints
    • Batch Prediction – High-throughput predictions on large datasets (asynchronous)
    • Private Endpoints – Deploy within VPC for network isolation
  • Online Prediction features:
    • Dedicated endpoints with configurable machine types and accelerators (GPUs, TPUs)
    • Automatic scaling (min/max replicas) based on traffic
    • Traffic splitting for A/B testing and canary deployments
    • Model versioning – deploy multiple model versions to the same endpoint
    • Prebuilt containers for TensorFlow, PyTorch, XGBoost, scikit-learn serving
    • Custom serving containers for any framework
    • TPU VM deployment for high-throughput inference
  • Batch Prediction features:
    • Process large datasets from Cloud Storage or BigQuery
    • Serverless – no persistent infrastructure
    • Cost-effective for non-real-time workloads
    • Supports the same model formats as online prediction
  • Foundation models (Gemini, Claude, Llama) are served via managed APIs without requiring endpoint deployment.
  • Supports model explainability (feature attributions) on deployed endpoints.

Vertex AI Pipelines

  • Vertex AI Pipelines is a serverless orchestration service for automating, monitoring, and governing ML workflows.
  • Supports two pipeline frameworks:
    • Kubeflow Pipelines (KFP) – Python-based pipeline SDK for defining ML workflows as directed acyclic graphs (DAGs)
    • TensorFlow Extended (TFX) – End-to-end TensorFlow production ML pipelines
  • Key features:
    • Serverless execution – no cluster management required
    • Pipeline scheduling and triggering (event-based or cron)
    • Artifact and metadata tracking for lineage
    • Integration with Vertex AI services (training, endpoints, Feature Store, Model Registry)
    • Reusable pipeline components and templates
    • Pipeline versioning and caching for faster iteration
  • Common pipeline patterns:
    • Data ingestion → Feature engineering → Training → Evaluation → Deployment
    • Continuous training pipelines triggered by data drift or schedule
    • A/B testing pipelines for comparing model versions
  • Pipelines integrate with Cloud Logging, Cloud Monitoring, and IAM for governance.

Vertex AI Feature Store

  • Vertex AI Feature Store is a fully managed service for organizing, storing, and serving ML features at scale.
  • Built on BigQuery – feature data is managed within BigQuery tables or views, eliminating the need for a separate offline store.
  • Key capabilities:
    • Feature Management – Centralized registry of features with metadata, lineage, and versioning
    • Online Serving – Low-latency feature retrieval for real-time predictions (single-digit millisecond latency)
    • Offline Serving – Point-in-time correct feature retrieval for training data generation
    • Feature Sharing – Share features across teams and projects to reduce duplication
    • Embedding Support – Store and serve vector embeddings for GenAI/RAG applications
    • Vector Similarity Search – Perform approximate nearest neighbor (ANN) search on stored embeddings
  • Integrated with Dataplex Universal Catalog for feature metadata tracking and discovery.
  • Supports streaming and batch feature ingestion from BigQuery, Cloud Storage, and Dataflow.
  • Feature Store integrates with Vertex AI RAG Engine for retrieval-augmented generation workflows.

Vertex AI Vector Search

  • Vertex AI Vector Search (formerly Matching Engine) is a high-performance, fully managed vector similarity search service.
  • Built on Google’s ScaNN (Scalable Nearest Neighbors) algorithm for efficient approximate nearest neighbor search.
  • Key features:
    • Supports billions of vectors with low-latency retrieval
    • Real-time index updates (streaming inserts)
    • Filtering with boolean and numeric predicates
    • Multiple distance metrics: cosine, dot product, Euclidean
    • Hybrid search combining vector similarity with metadata filters
  • Vector Search 2.0 (2025-2026):
    • Storage-optimized tier – Cost-effective solution for large-scale RAG and semantic search applications
    • Auto-tuning – Eliminates manual index configuration, automatically optimizes for workload
    • Integrated data retrieval – Returns full item data (not just IDs), eliminating the need for a separate key-value store
  • Use cases: semantic search, recommendation systems, RAG, image/video retrieval, anomaly detection.
  • Integrates with Vertex AI Feature Store for embedding storage and serving.

Vertex AI Agent Builder

  • Vertex AI Agent Builder is Google Cloud’s comprehensive platform for building, scaling, and governing AI agents.
  • Evolved from and consolidates: Dialogflow CX (now Conversational Agents), Vertex AI Search, and Generative AI App Builder.
  • Key pillars:
    • Agent Development Kit (ADK) – Code-first Python framework for building multi-agent systems
    • Agent Studio – Low-code visual builder with 35+ pre-built agent templates
    • Agent Engine – Managed runtime for deploying and scaling agents in production
    • Persistent Memory – Long-term memory and session management for agents
    • Enterprise Governance – Access controls, audit logging, and compliance
  • Conversational Agents (formerly Dialogflow CX):
    • Build hybrid agents combining deterministic flows with generative AI
    • Playbooks – Define goals and tools, let the LLM determine execution paths
    • Multi-channel support: web, phone, messaging platforms
    • Phone Gateway for voice agents
  • Vertex AI Search:
    • Enterprise search over structured, unstructured, and website data
    • RAG-powered answers grounded in enterprise documents
    • Supports PDF, HTML, Cloud Storage, BigQuery, and third-party data sources
  • Supports Agent-to-Agent (A2A) protocol for multi-agent collaboration.
  • 200+ foundation models accessible within Agent Builder workflows.

Grounding

  • Grounding connects generative AI model outputs to verifiable sources of information, reducing hallucinations and improving accuracy.
  • Grounding options:
    • Grounding with Google Search – Connects models to real-time, publicly available web content with cited sources
    • Grounding with Vertex AI Search – Connects models to enterprise data (documents, databases, websites)
    • Grounding with Parallel Web Search – Multi-hop agents that perform deeper web searches for complex questions
  • Key features:
    • Dynamic Retrieval – Model automatically determines when grounding is needed based on the prompt
    • Source Citations – Responses include links to source documents
    • Grounding Scores – Confidence scores indicating how well the response is supported by sources
    • Supports up to 10 Vertex AI Search data sources per request
    • Can combine Google Search grounding with enterprise data grounding
    • Limit of 1 million queries per day for Grounding with Google Search
  • Starting with Gemini 2.0, Google Search is available as a tool – the model can autonomously decide when to search.
  • Gemini 3 Pro includes 5,000 free Google Search grounding queries per month.

Model Evaluation

  • Vertex AI provides comprehensive model evaluation capabilities for both traditional ML and generative AI models.
  • Traditional ML Evaluation:
    • Classification metrics: accuracy, precision, recall, F1, AUC-ROC, confusion matrix
    • Regression metrics: MAE, RMSE, R-squared
    • Feature importance and attribution analysis
    • Slice-based evaluation for fairness assessment
  • Generative AI Evaluation:
    • AutoSxS (Auto Side-by-Side) – Automated pairwise comparison between models
    • Pointwise evaluation metrics: fluency, coherence, safety, groundedness
    • Custom evaluation criteria using LLM-as-a-judge
    • RAG evaluation: context relevance, answer faithfulness, answer relevance
    • Batch evaluation across datasets
  • Evaluation results stored in Vertex AI Experiments for tracking and comparison.
  • Supports evaluation of fine-tuned models against base models to measure improvement.

Vertex AI Workbench

  • Vertex AI Workbench provides managed Jupyter notebook environments for ML development and experimentation.
  • Workbench Instances (managed notebooks):
    • JupyterLab-based environment with pre-installed ML frameworks
    • Configurable machine types with GPU/TPU accelerators
    • Automatic idle shutdown to reduce costs
    • Integration with Git for version control
    • Pre-authenticated access to Google Cloud services
    • Supports custom containers for specialized environments
  • Colab Enterprise – Collaborative notebook environment integrated with Vertex AI:
    • Serverless and managed runtimes
    • Enterprise security (VPC-SC, CMEK, IAM)
    • Shared notebooks with team collaboration features
    • Code completion with Gemini Code Assist
  • Notebooks can directly launch Vertex AI training jobs, access Feature Store, and deploy models.

MLOps

  • Vertex AI provides integrated MLOps tooling for production ML lifecycle management.
  • Model Registry:
    • Central repository for all trained models
    • Model versioning with version aliases and descriptions
    • Model lineage tracking (training data, pipeline, hyperparameters)
    • Model labels and metadata for organization
    • Import models from any source (Vertex AI, BigQuery ML, external)
    • Default version management for deployments
  • Vertex AI Experiments:
    • Track and compare training runs with metrics, parameters, and artifacts
    • Automatic logging integration with TensorFlow and PyTorch
    • Visualization of experiment results
    • Integration with Vertex AI Pipelines for automated experimentation
  • Model Monitoring:
    • Detect data drift (training-serving skew) automatically
    • Feature distribution monitoring
    • Prediction output monitoring for quality degradation
    • Configurable alerting thresholds
    • Integration with Cloud Monitoring and Cloud Logging
  • Vertex AI Metadata:
    • Track artifacts, executions, and their lineage across the ML lifecycle
    • Query metadata for auditing and compliance
    • Automatic capture from Vertex AI Pipelines

Pricing Overview

  • Vertex AI uses a pay-as-you-go pricing model with no upfront commitments.
  • Generative AI / Foundation Models:
    • Priced per million tokens (input and output separately)
    • Gemini 2.5 Flash: ~$0.15/1M input tokens, ~$0.60/1M output tokens (cost-efficient)
    • Gemini 2.5 Pro: ~$1.25/1M input tokens, ~$5.00/1M output tokens
    • Third-party models (Claude, Llama): pricing varies by model
    • Grounding with Google Search: $2.50 per 1,000 requests (Gemini 3 Pro includes 5,000 free/month)
  • Custom Training:
    • Charged per node-hour based on machine type and accelerators
    • 30-second billing increments (no minimum duration)
    • No charge for failed training jobs (except user cancellations)
    • GPU pricing varies: A100 40GB ~$3.67/hr, H100 80GB ~$11.54/hr per accelerator
    • TPU pricing: v5e ~$1.20/hr per chip
  • Predictions:
    • Online prediction: charged per node-hour for deployed endpoint compute
    • Batch prediction: charged per node-hour for processing time
    • Automatic scaling to zero available (no charge when idle)
  • Other Services:
    • AutoML Training: per node-hour (varies by data type)
    • Feature Store: per GB stored + per million online reads
    • Vector Search: per node-hour for deployed indexes + storage
    • Pipelines: per pipeline run based on compute consumed
    • Agent Builder: per query/session based on usage
  • Total monthly costs range from under $100 for prototyping to $100,000+ for enterprise production workloads.
  • Free tier: new customers receive $300 Google Cloud credits applicable to Vertex AI services.

Vertex AI vs AWS SageMaker vs Azure ML

Feature Google Cloud Vertex AI AWS SageMaker Azure ML
Platform Philosophy Unified, integrated with BigQuery and Google AI research Broadest feature set, deep AWS integration Enterprise governance, Microsoft ecosystem integration
Foundation Models 200+ models (Gemini, Claude, Llama) via Model Garden 100+ models via Amazon Bedrock (Titan, Claude, Llama, Mistral) OpenAI models (GPT-4, o1) + open models via Azure AI Foundry
AutoML AutoML for tabular, image, video, text SageMaker Autopilot (tabular focus) Automated ML for tabular, vision, NLP
Custom Training GPUs + TPUs, serverless, Cluster Director GPUs + Trainium/Inferentia, SageMaker Training GPUs, Azure ML Compute clusters
Notebooks Workbench + Colab Enterprise SageMaker Studio Notebooks Azure ML Notebooks
Pipelines Kubeflow/TFX-based, serverless SageMaker Pipelines (proprietary SDK) Azure ML Pipelines (designer + SDK)
Feature Store BigQuery-integrated, GenAI-ready SageMaker Feature Store Azure ML Feature Store (managed)
Vector Search Vertex AI Vector Search (ScaNN) OpenSearch Serverless / Bedrock Knowledge Bases Azure AI Search (vector + hybrid)
AI Agents Agent Builder (ADK, Agent Engine, Studio) Amazon Bedrock Agents Azure AI Agent Service
Grounding/RAG Google Search + Vertex AI Search grounding Bedrock Knowledge Bases + Guardrails Azure AI Search + On Your Data
MLOps Model Registry, Experiments, Monitoring, Metadata Model Registry, Experiments, Model Monitor, Clarify Model Registry, Managed Endpoints, Responsible AI
Pricing Model Pay-per-use, 30s increments, TPU cost advantage Pay-per-use, instance markups 15-40% over EC2 Pay-per-use, lower platform surcharges
Key Differentiator BigQuery integration, TPU access, Google AI research Broadest service catalog, largest ecosystem OpenAI partnership, Power BI/Office integration
  • Choose Vertex AI when: using BigQuery for data, need TPU training, prefer Google’s Gemini models, want tight integration with Google Cloud services.
  • Choose SageMaker when: already on AWS, need the broadest feature set, require deep integration with AWS services, want Trainium/Inferentia for cost-effective inference.
  • Choose Azure ML when: using Microsoft ecosystem (Office, Teams, Power BI), need OpenAI GPT-4/o1 models, have enterprise governance requirements via Azure AD.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your data science team needs to quickly build a classification model on structured data without writing custom code. They want Google Cloud to handle feature engineering and architecture selection. Which Vertex AI service should they use?
    1. Vertex AI Custom Training with prebuilt containers
    2. Vertex AI AutoML Tabular
    3. Vertex AI Workbench with scikit-learn
    4. Vertex AI Pipelines with Kubeflow
  2. A company wants to deploy a Gemini model that provides answers grounded in their internal product documentation stored in Cloud Storage. Which Vertex AI feature should they use?
    1. Grounding with Google Search
    2. Grounding with Vertex AI Search
    3. Model distillation
    4. Vertex AI Feature Store
  3. Your organization needs to orchestrate an ML workflow that includes data preprocessing, model training, evaluation, and conditional deployment. The solution should be serverless and track artifacts automatically. What should you use?
    1. Cloud Composer (Apache Airflow)
    2. Cloud Run Jobs
    3. Vertex AI Pipelines with Kubeflow Pipelines SDK
    4. Cloud Scheduler with Cloud Functions
  4. A team needs to serve features with single-digit millisecond latency for real-time fraud detection. They already have feature data in BigQuery. Which service should they use?
    1. BigQuery real-time queries
    2. Memorystore for Redis
    3. Vertex AI Feature Store with optimized online serving
    4. Vertex AI Vector Search
  5. Your company wants to build a semantic search application that finds similar products from a catalog of 10 million items using embedding vectors. Which Vertex AI service is best suited?
    1. Vertex AI Feature Store
    2. BigQuery vector functions
    3. Vertex AI Search
    4. Vertex AI Vector Search
  6. An ML engineer wants to fine-tune a Gemini model to follow their company’s specific writing style using a small dataset of 500 examples. Which tuning approach provides the most parameter-efficient method?
    1. Full supervised fine-tuning
    2. RLHF
    3. LoRA (Low-Rank Adaptation)
    4. Model distillation
  7. A team has deployed a model on a Vertex AI endpoint and notices prediction quality degrading over time. Which Vertex AI capability should they enable to detect this automatically?
    1. Vertex AI Experiments
    2. Vertex AI Pipelines
    3. Vertex AI Metadata
    4. Vertex AI Model Monitoring (training-serving skew detection)
  8. Your team needs to deploy multiple versions of a model and gradually shift traffic from the old version to the new version. Which Vertex AI deployment feature enables this?
    1. Batch prediction
    2. Private endpoints
    3. Traffic splitting on online prediction endpoints
    4. Model Registry versioning
  9. A startup wants to use Claude (Anthropic) models on Google Cloud with enterprise security features like VPC Service Controls and data residency. Where should they access Claude?
    1. Directly from Anthropic’s API
    2. Vertex AI Model Garden (managed API)
    3. Deploy Claude on GKE
    4. Cloud Run with Claude container
  10. Your organization is training a large language model and needs to use TPU v5e pods for distributed training on Vertex AI. Which training method should they use?
    1. AutoML
    2. Custom Training with TPU VM configuration
    3. Vertex AI Studio fine-tuning
    4. BigQuery ML
  11. A company wants to build a customer service chatbot that combines deterministic conversation flows for order tracking with generative AI for open-ended questions. Which service should they use?
    1. Vertex AI Studio
    2. Vertex AI Search
    3. Conversational Agents (formerly Dialogflow CX) with Generative Playbooks
    4. Cloud Functions with Gemini API
  12. Which of the following is NOT a component of Vertex AI Agent Builder? (Select one)
    1. Agent Development Kit (ADK)
    2. Agent Engine
    3. Agent Studio
    4. Agent Trainer

References

Google Cloud AI Services Cheat Sheet

Google Cloud AI Services Cheat Sheet

  • Google Cloud provides a comprehensive suite of AI and Machine Learning services spanning the full ML lifecycle — from data preparation and model training to deployment, inference, and responsible AI governance.
  • In April 2026, Google rebranded Vertex AI as the Gemini Enterprise Agent Platform at Cloud Next ’26, consolidating all AI/ML services under an agent-first architecture.
  • Google Cloud AI services are broadly categorized into: AI Platform (Vertex AI / Gemini Enterprise Agent Platform), Foundation Models (Gemini), Pre-trained APIs, Conversational AI, AI Infrastructure (TPUs, GPUs), and Responsible AI tools.

Vertex AI / Gemini Enterprise Agent Platform

  • Vertex AI (now Gemini Enterprise Agent Platform since April 2026) is Google Cloud’s unified, fully managed ML platform for building, training, deploying, and scaling ML models and generative AI applications.
  • Provides a single environment combining AutoML and custom training with no-code, low-code, and code-first approaches.
  • Key components include:
    • Vertex AI Workbench — managed Jupyter notebook environment for data exploration and ML development.
    • Vertex AI Training — custom model training with distributed training support on GPUs and TPUs.
    • Vertex AI Predictions — online and batch prediction endpoints with autoscaling.
    • Vertex AI Pipelines — serverless ML workflow orchestration based on Kubeflow Pipelines or TFX.
    • Vertex AI Model Registry — central repository to manage, version, and deploy models.
    • Vertex AI Feature Store — managed feature storage for serving and sharing ML features at scale.
    • Vertex AI Model Garden — catalog of 200+ foundation models including Gemini, Claude, Llama, and open-source models.
    • Vertex AI Studio — UI for prompt engineering, model tuning, and testing generative AI models.
    • Vertex AI Experiments — track, compare, and analyze ML experiments.
    • Vertex AI Model Monitoring — detect data drift and model quality degradation in production.
  • Supports custom containers (Docker) for training and serving with any ML framework (TensorFlow, PyTorch, JAX, XGBoost, scikit-learn).
  • Provides pre-built containers for popular frameworks optimized for Google Cloud hardware.
  • Integrates with BigQuery, Cloud Storage, Dataflow, and other Google Cloud data services.
  • As of May 2026, Vertex AI has been fully migrated to Gemini Enterprise Agent Platform in the Google Cloud Console. All future updates are delivered through the Agent Platform.

Gemini (Foundation Model)

  • Gemini is Google’s family of multimodal foundation models from Google DeepMind, capable of understanding and generating text, images, audio, video, and code.
  • Gemini model family includes:
    • Gemini 3 Pro — most capable model for complex reasoning, coding, and multimodal tasks.
    • Gemini 3 Flash — optimized for speed and efficiency with near-Pro intelligence at lower cost.
    • Gemini 3.5 Flash — latest model with Pro-level coding proficiency and parallel agentic execution at Flash-tier pricing.
    • Gemini Nano — on-device model for mobile and edge deployments.
  • Supports multimodal inputs — can process text, images, audio, video, and code in a single prompt.
  • Offers a 1M+ token context window for processing large documents, codebases, and long videos.
  • Supports function calling, grounding with Google Search, and tool use for agentic applications.
  • Available through Vertex AI Studio, Vertex AI API, and Google AI Studio.
  • Supports fine-tuning and distillation to customize models for specific use cases.
  • Provides built-in safety filters with configurable thresholds for responsible deployment.
  • Gemini for Google Cloud (formerly Duet AI) provides AI-powered assistance across Google Cloud Console, Cloud Code, BigQuery, and other services.

Vertex AI Agent Builder

  • Vertex AI Agent Builder is Google Cloud’s comprehensive platform to build, scale, and govern reliable AI agents.
  • Key components include:
    • Agent Development Kit (ADK) — open-source, code-first framework for building multi-agent systems.
    • Agent Studio — low-code visual builder for designing agent workflows.
    • Agent Engine — managed runtime for deploying and scaling agents in production.
    • Agent Garden — collection of ready-to-use agent samples and tools.
  • Supports multi-agent orchestration where multiple agents collaborate on complex workflows.
  • Provider-agnostic — supports Gemini, Claude, Llama, and hundreds of third-party models from Model Garden.
  • Includes persistent memory, session management, and enterprise governance features.
  • Integrates with Google Workspace, third-party APIs, and enterprise data sources.
  • Supports the Agent-to-Agent (A2A) protocol for inter-agent communication across platforms.

Vertex AI Search

  • Vertex AI Search (part of AI Applications) brings together deep information retrieval, NLP, and LLM processing to understand user intent and return highly relevant results.
  • Goes beyond basic keyword matching using AI to deliver relevant results grounded in enterprise data.
  • Supports multiple data sources — websites, unstructured documents, structured data, and Cloud Storage.
  • Provides generative AI answers grounded in enterprise data with citations.
  • Includes Vertex AI Search for Commerce (formerly Recommendations AI) for e-commerce with:
    • AI-driven product rankings and catalog enhancements.
    • Conversational Commerce agent for guiding users from intent to purchase.
    • Personalized search results and recommendations optimized for revenue.
  • Supports RAG (Retrieval Augmented Generation) patterns for grounding LLM responses in enterprise data.
  • Provides out-of-the-box search widgets and APIs for quick integration.

Document AI

  • Document AI is a fully managed platform for document understanding that uses ML and generative AI to extract, classify, and enrich data from documents.
  • Supports structured, semi-structured, and unstructured documents (invoices, receipts, contracts, forms, IDs).
  • Key capabilities:
    • Document OCR — extract printed and handwritten text from documents and images.
    • Form Parser — extract key-value pairs, tables, and checkboxes from forms.
    • Specialized Processors — pre-trained models for invoices, receipts, bank statements, pay slips, W-2s, and procurement documents.
    • Custom Document Extractor — train custom models for domain-specific documents.
    • Document Splitter — classify and split multi-page documents.
    • Document AI Warehouse — search, store, and govern documents at scale with AI-powered classification.
  • Integrates with BigQuery, Cloud Storage, and Vertex AI Pipelines for end-to-end document processing workflows.
  • Supports human-in-the-loop review for critical document processing.
  • Processes documents asynchronously in batch or synchronously in real-time.

Vision AI

  • Vision AI provides pre-trained models for image analysis and computer vision tasks via the Cloud Vision API.
  • Key features:
    • Label Detection — identify objects, locations, activities, animal species, and products in images.
    • OCR (Text Detection) — extract printed and handwritten text from images.
    • Face Detection — detect faces along with associated attributes (joy, sorrow, anger, surprise).
    • Landmark Detection — identify popular natural and man-made landmarks.
    • Logo Detection — detect popular product and brand logos.
    • SafeSearch Detection — detect explicit content (adult, violence, medical, racy).
    • Image Properties — detect dominant colors and crop hints.
    • Object Localization — detect and locate multiple objects in an image with bounding polygons.
  • Supports batch image annotation for processing large volumes of images.
  • Provides a Product Search feature to find similar products in a product catalog.
  • Imagen on Vertex AI — Google’s text-to-image generation model for creating and editing images from text prompts.
  • Veo on Vertex AI — video generation model for creating videos from text and image prompts.
  • Vision AI pre-trained API provides basic capabilities; for custom image classification or object detection, use AutoML on Vertex AI.

Cloud Speech-to-Text

  • Speech-to-Text converts audio to text using Google’s deep learning neural network algorithms.
  • Supports 125+ languages and variants with automatic language detection.
  • Key features:
    • Real-time Streaming — transcribe audio from a microphone or streaming source in real-time.
    • Batch Recognition — transcribe pre-recorded audio files up to 480 minutes.
    • Multi-channel Recognition — transcribe separate channels (e.g., caller and agent in a call center).
    • Speaker Diarization — identify who said what in multi-speaker audio.
    • Automatic Punctuation — automatically add punctuation to transcripts.
    • Word-level Confidence — confidence scores for individual words.
    • Speech Adaptation — boost recognition of domain-specific terms and phrases.
    • Chirp — universal speech model with state-of-the-art accuracy across languages.
  • Provides V2 API with improved accuracy using latest foundation models.
  • Supports noise robustness for transcribing audio in noisy environments.

Cloud Text-to-Speech

  • Text-to-Speech converts text into natural-sounding speech using Google’s AI.
  • Offers 700+ voices across 50+ languages and variants, including Neural2, Studio, and WaveNet voices.
  • Key features:
    • WaveNet Voices — high-fidelity voices generated by DeepMind’s WaveNet model.
    • Neural2 Voices — next-generation voices combining Tensor2Tensor with WaveNet for improved quality.
    • Studio Voices — premium, human-like voices for professional applications.
    • Custom Voice — create a unique voice using your own recordings.
    • SSML Support — control pronunciation, speaking rate, pitch, and volume with Speech Synthesis Markup Language.
    • Multi-speaker — generate audio with multiple distinct speakers in a single request.
  • Supports audio output in MP3, OGG Opus, LINEAR16, and MULAW formats.
  • Integrates with Dialogflow for voice-enabled conversational agents.

Cloud Natural Language AI

  • Natural Language AI uses ML to extract insights from unstructured text.
  • Key capabilities:
    • Sentiment Analysis — understand the overall sentiment (positive/negative) of text at document and sentence level.
    • Entity Analysis — identify entities (people, organizations, locations, events, products) and their types.
    • Entity Sentiment Analysis — combine entity and sentiment analysis to understand sentiment about specific entities.
    • Syntax Analysis — extract tokens and sentences, identify parts of speech, and create dependency parse trees.
    • Content Classification — classify documents into 1,000+ predefined categories.
    • Text Moderation — classify text into safety categories (toxic, insult, profanity, etc.).
  • Supports multiple languages for all analysis features.
  • Provides the Healthcare Natural Language API for extracting medical entities from clinical text.
  • For custom text classification or entity extraction, use AutoML Natural Language on Vertex AI.

Cloud Translation AI

  • Cloud Translation provides real-time language translation using neural machine translation (NMT).
  • Two editions available:
    • Translation API Basic (v2) — simple, quick translations for 100+ languages.
    • Translation API Advanced (v3) — enterprise features including glossaries, custom models, and batch translation.
  • Key features:
    • AutoML Translation — train custom translation models with domain-specific terminology.
    • Adaptive Translation — real-time customization using few-shot examples without training a full model.
    • Glossaries — ensure consistent translation of domain-specific terms (brand names, product names).
    • Batch Translation — translate large volumes of documents asynchronously.
    • Language Detection — automatically detect the source language.
    • Document Translation — translate documents while preserving formatting (PDF, DOCX).
  • Supports 130+ languages for text translation.
  • Integrates with Cloud Storage for batch processing and BigQuery for analytics.

Video Intelligence AI

  • Video Intelligence API enables understanding of video content by analyzing stored and streaming video.
  • Key features:
    • Label Detection — recognize 20,000+ objects, places, and actions in video at shot, frame, or segment level.
    • Shot Change Detection — detect scene transitions in video.
    • Explicit Content Detection — identify inappropriate content in video.
    • Speech Transcription — transcribe speech within video content.
    • Text Detection (OCR) — detect and extract text appearing in video frames.
    • Object Tracking — track objects across video frames with bounding boxes.
    • Person Detection — detect people and track their poses in video.
    • Face Detection — detect faces in video (without identification).
    • Logo Detection — detect and track brand logos in video.
  • Supports both stored video (Cloud Storage, URIs) and streaming video analysis.
  • Provides rich metadata at video, shot, and frame levels for building searchable video archives.
  • Integrates with Cloud Storage, BigQuery, and Pub/Sub for automated video processing pipelines.

Contact Center AI (CCAI)

  • Contact Center AI Platform is a full-stack contact center solution for managing customer interactions across voice and digital channels.
  • Key components:
    • CCAI Platform — full CCaaS (Contact Center as a Service) with routing, queuing, and workforce management.
    • Dialogflow CX Virtual Agents — AI-powered virtual agents that handle customer interactions before routing to human agents.
    • Agent Assist — provides real-time suggestions, knowledge articles, and smart replies to human agents during conversations.
    • CCAI Insights — analyzes call transcripts to identify call drivers, sentiment, and conversation topics at scale.
    • Conversational Agents — new name for Dialogflow CX in the CCAI context (renamed 2025).
  • Supports IVA-only deployments to add Google’s generative AI virtual agents without replacing existing contact center infrastructure.
  • Integrates with third-party CRM and telephony systems (Genesys, Avaya, NICE, Cisco).
  • Provides sentiment analysis, entity extraction, and intent detection for every conversation.
  • Supports both voice and digital channels (chat, email, SMS, social media).

Dialogflow (CX and ES)

  • Dialogflow is a natural language understanding platform for building conversational interfaces (chatbots, voice bots, IVR systems).
  • Two editions available:
    • Dialogflow CX (Conversational Agents) — enterprise-grade edition for complex, multi-turn conversations.
      • Uses visual flow builder for designing conversation paths.
      • Supports state-based conversation management with pages, flows, and transition routes.
      • Provides built-in generative AI capabilities using Gemini for dynamic responses.
      • Supports data store agents for grounding responses in enterprise data.
      • Multi-language support with separate flows per language.
      • Advanced analytics and debugging tools.
    • Dialogflow ES (Essentials) — standard edition for simpler, single-turn or basic multi-turn conversations.
      • Intent-based conversation model with contexts for state management.
      • Suitable for small to medium chatbots and simple IVR systems.
      • Simpler setup but less control over complex conversation flows.
  • Dialogflow CX is recommended for new projects. ES is maintained but CX provides superior capabilities for enterprise use cases.
  • Integrates with telephony partners, Google Chat, Slack, Facebook Messenger, Twilio, and custom channels.
  • Supports webhook fulfillment for dynamic responses and backend integration.

Recommendations AI

  • Recommendations AI (now part of Vertex AI Search for Commerce) delivers personalized product recommendations at scale using Google’s ML expertise.
  • Key recommendation types:
    • Recommended for You — personalized suggestions based on user browsing and purchase history.
    • Others You May Like — similar product recommendations based on collective user behavior.
    • Frequently Bought Together — complementary product suggestions for cross-selling.
    • Similar Items — visually or categorically similar products.
    • Recently Viewed — personalized recall of previously viewed items.
  • Supports real-time user events for immediate personalization updates.
  • Provides A/B testing capabilities to measure recommendation quality impact on revenue.
  • Requires catalog data (products) and user events (views, add-to-cart, purchases) for model training.
  • Models improve automatically as more user interaction data is collected.
  • Recommendations AI has been consolidated into Vertex AI Search for Commerce / AI Commerce Search as of 2025.

Gemini for Google Cloud (formerly Duet AI)

  • Gemini for Google Cloud is an AI-powered collaborator embedded across Google Cloud services to boost developer and operator productivity.
  • Previously known as Duet AI for Google Cloud (rebranded to Gemini in February 2024).
  • Key capabilities across services:
    • Gemini Code Assist — AI-powered code completion, generation, and explanation in Cloud Shell Editor, VS Code, JetBrains IDEs, and Cloud Workstations.
    • Gemini in BigQuery — generate SQL queries, explain results, suggest optimizations using natural language.
    • Gemini in Cloud Console — natural language assistance for cloud operations, troubleshooting, and configuration.
    • Gemini in Looker — generate visualizations and formulas from natural language.
    • Gemini Cloud Assist — AI-driven recommendations for design, operations, and troubleshooting.
    • Gemini in Security — summarize security findings, explain threats, and suggest remediation in Security Command Center.
    • Gemini in Databases — generate schemas, optimize queries, and explain database operations (Cloud SQL, Spanner, AlloyDB).
  • Gemini Code Assist supports 20+ programming languages with full codebase context awareness.
  • Available in two tiers: Gemini Code Assist Standard and Gemini Code Assist Enterprise with codebase customization.

AutoML

  • AutoML enables training custom, high-quality ML models with minimal ML expertise using transfer learning and neural architecture search.
  • AutoML is now integrated into Vertex AI and supports:
    • AutoML Image Classification — classify images into custom categories.
    • AutoML Object Detection — detect and locate custom objects in images.
    • AutoML Text Classification — classify text documents into custom categories.
    • AutoML Entity Extraction — extract custom entities from text.
    • AutoML Sentiment Analysis — analyze sentiment with custom models.
    • AutoML Translation — train custom neural machine translation models.
    • AutoML Video Classification — classify video segments.
    • AutoML Video Object Tracking — track custom objects in video.
    • AutoML Tabular — train models on structured/tabular data for classification, regression, and forecasting.
  • Uses Google’s state-of-the-art transfer learning and neural architecture search technology.
  • Requires labeled training data — supports human labeling through Vertex AI Data Labeling service.
  • Provides model evaluation metrics (precision, recall, F1, confusion matrix) before deployment.
  • Trained models can be exported for edge deployment (TensorFlow Lite, TF.js, Core ML) or served via Vertex AI Endpoints.
  • Standalone AutoML products (automl.googleapis.com) have been migrated to Vertex AI. Use Vertex AI for all new AutoML workloads.

Cloud TPUs (Tensor Processing Units)

  • Cloud TPUs are Google’s custom-designed AI accelerators (ASICs) optimized for training and inference of large ML models using TensorFlow, PyTorch, and JAX.
  • TPU generations available on Google Cloud:
    • TPU v5e — cost-efficient accelerator optimized for training and serving transformer models, text-to-image, and CNNs. 256 chips per Pod.
    • TPU v6e (Trillium) — 6th generation with 4.7x peak compute improvement over v5e, doubled HBM capacity/bandwidth, and doubled ICI bandwidth. 256 chips per Pod.
    • TPU v5p — high-performance variant optimized for large-scale training workloads.
    • TPU7x (Ironwood) — 7th generation, Google’s most powerful TPU:
      • 4.6 petaFLOPS of peak FP8 compute per chip.
      • 192 GiB HBM3e memory per chip with 7.4 TB/s bandwidth.
      • 10x peak performance improvement over v5p.
      • 4x better performance per chip vs. v6e for training and inference.
      • 9,216-chip superpods delivering 42.5 exaFLOPS of FP8 compute.
      • 1.77 PB of directly accessible HBM capacity per superpod.
      • Each chip contains two TensorCores and four SparseCores.
  • TPUs are connected via high-speed Inter-Chip Interconnect (ICI) for efficient distributed training.
  • Support multislice training to scale beyond a single TPU Pod for training frontier models.
  • Available in Google Kubernetes Engine (GKE), Vertex AI, and Cloud TPU VMs.
  • Optimized for ML frameworks: JAX (best performance), TensorFlow, and PyTorch/XLA.
  • Support Queued Resources for managing TPU allocation in high-demand scenarios.
  • Only TPU v5e, v6e, and TPU7x are supported for Vertex AI model deployment. Earlier generations are deprecated for new workloads.

AI Infrastructure (GPUs and VMs)

  • Google Cloud provides GPU-accelerated VMs optimized for AI/ML workloads as part of AI Hypercomputer — a unified architecture integrating hardware, software, and flexible consumption models.
  • Key GPU VM families:
    • A3 Mega VMs — powered by 8x NVIDIA H100 80GB GPUs with 3.2 Tbps GPU-to-GPU networking. Optimized for large-scale training.
    • A3 Ultra VMs — powered by 8x NVIDIA H200 141GB GPUs (GA since late 2024). Superior memory bandwidth for large model training and inference.
    • A2 Ultra VMs — powered by NVIDIA A100 80GB GPUs.
    • G2 VMs — powered by NVIDIA L4 GPUs, optimized for inference and smaller training workloads.
  • Hypercompute Cluster — highly scalable clustering system for multi-node GPU workloads (GA 2024).
  • Key features:
    • Dynamic Workload Scheduler — efficiently schedule and manage GPU/TPU workloads.
    • Multislice/Multihost Training — scale training across multiple VMs/TPU slices.
    • NVIDIA NVLink and NVSwitch — high-bandwidth GPU-to-GPU interconnect within nodes.
    • GPUDirect-TCPXO — optimized networking stack for distributed GPU training.
  • Supports JetStream and vLLM for optimized LLM serving on both TPUs and GPUs.
  • Available with committed use discounts (CUDs) and on-demand pricing.
  • Integrates with GKE for container-orchestrated AI workloads and Vertex AI for managed training/serving.

Responsible AI

  • Google Cloud provides tools and frameworks for developing AI responsibly, aligned with Google’s AI Principles.
  • Key Responsible AI capabilities:
    • Vertex Explainable AI (XAI) — understand model predictions through:
      • Feature-based Explanations — feature attributions showing how each input feature contributed to a prediction (Shapley values, Integrated Gradients, XRAI).
      • Example-based Explanations — identify training examples most similar to the input being explained.
    • Model Cards — structured documentation describing model performance, intended use, limitations, and ethical considerations. Supports generating Model Cards automatically via Vertex AI Pipelines.
    • Fairness Indicators — evaluate model performance across different demographic groups to identify potential bias.
    • Data Cards — document dataset characteristics, collection methodology, and known biases.
    • Safety Filters — configurable content filtering for generative AI models across categories (hate speech, harassment, sexually explicit, dangerous content).
    • Guardrails — set boundaries on model behavior with system instructions and safety settings.
    • Model Evaluation — evaluate generative models on safety, quality, and groundedness metrics.
  • Safety attribute scoring available in all Vertex AI generative AI APIs with configurable confidence thresholds.
  • Vertex AI provides built-in content filtering that can be tuned per use case.
  • Supports Responsible AI practices throughout the ML lifecycle: data collection, training, evaluation, deployment, and monitoring.
  • Google publishes annual Responsible AI Progress Reports detailing governance, safety testing, and red-teaming practices.

Google Cloud vs AWS AI Services Comparison

Category Google Cloud Service AWS Equivalent
ML Platform Vertex AI / Gemini Enterprise Agent Platform Amazon SageMaker
Foundation Model Gemini (3 Pro, 3 Flash, 3.5 Flash, Nano) Amazon Nova, Claude (via Bedrock)
Model Hub / API Vertex AI Model Garden Amazon Bedrock
AI Agent Builder Vertex AI Agent Builder (ADK, Agent Engine) Amazon Bedrock Agents
Enterprise Search Vertex AI Search Amazon Kendra / Amazon Q Business
AI Code Assistant Gemini Code Assist Amazon Q Developer (formerly CodeWhisperer)
Document Processing Document AI Amazon Textract
Image Analysis Cloud Vision AI Amazon Rekognition (Images)
Image Generation Imagen on Vertex AI Amazon Titan Image Generator / Amazon Nova Canvas
Video Generation Veo on Vertex AI Amazon Nova Reel
Video Analysis Video Intelligence AI Amazon Rekognition Video
Speech-to-Text Cloud Speech-to-Text Amazon Transcribe
Text-to-Speech Cloud Text-to-Speech Amazon Polly
NLP / Text Analysis Cloud Natural Language AI Amazon Comprehend
Translation Cloud Translation AI Amazon Translate
Conversational AI Dialogflow CX (Conversational Agents) Amazon Lex
Contact Center AI CCAI Platform Amazon Connect
Recommendations Vertex AI Search for Commerce / Recommendations AI Amazon Personalize
AutoML Vertex AI AutoML Amazon SageMaker Autopilot
Custom AI Chips Cloud TPUs (v5e, v6e Trillium, TPU7x Ironwood) AWS Trainium / Inferentia
GPU VMs (Training) A3 Ultra (H200), A3 Mega (H100) P5 (H100), P5e (H200) instances
GPU VMs (Inference) G2 (L4 GPUs) G5 (A10G), Inf2 (Inferentia2)
AI Infrastructure Platform AI Hypercomputer AWS AI Infrastructure (UltraClusters)
Explainability Vertex Explainable AI SageMaker Clarify
Model Documentation Model Cards SageMaker Model Cards
Bias Detection Fairness Indicators SageMaker Clarify (Bias Detection)
Forecasting Vertex AI Forecasting (AutoML Tabular) Amazon Forecast
Data Labeling Vertex AI Data Labeling SageMaker Ground Truth
Feature Store Vertex AI Feature Store SageMaker Feature Store
ML Pipelines Vertex AI Pipelines SageMaker Pipelines
Notebook Environment Vertex AI Workbench SageMaker Studio

Google Gemini API & AI Studio – Developer Guide

Google Gemini API & AI Studio – Developer Guide

📌 Last Updated: June 2026. This post covers the Gemini model family (2.5 Pro, 2.5 Flash, Flash-Lite, Nano), Google AI Studio vs Vertex AI Studio, Gemini API capabilities, pricing tiers, rate limits, safety settings, Gemini Code Assist, and comparison with OpenAI GPT and Anthropic Claude.

  • Google Gemini is Google’s family of multimodal AI models that can process text, images, video, audio, and code.
  • Gemini models are available through the Gemini Developer API (via Google AI Studio) and through Google Cloud’s Vertex AI (now Gemini Enterprise Agent Platform).
  • Gemini 2.5 Pro and 2.5 Flash became generally available (GA) in June 2025, providing production-ready stability and scalability.
  • The model family spans from Gemini 2.5 Pro (most capable) to Gemini Nano (on-device), covering cloud API, enterprise, and edge use cases.
  • Gemini supports a 1M+ token context window, the largest among frontier models, enabling processing of entire codebases, long documents, and hours of video in a single prompt.

Gemini Model Family

  • Gemini 2.5 Pro – Google’s most capable model for complex reasoning, coding, and multimodal tasks.
    • 1M token context window (input), up to 66K output tokens
    • Excels at coding, mathematical reasoning, scientific analysis, and multi-step problem solving
    • Supports “thinking” mode with configurable thinking budgets for chain-of-thought reasoning
    • Natively multimodal – processes text, images, audio, video, and PDFs
    • Supports function calling, structured output (JSON mode), grounding with Google Search, and code execution
    • GA since June 2025; model ID: gemini-2.5-pro
  • Gemini 2.5 Flash – Hybrid reasoning model optimized for speed and cost-efficiency.
    • 1M token context window with thinking capabilities (first Flash model with thinking)
    • Configurable thinking budgets – control reasoning depth vs latency tradeoff
    • Excellent for production workloads needing fast responses at lower cost
    • Supports all Pro capabilities: multimodal input, function calling, grounding, JSON mode
    • ~4x cheaper than Pro for input tokens, ~4x cheaper for output tokens
    • GA since June 2025; model ID: gemini-2.5-flash
  • Gemini 2.5 Flash-Lite – Most cost-efficient cloud model for high-volume tasks.
    • 1M token context window
    • Optimized for high-throughput, cost-sensitive workloads: classification, translation, simple data processing
    • Pricing starts at $0.10/1M input tokens and $0.40/1M output tokens
    • Supports grounding with Google Search and Google Maps
    • Model ID: gemini-2.5-flash-lite
  • Gemini Nano – On-device model for Android and Chrome.
    • Runs natively on device hardware (NPU/GPU) without cloud connectivity
    • Available on Pixel 8 Pro, Pixel 9/10 series, Samsung Galaxy S24+ and later
    • Supports summarization, smart reply, proofreading, rewriting, and image description
    • Available in Chrome via the Prompt API (downloaded automatically with browser updates)
    • Privacy-preserving – all processing stays on device
    • Accessible via ML Kit GenAI APIs on Android and AICore
    • Supports hybrid inference – dynamically switches between on-device Nano and cloud-hosted Gemini models

Google AI Studio vs Vertex AI Studio

  • Google AI Studio (aistudio.google.com) – Free, web-based IDE for prototyping with Gemini.
    • Quick experimentation with prompts, no Google Cloud account required
    • Get an API key instantly for development
    • Supports prompt testing, side-by-side model comparison, and code export
    • Build mode for vibe-coding full-stack applications directly in the browser
    • One-click deployment to Google Cloud Run (up to 2 apps free via Starter Tier)
    • Free tier available with generous rate limits
    • Content on free tier may be used to improve Google products
    • Best for: rapid prototyping, learning, hackathons, individual developers
  • Vertex AI Studio / Gemini Enterprise Agent Platform (Google Cloud Console) – Enterprise-grade AI platform.
    • Requires Google Cloud project with billing enabled
    • Full IAM, VPC, audit logging, and enterprise security controls
    • Model fine-tuning (SFT), RAG Engine, model evaluation, and ML pipelines
    • Access to Model Garden with 200+ models (not just Gemini)
    • Provisioned throughput for guaranteed capacity
    • Data residency and compliance (HIPAA, SOC 2, FedRAMP)
    • Agent Builder for no-code conversational agent development
    • Content is never used to improve Google products
    • Best for: production enterprise deployments, regulated industries, multi-model workflows

💡 Certification Tip: If a question describes quick prototyping with no GCP account, the answer is Google AI Studio. If it mentions IAM, VPC, fine-tuning, RAG, or ML pipelines, the answer is Vertex AI / Gemini Enterprise Agent Platform.

Gemini API Capabilities

Multimodal Input & Output

  • Text – Natural language understanding, generation, summarization, translation, and Q&A
  • Images – Image understanding (describe, analyze, OCR) and native image generation (Gemini 2.5 Flash Image)
  • Video – Process and understand video content up to hours in length; video generation via Veo models
  • Audio – Audio understanding, speech-to-text, native audio output, and text-to-speech (TTS)
  • Code – Code generation, debugging, explanation, refactoring across 20+ programming languages
  • Documents/PDFs – Process entire PDF documents natively with layout understanding

Function Calling

  • Connect Gemini to external tools, APIs, and databases
  • Model determines when to call a function and provides structured parameters
  • Supports parallel function calling (multiple functions in one turn)
  • Automatic function calling mode available in SDKs
  • Gemini 3+ models generate unique IDs for each function call for tracing
  • Works with both Google AI Studio and Vertex AI endpoints

Grounding with Google Search

  • Connects Gemini to real-time, publicly available web content
  • Provides accurate, up-to-date answers with cited verifiable sources beyond model’s training cutoff
  • Returns grounding metadata with source URLs and support chunks
  • Supports dynamic retrieval – only charges when grounding actually contributes to response
  • Works with all available languages
  • Rate limits: Free tier gets 500 RPD (requests per day); Paid tier gets 1,500 RPD free then $35/1,000 grounded prompts
  • Limit of 1M queries per day (contact support for higher)
  • Respects robots.txt Google-Extended directives from web publishers

Structured Output (JSON Mode)

  • Force Gemini to output valid JSON conforming to a provided schema
  • Specify response schema using JSON Schema format
  • Guarantees parseable output for programmatic consumption
  • Supports enums, nested objects, arrays, and optional fields
  • Set response_mime_type: "application/json" in generation config

System Instructions

  • Set persistent behavioral guidelines that apply across all turns in a conversation
  • Define persona, tone, output format, safety constraints, and domain expertise
  • System instructions are separate from user messages and persist throughout the session
  • Supports multi-part system instructions for complex configurations

Context Caching

  • Cache large input contexts (documents, code repos) and reuse across multiple requests
  • Reduces latency and cost for repeated context (up to 90% cheaper for cached tokens)
  • Minimum cache size: 32,768 tokens
  • Storage price: $1.00–$4.50 per 1M tokens per hour depending on model
  • Available on paid tier only

Additional Capabilities

  • Code Execution – Model can write and run Python code in a sandboxed environment to solve problems
  • URL Context – Fetch and process content from URLs as part of the prompt
  • Computer Use – Build browser control agents that automate tasks (Preview)
  • File Search – Upload documents and perform semantic search across them
  • Live API – Real-time, low-latency bidirectional streaming for voice/video applications
  • Batch API – Process large volumes asynchronously at 50% cost reduction
  • Thinking/Reasoning – Configurable chain-of-thought with thinking budgets and thought signatures

Context Window – 1M+ Tokens

  • Gemini 2.5 Pro and Flash support a 1,000,000 token context window
  • This is equivalent to approximately:
    • ~750,000 words (longer than the entire Lord of the Rings trilogy)
    • ~1.5 hours of video
    • ~11 hours of audio
    • ~30,000 lines of code
  • Enables processing entire codebases, lengthy legal documents, research papers, or video content in a single prompt
  • Output token limits: up to 66K for Pro, 65K for Flash
  • Pricing tiers differ based on prompt length (≤200K tokens vs >200K tokens for Pro)
  • Context caching available to reduce costs for repeated large-context queries

Gemini in Google Cloud (Vertex AI / Gemini Enterprise Agent Platform)

  • Gemini models are available through Google Cloud’s enterprise AI platform (formerly Vertex AI, now Gemini Enterprise Agent Platform as of Cloud Next 2026)
  • Provides enterprise-grade features beyond the Developer API:
    • Fine-tuning (SFT) – Supervised fine-tuning on custom datasets
    • RAG Engine – Built-in Retrieval-Augmented Generation with managed vector stores
    • Model Evaluation – Automated evaluation pipelines with custom metrics
    • Provisioned Throughput – Guaranteed capacity for latency-sensitive applications
    • VPC Service Controls – Network isolation and data exfiltration prevention
    • CMEK – Customer-managed encryption keys for data at rest
    • Agent Builder – No-code platform for building conversational agents with grounding
  • Same Gemini models as the Developer API but with enterprise SLAs and compliance certifications
  • Supports HIPAA, SOC 1/2/3, ISO 27001, FedRAMP, PCI DSS compliance
  • Data is never used to train or improve Google models
  • Pricing may differ from Developer API; check Gemini Enterprise Agent Platform pricing page

Gemini Code Assist (IDE Integration)

  • AI-powered coding assistant integrated directly into IDEs (VS Code and JetBrains IDEs)
  • Key Capabilities:
    • Inline code completions while typing
    • Code generation from natural language prompts and comments
    • Code transformation and refactoring via chat
    • Smart actions (explain code, generate tests, fix bugs)
    • Full-project context awareness with file/folder specification
    • Agent mode for multi-step autonomous coding tasks (since Oct 2025)
    • Custom commands and rules configuration
    • Source citations for generated code
  • Editions:
    • Free tier – Available for individual developers via Google AI (Individual, Pro, Ultra tiers since June 2026)
    • Standard – For teams, includes features beyond the IDE
    • Enterprise – Large-context analysis (up to 1M tokens) across indexed repositories, integration with Google Cloud services, code customization
  • Enterprise edition integrates with Google Cloud services: Cloud Build, Cloud Run, Cloud Logging
  • Supports large-context analysis across entire repositories

Pricing Tiers (Free vs Paid)

Free Tier

  • Available through Google AI Studio – no billing account required
  • Access to Gemini 2.5 Pro, 2.5 Flash, and Flash-Lite models
  • Free input and output tokens within rate limits
  • Grounding with Google Search: up to 500 RPD (Flash/Flash-Lite)
  • Content may be used to improve Google products
  • Lower rate limits (5–15 RPM depending on model)
  • No access to context caching, Batch API, or some advanced features

Paid Tier

  • Link a billing account and prepay minimum $10 to upgrade
  • Higher rate limits (150–300+ RPM at Tier 1)
  • Access to context caching, Batch API (50% cost reduction), Flex and Priority inference
  • Content NOT used to improve Google products (enterprise-grade data privacy)
  • Tiered system based on cumulative spend: Tier 1 → Tier 2 → Tier 3 (postpay option)

Key Pricing (per 1M tokens, Standard tier)

  • Gemini 2.5 Pro: $1.25 input (≤200K) / $2.50 (>200K) | $10.00 output (≤200K) / $15.00 (>200K)
  • Gemini 2.5 Flash: $0.30 input (text/image/video) | $2.50 output
  • Gemini 2.5 Flash-Lite: $0.10 input (text/image/video) | $0.40 output
  • Grounding with Google Search: 1,500 RPD free, then $35/1,000 grounded prompts (2.5 models)
  • Batch API: 50% discount on standard pricing across all models
  • Context Caching: ~10% of input price per cached token read + storage fee per hour

Enterprise Tier (Gemini Enterprise Agent Platform)

  • Custom pricing based on usage volume
  • Dedicated support channels, advanced security, compliance certifications
  • Provisioned throughput and volume-based discounts
  • Contact Google Cloud sales for pricing

Rate Limits

  • Rate limits are determined at the billing account level and vary by tier and model
  • Free Tier:
    • Gemini 2.5 Pro: ~5 RPM (requests per minute)
    • Gemini 2.5 Flash: ~15 RPM
    • Gemini 2.5 Flash-Lite: ~15 RPM
    • Up to 250,000 tokens per minute
    • Up to 1,000 requests per day
  • Paid Tier 1:
    • 150–300 RPM depending on model
    • Higher token per minute limits
    • Higher daily request limits
  • Paid Tier 2–3: Progressively higher limits based on cumulative spend and account age
  • Rate limit dimensions: RPM (requests/minute), TPM (tokens/minute), RPD (requests/day)
  • Grounding with Google Search: 500 RPD (free) / 1,500 RPD free then pay-per-use (paid)
  • Exceeding limits returns HTTP 429 (Resource Exhausted) – implement exponential backoff

Safety Settings

  • Gemini API includes configurable content safety filters across multiple harm categories
  • Harm Categories:
    • HARM_CATEGORY_HARASSMENT – Harassment and bullying content
    • HARM_CATEGORY_HATE_SPEECH – Hate speech targeting protected groups
    • HARM_CATEGORY_SEXUALLY_EXPLICIT – Sexual content
    • HARM_CATEGORY_DANGEROUS_CONTENT – Dangerous or harmful activities
    • HARM_CATEGORY_CIVIC_INTEGRITY – Election/civic misinformation
  • Blocking Thresholds:
    • BLOCK_NONE – No blocking (may still have some restrictions)
    • BLOCK_ONLY_HIGH – Block only high-probability unsafe content
    • BLOCK_MEDIUM_AND_ABOVE – Block medium and high (default)
    • BLOCK_LOW_AND_ABOVE – Most restrictive setting
  • Safety ratings are provided for each response with probability levels: HIGH, MEDIUM, LOW, NEGLIGIBLE
  • Filters are configurable (default off for paid tier) – can be adjusted per request
  • System instructions can add additional safety guardrails on top of content filters
  • Image generation has additional responsible AI filters (no violent extremism, no CSAM, no non-consensual imagery)

Comparison: Gemini vs OpenAI GPT vs Anthropic Claude

  • All three platforms (Gemini, GPT, Claude) offer near-parity on general reasoning and coding benchmarks as of 2026
  • Context Window:
    • Gemini 2.5 Pro: 1,000,000 tokens (largest)
    • Claude Opus 4: 200,000 tokens
    • GPT-4o / GPT-5: 128,000–256,000 tokens
  • Multimodal:
    • Gemini: Native multimodal (text, image, video, audio, code, PDF) – strongest video/audio understanding
    • GPT-4o: Text, image, audio (limited video)
    • Claude: Text, image (no native audio/video)
  • Coding:
    • Claude Opus dominates SWE-bench Verified (~80-88% scores) – best for complex agentic coding
    • Gemini 2.5 Pro: Strong coding with unique large-context advantage for full-repo understanding
    • GPT-5: Strong general coding with excellent structured output
  • Pricing (per 1M tokens, approximate):
    • Gemini 2.5 Pro: $1.25 / $10.00 (input/output)
    • Claude Opus 4: $5.00 / $25.00 (input/output)
    • GPT-4o: $2.50 / $10.00 (input/output)
    • Gemini 2.5 Flash: $0.30 / $2.50 – significantly cheaper than competitors’ mid-tier models
  • Unique Strengths:
    • Gemini: Largest context window, best multimodal (especially video/audio), native Google Search grounding, most cost-efficient at scale
    • Claude: Best coding assistant, strongest safety/alignment, excellent for agentic multi-step tasks
    • GPT: Strongest general reasoning and math, best ecosystem/plugin support, excellent structured output
  • Integration:
    • Gemini: Google Cloud, Android, Chrome, Google Workspace
    • GPT: Azure OpenAI, Microsoft ecosystem
    • Claude: AWS Bedrock, direct API

When to Use Which Gemini Model Size

  • Use Gemini 2.5 Pro when:
    • Complex multi-step reasoning is required
    • Processing very large codebases or documents (full-repo analysis)
    • Highest quality output is more important than cost/latency
    • Advanced coding tasks: architecture decisions, complex refactoring, multi-file changes
    • Scientific research, mathematical proofs, legal analysis
  • Use Gemini 2.5 Flash when:
    • Production applications needing balance of quality and speed
    • Real-time user-facing applications (chatbots, assistants)
    • Tasks requiring reasoning but with latency constraints
    • General-purpose coding assistance, summarization, Q&A
    • Budget-conscious applications that still need thinking capabilities
  • Use Gemini 2.5 Flash-Lite when:
    • High-volume, cost-sensitive workloads at scale
    • Simple classification, entity extraction, sentiment analysis
    • Translation and localization tasks
    • Data processing pipelines with thousands of requests
    • Tasks where speed and cost matter more than reasoning depth
  • Use Gemini Nano when:
    • On-device processing with no internet connectivity
    • Privacy-sensitive applications (data never leaves device)
    • Low-latency responses on mobile devices
    • Smart replies, text summarization, image descriptions on Android
    • Hybrid inference (Nano for simple queries, cloud for complex ones)

Google Gemini API & AI Studio – Practice Questions

  1. A developer needs to build a prototype chatbot using Gemini models with no Google Cloud account setup. Which service should they use?
    • A. Vertex AI Studio
    • B. Google AI Studio
    • C. Cloud Run
    • D. Firebase ML

    Answer: B – Google AI Studio provides free access to Gemini models for prototyping without requiring a GCP account.

  2. Which Gemini model offers the largest context window for processing entire codebases in a single prompt?
    • A. Gemini Nano
    • B. Gemini 2.5 Flash-Lite
    • C. Gemini 2.5 Pro
    • D. GPT-4o

    Answer: C – Gemini 2.5 Pro supports a 1M token context window, the largest among frontier models.

  3. A company requires enterprise security controls, HIPAA compliance, and fine-tuning capabilities for their Gemini deployment. Which platform should they choose?
    • A. Google AI Studio Free Tier
    • B. Google AI Studio Paid Tier
    • C. Gemini Enterprise Agent Platform (Vertex AI)
    • D. Gemini Code Assist

    Answer: C – Gemini Enterprise Agent Platform provides enterprise security, compliance, fine-tuning, and VPC controls.

  4. Which Gemini API feature allows the model to access real-time information beyond its training data cutoff?
    • A. Function Calling
    • B. Context Caching
    • C. Grounding with Google Search
    • D. System Instructions

    Answer: C – Grounding with Google Search connects Gemini to real-time web content and provides cited sources.

  5. A mobile app developer needs AI summarization that works offline on Android devices with no data leaving the device. Which model is appropriate?
    • A. Gemini 2.5 Flash via API
    • B. Gemini 2.5 Pro via Vertex AI
    • C. Gemini Nano via ML Kit GenAI APIs
    • D. Gemini 2.5 Flash-Lite via Batch API

    Answer: C – Gemini Nano runs on-device without cloud connectivity, providing privacy-preserving AI processing.

  6. Which configuration ensures Gemini API always returns valid JSON that conforms to a specific schema? (Select TWO)
    • A. Set response_mime_type to “application/json”
    • B. Include a JSON example in the prompt
    • C. Provide a response_schema in the generation config
    • D. Use grounding with Google Search
    • E. Enable function calling

    Answer: A, C – Setting response_mime_type to “application/json” and providing a response_schema guarantees structured JSON output.

  7. A startup wants to minimize API costs while processing 100,000 classification requests daily. Which combination offers the lowest cost?
    • A. Gemini 2.5 Pro with Standard inference
    • B. Gemini 2.5 Flash with Priority inference
    • C. Gemini 2.5 Flash-Lite with Batch API
    • D. Gemini 2.5 Flash with context caching

    Answer: C – Flash-Lite ($0.10/$0.40) with Batch API (50% discount) gives the lowest cost for high-volume simple tasks.

  8. What is the primary difference between the Free and Paid tiers of the Gemini Developer API regarding data usage?
    • A. Free tier has no rate limits; Paid tier has rate limits
    • B. Free tier content may be used to improve Google products; Paid tier content is not used
    • C. Free tier only supports text; Paid tier supports multimodal
    • D. Free tier uses older models; Paid tier uses newer models

    Answer: B – Free tier content may be used to improve Google products, while paid tier provides enterprise-grade data privacy.

  9. Which safety setting category in the Gemini API is used to filter content related to election misinformation?
    • A. HARM_CATEGORY_HARASSMENT
    • B. HARM_CATEGORY_HATE_SPEECH
    • C. HARM_CATEGORY_CIVIC_INTEGRITY
    • D. HARM_CATEGORY_DANGEROUS_CONTENT

    Answer: C – HARM_CATEGORY_CIVIC_INTEGRITY covers election and civic misinformation content.

  10. A developer is hitting rate limits (HTTP 429) on the free tier of Gemini API. What are valid options to increase throughput? (Select TWO)
    • A. Implement exponential backoff retry logic
    • B. Upgrade to paid tier by linking a billing account
    • C. Switch from Pro to Nano model
    • D. Disable safety settings
    • E. Use system instructions to request faster processing

    Answer: A, B – Exponential backoff handles transient limits; upgrading to paid tier increases RPM from 5-15 to 150-300+.

References

AWS S3 vs EFS vs FSx – Storage Service Comparison

AWS S3 vs EFS vs FSx – Storage Service Comparison

📅 Published June 2026: Covers S3 Files (NFS mount for S3), S3 Access Points for FSx, FSx Intelligent-Tiering storage class, EFS Archive storage class, EFS performance enhancements (60 GiB/s, 2.5M IOPS), and updated exam guidance for SAA-C03, SAP-C02, and DEA-C01.

Overview

AWS offers multiple storage services designed for different workloads. Amazon S3 provides object storage accessed via API, Amazon EFS delivers managed NFS file storage, and the Amazon FSx family offers four purpose-built file systems (Lustre, Windows File Server, NetApp ONTAP, and OpenZFS). Choosing the right storage service depends on access patterns, performance requirements, protocol needs, and cost considerations.

Amazon S3 – Object Storage

  • Amazon S3 is a highly durable object storage service with a simple key-value design for storing any amount of data.
  • Accessed via REST API (PUT, GET, DELETE) — not a file system by default.
  • Provides 11 9’s (99.999999999%) durability by replicating data across at least 3 AZs.
  • Offers unlimited storage — stores over 500 trillion objects across hundreds of exabytes (as of 2026).
  • Supports multiple storage classes: Standard, Intelligent-Tiering, Standard-IA, One Zone-IA, Express One Zone, Glacier Instant Retrieval, Glacier Flexible Retrieval, and Glacier Deep Archive.
  • S3 Express One Zone provides single-digit millisecond latency with 10x faster access than S3 Standard.
  • 🆕 S3 Files (April 2026) — enables mounting S3 buckets as NFS v4.2 file systems on EC2, Lambda, EKS, and ECS with ~1ms latency and POSIX semantics.
  • 🆕 S3 Access Points for FSx — allows accessing FSx for NetApp ONTAP and FSx for OpenZFS data via the S3 API without copying data.
  • Best for: data lakes, backups, static website hosting, archive, big data analytics, content distribution.

Amazon EFS – Elastic File System

  • Amazon EFS is a fully managed, elastic NFS file system for Linux-based workloads.
  • Accessed via NFS v4.0/v4.1 protocol — mount as a standard file system on EC2, ECS, EKS, Lambda, and on-premises servers.
  • Provides Regional (Multi-AZ) durability by replicating data across multiple AZs, or One Zone for lower-cost single-AZ storage.
  • Elastic scaling — grows and shrinks automatically with no provisioning required. Pay only for storage used.
  • Supports up to 60 GiB/s read throughput, 2.5 million read IOPS, and 500,000 write IOPS (2024 enhancements).
  • Three storage classes: EFS Standard ($0.30/GB-month, sub-ms latency), EFS Infrequent Access ($0.016/GB-month), and EFS Archive ($0.008/GB-month — up to 97% lower than Standard).
  • Supports Lifecycle Management to automatically tier data between Standard, IA, and Archive based on access patterns.
  • Supports up to 10,000 access points per file system for multi-tenant and containerized applications.
  • Best for: shared home directories, CMS, development environments, containerized applications, machine learning training data.

Amazon FSx Family

FSx for Lustre

  • Fully managed high-performance parallel file system based on open-source Lustre.
  • Delivers sub-millisecond latency, up to 1,000 GB/s throughput, and millions of IOPS.
  • Uses a custom POSIX-compliant protocol optimized for performance (Linux clients only).
  • Native S3 integration via Data Repository Associations (DRA) — automatically imports/exports data between FSx and S3.
  • Deployment types: Persistent (long-term storage with data replicated within AZ) and Scratch (temporary high-burst workloads, no replication).
  • Storage options: SSD, HDD, and Intelligent-Tiering (elastic, starting at $0.004/GB-month for Archive tier).
  • Best for: HPC, machine learning training, financial modeling, genomics, video rendering, EDA.

FSx for Windows File Server

  • Fully managed Windows-native file system built on Windows Server with full SMB protocol support.
  • Accessed via SMB 2.0/2.1/3.0/3.1.1 protocol — supports Windows, Linux, and macOS clients.
  • Integrates with Microsoft Active Directory for authentication and Windows ACLs for access control.
  • Supports Multi-AZ deployment for 99.99% availability with automatic failover.
  • Up to 20 GB/s throughput and hundreds of thousands of IOPS per file system.
  • Maximum file system size: 64 TiB.
  • Supports DFS Namespaces, DFS Replication, shadow copies, user quotas, and data deduplication.
  • Best for: Windows application workloads, home directories, .NET applications, SQL Server, SharePoint, IIS.

FSx for NetApp ONTAP

  • Fully managed NetApp ONTAP file system providing enterprise-grade multi-protocol storage.
  • Supports NFS, SMB, and iSCSI protocols simultaneously — the only FSx option with block storage (iSCSI).
  • Up to 72-80 GB/s throughput and millions of IOPS with virtually unlimited storage (tens of PBs).
  • Automatic storage tiering moves cold data to lower-cost capacity pool storage.
  • Supports Multi-AZ deployment (99.99% SLA), snapshots, cloning, SnapMirror replication, and FlexCache.
  • Integrates with Active Directory for SMB access and supports NFS ACLs.
  • 🆕 S3 Access Points (2025) — access file data stored in ONTAP volumes via the S3 API for AI/analytics workloads without data movement.
  • Best for: enterprise NAS migration, multi-protocol workloads, Oracle/SAP, DevOps (cloning), hybrid cloud.

FSx for OpenZFS

  • Fully managed OpenZFS file system optimized for low-latency workloads.
  • Accessed via NFS v3/v4.0/v4.1/v4.2 — supports Linux, Windows, and macOS clients.
  • Delivers sub-0.5ms latency, up to 21 GB/s throughput (cached) / 10 GB/s (disk), and up to 2 million IOPS.
  • Maximum file system size: 512 TiB.
  • Supports instant snapshots, data cloning (space-efficient copies), and point-in-time recovery.
  • Supports Multi-AZ deployment (99.99% SLA) and Single-AZ (99.5% SLA).
  • 🆕 S3 Access Points (June 2025) — access OpenZFS file data via S3 API.
  • Best for: development/test (fast cloning), Linux workloads migrating from on-premises ZFS, databases, CI/CD pipelines.

Access Patterns Comparison

Service Access Method Protocol Client OS
S3 REST API (HTTP/HTTPS), S3 Files NFS mount HTTPS, NFS v4.2 (S3 Files) Any (API); Linux (NFS mount)
EFS Mount as file system NFS v4.0, v4.1 Linux, macOS
FSx for Lustre Mount as file system Lustre (POSIX-compliant custom) Linux only
FSx for Windows Map network drive / mount SMB 2.0/2.1/3.0/3.1.1 Windows, Linux, macOS
FSx for NetApp ONTAP Mount / map drive / iSCSI block NFS v3/v4.x, SMB, iSCSI, S3 API Windows, Linux, macOS
FSx for OpenZFS Mount as file system NFS v3/v4.0/v4.1/v4.2, S3 API Windows, Linux, macOS

Performance Tiers Comparison

Service Latency Max Throughput Max IOPS Max Storage
S3 Single-digit ms (Standard); <10ms (Express One Zone) Virtually unlimited (per-prefix: 5,500 GET/s) 3,500 PUT/s per prefix Unlimited
EFS Sub-millisecond (Standard) 60 GiB/s (read) 2.5 million (read), 500K (write) Unlimited (elastic)
FSx for Lustre Sub-millisecond 1,000 GB/s Millions Multiple PBs
FSx for Windows <1 ms 12-20 GB/s Hundreds of thousands 64 TiB
FSx for NetApp ONTAP <1 ms 72-80 GB/s Millions Virtually unlimited (tens of PBs)
FSx for OpenZFS <0.5 ms 10-21 GB/s 1-2 million 512 TiB

Pricing Model Comparison

Service Pricing Model Storage Cost (US East) Additional Charges
S3 Standard Pay per GB stored + requests + data transfer $0.023/GB-month GET: $0.0004/1K req; PUT: $0.005/1K req; data out
S3 Glacier Deep Archive Pay per GB stored + requests + retrieval $0.00099/GB-month Retrieval fees; 12-48 hour restore time
EFS Standard Pay per GB stored (elastic); optional throughput provisioning $0.30/GB-month IA reads: $0.03/GB; Provisioned throughput extra
EFS Archive Pay per GB stored + access fees $0.008/GB-month Access charges for reads/writes
FSx for Lustre (SSD) Pay per GB provisioned $0.140/GB-month (scratch); $0.145/GB-month (persistent) Metadata IOPS; backups: $0.05/GB-month
FSx for Lustre (Intelligent-Tiering) Pay per GB consumed + throughput + requests $0.023 (Frequent); $0.0125 (IA); $0.004 (Archive)/GB-month Throughput: $0.52/MBps-month; read/write request fees
FSx for Windows Pay per GB provisioned + throughput SSD: $0.13/GB-month; HDD: $0.013/GB-month Throughput capacity; backups; data dedup savings
FSx for NetApp ONTAP Pay per GB provisioned (SSD) + capacity pool + throughput SSD: $0.125/GB-month; Capacity pool: $0.0125/GB-month Throughput capacity; IOPS; backup storage
FSx for OpenZFS Pay per GB provisioned + throughput + IOPS SSD: $0.09/GB-month Throughput; additional IOPS; backup storage

Key Pricing Insights:

  • S3 is the cheapest for raw storage ($0.023/GB) but charges per API request — ideal for infrequent access or bulk data.
  • EFS is the most expensive per GB ($0.30) but offers elastic scaling with no provisioning — use Lifecycle policies to move data to IA/Archive for savings.
  • FSx for Lustre Intelligent-Tiering starts at $0.004/GB-month for archive data — lowest-cost managed Lustre option.
  • FSx for NetApp ONTAP automatic tiering to capacity pool ($0.0125/GB) provides significant savings for mixed workloads.
  • FSx for Windows HDD at $0.013/GB-month is cost-effective for large Windows file shares with infrequent access.

Durability and Availability

Service Durability Availability SLA Deployment Options
S3 Standard 99.999999999% (11 9’s) 99.99% Multi-AZ (automatic); One Zone-IA for single AZ
EFS Regional 99.999999999% (11 9’s) 99.99% Regional (Multi-AZ) or One Zone
FSx for Lustre Persistent: data replicated within AZ; Scratch: no replication 99.5% (Single-AZ) Single-AZ only (Persistent or Scratch)
FSx for Windows Data replicated within AZ (Single-AZ) or across AZs (Multi-AZ) Multi-AZ: 99.99%; Single-AZ: 99.5% Single-AZ or Multi-AZ
FSx for NetApp ONTAP Data replicated within/across AZs; SnapMirror for cross-region Multi-AZ: 99.99%; Single-AZ: 99.9% Single-AZ or Multi-AZ
FSx for OpenZFS Data replicated within/across AZs based on deployment Multi-AZ: 99.99%; Single-AZ: 99.5% Single-AZ or Multi-AZ

Encryption

Service Encryption at Rest Encryption in Transit Key Management
S3 Yes — SSE-S3 (default), SSE-KMS, SSE-C, DSSE-KMS Yes — TLS/HTTPS (enforced by default on new buckets) AWS KMS, S3-managed keys, customer-provided keys
EFS Yes — AWS KMS (enabled at creation) Yes — TLS 1.2 via EFS mount helper AWS KMS (aws/elasticfilesystem or custom CMK)
FSx for Lustre Yes — AWS KMS Yes — encryption in transit supported AWS KMS (AWS managed or customer managed)
FSx for Windows Yes — AWS KMS Yes — SMB Kerberos encryption AWS KMS (AWS managed or customer managed)
FSx for NetApp ONTAP Yes — AWS KMS Yes — SMB encryption, NFS Kerberos (krb5p) AWS KMS (AWS managed or customer managed)
FSx for OpenZFS Yes — AWS KMS Yes — encryption in transit supported AWS KMS (AWS managed or customer managed)

Access Control

Service Access Control Mechanisms
S3 IAM policies, bucket policies, S3 Access Points, VPC endpoints, Block Public Access, Object Ownership, ACLs (legacy)
EFS IAM resource policies, VPC security groups, POSIX user/group permissions, Access Points with root directory and UID/GID enforcement
FSx for Lustre VPC security groups, POSIX permissions, IAM for API operations
FSx for Windows Active Directory authentication, Windows NTFS ACLs, IAM for API operations, file access auditing
FSx for NetApp ONTAP Active Directory (SMB), NFS export policies, NTFS/UNIX ACLs, IAM for API operations, anti-virus integration, file access auditing
FSx for OpenZFS VPC security groups, NFS export policies, POSIX permissions, IAM for API operations

Use Cases – When to Use Which

Use Case Recommended Service Why
Data lake / analytics S3 Unlimited scale, lowest cost, native integration with Athena/EMR/Glue/Redshift Spectrum
Shared Linux home directories EFS NFS mount, elastic scaling, POSIX permissions, multi-AZ access
HPC / ML training FSx for Lustre Highest throughput (1,000 GB/s), millions of IOPS, S3 integration for data staging
Windows workloads / .NET apps FSx for Windows Native SMB, Active Directory, NTFS, DFS, Windows-native features
Enterprise NAS migration (multi-protocol) FSx for NetApp ONTAP NFS + SMB + iSCSI simultaneously, auto-tiering, SnapMirror, FlexCache
Dev/test with instant cloning FSx for OpenZFS Instant snapshots, space-efficient clones, lowest latency (<0.5ms)
Backup and archive S3 Glacier Cheapest storage ($0.00099/GB), 11 9’s durability, compliance retention
Container shared storage (EKS/ECS) EFS or FSx for Lustre EFS for general shared; FSx for Lustre for high-throughput ML training
Oracle/SAP databases FSx for NetApp ONTAP iSCSI block storage, snapshots, cloning for test/dev, enterprise features
Static website hosting / CDN origin S3 Native static hosting, CloudFront integration, low cost
Financial modeling / EDA FSx for Lustre Ultra-low latency, parallel I/O, compute-intensive workloads
AI/ML analytics on existing file data FSx for NetApp ONTAP / OpenZFS (with S3 Access Points) Access file data via S3 API for Bedrock, SageMaker, Athena without data movement

EFS vs EBS Multi-Attach

Both EFS and EBS Multi-Attach provide shared storage across multiple EC2 instances, but they are fundamentally different:

Feature Amazon EFS EBS Multi-Attach
Storage Type File storage (NFS) Block storage (raw disk)
Protocol NFS v4.0/v4.1 Block-level (no file system protocol — application must manage)
Max Instances Thousands (concurrent) Up to 16 Nitro-based instances
AZ Scope Multi-AZ (Regional) or One Zone Single AZ only
Volume Types N/A (elastic) io1 and io2 Provisioned IOPS only
Storage Capacity Virtually unlimited (auto-scales to petabytes) Up to 64 TiB per volume (io2 Block Express)
File System Managed (NFS — no setup needed) Requires cluster-aware file system (GFS2, OCFS2) or application-level I/O fencing
Concurrent Read/Write Full POSIX — safe concurrent reads and writes Requires I/O fencing — application must coordinate writes (NVMe Reservations on io2)
Performance Up to 60 GiB/s read, 2.5M IOPS Up to 256K IOPS, sub-ms latency (io2 Block Express)
Typical Use Cases Shared file storage, CMS, home dirs, containers Clustered databases, Oracle RAC, failover clusters requiring lowest latency

Exam Tip: Choose EFS when you need a simple shared file system across multiple instances and AZs. Choose EBS Multi-Attach only for specialized clustered applications (Oracle RAC, clustered databases) that require block-level access with application-managed I/O coordination within a single AZ.

FSx for Lustre with S3 Integration

FSx for Lustre provides native, bi-directional integration with Amazon S3 through Data Repository Associations (DRA):

  • Automatic Import: When a client accesses a file, FSx for Lustre automatically imports the file metadata (and optionally data) from the linked S3 bucket. New/changed objects in S3 are automatically reflected in the file system.
  • Automatic Export: New or modified files in the FSx file system are automatically exported back to the linked S3 bucket, keeping S3 as the durable long-term store.
  • Lazy Loading: Only metadata is loaded initially; actual file data is fetched from S3 on first access (lazy load) or can be preloaded using hsm_restore.
  • Data Tiering: With Intelligent-Tiering storage class, infrequently accessed data is automatically tiered to S3 (IA and Archive tiers) while maintaining file system visibility.
  • Multiple DRAs: A single file system can be linked to multiple S3 buckets/prefixes, enabling consolidated access to distributed datasets.
  • Cross-Account Access: FSx for Lustre can be linked to S3 buckets in different AWS accounts for shared data access.

Architecture Pattern — HPC/ML with S3 Data Lake:

  • Store raw data durably in S3 (lowest cost, 11 9’s durability).
  • Create a FSx for Lustre file system linked to the S3 bucket via DRA.
  • Compute instances mount FSx for Lustre and process data at hundreds of GB/s throughput.
  • Results are automatically exported back to S3.
  • Delete the FSx file system after processing to save costs (scratch deployments).

Exam Tip: When a question mentions HPC or ML workloads that need to process data stored in S3 with high throughput and POSIX file system access, FSx for Lustre with DRA is the answer. FSx for Lustre acts as a high-speed processing layer on top of S3.

Detailed Comparison Table – All Storage Types

Feature S3 EFS FSx for Lustre FSx for Windows FSx for NetApp ONTAP FSx for OpenZFS
Storage Type Object File (NFS) File (Lustre) File (SMB) File + Block (NFS/SMB/iSCSI) File (NFS)
Protocol HTTPS/REST, NFS v4.2 (S3 Files) NFS v4.0/v4.1 Lustre (POSIX) SMB 2.x/3.x NFS, SMB, iSCSI, S3 API NFS v3/v4.x, S3 API
Client OS Any (API); Linux (NFS) Linux, macOS Linux only Windows, Linux, macOS Windows, Linux, macOS Windows, Linux, macOS
Max Throughput Unlimited (parallel) 60 GiB/s 1,000 GB/s 12-20 GB/s 72-80 GB/s 10-21 GB/s
Latency Single-digit ms Sub-ms Sub-ms <1 ms <1 ms <0.5 ms
Max Storage Unlimited Unlimited (elastic) Multiple PBs 64 TiB Tens of PBs 512 TiB
Scaling Automatic (unlimited) Automatic (elastic) Provisioned (IT: elastic) Provisioned; can increase Provisioned + auto-tier Provisioned; can increase
Multi-AZ Yes (default) Yes (Regional) No (Single-AZ only) Yes (option) Yes (option) Yes (option)
Durability 11 9’s 11 9’s (Regional) Replicated within AZ (Persistent) Replicated within/across AZ Replicated within/across AZ Replicated within/across AZ
Availability SLA 99.99% 99.99% 99.5% 99.99% (Multi-AZ) 99.99% (Multi-AZ) 99.99% (Multi-AZ)
S3 Integration Native No (use DataSync) Native DRA (auto import/export) No S3 Access Points (2025) S3 Access Points (2025)
Active Directory No No No Yes (required) Yes (for SMB) No
Snapshots/Cloning Versioning (object-level) No native snapshots Backups only Shadow copies, backups Instant snapshots + cloning Instant snapshots + cloning
Data Deduplication No No No Yes Yes No
Data Compression No (client-side) No Yes (LZ4) Yes Yes Yes (LZ4, ZSTD)
Pricing Model GB stored + requests GB stored (elastic) GB provisioned or consumed GB provisioned + throughput GB provisioned + capacity pool + throughput GB provisioned + throughput + IOPS
Lowest Storage Cost $0.00099/GB (Deep Archive) $0.008/GB (Archive) $0.004/GB (IT Archive) $0.013/GB (HDD) $0.0125/GB (capacity pool) $0.09/GB (SSD)
On-Premises Access Yes (Internet/Direct Connect) Yes (VPN/Direct Connect) Yes (VPN/Direct Connect) Yes (VPN/Direct Connect) Yes (VPN/DX + FlexCache/Global File Cache) Yes (VPN/Direct Connect)

AWS Certification Exam Relevance

  • SAA-C03 (Solutions Architect Associate): Frequently tests storage service selection based on access patterns, performance needs, cost optimization, and durability requirements. Know when to choose S3 vs EFS vs FSx.
  • SAP-C02 (Solutions Architect Professional): Tests complex architectures combining S3 data lakes with FSx for Lustre processing, hybrid storage with ONTAP, and multi-protocol requirements.
  • DEA-C01 (Data Engineer Associate): Tests FSx for Lustre with S3 integration for ETL pipelines, data lake architectures, and high-performance analytics.
  • Key Exam Themes:
    • S3 = object storage, unlimited, cheapest, API access, data lake.
    • EFS = shared NFS file system, elastic, Linux, multi-AZ, POSIX.
    • FSx for Lustre = HPC, ML, highest throughput, S3 integration, Linux only.
    • FSx for Windows = Windows workloads, SMB, Active Directory, NTFS ACLs.
    • FSx for NetApp ONTAP = multi-protocol (NFS+SMB+iSCSI), enterprise NAS, auto-tiering.
    • FSx for OpenZFS = lowest latency, instant clones, dev/test, Linux NFS migration.
    • EBS Multi-Attach ≠ EFS — block vs file, single-AZ vs multi-AZ, requires cluster-aware FS.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A genomics research company stores 500 TB of sequencing data in Amazon S3. Their HPC cluster needs to process this data with sub-millisecond latency and hundreds of GB/s throughput using a POSIX-compliant file system. The processed results must be stored back in S3 for long-term retention. Which solution meets these requirements with MINIMUM operational overhead?
    1. Copy data from S3 to Amazon EFS, process on EC2, copy results back to S3
    2. Create an FSx for Lustre file system with a Data Repository Association linked to the S3 bucket
    3. Mount S3 using S3 Files on the HPC cluster instances
    4. Use AWS DataSync to copy data from S3 to FSx for OpenZFS

    Answer: b. FSx for Lustre provides native S3 integration via Data Repository Associations, delivering sub-ms latency and up to 1,000 GB/s throughput. Data is automatically imported from and exported back to S3 without manual copy operations.

  2. A company is migrating an on-premises Windows file server to AWS. The application requires SMB protocol access, Active Directory authentication, Windows ACLs, DFS Namespaces, and shadow copies for end-user file restore. Which AWS storage service should they use?
    1. Amazon EFS with Windows clients
    2. Amazon S3 with AWS Transfer Family
    3. Amazon FSx for Windows File Server
    4. Amazon FSx for NetApp ONTAP with SMB

    Answer: c. FSx for Windows File Server is purpose-built for Windows workloads with native SMB support, Active Directory integration, NTFS ACLs, DFS Namespaces, DFS Replication, and shadow copies. While FSx for NetApp ONTAP also supports SMB, it doesn’t provide native DFS Namespaces or Windows shadow copies.

  3. A media company needs shared file storage accessible from both Linux-based video rendering instances (using NFS) and Windows-based editing workstations (using SMB) simultaneously. They also need automatic tiering of cold data to lower-cost storage. Which solution provides this capability with a SINGLE file system?
    1. Amazon EFS with SMB gateway
    2. Amazon FSx for Windows File Server with NFS
    3. Amazon S3 with Transfer Family for NFS and SMB
    4. Amazon FSx for NetApp ONTAP

    Answer: d. FSx for NetApp ONTAP is the only AWS file service that supports NFS, SMB, and iSCSI simultaneously on the same file system, with automatic storage tiering that moves cold data to a lower-cost capacity pool.

  4. A development team wants to create multiple test environments by cloning a 10 TB production database file system. They need the clones to be created instantly without consuming additional storage for unchanged data. The file system uses NFS protocol on Linux. Which AWS service provides this capability MOST cost-effectively?
    1. Amazon EFS with AWS Backup restore
    2. Amazon FSx for Lustre with multiple file systems
    3. Amazon FSx for OpenZFS with instant cloning
    4. Amazon EBS snapshots with volume restore

    Answer: c. FSx for OpenZFS supports instant, space-efficient cloning using copy-on-write. Clones are created instantly regardless of data size and consume no additional storage for unchanged data. This is ideal for development/test environments.

  5. A solutions architect needs to choose a storage solution for a containerized application running on Amazon EKS. The application requires shared persistent storage across multiple pods in different Availability Zones, POSIX file permissions, and automatic scaling without capacity planning. Which storage service is MOST appropriate?
    1. Amazon EBS with CSI driver
    2. Amazon S3 with CSI driver
    3. Amazon EFS with EFS CSI driver
    4. Amazon FSx for Lustre with CSI driver

    Answer: c. Amazon EFS provides multi-AZ shared file storage with POSIX permissions and elastic scaling (no capacity planning needed). The EFS CSI driver enables EKS pods to mount EFS as persistent volumes. EBS is single-AZ and cannot be shared across AZs; S3 is not a file system; FSx for Lustre requires capacity provisioning.

  6. A company has a data pipeline where raw data lands in S3, needs high-performance file processing, and the results are used by AI services through the S3 API. They currently copy data between S3 and their file system, creating data duplication. Which AWS feature eliminates this data movement while allowing both file and S3 API access to the SAME data? (Select TWO)
    1. S3 Object Lambda
    2. FSx for Lustre Data Repository Association
    3. AWS DataSync with S3 as destination
    4. S3 Access Points for FSx for NetApp ONTAP
    5. S3 Transfer Acceleration

    Answer: b, d. FSx for Lustre DRA provides automatic bi-directional sync between the file system and S3. S3 Access Points for FSx for NetApp ONTAP allow accessing file data stored in ONTAP volumes directly via the S3 API without copying data, enabling AI/analytics services to work with file data natively.

  7. An organization needs to store compliance data for 7 years with the lowest possible storage cost. The data is written once and rarely accessed (less than once per year). Retrieval time of 12 hours is acceptable. Which storage option provides the LOWEST cost?
    1. Amazon EFS Archive storage class
    2. Amazon S3 Glacier Flexible Retrieval
    3. Amazon S3 Glacier Deep Archive
    4. Amazon FSx for Lustre Intelligent-Tiering Archive tier

    Answer: c. S3 Glacier Deep Archive is the lowest-cost storage in AWS at $0.00099/GB-month (~$1/TB/month) with 12-48 hour retrieval time, specifically designed for data retained 7+ years with rare access. EFS Archive ($0.008/GB) is 8x more expensive.

Related Posts

References

AWS CloudFormation vs CDK vs Terraform – IaC Comparison

AWS CloudFormation vs CDK vs Terraform – IaC Comparison

Infrastructure as Code (IaC) enables teams to define, provision, and manage cloud resources through machine-readable configuration files rather than manual console operations. The three dominant IaC tools for AWS are AWS CloudFormation, AWS CDK (Cloud Development Kit), and HashiCorp Terraform. Each takes a fundamentally different approach to solving the same problem — this post provides a comprehensive comparison to help you choose the right tool.

AWS CloudFormation Overview

  • AWS CloudFormation is AWS’s native IaC service that provisions and manages AWS resources using declarative JSON or YAML templates.
  • CloudFormation templates define the desired state of infrastructure — AWS handles the creation, update, and deletion of resources in the correct order.
  • Resources are organized into Stacks, which are the unit of deployment, update, and deletion.
  • CloudFormation supports 1,100+ AWS resource types and is tightly integrated with all AWS services.
  • Key features include Change Sets (preview changes before applying), Drift Detection, StackSets (multi-account/multi-region deployments), and Nested Stacks.
  • IaC Generator (launched Feb 2024) can scan existing AWS resources and generate CloudFormation templates or CDK apps automatically, eliminating weeks of manual effort for brownfield adoption.
  • CloudFormation Language Extensions add transforms like AWS::LanguageExtensions for looping, conditionals, and JSON/YAML manipulation — reducing template verbosity.
  • CloudFormation is free to use — you only pay for the AWS resources provisioned.

AWS CDK Overview

  • AWS Cloud Development Kit (CDK) is an open-source framework that lets you define AWS infrastructure using familiar programming languages — TypeScript, JavaScript, Python, Java, C#/.NET, and Go.
  • CDK synthesizes your code into CloudFormation templates, then deploys them via CloudFormation — it is an abstraction layer on top of CloudFormation, not a replacement.
  • CDK uses a layered construct model:
    • L1 (Layer 1) Constructs — Direct 1:1 mapping to CloudFormation resources (prefixed with Cfn). Verbose but complete.
    • L2 (Layer 2) Constructs — Curated, high-level abstractions with sensible defaults, helper methods, and security best practices built in. Reduce boilerplate significantly.
    • L3 (Layer 3) Constructs / Patterns — Opinionated combinations of multiple resources solving specific use cases (e.g., ApplicationLoadBalancedFargateService).
  • CDK v2 consolidates all stable constructs into a single package (aws-cdk-lib), simplifying dependency management.
  • CDK Mixins (2026) — Composable abstractions that decouple capabilities from monolithic L2 constructs, allowing you to compose exactly the features you need.
  • CDK Refactor (Preview, Sep 2025) — Enables safe infrastructure refactoring (rename constructs, move resources between stacks) while preserving deployed resources.
  • CDK Toolkit Library — Programmatic access to CDK operations (synth, deploy, destroy) for advanced CI/CD integration.
  • CDK supports unit testing with the assertions module and compliance checking with cdk-nag.

Terraform Overview

  • Terraform by HashiCorp (now IBM) is a multi-cloud IaC tool that uses HashiCorp Configuration Language (HCL) — a declarative language designed specifically for infrastructure.
  • Terraform works with providers — plugins that interact with cloud platforms, SaaS tools, and other APIs. The AWS provider covers all AWS services; 4,000+ providers exist in the Terraform Registry.
  • Terraform uses a plan → apply workflow: terraform plan shows what will change, terraform apply executes the changes.
  • Terraform maintains its own state file that maps configuration to real-world resources, enabling dependency tracking and change detection.
  • Terraform is the only tool of the three that natively supports multi-cloud deployments (AWS, Azure, GCP, etc.) in a single configuration.
  • Terraform Stacks (GA late 2025) — Deploy multiple Terraform configurations as a single orchestrated unit with native dependency resolution and progressive rollouts.
  • Licensing: In August 2023, HashiCorp changed Terraform’s license from MPL 2.0 to Business Source License (BSL 1.1). IBM acquired HashiCorp in 2024 for $6.4B. This led to OpenTofu, an MPL 2.0-licensed fork maintained by the Linux Foundation (CNCF Sandbox project as of April 2025).
  • Terraform Test Framework (native since v1.6) enables unit and integration testing for modules using HCL-based test files.

CDK for Terraform (CDKTF) – Deprecated

  • CDKTF (CDK for Terraform) allowed writing Terraform configurations using programming languages (TypeScript, Python, Java, C#, Go) via the CDK construct model, combining CDK’s developer experience with Terraform’s multi-cloud providers.
  • ⚠️ CDKTF was deprecated on December 10, 2025. HashiCorp/IBM no longer supports or maintains it. The GitHub repository has been archived.
  • Migration options for CDKTF users:
    • Terraform HCL — Convert back to native HCL for continued Terraform provider access.
    • AWS CDK — If AWS-only, CDK provides a superior programming language experience with active development.
    • Pulumi — Multi-cloud IaC with programming language support and Terraform provider compatibility. Offers migration tooling from CDKTF.
    • OpenTofu — Open-source Terraform fork (no CDKTF equivalent, HCL only).
  • The deprecation of CDKTF eliminated the “best of both worlds” option and forces teams to choose between language flexibility (CDK/Pulumi) and multi-cloud (Terraform HCL).

State Management

  • CloudFormation — State is managed entirely by AWS within the CloudFormation service. Each Stack maintains its own state internally. You never manage state files directly — AWS handles consistency, locking, and rollback. State cannot be corrupted or lost by users.
  • AWS CDK — Since CDK deploys via CloudFormation, state management is identical to CloudFormation. CDK also maintains a cdk.context.json file for caching environment lookups. No separate state file to manage.
  • Terraform — Maintains an explicit state file (terraform.tfstate) that maps configuration to real infrastructure. State must be stored securely and shared across teams:
    • Local state — Default; stored on the operator’s filesystem. Not suitable for teams.
    • Remote backends — S3 + DynamoDB (locking), Terraform Cloud/Enterprise, GCS, Azure Blob, Consul, etc.
    • State file contains sensitive data (resource IDs, outputs) and must be encrypted at rest.
    • State locking prevents concurrent modifications that could corrupt state.
    • State can be split across workspaces or state files for blast-radius management.

Language Support

Tool Language(s) Notes
CloudFormation JSON, YAML Declarative markup; no loops/conditionals natively (Language Extensions add some). Verbose for complex infrastructure.
AWS CDK TypeScript, JavaScript, Python, Java, C#/.NET, Go Full programming language power — loops, conditionals, inheritance, composition, IDE support, type safety.
Terraform HCL (HashiCorp Configuration Language) Domain-specific declarative language. Supports for_each, count, dynamic blocks, functions. Easier to learn than general-purpose languages but less flexible.

Multi-Cloud Support

  • CloudFormation — AWS-only. No support for other cloud providers.
  • AWS CDK — Primarily AWS-only. CDK synthesizes to CloudFormation, which only manages AWS resources. Third-party constructs exist for limited non-AWS use cases (e.g., Custom Resources calling external APIs).
  • TerraformTrue multi-cloud support. A single Terraform configuration can manage resources across AWS, Azure, GCP, Kubernetes, Datadog, GitHub, Cloudflare, and 4,000+ providers simultaneously. This is Terraform’s primary differentiator.
    • Organizations with multi-cloud strategies benefit from a unified workflow, language, and state management approach.
    • The same HCL language, plan/apply workflow, and module system work identically across all providers.
  • OpenTofu — Same multi-cloud capabilities as Terraform (same provider ecosystem) under an open-source license.

Drift Detection

  • CloudFormation — Built-in Drift Detection feature. Detects when resource properties differ from the stack template. Can be run on individual resources or entire stacks. Shows IN_SYNC, DRIFTED, or NOT_CHECKED status. Limitation: not all resource properties are checked; only properties defined in the template are compared.
  • AWS CDK — Inherits CloudFormation’s drift detection capabilities since CDK deploys via CloudFormation stacks. Use aws cloudformation detect-stack-drift on synthesized stacks.
  • Terraform — Drift detection occurs automatically on every terraform plan. Terraform refreshes the real-world state and compares it to the configuration, showing exactly what has drifted and what changes would be made to remediate. More comprehensive than CloudFormation because:
    • Detects all property changes, not just template-defined ones.
    • Shows the remediation plan immediately.
    • Can be automated in CI/CD with scheduled plans.
    • Terraform Cloud/Enterprise supports automated drift detection with notifications.

Modules, Constructs, and Nested Stacks

  • CloudFormation — Nested Stacks
    • Use AWS::CloudFormation::Stack to embed child stacks within parent stacks.
    • Templates must be stored in S3 and referenced by URL.
    • Useful for reusing common patterns (VPC, security groups) across stacks.
    • Outputs from nested stacks can be referenced by parent stacks.
    • Limitation: Nested stacks create tight coupling; updates must be initiated from the root stack.
    • Stack Exports/Imports — Cross-stack references using Export and Fn::ImportValue for loose coupling.
  • AWS CDK — Constructs
    • The fundamental building block. Every CDK element is a construct, organized in a tree structure.
    • Constructs are composable — you create higher-level constructs from lower-level ones.
    • Share constructs via npm, PyPI, Maven, or NuGet packages.
    • Construct Hub (construct.dev) — Community registry with 1,500+ published constructs.
    • Constructs leverage full programming language capabilities — generics, inheritance, interfaces.
    • CDK Mixins (2026) add composable behavior without monolithic construct hierarchies.
  • Terraform — Modules
    • Modules are reusable packages of Terraform configuration (a directory with .tf files).
    • Terraform Registry — 15,000+ published modules for common patterns.
    • Modules support versioning, input variables, and output values.
    • Can be sourced from local paths, Git repos, S3, GCS, or the Registry.
    • Module composition is declarative — modules call other modules.
    • Terraform Stacks (GA 2025) orchestrate multiple root modules as a coordinated deployment unit.

Importing Existing Resources

  • CloudFormation
    • Resource Import — Import existing resources into a stack using --import-existing-resources flag or the console. Must define the resource in the template first, then import.
    • IaC Generator (2024) — Scans your AWS account, discovers existing resources, identifies relationships, and generates complete CloudFormation templates or CDK apps. Supports targeted resource type scanning. Dramatically simplifies brownfield adoption.
    • Supports importing entire applications with resource relationships preserved.
  • AWS CDK
    • CDK Migrate — Works with IaC Generator to create CDK apps from existing resources.
    • Import existing resources by adding them to the stack with the importValue mechanism or by using L1 constructs with CloudFormation import.
    • Can generate CDK code from existing CloudFormation templates using cdk migrate.
  • Terraform
    • terraform import — Imports existing resources into Terraform state. Requires manually writing the configuration first (or using import blocks since v1.5).
    • Import Blocks (v1.5+) — Declarative import in configuration files. Can generate configuration with terraform plan -generate-config-out.
    • Third-party tools like Terraformer can bulk-import resources and generate HCL.
    • Import is per-resource — no automatic relationship detection like CloudFormation’s IaC Generator.

Testing Capabilities

  • CloudFormation
    • cfn-lint — Linting tool that validates templates against resource specifications.
    • TaskCat — Automated testing tool that deploys templates across regions and validates.
    • cfn-guard — Policy-as-code tool for validating templates against organizational rules.
    • No native unit testing framework — relies on third-party tools or deploying to test accounts.
  • AWS CDK
    • Assertions module (aws-cdk-lib/assertions) — Fine-grained unit testing against synthesized CloudFormation templates. Test specific resource properties without deploying.
    • Snapshot testing — Compare entire synthesized templates against stored snapshots to detect unintended changes.
    • cdk-nag — Compliance and security rule packs (AWS Solutions, HIPAA, NIST 800-53) validated at synthesis time using CDK Aspects.
    • Uses standard testing frameworks (Jest, pytest, JUnit) — familiar to developers.
    • Integration testinginteg-runner and integ-tests constructs for deploying and validating real infrastructure.
  • Terraform
    • Terraform Test Framework (native since v1.6) — Write test files (.tftest.hcl) for unit and integration testing of modules. Can create real infrastructure for validation.
    • terraform validate — Syntax and configuration validation without accessing providers.
    • terraform plan — Preview changes; can be used as a validation step in CI.
    • Terratest (Go library) — Deploy real infrastructure, validate it, then destroy. Popular for integration testing.
    • OPA/Sentinel — Policy-as-code for governance (Sentinel is Terraform Enterprise only; OPA works with any plan output).
    • tflint — Pluggable linting framework for Terraform.

CI/CD Integration

  • CloudFormation
    • Native integration with AWS CodePipeline — deploy/update stacks as pipeline actions.
    • Supports StackSets for multi-account/multi-region deployments from a single template.
    • Change Sets can be created in a pipeline stage for review before execution.
    • Integrates with AWS Service Catalog for standardized deployments.
    • Supports Git-based workflows via CodeCommit, CodeBuild, or any CI tool that can call AWS APIs.
  • AWS CDK
    • CDK Pipelines — High-level construct that defines a self-mutating CI/CD pipeline (pipeline updates itself when you change the pipeline code).
    • CDK Toolkit Library enables programmatic deployment from any CI system.
    • Synthesized CloudFormation templates can be stored as artifacts and deployed via CodePipeline.
    • Standard build tools (npm, pip, Maven) handle compilation and synthesis in CI.
    • cdk diff in pull requests shows planned infrastructure changes.
  • Terraform
    • Terraform Cloud/Enterprise — Built-in CI/CD with VCS integration, plan-on-PR, cost estimation, Sentinel policies, and run triggers.
    • Integrates with any CI system (GitHub Actions, GitLab CI, Jenkins, CircleCI) via CLI.
    • terraform plan output can be posted as PR comments for review.
    • Atlantis — Popular open-source tool for Terraform PR automation.
    • Spacelift, env0, Scalr — Third-party platforms for Terraform CI/CD and governance.
    • Remote state and locking enable safe concurrent CI/CD executions.

When to Use Which – Decision Matrix

Scenario Recommended Tool Why
AWS-only, small team, simple infrastructure CloudFormation No additional tools needed; fully managed state; free; tight AWS integration.
AWS-only, development team, complex logic AWS CDK Programming language power; reusable constructs; unit testing; type safety; rapid iteration.
Multi-cloud or hybrid environment Terraform Only option for unified multi-cloud IaC; single language across providers.
Platform engineering team serving multiple app teams Terraform Modules, workspaces, and Stacks enable standardized patterns; HCL is accessible to ops teams.
Brownfield — importing many existing AWS resources CloudFormation/CDK IaC Generator automatically discovers resources and relationships; far easier than Terraform import.
Strict compliance/governance requirements Terraform Enterprise or CDK + cdk-nag Sentinel policies (Terraform) or cdk-nag rule packs (CDK) enforce compliance pre-deployment.
Open-source licensing requirement OpenTofu or AWS CDK Terraform is BSL-licensed; OpenTofu (MPL 2.0) is the open-source alternative. CDK is Apache 2.0.
Serverless-first (SAM integration needed) CloudFormation/CDK AWS SAM is a CloudFormation extension; CDK supports SAM-like patterns natively with L3 constructs.
Kubernetes + cloud infrastructure together Terraform Terraform has native Kubernetes, Helm, and cloud providers in a single workflow.

Detailed Comparison Table

Feature CloudFormation AWS CDK Terraform
Type Declarative (JSON/YAML) Imperative → Declarative (generates CFN) Declarative (HCL)
Cloud Support AWS only AWS only Multi-cloud (4,000+ providers)
State Management Managed by AWS (serverless) Managed by AWS (via CFN) Self-managed state file (remote backends)
Language JSON, YAML TypeScript, Python, Java, C#, Go, JS HCL
Learning Curve Low (templates) but verbose Medium (programming + AWS concepts) Low-Medium (HCL is purpose-built)
Drift Detection Built-in (partial properties) Via CloudFormation Automatic on every plan (all properties)
Rollback Automatic on failure Automatic (via CFN) No automatic rollback; manual remediation
Modularity Nested Stacks, Stack Exports Constructs (L1/L2/L3), Mixins Modules (Registry + custom)
Import Resources Resource Import + IaC Generator CDK Migrate + IaC Generator terraform import, import blocks, Terraformer
Testing cfn-lint, cfn-guard, TaskCat assertions module, cdk-nag, Jest/pytest terraform test, Terratest, tflint, OPA
CI/CD CodePipeline, StackSets, Change Sets CDK Pipelines, Toolkit Library Terraform Cloud, Atlantis, any CI
Cost Free (pay for resources only) Free (pay for resources only) CLI free; Cloud/Enterprise paid
License Proprietary (AWS service) Apache 2.0 BSL 1.1 (OpenTofu: MPL 2.0)
New AWS Service Support Day-1 (same team) Day-1 L1; L2 follows weeks later Days-to-weeks after launch
Resource Limit 500 resources per stack 500 per stack (same CFN limit) No hard limit (state file size practical limit)
Multi-Account StackSets CDK Pipelines cross-account Provider aliases, workspaces, Stacks

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • Open to further feedback, discussion and correction.
  1. A company has AWS infrastructure deployed across multiple accounts and regions using manual console operations. They want to bring all existing resources under IaC management with minimal effort. Which approach provides the fastest path to IaC adoption?
    1. Use Terraform import blocks to import each resource individually and write HCL configuration.
    2. Use AWS CloudFormation IaC Generator to scan the accounts, discover resources and relationships, and generate CloudFormation templates automatically.
    3. Write AWS CDK code manually for each resource after documenting the current state.
    4. Use AWS Config to export resource configurations and convert them to Terraform modules.
  2. A development team wants to define AWS infrastructure using TypeScript with full IDE support, unit testing, and the ability to use loops and conditionals for dynamic resource creation. They are AWS-only. Which tool best meets their requirements?
    1. AWS CloudFormation with Language Extensions.
    2. Terraform with HCL dynamic blocks.
    3. AWS CDK with TypeScript, using L2 constructs and the assertions module for testing.
    4. AWS SAM with YAML templates.
  3. An organization operates across AWS, Azure, and GCP. They need a single IaC tool that can manage resources across all three cloud providers with a unified workflow and state management. Which tool should they use?
    1. AWS CloudFormation with Custom Resources for Azure and GCP.
    2. AWS CDK with third-party constructs.
    3. Separate tools for each cloud provider.
    4. Terraform (or OpenTofu) with provider configurations for AWS, Azure, and GCP.
  4. A team using AWS CDK wants to reorganize their infrastructure code by moving resources between stacks and renaming constructs, without replacing the deployed resources. Which CDK feature enables this safe refactoring?
    1. CDK Pipelines with approval stages.
    2. CloudFormation Change Sets with manual review.
    3. CDK Refactor command, which computes mappings between current code and deployed state to preserve resources.
    4. Export/Import values between stacks with deletion protection.
  5. A security team needs to ensure all CDK-deployed infrastructure complies with HIPAA requirements before resources are provisioned. Which approach validates compliance at synthesis time without deploying infrastructure?
    1. Deploy to a test account and run AWS Config rules.
    2. Use cdk-nag with the HIPAA Security rule pack as a CDK Aspect to validate constructs during synthesis.
    3. Use CloudFormation drift detection after deployment.
    4. Write Terraform Sentinel policies.
  6. A company is evaluating IaC tools and needs automatic rollback capability when a deployment fails partway through creating resources. Which tools provide this natively? (Select TWO)
    1. AWS CloudFormation
    2. AWS CDK (via CloudFormation)
    3. Terraform
    4. OpenTofu
    5. Ansible
  7. An operations team wants continuous drift detection that automatically identifies when infrastructure configuration differs from the declared state. The detection should be comprehensive (all properties, not just template-defined ones) and run on a schedule. Which approach provides the most comprehensive drift detection?
    1. CloudFormation drift detection on all stacks.
    2. AWS Config rules for resource compliance.
    3. Terraform Cloud with scheduled plan-only runs that compare actual state against configuration for all managed properties.
    4. AWS CDK with snapshot testing in CI/CD.

Related Posts

References

AWS Lambda vs Step Functions vs EventBridge

AWS Lambda vs Step Functions vs EventBridge

📅 Published June 2026: Covers Lambda Durable Functions, Lambda Managed Instances, Step Functions 1,100+ new SDK integrations, EventBridge 1MB payload support, cross-account delivery, and enhanced observability features.

Overview

  • AWS Lambda is a serverless compute service that runs code in response to events without provisioning or managing servers. It automatically scales from zero to thousands of concurrent executions and charges only for actual compute time consumed.
  • AWS Step Functions is a serverless workflow orchestration service that coordinates multiple AWS services into visual workflows. It manages state, handles retries, error handling, and parallel execution using a declarative state machine model (Amazon States Language or visual Workflow Studio).
  • Amazon EventBridge is a serverless event bus service that makes it easy to connect applications using events. It routes events from AWS services, SaaS applications, and custom sources to targets based on content-based filtering rules, enabling loosely-coupled event-driven architectures.

Key Concepts

  • Lambda = Compute (execute code)
  • Step Functions = Orchestration (coordinate services)
  • EventBridge = Routing (connect event producers and consumers)

These three services are complementary and are frequently used together in modern serverless architectures. Lambda provides the compute, Step Functions coordinates the workflow, and EventBridge routes events between decoupled components.

Detailed Comparison Table

Feature AWS Lambda AWS Step Functions Amazon EventBridge
Primary Purpose Serverless compute — run code Workflow orchestration — coordinate services Event routing — connect producers and consumers
Execution Model Event-driven function invocation State machine with defined steps Event bus with rules-based routing
Max Duration 15 minutes (standard); 8 hours (MicroVMs); 1 year (Durable Functions) 1 year (Standard); 5 minutes (Express) N/A — routes events, no execution duration
State Management Stateless (Durable Functions add checkpointing) Built-in state tracking between steps Stateless — routes events only
Error Handling DLQ, retry policies, destinations Built-in Retry, Catch, fallback states DLQ, retry policy (up to 185 retries over 24 hours)
Payload Size 6 MB (sync); 1 MB (async, updated Jan 2026) 256 KB per state 1 MB (updated Jan 2026; billed per 64 KB chunk)
Concurrency 1,000+ concurrent executions (configurable) Standard: 2,000+ starts/sec; Express: 100,000 transitions/sec Thousands of events/sec per bus
Workflow Visibility CloudWatch Logs, X-Ray tracing Visual execution history, step-by-step I/O Enhanced logging (CloudWatch, S3, Firehose)
Direct Service Integrations 220+ AWS services as event sources 220+ AWS services via SDK integrations (1,100+ API actions) 200+ AWS services, 45+ SaaS partners as sources; 20+ target types
Coupling Tightly coupled to event source Orchestrates tightly coupled steps Loosely coupled — producers/consumers independent
Scheduling Via EventBridge rule/scheduler trigger Via EventBridge Scheduler or rule Built-in Scheduler (cron/rate/one-time)
Free Tier 1M requests + 400,000 GB-seconds/month 4,000 state transitions/month All AWS service events free; no free tier for custom events

When to Use Which — Decision Guide

Scenario Recommended Service Why
Simple event processing (transform, validate, enrich) Lambda Single function handles the entire task
Multi-step workflow with error handling and retries Step Functions Built-in retry, catch, and visual debugging
Fan-out events to multiple consumers EventBridge Content-based routing to multiple targets
Long-running process (human approval, wait for callback) Step Functions or Lambda Durable Functions Both support waiting up to 1 year; Step Functions for cross-service, Durable for code-first
Decouple microservices EventBridge Producers don’t need to know about consumers
Schedule tasks (cron jobs) EventBridge Scheduler → Lambda EventBridge handles scheduling, Lambda handles execution
AI/ML inference pipeline Step Functions + Lambda Orchestrate Bedrock, SageMaker calls with state management
SaaS integration (Stripe, Zendesk, Datadog) EventBridge + Lambda EventBridge receives partner events, Lambda processes
Parallel batch processing at scale Step Functions Distributed Map Process millions of items from S3 with up to 10,000 parallel child executions
Real-time data transformation Lambda (with Kinesis/SQS trigger) Sub-second processing of streaming data
Cross-account event routing EventBridge Direct cross-account delivery without intermediate bus
ETL pipeline with dependencies Step Functions + Lambda Sequential/parallel steps with data passing between stages

Decision Framework

  • Start with Lambda if you have a single task triggered by an event that completes within 15 minutes.
  • Add Step Functions when your logic requires multiple steps, conditional branching, parallel execution, error recovery, or human-in-the-loop approvals.
  • Add EventBridge when you need to decouple producers from consumers, route events to multiple targets based on content, or integrate with SaaS partners.
  • Use Lambda Durable Functions (instead of Step Functions) when you prefer code-first orchestration within a single Lambda function and don’t need cross-service visual orchestration.

Architecture Patterns Combining All Three

Pattern 1: Event-Driven Microservices

Multiple microservices communicate through EventBridge. Each service publishes domain events, and EventBridge routes them to interested consumers (Lambda functions or Step Functions workflows).

  • EventBridge — Central event bus for inter-service communication
  • Lambda — Individual microservice handlers (order processing, notification, inventory)
  • Step Functions — Complex business processes spanning multiple services (order fulfillment saga)

Pattern 2: Saga Pattern for Distributed Transactions

Step Functions orchestrates a saga where each step has a compensating action. EventBridge decouples the saga from downstream services.

  • Step Functions — Orchestrates the saga with Retry/Catch for each step
  • Lambda — Executes each transaction step and its compensation
  • EventBridge — Publishes saga completion/failure events to notify other bounded contexts

Pattern 3: Scheduled ETL with Event Notifications

  • EventBridge Scheduler — Triggers the ETL workflow on a cron schedule
  • Step Functions — Orchestrates extract → transform → load with error handling
  • Lambda — Executes each ETL stage
  • EventBridge — Publishes ETLComplete/ETLFailed events for downstream consumers

Pattern 4: AI/ML Pipeline with Human-in-the-Loop

  • EventBridge — Receives document upload events from S3
  • Step Functions — Orchestrates the pipeline: classify → extract → validate → approve
  • Lambda — Calls Bedrock/Textract for AI processing
  • Step Functions Wait — Pauses for human review via callback token
  • EventBridge — Publishes ProcessingComplete event to downstream systems

Pattern 5: Fan-Out/Fan-In Processing

  • EventBridge — Receives incoming event and triggers Step Functions
  • Step Functions Distributed Map — Fans out to process thousands of items in parallel
  • Lambda — Processes each individual item
  • Step Functions — Aggregates results after all items complete
  • EventBridge — Publishes aggregated results to interested consumers

Integration Points

Lambda ↔ Step Functions

  • Step Functions can invoke Lambda functions directly using optimized integrations (no additional charges for Lambda invocation from Step Functions)
  • Step Functions passes input/output data between Lambda steps automatically
  • Lambda can start Step Functions executions via the AWS SDK
  • Callback pattern — Step Functions sends a task token to Lambda; Lambda calls back when async work completes
  • Activity tasks — Step Functions waits for external workers (Lambda or other compute) to poll and complete work
  • Lambda Durable Functions (Dec 2025) — Provides Step Functions-like orchestration directly within Lambda code, blurring the boundary

Lambda ↔ EventBridge

  • EventBridge can invoke Lambda as a target for matched rules (asynchronous invocation)
  • Lambda can publish events to EventBridge via the SDK (PutEvents API)
  • EventBridge Pipes — Connects event sources (SQS, Kinesis, DynamoDB Streams) to Lambda with built-in filtering and enrichment
  • EventBridge Scheduler — Invokes Lambda on cron/rate/one-time schedules with 619+ SDK API actions (May 2026)
  • Lambda Destinations — On success/failure, Lambda can route results to EventBridge

Step Functions ↔ EventBridge

  • Step Functions can publish events to EventBridge directly from a workflow step (EventBridge PutEvents integration)
  • EventBridge rules can start Step Functions executions as a target
  • Step Functions emits execution status change events to EventBridge (RUNNING, SUCCEEDED, FAILED, TIMED_OUT, ABORTED)
  • EventBridge Scheduler can trigger Step Functions workflows on a schedule
  • Wait for Callback — Step Functions pauses and waits for an external event (potentially routed via EventBridge) to resume

All Three Together

  • EventBridge routes an incoming event → triggers Step Functions workflow → Step Functions orchestrates multiple Lambda functions → publishes completion event back to EventBridge
  • EventBridge Pipes can enrich events using Lambda before delivering to Step Functions
  • Step Functions 1,100+ SDK integrations (Mar 2026) include direct EventBridge and Lambda actions without writing code

Pricing Comparison

Dimension AWS Lambda AWS Step Functions Amazon EventBridge
Pricing Model Per request + per GB-second of compute Standard: per state transition; Express: per execution + duration + memory Per event published (per 64 KB chunk)
Request/Event Cost $0.20 per 1M requests Standard: $25.00 per 1M state transitions $1.00 per 1M custom events
Compute/Duration Cost $0.0000166667 per GB-second (x86); $0.0000133334 (ARM) Express: $0.000001 per 100ms per 64 MB memory N/A — no execution duration
Free Tier 1M requests + 400,000 GB-seconds/month (permanent) 4,000 state transitions/month (permanent) All AWS service events free; Scheduler: 14M invocations/month free
Additional Costs Provisioned Concurrency: idle + active charges; Ephemeral storage beyond 512 MB Express execution charges; payload processing for large data Pipes: per event + processing time; Schema Registry discovery: $0.10 per event

Cost Optimization Tips

  • Lambda — Use ARM/Graviton2 for 20% savings; right-size memory; use Provisioned Concurrency only for latency-sensitive paths
  • Step Functions — Use Express Workflows for high-volume, short-duration workflows (up to 90% cheaper than Standard); minimize state transitions by combining logic in single Lambda functions
  • EventBridge — Keep event payloads under 64 KB to avoid chunk-based billing; use input transformers to reduce payload size before delivery; AWS service events are always free
  • Combined — Use Step Functions SDK integrations directly (DynamoDB, S3, SQS) instead of Lambda wrappers to save Lambda costs

Recent Updates (2025-2026)

AWS Lambda

  • Lambda Durable Functions (Dec 2025) — Multi-step orchestration within Lambda code using steps and waits; auto-checkpoint, suspend up to 1 year, and recover from failures without Step Functions
  • Lambda Managed Instances (Dec 2025) — Dedicated compute with EC2 flexibility; multi-request processing per execution environment; supports tag propagation (Jun 2026)
  • Lambda MicroVMs (Jun 2026) — Container-based Firecracker snapshots with up to 8-hour runtime for long-running workloads
  • 1 MB Async Payload (Oct 2025) — Asynchronous invocation payload increased from 256 KB to 1 MB
  • ARC Region Switch ESM Block (May 2026) — Event source mapping execution block during regional failovers

AWS Step Functions

  • 28 New Service Integrations + 1,100 API Actions (Mar 2026) — Including Amazon Bedrock AgentCore and Amazon S3 Vectors
  • 100,000 State Machines per Account (Feb 2025) — 10x increase from previous 10,000 limit
  • 137 Additional APIs + Backup Search (Apr 2025) — Expanded SDK integrations
  • Distributed Map Enhancements (Sep 2025) — Additional data sources and observability metrics
  • API-based Local Testing (May 2026) — Validate workflows before deploying to AWS

Amazon EventBridge

  • 1 MB Event Payload (Jan 2026) — Maximum event size increased from 256 KB to 1 MB
  • Cross-Account Direct Delivery (Jan 2025) — Deliver events to targets in another account without intermediate default bus
  • Enhanced Visual Rule Builder (Nov 2025) — Intuitive console with event catalog for 200+ AWS services
  • Enhanced Logging (Jul 2025) — Logging to CloudWatch Logs, S3, and Kinesis Data Firehose for event lifecycle tracking
  • Scheduler 619 New SDK Actions (May 2026) — Including Lambda Managed Instances, 13 additional services
  • MWAA Serverless Integration (Jun 2026) — EventBridge notifications for Airflow workflow state transitions

AWS Certification Exam Relevance

SAA-C03 (Solutions Architect Associate)

  • Understand when to use Lambda vs Step Functions vs EventBridge for decoupling and orchestration
  • Know EventBridge for event-driven architecture design and cross-account event routing
  • Know Step Functions for workflow coordination with error handling
  • Understand the Saga pattern using Step Functions for distributed transactions
  • Know Lambda limits (15-min timeout, 6 MB sync payload, concurrency)

DVA-C02 (Developer Associate)

  • Deeper knowledge of Lambda event source mappings, destinations, and error handling
  • Step Functions Amazon States Language (ASL), Wait/Callback patterns, SDK integrations
  • EventBridge rule patterns, input transformers, Pipes, and Scheduler
  • Lambda Durable Functions vs Step Functions trade-offs
  • Testing and debugging workflows (Step Functions local testing, EventBridge logging)

SAP-C02 (Solutions Architect Professional)

  • Multi-account event routing with EventBridge (cross-account, cross-region)
  • Complex workflow patterns: Saga, fan-out/fan-in, human-in-the-loop with Step Functions
  • Cost optimization: Step Functions Standard vs Express, Lambda Provisioned Concurrency decisions
  • Hybrid architectures combining all three services for enterprise event-driven systems
  • Disaster recovery with Lambda ARC Region Switch and Step Functions execution history

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company has an e-commerce application where placing an order requires inventory reservation, payment processing, and shipping initiation. If any step fails, previous steps must be compensated (rolled back). Which architecture BEST implements this?
    1. Lambda function that calls each service sequentially with try/catch
    2. EventBridge rules that route events between each service
    3. Step Functions workflow implementing the Saga pattern with Catch and compensating states
    4. SQS queues between each service for reliability

    Answer: C — Step Functions is ideal for the Saga pattern because it provides built-in Retry and Catch states, maintains execution history, and allows defining compensating actions for each step. EventBridge is for decoupling, not orchestrating sequential transactions with compensation.

  2. A company needs to route customer feedback events to different processing Lambda functions based on the sentiment (positive, negative, neutral) contained in the event payload. Which service should they use?
    1. AWS Step Functions with Choice state
    2. Amazon EventBridge with content-based filtering rules
    3. Amazon SQS with message filtering
    4. AWS Lambda with conditional logic

    Answer: B — EventBridge excels at content-based routing where events are routed to different targets based on event payload content. Each rule can match specific patterns in the event JSON and route to different Lambda targets. This keeps services decoupled.

  3. A developer is building a document processing pipeline that takes 45 minutes to complete. The pipeline extracts text, classifies the document, and stores results. Which approach is MOST cost-effective?
    1. Single Lambda function with maximum 15-minute timeout chained via SQS
    2. Step Functions Standard Workflow orchestrating Lambda functions for each step
    3. Step Functions Express Workflow with Lambda functions
    4. EventBridge Pipes connecting each processing stage

    Answer: B — Step Functions Standard Workflow supports executions up to 1 year and is appropriate for a 45-minute pipeline. Express Workflows are limited to 5 minutes. Chaining Lambda via SQS adds operational complexity. EventBridge Pipes is for point-to-point event streaming, not multi-step orchestration.

  4. A solutions architect needs to design a system where multiple microservices react to a single business event (e.g., “UserRegistered”) without the producing service knowing about the consumers. Which approach provides the MOST decoupled architecture?
    1. Lambda function invoking each consumer directly
    2. SNS topic with subscriptions for each consumer
    3. EventBridge custom event bus with content-based rules for each consumer
    4. Step Functions parallel state invoking each consumer

    Answer: C — EventBridge provides the most decoupled architecture because producers publish events without knowing about consumers. Each consumer creates their own rule with content-based filtering. Unlike SNS, EventBridge supports schema discovery, event replay, and more sophisticated filtering. Step Functions and direct Lambda invocation create tight coupling.

  5. An application processes 500,000 short-lived events per hour, each requiring 3 state transitions. The team wants to minimize costs. Which Step Functions workflow type should they use?
    1. Standard Workflow
    2. Express Workflow
    3. Standard Workflow with Activity Tasks
    4. Lambda Durable Functions instead

    Answer: B — Express Workflows are designed for high-volume, short-duration workloads and cost approximately $1.00 per million executions vs $25.00 per million state transitions for Standard. With 500K events × 3 transitions = 1.5M transitions/hour, Express saves significantly. Express supports up to 100,000 state transitions per second.

  1. A company operates in multiple AWS accounts and needs Service A in Account 1 to trigger a Lambda function in Account 2 when an order is placed. Which is the simplest approach with EventBridge?
    1. Send event to Account 2’s default bus, then create a rule targeting Lambda
    2. Use EventBridge cross-account direct delivery to invoke the Lambda in Account 2
    3. Use SNS cross-account subscription
    4. Use SQS cross-account queue

    Answer: B — EventBridge now supports direct cross-account delivery (Jan 2025), eliminating the need to send events to the target account’s default bus first. This simplifies the architecture by allowing a rule in Account 1 to directly target a Lambda function in Account 2.

  2. A developer wants to build a multi-step AI workflow within a single Lambda function that checkpoints progress and can suspend execution while waiting for a human review. The workflow only involves Lambda logic (no other AWS service orchestration). Which feature should they use?
    1. Step Functions Standard Workflow
    2. Step Functions Express Workflow
    3. Lambda Durable Functions
    4. EventBridge with Lambda Destinations

    Answer: C — Lambda Durable Functions (Dec 2025) provide code-first orchestration with steps and waits directly within Lambda. They’re ideal when the workflow logic stays within Lambda and doesn’t need visual cross-service orchestration. Durable functions automatically checkpoint progress and can suspend execution for up to 1 year.

  3. A team needs to process 10 million S3 objects in parallel, with each object requiring a Lambda function for transformation. Which Step Functions feature is BEST suited?
    1. Map state with MaxConcurrency
    2. Parallel state with multiple branches
    3. Distributed Map with up to 10,000 parallel child executions
    4. EventBridge with Lambda targets

    Answer: C — Step Functions Distributed Map is specifically designed for large-scale parallel processing of items from S3 datasets. It supports up to 10,000 parallel child executions and can process millions of objects. The standard Map state is limited to 40 concurrent iterations.

  4. A company wants to trigger different processing pipelines based on the type of file uploaded to S3 — images go to a Rekognition pipeline, documents go to Textract, and videos go to MediaConvert. Which combination provides content-based routing? (Select TWO)
    1. S3 Event Notifications directly to each Lambda
    2. S3 → EventBridge rule with event pattern matching on object key suffix → different Step Functions targets
    3. S3 → Lambda → conditional invocation of each pipeline
    4. S3 → EventBridge rule with event pattern matching → different Lambda targets
    5. S3 → SQS → Lambda routing

    Answer: B, D — EventBridge can receive S3 events and apply content-based filtering rules on the object key (suffix pattern matching) to route to different targets. Both Step Functions and Lambda can be valid targets depending on pipeline complexity. S3 Event Notifications lack content-based filtering on file type.

  5. Which statement BEST describes the relationship between Lambda, Step Functions, and EventBridge in a serverless architecture?
    1. They are competing services — choose one for your architecture
    2. Lambda handles compute, Step Functions handles orchestration, EventBridge handles event routing — they are complementary
    3. Step Functions replaces the need for both Lambda and EventBridge
    4. EventBridge replaces Step Functions for all workflow orchestration needs

    Answer: B — These three services are complementary, not competing. Lambda provides serverless compute (executing code), Step Functions provides workflow orchestration (coordinating multiple services), and EventBridge provides event routing (connecting producers and consumers). Most serverless architectures use all three together.

References

Related Posts

AWS Certified Generative AI Developer – Professional (AIP-C01) Exam Learning Path

AWS Certified Generative AI Developer – Professional (AIP-C01) Exam Learning Path

  • The AWS Certified Generative AI Developer – Professional (AIP-C01) is AWS’s newest professional-level certification, validating the ability to integrate foundation models (FMs) into applications and business workflows using AWS technologies.
  • This is a hands-on certification focused on building and deploying GenAI solutions — not just theory, but production-ready implementations.
  • Target audience: Developers with 2+ years of AWS experience and 1+ year of hands-on GenAI solution implementation.
🎓 Recommended Course
Stephane Maarek – AWS Certified Generative AI Developer Professional — Comprehensive course covering all 5 exam domains with hands-on labs.

AIP-C01 Exam Content

  • Validates ability to design and implement solutions using vector stores, RAG, knowledge bases, and GenAI architectures
  • Tests integration of foundation models into applications and business workflows
  • Covers prompt engineering and management techniques
  • Tests implementation of agentic AI solutions
  • Validates optimization for cost, performance, and business value
  • Covers security, governance, and Responsible AI practices
  • Tests troubleshooting, monitoring, and optimization of GenAI applications

Refer AWS Certified Generative AI Developer – Professional (AIP-C01) Exam Guide

AIP-C01 Exam Summary

  • AIP-C01 consists of 65 scored questions + 10 unscored in 170 minutes
  • Question types: multiple-choice and multiple-response
  • Scaled score between 100 and 1,000. Minimum passing score: 750
  • Professional-level exam costs $300 + tax
  • Certification valid for 3 years
  • No prerequisite certification required (but AIF-C01 recommended as foundation)

AIP-C01 Exam Domains

  • Domain 1: Foundation Model Integration, Data Management & Compliance — 31%
    • Select and configure FMs for specific use cases
    • Design data pipelines for GenAI (vector stores, embeddings, chunking strategies)
    • Implement RAG architectures with Amazon Bedrock Knowledge Bases
    • Ensure data compliance and governance
  • Domain 2: Implementation and Integration — 26%
    • Implement GenAI solutions using Amazon Bedrock, SageMaker, and open-source models
    • Design and implement agentic workflows (Bedrock Agents, AgentCore)
    • Apply prompt engineering techniques (few-shot, chain-of-thought, system prompts)
    • Integrate FMs into existing applications and workflows
  • Domain 3: AI Safety, Security & Governance — 20%
    • Implement Bedrock Guardrails for content filtering
    • Apply Responsible AI practices (bias detection, toxicity filtering)
    • Secure GenAI workloads (IAM, VPC, encryption, data isolation)
    • Implement model governance and versioning
  • Domain 4: Operational Efficiency & Optimization — 12%
    • Optimize inference costs (model selection, caching, batch inference)
    • Implement performance optimization (provisioned throughput, model distillation)
    • Design for scalability and reliability
  • Domain 5: Testing, Validation & Troubleshooting — 11%
    • Evaluate FM outputs (BLEU, ROUGE, human evaluation)
    • Implement testing strategies for GenAI applications
    • Monitor and troubleshoot GenAI workloads
    • Implement feedback loops and continuous improvement

Key AWS Services for AIP-C01

  • Amazon Bedrock — Primary service for FM access, Knowledge Bases, Agents, Guardrails, Model Evaluation
  • Amazon Bedrock Agents — Agentic AI workflows with tool use and multi-step reasoning
  • Amazon Bedrock Knowledge Bases — Managed RAG with vector stores (OpenSearch Serverless, Aurora, Pinecone)
  • Amazon Bedrock Guardrails — Content filtering, PII redaction, topic denial
  • Amazon SageMaker — Custom model training, fine-tuning, hosting
  • Amazon Q Developer — AI coding assistant
  • Amazon Q Business — Enterprise AI assistant with data connectors
  • AWS Lambda — Serverless inference triggers, agent action groups
  • Amazon OpenSearch Serverless — Vector search for RAG
  • Amazon DynamoDB — Session state, conversation history
  • Amazon S3 — Data sources for knowledge bases
  • AWS Step Functions — Orchestrate multi-model workflows
  • Amazon CloudWatch — Monitoring GenAI workloads, model invocation metrics

Key Concepts for AIP-C01

  • RAG (Retrieval Augmented Generation) — Grounding FM responses with external data
  • Vector Databases & Embeddings — Semantic search, chunking strategies, embedding models
  • Prompt Engineering — System prompts, few-shot learning, chain-of-thought, temperature/top-p
  • Agentic AI — Tool use, function calling, multi-step reasoning, ReAct pattern
  • Fine-tuning vs RAG — When to customize the model vs augment with external data
  • Model Evaluation — Automated metrics (BLEU, ROUGE, BERTScore), human evaluation, LLM-as-judge
  • Responsible AI — Bias detection, hallucination mitigation, content safety, model cards
  • Inference Optimization — Provisioned throughput, caching, batch inference, model distillation
  • Guardrails — Content filters, denied topics, word filters, PII redaction, contextual grounding

AIP-C01 Preparation Strategy

  • Hands-on with Bedrock is highly recommended — While the exam is multiple-choice, questions are scenario-based and require practical understanding of Bedrock configurations
  • Build at least 2-3 RAG applications using Bedrock Knowledge Bases
  • Create Bedrock Agents with action groups and Lambda functions
  • Implement Guardrails with different filter configurations
  • Understand the differences between FM families (Claude, Titan, Llama, Mistral)
  • Practice prompt engineering with different techniques
  • Understand cost optimization: on-demand vs provisioned vs batch inference
  • Focus on Domain 1 (31%) and Domain 2 (26%) — they cover 57% of the exam

Recommended Resources

Related Posts

References

AWS EKS vs ECS – Decision Guide

AWS EKS vs ECS – Decision Guide

  • AWS offers two container orchestration services: ECS (AWS-native) and EKS (managed Kubernetes).
  • Both support Fargate (serverless) and EC2 launch types for running containers.
  • The choice depends on team expertise, portability needs, ecosystem requirements, and operational preferences.

EKS vs ECS Comparison

Feature Amazon ECS Amazon EKS
Orchestrator AWS-proprietary Kubernetes (CNCF standard)
Learning Curve Lower — simpler concepts Steeper — Kubernetes complexity
Portability AWS-only Multi-cloud, on-premises (EKS Anywhere), hybrid
Control Plane Cost Free $0.10/hour (~$73/month) per cluster; or EKS Auto Mode
Compute Options Fargate, EC2, External (ECS Anywhere) Fargate, EC2 (managed/self-managed), Karpenter, EKS Anywhere
Auto Scaling Service Auto Scaling + Capacity Providers HPA, VPA, Karpenter, Cluster Autoscaler
Networking awsvpc mode (ENI per task), Service Connect VPC CNI (pod IPs from VPC), service mesh (Istio, App Mesh)
Service Mesh ECS Service Connect (built-in) Istio, Linkerd, App Mesh, or VPC Lattice
Load Balancing ALB/NLB direct integration AWS Load Balancer Controller (ALB/NLB via ingress)
CI/CD CodeDeploy (blue/green), CodePipeline Flux, ArgoCD, Helm, CodePipeline, GitHub Actions
Observability Container Insights, X-Ray, FireLens Container Insights, Prometheus, Grafana, ADOT
Secrets Secrets Manager / Parameter Store integration Secrets Store CSI Driver, External Secrets
Windows Containers Supported (EC2 only) Supported (EC2 only)
GPU Workloads Supported Supported (better ecosystem for ML)
Ecosystem AWS-native tools Massive CNCF ecosystem (Helm, operators, CRDs)

When to Choose ECS

  • AWS-only deployment — no multi-cloud or on-premises Kubernetes needed.
  • Simplicity — smaller teams who want containers without Kubernetes complexity.
  • Cost-sensitive — no control plane fee ($73/month savings per cluster).
  • Tight AWS integration — native IAM task roles, Service Connect, CodeDeploy blue/green.
  • Getting started with containers — lower barrier to entry.
  • Fargate-first — ECS + Fargate is the simplest serverless container path.
  • Best for: Microservices on AWS, web apps, APIs, batch processing, startups.

When to Choose EKS

  • Kubernetes expertise exists — team already knows Kubernetes or uses it elsewhere.
  • Multi-cloud / hybrid — need portability to GKE, AKS, or on-premises (EKS Anywhere).
  • Rich ecosystem needed — Helm charts, operators, Istio, Argo, Prometheus, custom CRDs.
  • Complex scheduling — advanced pod placement, affinities, taints/tolerations, DaemonSets.
  • ML/AI workloads — better tooling for GPU scheduling, Kubeflow, Ray, distributed training.
  • Stateful workloads — StatefulSets, persistent volumes, operators for databases.
  • Regulatory requirements — some compliance frameworks mandate Kubernetes for container orchestration.
  • Best for: Platform teams, ML pipelines, multi-cloud strategies, complex microservices, ISVs.

EKS Auto Mode

  • Launched Dec 2024 — fully managed compute, networking, and storage for EKS.
  • AWS manages node provisioning, scaling, OS patching, and security updates.
  • Eliminates the need for managed node groups or self-managed nodes.
  • Combines the Kubernetes API with ECS-level operational simplicity.
  • Best for teams who want Kubernetes API compatibility without node management overhead.

Fargate: ECS vs EKS

  • Fargate works with both ECS and EKS — serverless compute for containers either way.
  • ECS on Fargate: Simpler configuration, native task definitions.
  • EKS on Fargate: Kubernetes pod spec, but with Fargate limitations (no DaemonSets, no privileged containers, no persistent volumes with EBS).
  • ECS Fargate supports more features (ephemeral storage up to 200 GiB, EFS, exec).

Decision Flowchart

  • Need Kubernetes API compatibility? → EKS
  • Need multi-cloud portability? → EKS
  • Team has no Kubernetes experience? → ECS
  • Want zero control plane cost? → ECS
  • Need Helm charts / operators / CRDs? → EKS
  • Just need to run containers simply? → ECS + Fargate
  • Want Kubernetes without node management? → EKS Auto Mode

AWS Certification Exam Practice Questions

  1. A startup with a small team wants to deploy containerized microservices on AWS with minimal operational overhead and no Kubernetes experience. They want serverless compute. Which option is most appropriate?
    1. EKS with managed node groups
    2. ECS with Fargate
    3. EKS with Fargate
    4. EKS Auto Mode
  2. A company runs Kubernetes on-premises and in GCP. They want to extend to AWS while maintaining the same Kubernetes manifests, Helm charts, and CI/CD pipelines. Which service should they use?
    1. ECS with EC2
    2. ECS with Fargate
    3. EKS
    4. App Runner
  3. An organization wants to run containers on Kubernetes but doesn’t want to manage nodes, patching, or scaling of the underlying compute. Which option provides this?
    1. ECS with Fargate
    2. EKS with self-managed nodes
    3. EKS Auto Mode
    4. ECS with EC2 and Capacity Providers
  4. A machine learning team needs to schedule GPU workloads with custom Kubernetes operators, use Kubeflow for training pipelines, and deploy models with Karpenter for cost-optimized scaling. Which is appropriate?
    1. ECS with GPU instances
    2. SageMaker
    3. EKS with EC2 (GPU) and Karpenter
    4. ECS with Fargate
  5. A company wants built-in service-to-service communication with automatic retries, circuit breaking, and observability without installing a service mesh like Istio. Which feature provides this on ECS?
    1. App Mesh
    2. VPC Lattice
    3. ECS Service Connect
    4. Cloud Map

Related Posts

References

Amazon ECS Developer Guide

Amazon EKS User Guide

EKS Auto Mode

AWS RDS vs DynamoDB – When to Use Which

AWS RDS vs DynamoDB

  • AWS offers both relational (RDS/Aurora) and NoSQL (DynamoDB) managed database services.
  • Choosing between them depends on data model, access patterns, scale requirements, and consistency needs.
  • Many architectures use both — RDS for transactional data and DynamoDB for high-scale, low-latency access patterns.

RDS vs DynamoDB Comparison

Feature Amazon RDS / Aurora Amazon DynamoDB
Type Relational (SQL) NoSQL (key-value / document)
Data Model Fixed schema, tables with relationships Flexible schema, single-table design
Query Language SQL (complex joins, aggregations) PartiQL / API (key-based access, limited filtering)
Scaling Vertical (instance size) + Read Replicas; Aurora Limitless for horizontal Horizontal (automatic, unlimited); on-demand or provisioned
Performance Depends on instance size; millisecond queries Single-digit millisecond at any scale; DAX for microsecond
Max Storage RDS: 64 TiB; Aurora: 128 TiB Unlimited
Consistency ACID transactions, strong consistency Eventually consistent (default); strong consistent reads available
Transactions Full SQL transactions (multi-table) TransactWriteItems/TransactGetItems (up to 100 items)
Joins Native SQL JOINs across tables No joins — denormalize or use application-side logic
Indexes B-tree, hash, GIN, full-text GSI (up to 20), LSI (up to 5)
HA / DR Multi-AZ (sync replication); Aurora Global Database Multi-AZ by default; Global Tables (multi-Region active-active)
Serverless Aurora Serverless v2 (scales to zero possible) On-demand mode (true pay-per-request, scales to zero)
Backup Automated snapshots, PITR (5-min granularity) Continuous backups, PITR (1-second granularity)
Pricing Model Instance hours + storage + I/O Read/Write capacity units or per-request
Maintenance Maintenance windows, patching required Zero maintenance, fully managed
Analytics Native SQL analytics, zero-ETL to Redshift Zero-ETL to Redshift/OpenSearch; export to S3

When to Choose RDS / Aurora

  • Complex queries — need SQL JOINs, aggregations, subqueries, window functions.
  • ACID transactions — financial systems, order processing, inventory management.
  • Existing relational schema — migrating from on-premises MySQL/PostgreSQL/Oracle/SQL Server.
  • Reporting workloads — need ad-hoc queries across multiple tables.
  • Normalized data model — relationships between entities are primary concern.
  • Moderate scale — up to hundreds of thousands of requests/second (Aurora).
  • Examples: E-commerce orders, banking, ERP, CMS with complex relationships.

When to Choose DynamoDB

  • Known access patterns — you can design a key schema around how data is queried.
  • Massive scale — millions of requests/second with consistent single-digit millisecond latency.
  • Simple key-value or document lookups — get/put by primary key.
  • Serverless / event-driven architectures — pairs naturally with Lambda.
  • Global applications — multi-Region active-active with Global Tables.
  • Variable/spiky traffic — on-demand mode handles any traffic without pre-provisioning.
  • Zero operational overhead — no patching, no maintenance windows.
  • Examples: Gaming leaderboards, session stores, IoT data, shopping carts, user profiles.

Using Both Together

  • Common pattern: RDS for source of truth (transactions) + DynamoDB for read-heavy access (caching, APIs).
  • Use DynamoDB Streams + Lambda to sync DynamoDB changes to RDS (or vice versa).
  • Use zero-ETL from Aurora/DynamoDB to Redshift for unified analytics.
  • Example: E-commerce — RDS for order processing, DynamoDB for product catalog and session data.

AWS Certification Exam Practice Questions

  1. A social media application needs to store user profiles and serve them with single-digit millisecond latency at millions of requests per second globally. Which database is most appropriate?
    1. Aurora PostgreSQL with read replicas
    2. DynamoDB with Global Tables
    3. RDS MySQL Multi-AZ
    4. ElastiCache Redis
  2. A financial application requires complex multi-table transactions where a transfer must debit one account and credit another atomically, with full SQL reporting capabilities. Which is most appropriate?
    1. DynamoDB with TransactWriteItems
    2. Aurora PostgreSQL with Multi-AZ
    3. DynamoDB with DAX
    4. Aurora Serverless with DynamoDB Streams
  3. A startup wants to minimize operational overhead with zero maintenance, pay only for actual usage, and handle unpredictable traffic spikes from 0 to 100K requests/second. Which option is best?
    1. Aurora Serverless v2
    2. DynamoDB On-Demand mode
    3. RDS with Auto Scaling read replicas
    4. Aurora with Provisioned capacity
  4. An analytics team needs to run complex ad-hoc SQL queries with JOINs across customer, order, and product tables without knowing access patterns in advance. Which is appropriate?
    1. DynamoDB with GSIs
    2. DynamoDB with PartiQL
    3. Aurora PostgreSQL
    4. DynamoDB export to S3 + Athena
  5. A gaming company stores player sessions in DynamoDB and order history in Aurora. They want unified analytics across both datasets without ETL pipelines. Which feature enables this?
    1. DynamoDB Streams to Aurora via Lambda
    2. Zero-ETL from both Aurora and DynamoDB to Redshift
    3. DynamoDB export to S3 + Aurora export to S3
    4. Federated queries in Athena

Related Posts

References

Amazon DynamoDB Developer Guide

Amazon Aurora User Guide

Amazon Q Business & Q Developer

🎓 Build AI Skills with Google
Learn practical AI skills and earn a Google Certificate. No experience required – learn at your own pace.
Start the Google AI Essentials Learning Path →

Amazon Q Business Overview

  • Amazon Q Business is a fully managed generative AI assistant for enterprises that can answer questions, provide summaries, generate content, and take actions based on enterprise data.
  • Connects to 40+ enterprise data sources (S3, SharePoint, Confluence, Slack, Salesforce, Google Drive, Gmail, Jira, ServiceNow, databases, and more).
  • Provides secure, accurate answers grounded in company data with citations and source attribution.
  • Respects existing access controls and permissions — users only see answers from data they’re authorized to access (ACL-aware).
  • Previously known as “Amazon Q for Business” — rebranded as Amazon Q Business (GA April 2024).

Key Features

  • Enterprise Search — natural language search across all connected data sources with ranked results.
  • Conversational AI — multi-turn conversations with context retention.
  • Content Generation — draft emails, reports, summaries, blog posts based on enterprise data.
  • Document Summarization — summarize long documents, meeting transcripts, and reports.
  • Actions & Plugins — perform tasks like creating Jira tickets, sending emails, updating Salesforce records.
  • Custom Plugins — build custom actions using OpenAPI schemas.
  • Amazon Q Apps — no-code app builder for creating lightweight gen-AI applications from natural language descriptions.
  • Data Insights — ask questions about structured data (databases, spreadsheets) with auto-generated visualizations.

Data Sources & Connectors

  • AWS Sources: S3, RDS, Aurora, Redshift, DynamoDB, WorkDocs
  • Collaboration: Slack, Microsoft Teams, Confluence, SharePoint Online, Google Drive
  • Productivity: Gmail, Outlook, OneDrive, Box, Dropbox
  • CRM/ITSM: Salesforce, ServiceNow, Zendesk
  • Development: Jira, GitHub, GitLab
  • Databases: PostgreSQL, MySQL, SQL Server, Oracle
  • Custom: Web Crawler, Custom connectors via API
  • Supports incremental sync — only indexes changed content.
  • 40+ pre-built connectors available.

Security & Access Control

  • ACL-aware retrieval — respects source system permissions (if a user can’t access a SharePoint doc, Q won’t use it in answers).
  • IAM Identity Center integration — SSO with corporate identity providers (Okta, Azure AD, Ping).
  • Encryption — data encrypted at rest (KMS, customer-managed keys supported) and in transit (TLS 1.2+).
  • VPC support — keep data connector traffic within VPC.
  • Admin controls — block specific topics, configure response behavior, set global/topic-level guardrails.
  • Audit logging — CloudTrail integration for all API calls.
  • Data retention — configurable conversation history retention.

Amazon Q Apps

  • No-code app builder — create gen-AI powered apps by describing them in natural language.
  • Apps can include: text generation, file upload, data queries, and custom actions.
  • App library — share apps across the organization.
  • Built on top of Q Business data and permissions.
  • Example apps: Meeting summarizer, FAQ generator, onboarding assistant, report builder.

Amazon Q Developer

  • Separate product from Q Business — focused on software development assistance.
  • Code generation — inline code suggestions in IDEs (VS Code, JetBrains, Visual Studio).
  • Code transformation — automated Java version upgrades (Java 8/11 → 17), .NET upgrades.
  • Chat — answer questions about AWS services, your codebase, and best practices.
  • /dev agent — implement features from natural language descriptions across multiple files.
  • /review — automated code review with security scanning.
  • AWS Console integration — troubleshoot errors, explain resources, generate CLI commands.
  • Customization — connect to private repositories for organization-specific suggestions.
  • Operational investigations — diagnose and resolve operational issues in AWS.

Amazon Q Business vs Amazon Bedrock

Feature Amazon Q Business Amazon Bedrock
Purpose Turnkey enterprise AI assistant Build custom gen-AI applications
Target User Business users, IT admins Developers, ML engineers
Customization Configure connectors, guardrails, plugins Full control: model selection, fine-tuning, agents, RAG
Data Integration 40+ pre-built connectors, automatic indexing Knowledge Bases (S3, web, custom), manual setup
Access Control ACL-aware (respects source permissions) IAM-based, manual implementation
Coding Required No (configuration only) Yes (API integration)
Use When Enterprise search, Q&A on company docs Custom AI apps, chatbots, content pipelines

Pricing

  • Q Business Lite — $3/user/month (search, Q&A, summaries).
  • Q Business Pro — $20/user/month (Lite features + plugins, actions, Q Apps, admin controls).
  • Q Developer Free Tier — limited code suggestions per month.
  • Q Developer Pro — $19/user/month (unlimited suggestions, /dev agent, code transformation).
  • Index pricing — based on document count and storage.

AWS Certification Exam Practice Questions

  1. A company wants employees to ask questions about internal policies, HR documents, and project wikis stored across SharePoint and Confluence. The solution must respect existing document permissions. Which service is most appropriate?
    1. Amazon Bedrock with Knowledge Bases
    2. Amazon Q Business
    3. Amazon Kendra
    4. Amazon Lex with Lambda
  2. A development team needs AI-assisted code reviews and the ability to automatically upgrade their Java 8 applications to Java 17. Which service provides this?
    1. Amazon Q Business
    2. Amazon CodeWhisperer
    3. Amazon Q Developer
    4. Amazon Bedrock Agents
  3. An organization using Amazon Q Business wants non-technical employees to create simple AI-powered apps (like a meeting summarizer) without writing code. Which feature enables this?
    1. Custom plugins
    2. Bedrock Flows
    3. Amazon Q Apps
    4. Bedrock Studio
  4. A security team needs to ensure that when employees use the company’s AI assistant, a junior analyst cannot receive answers from executive-level strategy documents they don’t have access to in SharePoint. How does Q Business handle this?
    1. Guardrails block the content category
    2. Admin manually configures document-level permissions
    3. ACL-aware retrieval automatically respects source system permissions
    4. IAM policies restrict Q Business API calls
  5. A company is choosing between Amazon Q Business and Amazon Bedrock Knowledge Bases for their internal document Q&A system. They have 40,000 employees, documents in 8 different systems, and no ML engineering team. Which should they choose?
    1. Bedrock Knowledge Bases (more flexible)
    2. Amazon Q Business (turnkey, 40+ connectors, no code needed)
    3. Amazon Kendra (purpose-built search)
    4. Custom RAG solution on EC2

Related Posts

References

Amazon Q Business User Guide

Amazon Q Developer User Guide

Amazon Q Business Product Page

Amazon Q Developer Product Page