App Engine vs Cloud Run vs GKE – Google Cloud Compute Services Compared
Overview
Google Cloud offers multiple compute platforms for running applications, each with different levels of abstraction, control, and operational overhead. App Engine, Cloud Run, and Google Kubernetes Engine (GKE) represent three distinct approaches to application hosting — from fully managed PaaS to container-native serverless to full Kubernetes orchestration.
This guide provides a comprehensive comparison to help you choose the right compute platform based on your application requirements, team expertise, and operational preferences.
Service Overview
App Engine
- App Engine is Google Cloud’s original Platform-as-a-Service (PaaS), launched in 2008.
- Designed for web applications and mobile backends with minimal infrastructure management.
- Offers two environments:
- Standard Environment — sandboxed runtime with specific language versions, scales to zero, free daily quotas.
- Flexible Environment — runs on Compute Engine VMs, supports custom runtimes via Docker, minimum 1 instance always running.
- Supports Python, Java, Node.js, Go, PHP, Ruby (Standard); any language via custom Docker images (Flexible).
- Provides built-in services: Memcache, Task Queues, Cron, Identity-Aware Proxy.
- One application per project with multiple services and versions for traffic splitting.
- Google recommends Cloud Run for new serverless workloads; App Engine remains fully supported.
Cloud Run
- Cloud Run is a fully managed serverless platform for running stateless containers.
- Accepts any container image that listens for HTTP requests or processes events (via Eventarc).
- Scales automatically from zero to thousands of instances based on incoming traffic.
- Supports any programming language, library, or binary that can be containerized.
- 2025-2026 Key Features:
- GPU support (GA) — NVIDIA L4 GPUs with 24 GB vRAM, scale-to-zero for GPU instances.
- Multi-container sidecars — deploy helper containers alongside your main application container.
- Cloud Run functions — Cloud Functions (2nd gen) is now Cloud Run functions, unified on Cloud Run infrastructure.
- Always-on CPU allocation — option for background processing outside of request handling.
- Cloud Run Jobs — run containers to completion without serving requests.
- Direct VPC egress — connect to VPC resources without a Serverless VPC Access connector.
- Volume mounts — Cloud Storage FUSE and NFS support.
- Pay only for resources consumed during request processing (or always-on CPU mode).
Google Kubernetes Engine (GKE)
- GKE is a managed Kubernetes service for deploying, managing, and scaling containerized applications.
- Provides full Kubernetes API access with Google-managed control plane.
- Offers two modes of operation:
- Autopilot — Google manages nodes, scaling, security; billed per pod resource request.
- Standard — you manage node pools, scaling policies; billed per node VM.
- 2025-2026 Key Features:
- GKE Enterprise tier — multi-cluster management, service mesh, advanced security.
- KEDA support — event-driven autoscaling including scale-to-zero for workloads.
- Flex CUD pricing (Jan 2026) — spend-based committed use discounts: 28% on 1-year, 46% on 3-year.
- Multi-instance GPUs — partition a single GPU into up to 7 slices for multiple containers.
- SCTP support — direct SCTP communication for Pod-to-Pod and Pod-to-Service traffic.
- AI/ML optimizations — powering AI workloads for top 50 Google Cloud customers.
- Supports stateful workloads, custom networking, service mesh, and complex multi-service architectures.
Detailed Comparison Table
| Feature | App Engine Standard | App Engine Flexible | Cloud Run | GKE |
|---|---|---|---|---|
| Type | PaaS (Serverless) | PaaS (VM-based) | CaaS (Serverless) | CaaS (Managed Kubernetes) |
| Abstraction Level | Highest — code only | High — code + custom runtimes | High — container images | Low — full Kubernetes control |
| Container Support | No — predefined runtimes only | Yes — custom Docker images | Yes — any OCI container | Yes — any OCI container + pods |
| Scale to Zero | Yes | No — minimum 1 instance | Yes | With KEDA (not native) |
| Autoscaling | Automatic (request-based) | Automatic (CPU-based) | Automatic (request/CPU/memory) | HPA, VPA, Cluster Autoscaler, KEDA |
| Cold Starts | Yes — seconds (sandboxed runtime init) | Yes — minutes (VM provisioning) | Yes — sub-second to seconds | Minimal — pods always running |
| Max Request Timeout | 10 minutes | 60 minutes | 60 minutes (services), 24h (jobs) | No limit |
| GPU Support | No | No | Yes — NVIDIA L4 (24 GB vRAM) | Yes — NVIDIA T4, L4, A100, H100 |
| Stateful Workloads | No | Limited | No (stateless by design) | Yes — StatefulSets, PersistentVolumes |
| Networking | Shared VPC, firewall rules | VPC-native, custom network | VPC connector, Direct VPC egress, Internal Load Balancer | Full VPC-native, Network Policies, Service Mesh, Ingress/Gateway API |
| Multi-Region | Yes — multiple services | Yes | Yes — deploy to multiple regions | Yes — multi-cluster with GKE Enterprise |
| CI/CD Integration | gcloud app deploy | gcloud app deploy | gcloud run deploy, Cloud Build, source deploy | kubectl, Cloud Deploy, Helm, ArgoCD |
| Traffic Splitting | Yes — version-based | Yes — version-based | Yes — revision-based | Yes — Istio/Gateway API |
| WebSockets | Standard: No, Flexible: Yes | Yes | Yes | Yes |
| Operational Overhead | Minimal | Low | Minimal | Medium (Autopilot) to High (Standard) |
Architecture Comparison
App Engine Architecture
- Application-centric model: one App Engine application per Google Cloud project.
- Applications consist of services (microservices), each with multiple versions.
- Standard: runs in Google’s sandboxed infrastructure with language-specific runtimes.
- Flexible: runs on Compute Engine VMs in Docker containers managed by Google.
- Built-in services (Memcache, Task Queues) tightly integrated but create vendor lock-in.
- Automatic load balancing across instances within a service.
Cloud Run Architecture
- Container-centric model: each service runs one or more container instances.
- Built on Knative — an open-source Kubernetes-based serverless platform.
- Fully stateless: each request can be served by any instance.
- Supports multi-container pods (sidecars) for logging agents, proxies, or helper services.
- Instances are ephemeral — no persistent local storage between requests (unless always-on CPU).
- Supports both HTTP services (request-driven) and Cloud Run Jobs (task-driven, run to completion).
- Integrated with Eventarc for event-driven architectures (Pub/Sub, Cloud Storage, Firestore triggers).
GKE Architecture
- Cluster-centric model: control plane (Google-managed) + worker nodes (your VMs or Autopilot-managed).
- Full Kubernetes primitives: Pods, Deployments, Services, Ingress, ConfigMaps, Secrets, StatefulSets.
- Supports complex multi-service architectures with service discovery and internal networking.
- Autopilot mode: Google provisions and manages nodes per-pod; no node management overhead.
- Standard mode: you manage node pools, machine types, and scaling policies.
- Supports service mesh (Istio/Anthos Service Mesh) for observability, traffic management, and security.
- Native support for stateful workloads via PersistentVolumes and StatefulSets.
Scaling Comparison
Scale-to-Zero
| Service | Scale-to-Zero | Details |
|---|---|---|
| App Engine Standard | ✅ Yes (default) | Scales to 0 instances when no traffic. Configurable via min_instances in app.yaml. |
| App Engine Flexible | ❌ No | Minimum 1 instance always running (default minimum is 2 for redundancy). |
| Cloud Run | ✅ Yes (default) | Scales to 0 automatically. Set min-instances > 0 to keep warm instances. |
| GKE | ⚠️ With KEDA | Not native. Use KEDA (event-driven autoscaler) to scale pods to zero. Cluster Autoscaler can scale node pools to zero. |
Autoscaling Behavior
- App Engine Standard: scales based on request rate. Automatic scaling uses target CPU utilization, max concurrent requests, and target throughput. Supports basic and manual scaling types.
- App Engine Flexible: scales based on CPU utilization (default 60%). Slower scaling due to VM provisioning.
- Cloud Run: scales based on concurrent requests per instance (default 80), CPU utilization, or memory. Can scale from 0 to 1000+ instances. Scaling speed is fast (seconds).
- GKE: multi-layered autoscaling:
- HPA (Horizontal Pod Autoscaler) — scales pods based on CPU, memory, or custom metrics.
- VPA (Vertical Pod Autoscaler) — adjusts pod resource requests/limits.
- Cluster Autoscaler — adds/removes nodes based on pending pod requests.
- KEDA — event-driven autoscaling from external sources (Pub/Sub, database queues, HTTP).
Cold Start Comparison
- App Engine Standard: cold starts range from 1-10 seconds depending on runtime and application size. Mitigate with
min_idle_instances(warm instances at extra cost). - App Engine Flexible: cold starts can take 1-3 minutes (VM provisioning). Keep minimum instances to avoid.
- Cloud Run: cold starts typically sub-second to 5 seconds. Depends on container image size and startup time. Mitigate with minimum instances, startup CPU boost, or smaller container images.
- GKE: minimal cold starts for always-running pods. New pod scheduling: seconds (if node capacity exists) to minutes (if new node provisioning needed). Autopilot may have slightly higher pod scheduling latency.
Pricing Comparison
| Service | Pricing Model | Free Tier | Key Cost Factors |
|---|---|---|---|
| App Engine Standard | Instance hours (per instance class) | Yes — 28 instance-hours/day (F1 class) | Instance class (F1-F4_1G), instance hours, outgoing network |
| App Engine Flexible | vCPU, memory, persistent disk per hour | No | vCPU-hours, GB-hours, persistent disk, always-on instances |
| Cloud Run | Per-request (vCPU-second, GiB-second, requests) | Yes — 2M requests, 360K vCPU-seconds, 180K GiB-seconds/month | vCPU-seconds (~$0.000024/s), GiB-seconds (~$0.0000025/s), requests ($0.40/million) |
| GKE Autopilot | Per-pod resource requests (vCPU, memory, ephemeral storage) | $74.40/month cluster credit | Pod vCPU/memory requests, cluster management fee ($0.10/hr), Flex CUDs available |
| GKE Standard | Per-node VM (Compute Engine pricing) | $74.40/month cluster credit (one free zonal cluster) | Node VM costs, cluster management fee ($0.10/hr), you pay for full nodes regardless of utilization |
Cost Optimization Tips
- App Engine Standard: use scale-to-zero for low-traffic apps; leverage free tier quotas.
- Cloud Run: use “CPU only allocated during request processing” for bursty traffic; switch to “CPU always allocated” (30% cheaper per vCPU-second) for steady traffic.
- GKE Autopilot: right-size pod resource requests; use Flex CUDs for predictable workloads (28% savings on 1-year, 46% on 3-year).
- GKE Standard: use Spot VMs for fault-tolerant workloads (60-91% discount); right-size node pools; enable Cluster Autoscaler.
Use Cases — When to Choose Each
Choose App Engine When:
- Building simple web applications or mobile backends with minimal operational overhead.
- You want built-in services (Memcache, Task Queues, Cron) without additional setup.
- Your team prefers deploying code directly without containerization.
- You need a free tier for development/low-traffic applications (Standard environment).
- You have an existing App Engine application and the migration cost to Cloud Run is not justified.
- You need version-based traffic splitting for A/B testing.
Choose Cloud Run When:
- You want serverless simplicity with full container flexibility (any language, any library).
- Your workload is stateless and request-driven (APIs, web apps, webhooks, event processing).
- You need scale-to-zero to minimize costs during idle periods.
- You want to avoid vendor lock-in — containers are portable to any platform.
- You need GPU acceleration for AI/ML inference with cost-efficient scale-to-zero.
- You’re building event-driven microservices with Pub/Sub, Cloud Storage, or Firestore triggers.
- You want fast deployments without managing infrastructure or Kubernetes clusters.
- You’re migrating from Cloud Functions or App Engine to a more flexible platform.
Choose GKE When:
- You need full Kubernetes capabilities: StatefulSets, DaemonSets, custom operators, CRDs.
- Your application requires complex networking: Network Policies, service mesh, custom ingress.
- You’re running stateful workloads (databases, message queues, caches) alongside stateless services.
- You need fine-grained control over resource allocation, scheduling, and placement.
- Your team has Kubernetes expertise and needs multi-cluster management.
- You’re running large-scale ML training workloads requiring multi-GPU nodes (A100, H100).
- You need long-running processes, batch jobs, or background workers without request timeouts.
- You require hybrid/multi-cloud deployment with consistent Kubernetes APIs (GKE Enterprise).
Decision Flowchart
Quick Decision Guide
- Do you need Kubernetes-specific features? (StatefulSets, DaemonSets, service mesh, custom operators) → GKE
- Is your workload stateless and request-driven? → Consider Cloud Run
- Do you need scale-to-zero with container flexibility? → Cloud Run
- Do you prefer deploying code without containers? → App Engine Standard
- Do you need GPUs with serverless scaling? → Cloud Run (single L4) or GKE (multi-GPU)
- Do you need complex multi-service architectures with inter-service communication? → GKE
- Do you want minimal operational overhead for a new project? → Cloud Run (recommended for most new workloads)
Migration Paths
- App Engine → Cloud Run: Google provides an official App Engine to Cloud Run migration guide. Cloud Run supports similar traffic splitting, custom domains, and IAM integration. Key difference: you need to containerize your application.
- Cloud Run → GKE: Cloud Run is built on Knative. You can deploy the same container images to GKE with Knative serving, or use standard Kubernetes Deployments for more control.
- GKE → Cloud Run: For stateless HTTP services, extract individual microservices from GKE and deploy as Cloud Run services. Reduces operational overhead for simple services.
Google Cloud Certification Exam Tips
- Selecting the right compute platform based on requirements (stateless vs stateful, scaling needs, team expertise).
- Understanding scale-to-zero capabilities and cost implications.
- Knowing the difference between App Engine Standard and Flexible environments.
- GKE Autopilot vs Standard mode selection criteria.
- Cloud Run’s container-based serverless model vs App Engine’s PaaS model.
Practice Questions
-
A startup is building a REST API that receives sporadic traffic — zero requests for hours, then sudden bursts of thousands of requests. The team wants to minimize costs during idle periods while handling bursts without manual intervention. They use Python with custom native libraries. Which compute option is MOST cost-effective?
- App Engine Standard environment
- App Engine Flexible environment
- Cloud Run
- GKE Autopilot
Answer: C — Cloud Run scales to zero (no cost when idle), handles bursts automatically, supports any container (custom native libraries), and charges only per-request. App Engine Standard also scales to zero but may not support custom native libraries. GKE Autopilot still incurs cluster management fees even when idle.
-
A company runs a large e-commerce platform with 50+ microservices, some stateful (Redis, PostgreSQL), requiring service mesh for inter-service mTLS, network policies for isolation, and custom Kubernetes operators for database failover. Which platform should they use?
- Cloud Run with multiple services
- App Engine Flexible with multiple services
- GKE Standard with Anthos Service Mesh
- Cloud Run with VPC connector
Answer: C — GKE Standard provides full Kubernetes capabilities including StatefulSets for databases, Network Policies for isolation, service mesh for mTLS, and custom operators. Cloud Run doesn’t support stateful workloads or custom operators.
-
A data science team needs to deploy an ML inference model that receives 100-500 requests per day. The model requires GPU acceleration and the team wants to avoid paying for idle GPU resources. Which is the BEST option?
- GKE Standard with GPU node pool
- Cloud Run with GPU support
- App Engine Flexible with custom runtime
- GKE Autopilot with GPU pods
Answer: B — Cloud Run with GPU support (NVIDIA L4, 24 GB vRAM) provides scale-to-zero for GPU instances, eliminating idle GPU costs. For 100-500 requests/day, this is significantly cheaper than maintaining a GKE GPU node pool that runs 24/7.
-
A developer is building a simple web application using a standard Python/Flask stack with no custom native dependencies. The application has low, consistent traffic and the developer wants the simplest deployment with minimal configuration. Which platform requires the LEAST operational effort?
- GKE Autopilot
- Cloud Run
- App Engine Standard environment
- App Engine Flexible environment
Answer: C — App Engine Standard requires the least effort: deploy with
gcloud app deployusing anapp.yamlfile. No Dockerfile, no container registry, no Kubernetes manifests needed. Includes a free tier for low-traffic apps. -
A company wants to migrate their existing containerized microservices from on-premises Kubernetes to Google Cloud. They want to maintain Kubernetes compatibility for potential multi-cloud deployment while reducing operational overhead. They don’t need to manage node pools or OS patching. Which option is BEST?
- Cloud Run
- GKE Standard
- GKE Autopilot
- App Engine Flexible
Answer: C — GKE Autopilot provides full Kubernetes API compatibility (multi-cloud portability) while Google manages nodes, OS patching, and scaling. It reduces operational overhead compared to Standard mode while maintaining Kubernetes features the team already uses.
Frequently Asked Questions
What is the difference between App Engine and Cloud Run?
App Engine is a PaaS with opinionated runtimes and built-in versioning/traffic splitting. Cloud Run is a serverless container platform that runs any container image with per-request pricing and scale-to-zero. Cloud Run offers more flexibility while App Engine provides simpler deployment for supported languages.
Can GKE scale to zero?
Standard GKE cannot scale node pools to zero while maintaining the cluster. However, GKE Autopilot only provisions nodes when pods are scheduled, and Knative/Cloud Run for Anthos on GKE can scale individual services to zero pods.
Should I migrate from App Engine to Cloud Run?
Google recommends Cloud Run for new applications. Migration makes sense if you need custom containers, faster cold starts, more pricing flexibility, or GPU support. App Engine remains fully supported and is simpler for basic web apps in supported runtimes.