Google Kubernetes Engine – GKE

Google Kubernetes Engine – GKE provides a managed environment for deploying, managing, and scaling containerized applications using Google infrastructure.

GKE is available in two editions:
- GKE Standard edition – core GKE functionality including cluster management, autoscaling, release channels, fleet management, Config Management, and Policy Controller at no additional cost.
- GKE Enterprise edition – adds advanced security, compliance insights, Binary Authorization, service mesh, multi-cluster management, and richer networking features for enterprise-scale operations.

Standard vs Autopilot Cluster

Autopilot (Recommended – Default mode since 2023)
- Provides a fully provisioned and managed cluster configuration.
- Cluster configuration options are made for you.
- Autopilot clusters are pre-configured with an optimized cluster configuration that is ready for production workloads.
- GKE manages the entire underlying infrastructure of the clusters, including the control plane, nodes, and all system components.
- Applies security best practices by default including hardened node configuration, automatic security patching, and default seccomp profiles.
- Billing is based on Pod resource requests (CPU, memory, ephemeral storage) rather than node-level resources.
- Uses a container-optimized compute platform (introduced 2025) that delivers up to 85% faster provisioning speed and improved autoscaling performance.
- Supports ComputeClasses (Balanced, Scale-Out, custom) to let workloads specify hardware requirements like GPUs, high-memory, or accelerator-optimized nodes.
- In 2024, 30% of all active GKE clusters were created in Autopilot mode.

Standard
- Provides advanced configuration flexibility over the cluster’s underlying infrastructure.
- Cluster configurations needed for the production workloads are determined by you.
- You manage node pools, machine types, scaling policies, and node-level security.
- Supports Autopilot mode workloads in Standard clusters – allows deploying ComputeClasses and letting GKE auto-create/manage node pools for specific workloads while retaining Standard cluster control for others.

GKE - Autopilot vs Standard Clusters

Zonal vs Regional Cluster

Zonal clusters
- Zonal clusters have a single control plane in a single zone.
- Depending on the availability requirements, nodes for the zonal cluster can be distributed in a single zone or in multiple zones.
- Single-zone clusters
  - Control Plane -> Single Zone & Workers -> Single Zone
  - A single-zone cluster has a single control plane running in one zone.
  - Control plane manages workloads on nodes running in the same zone.
- Multi-zonal clusters
  - Control Plane -> Single Zone & Workers -> Multi-Zone
  - A multi-zonal cluster has a single replica of the control plane running in a single zone and has nodes running in multiple zones.
  - During an upgrade of the cluster or an outage of the zone where the control plane runs, workloads still run. However, the cluster, its nodes, and its workloads cannot be configured until the control plane is available.
  - Multi-zonal clusters balance availability and cost for consistent workloads.
Regional clusters
- Control Plane -> Multi Zone & Workers -> Multi-Zone
- A regional cluster has multiple replicas of the control plane, running in multiple zones within a given region.
- Nodes also run in each zone where a replica of the control plane runs.
- Because a regional cluster replicates the control plane and nodes, it consumes more Compute Engine resources than a similar single-zone or multi-zonal cluster.

GKE Zonal vs Regional Cluster

Route-Based Cluster vs VPC-Native Cluster

VPC-native clusters (using Alias IPs) are the default and recommended networking mode.
VPC-native mode is always on for Autopilot clusters and cannot be turned off.
Route-based clusters require explicitly disabling the VPC-native option and are not recommended for new deployments.

VPC-native clusters offer better scalability (not subject to route quotas), native integration with VPC features, and support for Private Google Access.

Refer blog post @ Google Kubernetes Engine Networking

Private Cluster

Private clusters help isolate nodes from having inbound and outbound connectivity to the public internet by providing nodes with internal IP addresses only.
External clients can still reach the services exposed as a load balancer by calling the external IP address of the HTTP(S) load balancer.

Cloud NAT or self-managed NAT gateway can provide outbound internet access for certain private nodes.
By default, Private Google Access is enabled, which provides private nodes and their workloads with limited outbound access to Google Cloud APIs and services over Google’s private network.
The defined VPC network contains the cluster nodes, and a separate Google Cloud VPC network contains the cluster’s control plane.

The control plane’s VPC network is located in a project controlled by Google. The control plane’s VPC network is connected to the cluster’s VPC network with VPC Network Peering. Traffic between nodes and the control plane is routed entirely using internal IP addresses.
Control plane for a private cluster has a private endpoint in addition to a public endpoint.
Control plane public endpoint access level can be controlled:
- Public endpoint access disabled
  - Most secure option as it prevents all internet access to the control plane.
  - Cluster can be accessed using Bastion host/Jump server or if Cloud Interconnect and Cloud VPN have been configured from the on-premises network to connect to Google Cloud.
  - Authorized networks must be configured for the private endpoint, which must be internal IP addresses.
- Public endpoint access enabled, authorized networks enabled:
  - Provides restricted access to the control plane from defined source IP addresses.
- Public endpoint access enabled, authorized networks disabled
  - Default and least restrictive option.
  - Publicly accessible from any source IP address as long as you authenticate.
Nodes always contact the control plane using the private endpoint.

Shared VPC Clusters

Shared VPC supports both zonal and regional clusters.
Shared VPC supports VPC-native clusters and must have Alias IPs enabled. Legacy networks are not supported.

Node Pools

A node pool is a group of nodes within a cluster that all have the same configuration and are identical to one another.
Node pools use a NodeConfig specification.
Each node in the pool has a cloud.google.com/gke-nodepool Kubernetes node label, which has the node pool’s name as its value.

Number of nodes and type of nodes specified during cluster creation becomes the default node pool. Additional custom node pools of different sizes and types can be added to the cluster for e.g. local SSDs, GPUs, Spot VMs, or different machine types.
Node pools can be created, upgraded, and deleted individually without affecting the whole cluster. However, a single node in a node pool cannot be configured; any configuration changes affect all nodes in the node pool.
You can resize node pools in a cluster by adding or removing nodes using gcloud container clusters resize CLUSTER_NAME --node-pool POOL_NAME --num-nodes NUM_NODES

Existing node pools can be manually upgraded or automatically upgraded.
For a multi-zonal or regional cluster, all of the node pools are replicated to those zones automatically. Any new node pool is automatically created or deleted in those zones.
GKE drains all the nodes in the node pool when a node pool is deleted.

Spot VMs (replacement for Preemptible VMs) can be used in node pools for fault-tolerant workloads with up to 60-91% cost savings.
Node pool auto-creation (formerly Node Auto-Provisioning/NAP) allows GKE to automatically create and delete node pools based on workload requirements and ComputeClass specifications.

Cluster Autoscaler

GKE’s cluster autoscaler automatically resizes the number of nodes in a given node pool, based on the demands of the workloads.

Cluster autoscaler is automatic by specifying the minimum and maximum size of the node pool and does not require manual intervention.
Cluster autoscaler increases or decreases the size of the node pool automatically, based on the resource requests (rather than actual resource utilization) of Pods running on that node pool’s nodes.
- If Pods are unschedulable because there are not enough nodes in the node pool, cluster autoscaler adds nodes, up to the maximum size of the node pool.
- If nodes are under-utilized, and all Pods could be scheduled even with fewer nodes in the node pool, cluster autoscaler removes nodes, down to the minimum size of the node pool. If the node cannot be drained gracefully after a timeout period (currently 10 minutes – not configurable), the node is forcibly terminated.
Before enabling cluster autoscaler, design the workloads to tolerate potential disruption or ensure that critical Pods are not interrupted.
Workloads might experience transient disruption with autoscaling, esp. with workloads running with a single replica.

With Autopilot clusters, you don’t need to configure cluster autoscaler because node pools are automatically provisioned and scaled to meet workload requirements.

ComputeClasses

A ComputeClass is a Kubernetes custom resource that defines a list of node configurations (machine types, feature settings, hardware requirements) for GKE to follow when provisioning nodes.
Built-in ComputeClasses (Autopilot):
- General-Purpose (default) – standard compute for most workloads.
- Balanced – optimized balance of compute, memory, and networking.
- Scale-Out – cost-efficient for horizontally scalable workloads.
- Accelerator – for GPU/TPU workloads (AI/ML).

Custom ComputeClasses let you define prioritized lists of node configurations for autoscaling, including machine families, Spot VM fallback, specific zones, and hardware constraints.
ComputeClasses work in both Autopilot and Standard clusters (with Autopilot mode enabled for the workload).
Pods select a ComputeClass using the cloud.google.com/compute-class node selector or nodeAffinity.

Release Channels & Extended Support

GKE release channels provide automatic version management:
- Rapid – latest Kubernetes release; access new GKE features as soon as they go GA.
- Regular – 1-2 months after Rapid; balance of feature access and stability.
- Stable – 2-3 months after Regular; priority on stability.
- Extended – for clusters needing longer support on a specific minor version.
Extended Support (since GKE 1.27): clusters can remain on a specific minor version for up to 24 months – 14 months of standard support plus ~10 months of extended support with continued security patches.
Clusters enrolled in release channels receive automatic upgrades within their channel’s schedule.

Auto-upgrading Nodes

Node auto-upgrades help keep the nodes in the GKE cluster up-to-date with the cluster control plane version when the control plane is updated on your behalf.
Node auto-upgrade is enabled by default when a new cluster or node pool is created with Google Cloud Console or the gcloud command.
Node auto-upgrades provide several benefits:
- Lower management overhead – no need to manually track and update the nodes when the control plane is upgraded on your behalf.
- Better security – GKE automatically ensures that security updates are applied and kept up to date.
- Ease of use – provides a simple way to keep the nodes up to date with the latest Kubernetes features.

Node pools with auto-upgrades enabled are scheduled for upgrades when they meet the selection criteria. Rollouts are phased across multiple weeks to ensure cluster and fleet stability.
During the upgrade, nodes are drained and re-created to match the current control plane version. Modifications on the boot disk of a node VM do not persist across node re-creations. To preserve modifications across node re-creation, use a DaemonSet.
Enabling auto-upgrades does not cause the nodes to upgrade immediately.

Workload Identity Federation

Workload Identity Federation for GKE (previously known as Workload Identity) is the recommended way for workloads running on GKE to authenticate to Google Cloud APIs.
Eliminates the need for service account keys, which are a security risk due to being long-lived credentials.
Allows Kubernetes service accounts to act as IAM principals, directly referencing them in IAM policies without an intermediate Google service account.

Provides per-Pod identity using the principle of least privilege, unlike node-level service accounts that are shared by all workloads on a node.
Enabled by default on Autopilot clusters.
Supports fleet-level Workload Identity Federation for multi-cluster environments.

Fleet Management

A Fleet is a logical grouping of GKE clusters that enables multi-cluster management and governance.
Fleets allow you to manage features like Config Management, Policy Controller, and service mesh across multiple clusters simultaneously.

Fleet-level features include:
- Teams – define team scopes across clusters for multi-tenancy.
- Config Sync – apply consistent configuration across fleet members.
- Policy Controller – enforce governance policies fleet-wide.
- Service Mesh – unified service mesh across clusters (Cloud Service Mesh).
- Multi-cluster Services (MCS) – discover and route to services across clusters.
- Multi-cluster Gateway – global load balancing across clusters using Gateway API.
Fleet management features are available in GKE Standard edition at no additional cost (since 2024).

GKE Security

https://jayendrapatil.com/google-kubernetes-engine-gke-security/

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your existing application running in Google Kubernetes Engine (GKE) consists of multiple pods running on four GKE n1-standard-2 nodes. You need to deploy additional pods requiring n2-highmem-16 nodes without any downtime. What should you do?
1. Use gcloud container clusters upgrade. Deploy the new services.
2. Create a new Node Pool and specify machine type n2-highmem-16. Deploy the new pods.
3. Create a new cluster with n2-highmem-16 nodes. Redeploy the pods and delete the old cluster.
4. Create a new cluster with both n1-standard-2 and n2-highmem-16 nodes. Redeploy the pods and delete the old cluster.

A company is running a production GKE cluster and wants to minimize operational overhead while ensuring nodes are always patched and running the latest supported Kubernetes version. What should they configure?
1. Manually upgrade nodes each quarter using gcloud container clusters upgrade.
2. Use GKE Autopilot mode with release channels enabled.
3. Disable auto-upgrade and use a custom CI/CD pipeline for upgrades.
4. Use preemptible VMs so nodes are recycled frequently.

Your organization runs multiple GKE clusters across different regions. You need a way to apply consistent security policies and deploy services accessible across all clusters. Which features should you use?
1. Create separate IAM policies for each cluster and use external load balancers.
2. Register clusters in a Fleet and use Policy Controller with Multi-cluster Services.
3. Deploy identical configurations manually to each cluster.
4. Use a single regional cluster spanning all regions.

A team wants their GKE workloads to securely access Google Cloud Storage and BigQuery APIs without managing service account keys. What is the recommended approach?
1. Mount service account JSON keys as Kubernetes secrets.
2. Use the node’s default service account for all Pods.
3. Enable Workload Identity Federation and bind Kubernetes service accounts to IAM principals.
4. Store service account keys in Secret Manager and inject at runtime.

You are deploying an AI/ML training workload on GKE that requires GPU nodes. You want GKE to automatically provision the right node type without manual node pool creation. What should you use?
1. Manually create a GPU node pool and set taints/tolerations.
2. Use cluster autoscaler with a pre-created GPU node pool.
3. Use GKE Autopilot with the Accelerator ComputeClass.
4. Deploy the workload on CPU nodes and use software-based GPU emulation.
You want your GKE cluster to remain on Kubernetes version 1.29 for 24 months to minimize disruption to production workloads. What should you configure?
1. Disable auto-upgrades and manually manage the cluster version.
2. Use the Rapid release channel for the latest patches.
3. Enroll the cluster in the Extended release channel for extended support.
4. Create a new cluster every 14 months on the desired version.