GCP Compute Engine

GCP Compute Engine

  • Compute Engine instance is a virtual machine (VM) hosted on Google’s infrastructure.
  • Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images created or imported from existing systems.
  • Docker containers can also be deployed, which are automatically launched on instances running the Container-Optimized OS public image.
  • Each instance belongs to a GCP project, and a project can have one or more instances. When you delete an instance, it is removed from the project.
  • For an instance creation, the zone, operating system, and machine type (number of virtual CPUs and the amount of memory) needs to be specified.
  • By default, each Compute Engine instance has a small boot persistent disk that contains the OS. Additional storage options can be attached to the instance.
  • Each network interface of a Compute Engine instance is associated with a subnet of a unique VPC network.
  • Regardless of the region where the VM instance is created, the default time for the VM instance is Coordinated Universal Time (UTC).

Compute Engine Instance Lifecycle

Instance life cycle.

  • PROVISIONING. Resources are being allocated for the instance. The instance is not running yet.
  • STAGING. Resources have been acquired and the instance is being prepared for first boot.
  • RUNNING. The instance is booting up or running. You should be able to ssh into the instance soon, but not immediately, after it enters this state.
  • STOPPING: the instance is being stopped because a user has made a request to stop the instance or there was a failure. This is a temporary status and the instance will move to TERMINATED.
  • REPAIRING – The instance is being repaired  because the instance encountered an internal error or the underlying machine is unavailable due to maintenance. During this time, the instance is unusable. If repair is successful, the instance returns to one of the above states.
  • TERMINATED. A user shut down the instance, or the instance encountered a failure. You can choose to restart the instance or delete it.
  • SUSPENDING Instance is being suspended due to an user action
  • SUSPENDED – Instance is suspended and can be resumed or deleted

GCP Compute Engine Instance Stopping vs Suspending vs Resetting

Compute Engine Machine Types

  • A machine type is a set of virtualized hardware resources available to a virtual machine (VM) instance, including the system memory size, virtual CPU (vCPU) count, and persistent disk limits.
  • Machine types are grouped and curated by families for different workloads

GCP Compute Engine Machine Types

Compute Engine Guest Environment

  • A Guest environment is automatically installed on the VM instance, when using Google-provided public images
  • Guest environment is a set of scripts, daemons, and binaries that read the content of the metadata server to make a VM run properly on Compute Engine
  • A metadata server is a communication channel for transferring information from a client to the guest operating system.
  • Guest environment can be manually installed on custom images

Compute Engine Instance Availability Policies

  • Compute Engine does regular maintenance of its infrastructure which entail hardware and software updates
  • Google might require to move the VM away from the host that is undergoing maintenance and Compute Engine automatically manages the scheduling behavior of these instances.
  • Compute Engine instance’s availability policy determines how it behaves when there is a maintenance event
    • Live migrate – move the VM instances to another host machine
    • Stop the instances
  • Instance’s availability policy can be changed by configuring the following two settings:
    • VM instance’s maintenance behavior onHostMaintenance, which determines whether the instance is live migrated MIGRATE (default) or stopped TERMINATE
    • Instance’s restart behavior  automaticRestart  which determines whether the instance automatically restarts (default) if it crashes or gets stopped

Compute Engine Live Migration

  • Live migration helps keep the VM instances running even when a host system event, such as a software or hardware update, occurs
  • Compute Engine live migrates the running instances to another host in the same zone instead of requiring the VMs to be rebooted
  • Live migration allows Google to perform maintenance to keep infrastructure protected and reliable without interrupting any of the VMs.
  • GCP provides a notification to the guest that a migration is imminent, when a VM is scheduled to be live migrated
    • Regular infrastructure maintenance and upgrades.
    • Network and power grid maintenance in the data centers.
    • Failed hardware such as memory, CPU, network interface cards, disks, power, and so on. This is done on a best-effort basis; if a hardware fails completely or otherwise prevents live migration, the VM crashes and restarts automatically and a hostError is logged.
    • Host OS and BIOS upgrades.
    • Security-related updates, with the need to respond quickly.
    • System configuration changes, including changing the size of the host root partition, for storage of the host image and packages.Live migration keeps the instances running during:
  • Live migration does not change any attributes or properties of the VM including internal and external IP addresses, instance metadata, block storage data and volumes, OS and application state, network settings, network connections, and so on.
  • Compute Engine can also live migrate instances with local SSDs attached, moving the VMs along with their local SSD to a new machine in advance of any planned maintenance.
  • Instances with GPUs attached cannot be live migrated and must be set to stop and optionally restart. Compute Engine offers a 60-minute notice before a VM instance with a GPU attached is stopped
  • Preemptible instance cannot be configured for live migration

Preemptible VM instances

  • A preemptible VM is an instance that can be created and run at a much lower price than normal instances.
  • However, Compute Engine might stop (preempt) these instances if it requires access to those resources for other tasks.
  • Preemptible instances are excess Compute Engine capacity, so their availability varies with usage.
  • Preemptible instances are ideal to reduce costs significantly, if the apps are fault-tolerant and can withstand possible instance preemptions
  • Preemptible instance limitations
    • Compute Engine might stop preemptible instances at any time due to system events.
    • Compute Engine always stops preemptible instances after they run for 24 hours.
    • Preemptible instances are finite Compute Engine resources, so they might not always be available.
    • Preemptible instances can’t live migrate to a regular VM instance, or be set to automatically restart when there is a maintenance event.
    • Preemptible instances are not covered by any Service Level Agreement
    • GCP Free Tier credits for Compute Engine don’t apply to preemptible instances
  • Preemption process
    • Compute Engine sends a preemption notice to the instance in the form of an ACPI G2 Soft Off signal.
    • Shutdown script can be used to handle the preemption notice and complete cleanup actions before the instance stops
    • If the instance does not stop after 30 seconds, Compute Engine sends an ACPI G3 Mechanical Off signal to the operating system.
    • Compute Engine transitions the instance to a TERMINATED state.
  • Managed Instance group supports Premptible instances.

Shielded VM

  • Shielded VM offers verifiable integrity of the Compute Engine VM instances, to confirm the instances haven’t been compromised by boot- or kernel-level malware or rootkits.
  • Shielded VM’s verifiable integrity is achieved through the use of Secure Boot, virtual trusted platform module (vTPM)-enabled Measured Boot, and integrity monitoring.

Managing access to the instances

  • Linux instances:
    • Compute Engine uses key-based SSH authentication to establish connections to Linux virtual machine (VM) instances.
    • By default, local users with passwords aren’t configured on Linux VMs.
    • By default, Compute Engine uses custom project and/or instance metadata to configure SSH keys and to manage SSH access. If OS Login is used, metadata SSH keys are disabled.
    • Managing Instance Access Using OS Login,
      • allows associating SSH keys with the Google Account or Google Workspace account and manage admin or non-admin access to instance through IAM roles.
      • connecting to the instances using the gcloud command-line tool or SSH from the console, Compute Engine can automatically generate SSH keys and apply them to the Google Account or Google Workspace account.
    • Manage the SSH keys in project or instance metadata
      • allows granting admin access to instances with metadata access that do not use OS Login.
      • connecting to the instances using the gcloud command-line tool or SSH from the console, Compute Engine can automatically generate SSH keys and apply them to project metadata.
      • Project-wide public SSH keys
        • give users general access to a Linux instance.
        • give users access to all of the Linux instances in a project that allow project-wide public SSH keys
      • Instance metadata
        • If an instance blocks project-wide public SSH keys, a user can’t use the project-wide public SSH key to connect to the instance unless the same public SSH key is also added to instance metadata
  • On Windows Server instances:
    • Create a password for a Windows Server instance

Compute Engine Images

  • Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images created or imported from existing systems.
    • Public images are provided and maintained by Google, open source communities, and third-party vendors. All Google Cloud projects have access to these images and can use them to create instances.
    • Custom images are available only to the Cloud project. Custom image can be created from boot disks and other images.

Instance Templates

  • An instance template is a resource used to create virtual machine (VM) instances and managed instance groups (MIGs) with identical configuration.
  • Instance templates define the machine type, boot disk image or container image, labels, and other instance properties
  • Instance templates are a convenient way to save a VM instance’s configuration to create VMs or groups of VMs later
  • An instance template is a global resource that is not bound to a zone or a region. However, if some zonal resources are specified in an instance template, which restricts the template to the zone where that resource resides.
  • Labels defined within an instance template are applied to all instances that are created from that instance template. The labels do not apply to the instance template itself.
  • Existing instance template cannot be updated or changed after its created

Instance Groups

Refer blog post @ Compute Engine Instance Groups

Machine Image

  • A machine image is a Compute Engine resource that stores all the configuration, metadata, permissions, and data from one or more disks required to create a virtual machine (VM) instance.

Sole Tenant Nodes

  • Sole-tenancy provides dedicated hosting only for your project’s VM and provides added layer of hardware isolation
  • Sole-tenant nodes ensure that the VMs do not share host hardware with VMs from other projects
  • Each sole-tenant node maintains a one-to-one mapping to the physical server that is backing the node
  • Project has exclusive access to a sole-tenant node, which is a physical Compute Engine server and can be used to keep the VMs physically separated from VMs in other projects, or to group the VMs together on the same host hardware
  • Sole-tenant nodes can help meet dedicated hardware requirements for bring your own license (BYOL) scenarios that require per-core or per-processor licenses

Projects on a multi-tenant host versus a sole-tenant node.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company hosts multiple applications on Compute Engine instances. They want the instances to be resilient to any Host maintenance activities performed on the instance. How would you configure the instances?
    1. Set automaticRestart availability policy to true
    2. Set automaticRestart availability policy to false
    3. Set onHostMaintenance availability policy to migrate instances
    4. Set onHostMaintenance availability policy to terminate instances

References

Google Cloud Engine – Compute Engine documentation

GCP Compute Engine Instance Groups

GCP – Compute Engine Instance Groups

  • An instance group is a collection of virtual machine (VM) instances that can be managed as a single entity.
  • Compute Engine offers two kinds of VM instance groups
    • Managed instance groups (MIGs)
      • allows app creation with multiple identical VMs.
      • workloads can be made scalable and highly available by taking advantage of automated MIG services, including: autoscaling, autohealing, regional (multiple zone) deployment, and automatic updating
    • Unmanaged instance groups
      • allows load balance across a fleet of VMs that you manage yourself which may not be identical

Managed instance groups (MIGs)

  • A MIG creates each of its managed instances based on the instance template and specified optional stateful configuration
  • Managed instance group (MIG) is ideal for scenarios
    • Stateless serving workloads, such as a website frontend
    • Stateless batch, high-performance, or high-throughput compute workloads, such as image processing from a queue
    • Stateful applications, such as databases, legacy applications, and long-running batch computations with check pointing
Use a managed instance group to build highly available deployments for stateless serving, stateful applications, or batch workloads.

High Availability & Autohealing

  • Managed instance groups maintain high availability of the applications by proactively maintaining the number of instances and keeping the instances available, which means in RUNNING state.
  • Application-based autohealing improves application availability by relying on a health checking signal that detects application-specific issues such as freezing, crashing, or overloading.
  • If a health check determines that an application has failed on a VM, the MIG automatically recreates that VM instance.
  • A MIG automatically recreates an instance that is not RUNNING. However, relying only on VM state may not be sufficient. You may want to recreate instances when an application freezes, crashes, or runs out of memory.

Health Checking

  • Managed instance group health checks proactively signal to delete and recreate instances that become UNHEALTHY.
  • Health checks used to monitor MIGs are similar to the health checks used for load balancing, with some differences in behavior.
  • Load balancing health checks help direct traffic away from non-responsive instances and toward healthy instances; these health checks do not cause Compute Engine to recreate instances.

Regional or zonal groups

  • Zonal MIG,
    • deploys instances to a single zone.
  • Regional MIG
    • deploys instances to multiple zones across the same region
    • provides higher availability by spreading application load across multiple zones,
    • protects the workload against zonal failure
    • offer more capacity, with a maximum of 2,000 instances per regional group.

Load balancing

  • MIGs work with load balancing services to distribute traffic across all of the instances in the group.
  • Google Cloud load balancing can use instance groups to serve traffic by adding instance groups to a target pool or to a backend service.

Scalability & Autoscaling

  • MIGs provides scalability and supports autoscaling that dynamically adds or removes instances in response to increases or decreases in load.
  • Autoscaling policy determines how the group would scale which includes scaling based on CPU utilization, Cloud Monitoring metrics, load balancing capacity,  or, for zonal MIGs, by using a queue-based workload like Pub/Sub

Automatic Updating

  • MIG automatic updater supports a flexible range of rollout scenarios to deploy new versions of the software to instances in your MIG such as rolling updates and canary updates.
  • Speed and scope of deployment can be controlled as well as the level of disruption to the service.

Stateful Workloads Support

  • MIGs can be used for building highly available deployments and automating operation of applications with stateful data or configuration, such as databases, DNS servers, legacy monolith applications, or long-running batch computations with checkpointing.
  • Uptime and resiliency of such applications can be improved with autohealing, controlled updates, and multi-zone deployments, while preserving each instance’s unique state, including customizable instance name, persistent disks, and metadata.
  • Stateful MIGs preserve each instance’s unique state (instance name, attached persistent disks, and metadata) on machine restart, recreation, auto-healing, and update events.

Preemptible Instances Groups

  • MIG supports preemptible VM instances, which can help reduce cost.
  • Preemptible instances last up to 24 hours, and are preempted gracefully and the application has 30 seconds to exit correctly.
  • Preemptible instances can be deleted any time, but autohealing will bring the instances back when preemptible capacity becomes available again

Containers

  • MIG supports deployment of containers to container-optimized OS that includes docker, if the instance template used specifies a container image.

Network and subnet

  • Instance template, used with MIG, defines the VPC network and subnet that member instances use.
  • For auto mode VPC networks, the subnet can be omitted ; this instructs GCP to select the automatically-created subnet in the region specified in the template.
  • If VPC network is omitted, GCP attempts to use the VPC network named default.

Unmanaged instance groups

  • Unmanaged instance groups can contain heterogeneous instances that  can be arbitrarily added and removed from the group.
  • Unmanaged instance groups do not offer autoscaling, autohealing, rolling update support, multi-zone support, or the use of instance templates and are not a good fit for deploying highly available and scalable workloads.
  • Use unmanaged instance groups, if load balancing needs to be added to groups of heterogeneous instances, or if you need to manage the instances yourself.