Google Cloud Compute Engine Snapshots

Compute Engine Snapshots

  • Snapshots provide periodic backup of the persistent disks
  • Snapshots incrementally back up data from the persistent disks.
  • Snapshots are global resources, so any snapshot is accessible by any resource within the same project.
  • Snapshots can be shared across projects.
  • Storage costs for persistent disk snapshots charge only for the total size of the snapshot.
  • Snapshots once created with the current state of the disk, can be restored as a new disk.
  • Compute Engine stores multiple copies of each snapshot across multiple locations with automatic checksums to ensure the integrity of the data.
  • Snapshots can be created from disks even while they are attached to running virtual machine (VM) instances.
  • Lifecycle of a snapshot created from a disk attached to a running VM instances is independent of the lifecycle of the VM instance.
  • Snapshots can be stored in either one Cloud Storage multi-regional location, such as asia, or one Cloud Storage regional location, such as asia-south1.
  • A multi-regional storage location provides higher availability and might reduce network costs when creating or restoring a snapshot
  • A snapshot can be used to create a new disk in any region and zone, regardless of the storage location of the snapshot.

Snapshot Creation

  • Snapshots are incremental and automatically compressed, so that they can be regularly created on a persistent disk faster and at a lower cost than regularly creating a full image of the disk.
  • Incremental snapshots work as follows:
    • The first successful snapshot of a persistent disk is a full snapshot that contains all the data on the persistent disk.
    • The second snapshot only contains any new data or modified data since the first snapshot. Data that hasn’t changed since snapshot 1 isn’t included. Instead, snapshot 2 contains references to snapshot 1 for any unchanged data.
    • Snapshot 3 contains any new or changed data since snapshot 2 but won’t contain any unchanged data from snapshot 1 or 2. Instead, snapshot 3 contains references to blocks in snapshot 1 and snapshot 2 for any unchanged data.

Snapshot Deletion

  • Compute Engine uses incremental snapshots so that each snapshot contains only the data that has changed since the previous snapshot.
  • For unchanged data, snapshots reference the data in previous snapshots.
  • When a snapshot is deleted, Compute Engine immediately marks the snapshot as DELETED in the system.
    • If the snapshot has no dependent snapshots, it is deleted outright.
    • However, if the snapshot does have dependent snapshots:
      • Any data that is required for restoring other snapshots is moved into the next snapshot, increasing its size.
      • Any data that is not required for restoring other snapshots is deleted. This lowers the total size of all your snapshots.
      • The next snapshot no longer references the snapshot marked for deletion, and instead references the snapshot before it.
  • Deleting a snapshot does not necessarily delete all the data on the snapshot because subsequent snapshots might require information stored in a previous snapshot, keep in mind that
  • To definitively delete data from the snapshots, you should delete all snapshots.

Snapshot Best Practices

  • If a snapshot is created of the persistent disk while the application is running, the snapshot might not capture pending writes that are in transit from memory to disk. So, prepare disk for consistency
    • Pause application/processes that write data, flush disk buffers
    • Unmount disk completely
    • For windows, use VSS snapshots
    • Use ext4 for linux to reduce the risk that data is cached without actually being written to the persistent disk.
  • Take only one snapshot at a time
  • Schedule snapshot off-peak hours
  • Avoid frequent snapshots, take a snapshot of the disk once per hour. Avoid taking snapshots more often than that. Disk snapshots can be created at most once every 10 minutes.
  • Use snapshot schedules as a best practice to back up your Compute Engine workloads
  • Use multiple persistent disks for large data volume. Larger amounts of data create larger snapshots, which cost more and take longer to create.
  • Run fstrim before snapshot (Linux) to clean up space, as this command removes blocks that the file system no longer needs, so that the system can create the snapshot more quickly and with a smaller size
  • Use image from an infrequently used snapshot, instead of using the snapshot itself

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You have a workload running on Compute Engine that is critical to your business. You want to ensure that the data on the boot disk of this workload is backed up regularly. You need to be able to restore a backup as quickly as possible in case of disaster. You also want older backups to be cleaned automatically to save on cost. You want to follow Google-recommended practices. What should you do?
    1. Create a Cloud Function to create an instance template.
    2. Create a snapshot schedule for the disk using the desired interval.
    3. Create a cron job to create a new disk from the disk using gcloud.
    4. Create a Cloud Task to create an image and export it to Cloud Storage.

References

Google_Cloud_Compute_Engine_Snapshots

Google Cloud Compute Engine Storage Options

Google Cloud Compute Engine Storage Options

Persistent Disk

  • Persistent disks are durable network storage devices that the instances can access like physical disks in a desktop or a server.
  • Persistent disks are used as boot disks
  • Data on each persistent disk is distributed across several physical disks.
  • Compute Engine manages the physical disks and the data distribution to ensure redundancy and optimal performance.
  • Persistent disks are located independently from the VM instances and can be detached or moved to keep the data even after the instance is deleted
  • Persistent disk performance scales automatically with size, so they can be resized or additional ones added to meet the performance and storage space requirements.

Persistent Disk Types

  • Standard persistent disks (pd-standard) are backed by standard hard disk drives (HDD).
  • Balanced persistent disks (pd-balanced) are backed by solid-state drives (SSD). They are an alternative to SSD persistent disks that balance performance and cost.
  • SSD persistent disks (pd-ssd) are backed by solid-state drives (SSD).

Zonal Persistent Disks

  • Zonal persistent disks provide durable storage and replication of data within a single zone in a region.
  • Persistent disks have built-in redundancy to protect the data against equipment failure and to ensure data availability through datacenter maintenance events.
  • For additional space on the persistent disks, resize the disks and resize the single file system rather than repartitioning and formatting.
  • Compute Engine automatically encrypts the data in transit, before it travels outside of the instance to persistent disk storage space.
  • Zonal persistent disk remains encrypted either with system-defined keys or with customer-supplied keys.

Regional Persistent Disks

  • Regional persistent disks provide durable storage and replication of data between two zones in the same region.
  • Regional persistent disks are also designed to work with regional managed instance groups.
  • Zonal outage can be handled by force attaching the disk to the standby instance, even if the disk can’t be detached from the original VM
  • Regional persistent disks are designed for
    • workloads that require a lower RPO and RTO compared to using persistent disk snapshots.
    • write performance is less critical than data redundancy across multiple zones.
  • Regional persistent disks cannot be used with memory-optimized machines and compute-optimized machines.

Local SSD

  • Local SSDs are physically attached to the server that hosts the VM instance.
  • Local SSDs have higher throughput and lower latency than standard persistent disks or SSD persistent disks.
  • Data stored on a local SSD persists only until the instance is stopped or deleted.
  • Local SSD disks cannot be used as boot disks
  • Local SSD disks can be attached only during instance creation, and not once the instance is created
  • Local SSDs performance gains require certain trade-offs in availability, durability, and flexibility. Because of these trade-offs, Local SSD storage isn’t automatically replicated and all data on the local SSD might be lost if the instance terminates for any reason.
  • Each local SSD is 375 GB in size, but a maximum of 24 local SSD partitions can be attached for a total of 9 TB per instance.
  • Compute Engine automatically encrypts the data when it is written to local SSD storage space. Customer-supplied encryption keys is not supported with local SSDs.

Cloud Storage Buckets

  • Cloud Storage buckets are the most flexible, scalable, and durable storage option for the VM instances.
  • Cloud Storage is ideal if you don’t require the lower latency of Persistent Disks and Local SSDs, and can store the data in a Cloud Storage bucket.
  • Performance of Cloud Storage depends on the selected storage class
  • Standard storage class used in the same location as the instance gives performance that is comparable to persistent disks but with higher latency and less consistent throughput characteristics.
  • Cloud Storage buckets have built-in redundancy to protect the data against equipment failure and to ensure data availability through datacenter maintenance events
  • Cloud Storage buckets aren’t restricted to the zone where the instance is located. Multiregional Cloud Storage buckets stores the data redundantly across at least two regions within a larger multiregional location.
  • Cloud Storage bucket can be mounted on the instance as file system
  • Cloud Storage allows read and write data to a bucket from multiple instances simultaneously.
  • However, Cloud Storage buckets are object stores that don’t have the same write constraints as a POSIX file system and can’t be used as boot disks. Multiple instances working on the same file can lead to overwritten data.
  • Cloud Storage supports both encryption at rest and in transit.

Filestore

  • Filestore provides high-performance, fully managed network attached Storage (NAS)  file storage

Storage Options Comparison

Google Cloud Compute Engine Storage Options

Storage Options Performance Comparison

Google Cloud Compute Engine Storage Performance

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

Google Cloud Compute Services Cheat Sheet

Google Cloud Compute Services

Google Cloud - Compute Services Options

Compute Engine

  • is a virtual machine (VM) hosted on Google’s infrastructure.
  • can run the public images for Google provided Linux and Windows Server as well as custom images created or imported from existing systems
  • availability policy determines how it behaves when there is a maintenance event
    • VM instance’s maintenance behavior onHostMaintenance, which determines whether the instance is live migrated MIGRATE (default) or stopped TERMINATE
    • Instance’s restart behavior automaticRestart  which determines whether the instance automatically restarts (default) if it crashes or gets stopped
  • Live migration helps keep the VM instances running even when a host system event, such as a software or hardware update, occurs
  • Preemptible VM is an instance that can be created and run at a much lower price than normal instances, however can be stopped at any time
  • Shielded VM offers verifiable integrity of the Compute Engine VM instances, to confirm the instances haven’t been compromised by boot- or kernel-level malware or rootkits.
  • Instance template is a resource used to create VM instances and managed instance groups (MIGs) with identical configuration
  • Instance group is a collection of virtual machine (VM) instances that can be managed as a single entity.
    • Managed instance groups (MIGs)
      • allows app creation with multiple identical VMs.
      • workloads can be made scalable and highly available by taking advantage of automated MIG services, including: autoscaling, autohealing, regional (multiple zone) deployment, and automatic updating
      • supports rolling update feature
      • works with load balancing services to distribute traffic across all of the instances in the group.
    • Unmanaged instance groups
      • allows load balance across a fleet of VMs that you manage yourself which may not be identical
  • Instance template are global, while instance groups are regional.
  • Machine image stores all the configuration, data, metadata and permissions from one or more disks required to create a VM instance
  • Sole-tenancy provides dedicated hosting only for the project’s VM and provides added layer of hardware isolation
  • deletionProtection prevents accidental VM deletion esp. for VMs running critical workloads and need to be protected
  • provides Sustained Discounts, Committed discounts, free tier etc in pricing

App Engine

  • App Engine helps build highly scalable applications on a fully managed serverless platform
  • Each Cloud project can contain only a single App Engine application
  • App Engine is regional, which means the infrastructure that runs the apps is located in a specific region, and Google manages it so that it is available redundantly across all of the zones within that region
  • App Engine application location or region cannot be changed once created
  • App engine allows traffic management to an application version by migrating or splitting traffic.
    • Traffic Splitting (Canary) – distributes a percentage of traffic to versions of the application.
    • Traffic Migration – smoothly switches request routing
  • Support Standard and Flexible environments
    • Standard environment
      • Application instances that run in a sandbox, using the runtime environment of a supported language only.
      • Sandbox restricts what the application can do
        • only allows the app to use a limited set of binary libraries
        • app cannot write to disk
        • limits the CPU and memory options available to the application
      • Sandbox does not support
        • SSH debugging
        • Background processes
        • Background threads (limited capability)
        • Using Cloud VPN
    • Flexible environment
      • Application instances run within Docker containers on Compute Engine virtual machines (VM).
      • As Flexible environment supports docker it can support custom runtime or source code written in other programming languages.
      • Allows selection of any Compute Engine machine type for instances so that the application has access to more memory and CPU.
  • min_idle_instances indicates the number of additional instances to be kept running and ready to serve traffic for this version.

GKE

Node Pool

GKE
commands
–num-nodes scale cluster –size is deprecated

Google Cloud Compute Engine – GCE

Google Cloud Compute Engine

  • Compute Engine instance is a virtual machine (VM) hosted on Google’s infrastructure.
  • Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images created or imported from existing systems.
  • Docker containers can also be deployed, which are automatically launched on instances running the Container-Optimized OS public image.
  • Each instance belongs to a GCP project, and a project can have one or more instances. When you delete an instance, it is removed from the project.
  • For instance creation, the zone, operating system, and machine type (number of virtual CPUs and the amount of memory) need to be specified.
  • By default, each Compute Engine instance has a small boot persistent disk that contains the OS. Additional storage options can be attached.
  • Each network interface of a Compute Engine instance is associated with a subnet of a unique VPC network.
  • Regardless of the region where the VM instance is created, the default time for the VM instance is Coordinated Universal Time (UTC).

Compute Engine Instance Lifecycle

Instance life cycle.

  • PROVISIONING. Resources are being allocated for the instance. The instance is not running yet.
  • STAGING. Resources have been acquired and the instance is being prepared for the first boot.
  • RUNNING. The instance is booting up or running. You should be able to ssh into the instance soon, but not immediately after it enters this state.
  • REPAIRING – The instance is being repaired because the instance encountered an internal error or the underlying machine is unavailable due to maintenance. During this time, the instance is unusable. If repair is successful, the instance returns to one of the above states.
  • STOPPING: The instance is being stopped because a user has made a request to stop the instance or there was a failure. This is a temporary status and the instance will move to TERMINATED.
  • TERMINATED. A user shut down the instance, or the instance encountered a failure. You can choose to restart the instance or delete it.
  • SUSPENDING The instance is being suspended due to a user action
  • SUSPENDED – Instance is suspended and can be resumed or deleted

GCP Compute Engine Instance Stopping vs Suspending vs Resetting

Compute Engine Machine Types

  • A machine type is a set of virtualized hardware resources available to a virtual machine (VM) instance, including the system memory size, virtual CPU (vCPU) count, and persistent disk limits.
  • Machine types are grouped and curated by families for different workloads

GCP Compute Engine Machine Types

Compute Engine Storage

Refer blog post @ Compute Engine Storage Options

Compute Engine Guest Environment

  • A Guest environment is automatically installed on the VM instance when using Google-provided public images
  • Guest environment is a set of scripts, daemons, and binaries that read the content of the metadata server to make a VM run properly on CE
  • A metadata server is a communication channel for transferring information from a client to the guest operating system.
  • Guest environment can be manually installed on custom images

Compute Engine Instance Availability Policies

  • Compute Engine does regular maintenance of its infrastructure which entails hardware and software updates
  • Google might require to move the VM away from the host undergoing maintenance and Compute Engine automatically manages the scheduling behavior of these instances.
  • Compute Engine instance’s availability policy determines how it behaves when there is a maintenance event
    • Live migrate – move the VM instances to another host machine
    • Stop the instances
  • Instance’s availability policy can be changed by configuring the following two settings:
    • VM instance’s maintenance behavior onHostMaintenance, which determines whether the instance is live migrated MIGRATE (default) or stopped TERMINATE
    • Instance’s restart behavior  automaticRestart  which determines whether the instance automatically restarts (default) if it crashes or gets stopped

Compute Engine Live Migration

  • Live migration helps keep the VM instances running even when a host system event, such as a software or hardware update, occurs
  • Compute Engine live migrates the running instances to another host in the same zone instead of requiring the VMs to be rebooted
  • Live migration allows Google to perform maintenance to keep infrastructure protected and reliable without interrupting any of the VMs.
  • GCP provides a notification to the guest that migration is imminent, when a VM is scheduled to be live migrated
    • Regular infrastructure maintenance and upgrades.
    • Network and power grid maintenance in the data centers.
    • Failed hardware such as memory, CPU, network interface cards, disks, power, and so on. This is done on a best-effort basis; if hardware fails completely or otherwise prevents live migration, the VM crashes and restarts automatically and a hostError is logged.
    • Host OS and BIOS upgrades.
    • Security-related updates, with the need to respond quickly.
    • System configuration changes, including changing the size of the host root partition, for storage of the host image and packages. Live migration keeps the instances running during:
  • Live migration does not change any attributes or properties of the VM including internal and external IP addresses, instance metadata, block storage data and volumes, OS and application state, network settings, network connections, and so on.
  • Compute Engine can also live migrate instances with local SSDs attached, moving the VMs along with their local SSD to a new machine in advance of any planned maintenance.
  • Instances with GPUs attached cannot be live migrated and must be set to stop and optionally restart. Compute Engine offers a 60-minute notice before a VM instance with a GPU attached is stopped
  • Preemptible instance cannot be configured for live migration

Preemptible VM instances

  • A preemptible VM is an instance that can be created and run at a much lower price than normal instances.
  • Compute Engine might stop (preempt) these instances if it requires access to those resources for other tasks.
  • Preemptible VMs are excess Compute Engine capacity, so their availability varies with usage.
  • Preemptible VM are ideal to reduce costs significantly, if the apps are fault-tolerant and can withstand possible instance preemptions or interruptions
  • Preemptible instance limitations
    • might stop preemptible instances at any time due to system events.
    • always stops preemptible instances after they run for 24 hours.
    • are finite GCE resources, so they might not always be available.
    • can’t live migrate to a regular VM instance, or be set to automatically restart when there is a maintenance event.
    • are not covered by any Service Level Agreement
    • GCP Free Tier credits for Compute Engine don’t apply to preemptible instances
  • Preemption process
    • Compute Engine sends a preemption notice to the instance in the form of an ACPI G2 Soft Off signal.
    • Shutdown script can be used to handle the preemption notice and complete cleanup actions before the instance stops
    • If the instance does not stop after 30 seconds, Compute Engine sends an ACPI G3 Mechanical Off signal to the operating system.
    • Compute Engine transitions the instance to a TERMINATED state.
  • Managed Instance group supports Preemptible instances.

Shielded VM

  • Shielded VM offers verifiable integrity of the Compute Engine VM instances, to confirm the instances haven’t been compromised by boot- or kernel-level malware or rootkits.
  • Shielded VM’s verifiable integrity is achieved through the use of Secure Boot, virtual trusted platform module (vTPM)-enabled Measured Boot, and integrity monitoring.

Managing access to the instances

  • Linux instances:
    • Compute Engine uses key-based SSH authentication to establish connections to Linux virtual machine (VM) instances.
    • By default, local users with passwords aren’t configured on Linux VMs.
    • By default, Compute Engine uses custom project and/or instance metadata to configure SSH keys and to manage SSH access. If OS Login is used, metadata SSH keys are disabled.
    • Managing Instance Access Using OS Login,
      • allows associating SSH keys with the Google Account or Google Workspace account and manage admin or non-admin access to the instance through IAM roles.
      • connecting to the instances using the gcloud command-line tool or SSH from the console, Compute Engine can automatically generate SSH keys and apply them to the Google Account or Google Workspace account.
    • Manage the SSH keys in the project or instance metadata
      • allows granting admin access to instances with metadata access that does not use OS Login.
      • connecting to the instances using the gcloud command-line tool or SSH from the console, Compute Engine can automatically generate SSH keys and apply them to project metadata.
      • Project-wide public SSH keys
        • give users general access to a Linux instance.
        • give users access to all of the Linux instances in a project that allows project-wide public SSH keys
      • Instance metadata
        • If an instance blocks project-wide public SSH keys, a user can’t use the project-wide public SSH key to connect to the instance unless the same public SSH key is also added to instance metadata
  • On Windows Server instances:
    • Create a password for a Windows Server instance

Compute Engine Images

  • Compute Engine Images help provide operation system images to create boot disks and application images with preinstalled, configured software
  • Main purpose is to create new instances or configure instance templates
  • Images can be regional  or multi-regional and can be shared and accessed across projects and organizations
  • Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images created or imported from existing systems.
    • Public images
      • provided and maintained by Google, open-source communities, and third-party vendors.
      • All Google Cloud projects have access to these images and can use them to create instances.
    • Custom images
      • are available only to the Cloud project.
      • Custom images can be created from boot disks and other images.
  • Image families
    • help image versioning
    • helps to manage images in the project by grouping related images together, so that they can roll forward and roll back between specific image versions
    • always points to newest latest non-deprecated version
  • Linux images can be exported as a tar.gz file to Cloud Storage
  • Google Cloud supports images with Container-Optimized OS, an OS image for the CE instances optimized for running Docker containers

Instance Templates

  • Instance template is a resource used to create VM instances and managed instance groups (MIGs) with identical configuration
  • Instance templates define the machine type, boot disk image or container image, labels, and other instance properties
  • Instance templates are a convenient way to save a VM instance’s configuration to create VMs or groups of VMs later
  • Instance template is a global resource that is not bound to a zone or a region. However, if some zonal resources are specified in an instance template for e.g. disks, which restricts the template to the zone where that resource resides.
  • Labels defined within an instance template are applied to all instances that are created from that instance template. The labels do not apply to the instance template itself.
  • Existing instance template cannot be updated or changed after its created

Instance Groups

Refer blog post @ Compute Engine Instance Groups

Snapshots

Refer blog post @ Compute Engine Snapshots

Startup & Shutdown Scripts

  • Startup scripts
    • can be added and executed on the VM instances to perform automated tasks every time the instance boots up.
    • can perform actions such as installing software, turning on services, performing updates, and any other tasks defined in the script.
  • Shutdown scripts
    • execute commands right before a VM instance is stopped or restarted.
    • can be useful allowing instances time to clean up or perform tasks, such as exporting logs, or syncing with other systems.
    • are executed only on a best-effort basis
    • have a limited amount of time to finish running before the instance stops i.e. 90 secs for on-demand and 30 secs for Preemptible instances
  • Startup & Shutdown scripts are executed using root user
  • Startup & Shutdown scripts can be provided to the VM instance using
    • local file, supported by gcloud only
    • inline using startup-script or shutdown-script option
    • Cloud Storage URL and startup-script-url or shutdown-script-url as the metadata key, provided the instance has access to the script

Machine Image

  • A machine image is a Compute Engine resource that stores all the configuration, metadata, permissions, and data from one or more disks required to create a virtual machine (VM) instance.

Sole Tenant Nodes

  • Sole-tenancy provides dedicated hosting only for the project’s VM and provides an added layer of hardware isolation
  • Sole-tenant nodes ensure that the VMs do not share host hardware with VMs from other projects
  • Each sole-tenant node maintains a one-to-one mapping to the physical server that is backing the node
  • Project has exclusive access to a sole-tenant node, which is a physical CE server and can be used to keep the VMs physically separated from VMs in other projects or to group the VMs together on the same host hardware
  • Sole-tenant nodes can help meet dedicated hardware requirements for bring your own license (BYOL) scenarios that require per-core or per-processor licenses

Projects on a multi-tenant host versus a sole-tenant node.

Preventing Accidental VM Deletion

  • Accidental VM deletion can be prevented by setting the property deletionProtection on an instance resource esp. for VMs running critical workloads and need to be protected
  • Deletion request fails if a user attempts to delete a VM instance for which the deletionProtection flag is set
  • Only a user granted with compute.instances.create permission can reset the flag to allow the resource to be deleted.
  • Deletion prevention does not prevent the following actions:
    • Terminating an instance from within the VM (such as running the shutdown command)
    • Stopping an instance
    • Resetting an instance
    • Suspending an instance
    • Instances being removed due to fraud and abuse after being detected by Google
    • Instances being removed due to project termination
  • Deletion protection can be applied to both regular and preemptible VMs.
  • Deletion protection cannot be applied to VMs that are part of a managed instance group but can be applied to instances that are part of unmanaged instance groups.
  • Deletion prevention cannot be specified in instance templates.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company hosts multiple applications on Compute Engine instances. They want the instances to be resilient to any Host maintenance activities performed on the instance. How would you configure the instances?
    1. Set automaticRestart availability policy to true
    2. Set automaticRestart availability policy to false
    3. Set onHostMaintenance availability policy to migrate instances
    4. Set onHostMaintenance availability policy to terminate instances

References

Google Cloud Engine – Compute Engine documentation

Google Cloud Compute Engine Instance Groups

Google Cloud – Compute Engine Instance Groups

  • An instance group is a collection of virtual machine (VM) instances that can be managed as a single entity.
  • Compute Engine offers two kinds of VM instance groups
    • Managed instance groups (MIGs)
      • allows app creation with multiple identical VMs.
      • workloads can be made scalable and highly available by taking advantage of automated MIG services, including: autoscaling, autohealing, regional (multiple zones) deployment, and automatic updating
    • Unmanaged instance groups
      • allows load balance across a fleet of self managed nonidentical VMs

Managed instance groups (MIGs)

  • A MIG creates each of its managed instances based on the instance template and specified optional stateful configuration
  • Managed instance group (MIG) is ideal for scenarios
    • Stateless serving workloads, such as a website frontend
    • Stateless batch, high-performance, or high-throughput compute workloads, such as image processing from a queue
    • Stateful applications, such as databases, legacy applications, and long-running batch computations with check pointing
Use a managed instance group to build highly available deployments for stateless serving, stateful applications, or batch workloads.

Health Checking

  • Managed instance group health checks proactively signal to delete and recreate instances that become UNHEALTHY.
  • Load balancing health checks help direct traffic away from non-responsive instances and toward healthy instances; these health checks do not cause Compute Engine to recreate instances.
  • Health checks used to monitor MIGs are similar to the health checks used for load balancing, with some differences in behavior.

High Availability & Autohealing

  • Managed instance groups maintain high availability of the applications by proactively maintaining the number of instances and keeping the instances available, which means in RUNNING state.
  • Application-based autohealing improves application availability by relying on a health checking signal that detects application-specific issues such as freezing, crashing, or overloading.
  • If a health check determines that an application has failed on a VM, the MIG automatically recreates that VM instance.
  • A MIG automatically recreates an instance that is not RUNNING. However, relying only on VM state may not be sufficient and should include check for application freezes, crashes, or runs out of memory.

Regional or Zonal groups

  • Zonal MIG,
    • deploys instances to a single zone.
  • Regional MIG
    • deploys instances to multiple zones across the same region
    • provides higher availability by spreading application load across multiple zones,
    • protects the workload against zonal failure
    • offer more capacity, with a maximum of 2,000 instances per regional group.

Load Balancing

  • MIGs work with load balancing services to distribute traffic across all of the instances in the group.
  • Google Cloud load balancing can use instance groups to serve traffic by adding instance groups to a target pool or to a backend service.

Scalability & Autoscaling

  • MIGs provides scalability and supports autoscaling that dynamically adds or removes instances in response to increases or decreases in load.
  • Autoscaling policy determines how the group would scale which includes scaling based on CPU utilization, Cloud Monitoring metrics, load balancing capacity,  or, for zonal MIGs, by using a queue-based workload like Pub/Sub
  • Autoscaler continuously collects usage information based on the selected utilization metric, compares actual utilization to the desired target utilization, and uses this information to determine whether the group needs to remove instances (scale in) or add instances (scale out).
  • Cool down period
    • is known as the application initialization period
  • Stabilization period
    • For scaling in, the autoscaler calculates the group’s recommended target size based on peak load over the last 10 minutes which is called the Stabilization period
    • Using the stabilization period, the autoscaler ensures that the recommended size for the managed instance group is always sufficient to serve the peak load observed during the previous 10 minutes.
  • Predictive autoscaling
    • helps to optimize your MIG for availability,
    • the autoscaler forecasts future load based on historical data and scales out a MIG in advance of predicted load, so that new instances are ready to serve when the load arrives.

Automatic Updating

  • MIG automatic updater supports a flexible range of rollout scenarios to deploy new versions of the software to instances in the MIG such as rolling updates and canary updates.
  • Speed and scope of deployment can be controlled as well as the level of disruption to the service.

Stateful Workloads Support

  • MIGs can be used for building highly available deployments and automating operation of applications with stateful data or configuration, such as databases, DNS servers, legacy monolith applications, or long-running batch computations with checkpointing.
  • Uptime and resiliency of such applications can be improved with autohealing, controlled updates, and multi-zone deployments, while preserving each instance’s unique state, including customizable instance name, persistent disks, and metadata.
  • Stateful MIGs preserve each instance’s unique state (instance name, attached persistent disks, and metadata) on machine restart, recreation, auto-healing, and update events.

Preemptible Instances Groups

  • MIG supports preemptible VM instances, which can help reduce cost.
  • Preemptible instances last up to 24 hours and are preempted gracefully and the application has 30 seconds to exit correctly.
  • Preemptible instances can be deleted any time, but autohealing will bring the instances back when preemptible capacity becomes available again

Containers

  • MIG supports the deployment of containers to container-optimized OS that includes docker, if the instance template used specifies a container image.

Network and Subnet

  • Instance template, used with MIG, defines the VPC network and subnet that member instances use.
  • For auto mode VPC networks, the subnet can be omitted ; this instructs GCP to select the automatically-created subnet in the region specified in the template.
  • If VPC network is omitted, GCP attempts to use the VPC network named default.

Unmanaged instance groups

  • Unmanaged instance groups can contain heterogeneous instances that can be arbitrarily added and removed from the group.
  • Unmanaged instance groups do not offer autoscaling, autohealing, rolling update support, multi-zone support, or the use of instance templates and are not a good fit for deploying highly available and scalable workloads.
  • Use unmanaged instance groups, if load balancing needs to be added to groups of heterogeneous instances, or needs self managed instances

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company’s test suite is a custom C++ application that runs tests throughout each day on Linux virtual machines. The full
    test suite takes several hours to complete, running on a limited number of on-premises servers reserved for testing. Your company
    wants to move the testing infrastructure to the cloud, to reduce the amount of time it takes to fully test a change to the system,
    while changing the tests as little as possible. Which cloud infrastructure should you recommend?

    1. Google Compute Engine unmanaged instance groups and Network Load Balancer.
    2. Google Compute Engine managed instance groups with auto-scaling.
    3. Google Cloud Dataproc to run Apache Hadoop jobs to process each test.
    4. Google App Engine with Google Stackdriver for logging.
  2. Your company has a set of compute engine instances that would be hosting production-based applications. These applications
    would be running 24×7 throughout the year. You need to implement the cost-effective, scalable and high availability solution even
    if a zone fails. How would you design the solution?

    1. Use Managed instance groups with preemptible instances across multiple zones
    2. Use Managed instance groups across multiple zones
    3. Use managed instance groups with instances in a single zone
    4. Use Unmanaged instance groups across multiple zones