Google Cloud Compute Engine Snapshots

Compute Engine Snapshots

  • Snapshots provide periodic backup of the persistent disks
  • Snapshots incrementally back up data from the persistent disks.
  • Snapshots are global resources, so any snapshot is accessible by any resource within the same project.
  • Snapshots can be shared across projects.
  • Storage costs for persistent disk snapshots charge only for the total size of the snapshot.
  • Snapshots once created with the current state of the disk, can be restored as a new disk.
  • Compute Engine stores multiple copies of each snapshot across multiple locations with automatic checksums to ensure the integrity of the data.
  • Snapshots can be created from disks even while they are attached to running virtual machine (VM) instances.
  • Lifecycle of a snapshot created from a disk attached to a running VM instances is independent of the lifecycle of the VM instance.
  • Snapshots can be stored in either one Cloud Storage multi-regional location, such as asia, or one Cloud Storage regional location, such as asia-south1.
  • A multi-regional storage location provides higher availability and might reduce network costs when creating or restoring a snapshot
  • A snapshot can be used to create a new disk in any region and zone, regardless of the storage location of the snapshot.

Snapshot Creation

  • Snapshots are incremental and automatically compressed, so that they can be regularly created on a persistent disk faster and at a lower cost than regularly creating a full image of the disk.
  • Incremental snapshots work as follows:
    • The first successful snapshot of a persistent disk is a full snapshot that contains all the data on the persistent disk.
    • The second snapshot only contains any new data or modified data since the first snapshot. Data that hasn’t changed since snapshot 1 isn’t included. Instead, snapshot 2 contains references to snapshot 1 for any unchanged data.
    • Snapshot 3 contains any new or changed data since snapshot 2 but won’t contain any unchanged data from snapshot 1 or 2. Instead, snapshot 3 contains references to blocks in snapshot 1 and snapshot 2 for any unchanged data.

Snapshot Deletion

  • Compute Engine uses incremental snapshots so that each snapshot contains only the data that has changed since the previous snapshot.
  • For unchanged data, snapshots reference the data in previous snapshots.
  • When a snapshot is deleted, Compute Engine immediately marks the snapshot as DELETED in the system.
    • If the snapshot has no dependent snapshots, it is deleted outright.
    • However, if the snapshot does have dependent snapshots:
      • Any data that is required for restoring other snapshots is moved into the next snapshot, increasing its size.
      • Any data that is not required for restoring other snapshots is deleted. This lowers the total size of all your snapshots.
      • The next snapshot no longer references the snapshot marked for deletion, and instead references the snapshot before it.
  • Deleting a snapshot does not necessarily delete all the data on the snapshot because subsequent snapshots might require information stored in a previous snapshot, keep in mind that
  • To definitively delete data from the snapshots, you should delete all snapshots.

Snapshot Best Practices

  • If a snapshot is created of the persistent disk while the application is running, the snapshot might not capture pending writes that are in transit from memory to disk. So, prepare disk for consistency
    • Pause application/processes that write data, flush disk buffers
    • Unmount disk completely
    • For windows, use VSS snapshots
    • Use ext4 for linux to reduce the risk that data is cached without actually being written to the persistent disk.
  • Take only one snapshot at a time
  • Schedule snapshot off-peak hours
  • Avoid frequent snapshots, take a snapshot of the disk once per hour. Avoid taking snapshots more often than that. Disk snapshots can be created at most once every 10 minutes.
  • Use snapshot schedules as a best practice to back up your Compute Engine workloads
  • Use multiple persistent disks for large data volume. Larger amounts of data create larger snapshots, which cost more and take longer to create.
  • Run fstrim before snapshot (Linux) to clean up space, as this command removes blocks that the file system no longer needs, so that the system can create the snapshot more quickly and with a smaller size
  • Use image from an infrequently used snapshot, instead of using the snapshot itself

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You have a workload running on Compute Engine that is critical to your business. You want to ensure that the data on the boot disk of this workload is backed up regularly. You need to be able to restore a backup as quickly as possible in case of disaster. You also want older backups to be cleaned automatically to save on cost. You want to follow Google-recommended practices. What should you do?
    1. Create a Cloud Function to create an instance template.
    2. Create a snapshot schedule for the disk using the desired interval.
    3. Create a cron job to create a new disk from the disk using gcloud.
    4. Create a Cloud Task to create an image and export it to Cloud Storage.