Google Cloud Shared VPC

Google Cloud Shared VPC

  • Shared VPC allows an organization to connect resources from multiple projects to a common VPC network to communicate with each other securely and efficiently using internal IPs from that network.
  • requires designating a project as a host project and attach one or more other service projects to it.
  • allows organization administrators to delegate administrative responsibilities, such as creating and managing instances, to Service Project Admins while maintaining centralized control over network resources like subnets, routes, and firewalls.
  • allows you to
    • implement a security best practice of least privilege for network administration, auditing, and access control.
    • apply and enforce consistent access control policies at the network level for multiple service projects in the organization while delegating administrative responsibilities
    • use service projects to separate budgeting or internal cost centers.

Shared VPC Concepts

GCP Shared VPC - Multiple host projects

  • Shared VPC connects projects within the same organization. Participating host and service projects cannot belong to different organizations
  • Linked projects can be in the same or different folders, but if they are in different folders the admin must have Shared VPC Admin rights to both folders
  • Each project in Shared VPC is either a host project or a service project
    • host project contains one or more Shared VPC networks. A Shared VPC Admin must first enable a project as a host project. After that, a Shared VPC Admin can attach one or more service projects to it.
    • service project is any project that has been attached to a host project by a Shared VPC Admin. This attachment allows it to participate in Shared VPC.
  • A project cannot be both a host and a service project simultaneously. Thus, a service project cannot be a host project to further service projects.
  • Multiple host projects can be created; however, each service project can only be attached to a single host project.
  • A project that does not participate in Shared VPC is called a standalone project.
  • VPC networks in the host project are called Shared VPC networks. Service projects resources can use subnets in the Shared VPC network
  • Shared VPC networks can be either auto or custom mode, but legacy networks are not supported.
  • Host and service projects are connected by attachments at the project level.
  • Subnets of the Shared VPC networks in the host project are accessible by Service Project Admins
  • Organization policies and IAM permissions work together to provide different levels of access control.
  • Organization policies enable setting controls at the organization, folder, or project level.

IAM Roles

Administrator (IAM role) Purpose
Organization Admin Organization Admins nominate Shared VPC Admins by granting them appropriate project creation and deletion roles, and the Shared VPC Admin role for the organization. These admins can define organization-level policies, but specific folder and project actions require additional folder and project roles.
Shared VPC Admin Shared VPC Admins have the Compute Shared VPC Admin and Project IAM Admin roles for the organization or one or more folders. They perform various tasks necessary to set up Shared VPC, such as enabling host projects, attaching service projects to host projects, and delegating access to some or all of the subnets in Shared VPC networks to Service Project Admins. A Shared VPC Admin for a given host project is typically its project owner as well.
A Shared VPC Admin can link projects in two different folders only if the admin has the role for both folders.
Service Project Admin A Shared VPC Admin defines a Service Project Admin by granting an IAM member the Network User role to either the whole host project or select subnets of its Shared VPC networks. Service Project Admins also maintain ownership and control over resources defined in the service projects, so they should have the Instance Admin role in the corresponding service projects. They may have additional IAM roles to the service projects, such as project owner.

Cloud Interconnect with Shared VPC

  • Shared VPC can help share the VLAN attachment in a project with other VPC networks.
  • Shared VPC is preferable if you need to create many projects and would like to prevent individual project owners from managing their connectivity back to the on-premises network.
  • Host project contains a common Shared VPC network that VMs in service projects can use. Because VMs in service projects use this network, Service Project Admins don’t need to create other VLAN attachments or Cloud Routers in the service projects.
  • VLAN attachments and Cloud Routers for an Interconnect connection must be created only in the Shared VPC host project.
  • The combination of a VLAN attachment and its associated Cloud Router is unique to a given Shared VPC network.
  • Service Project Admins can create VMs in subnets that exist in a host project’s Shared VPC network based on the permissions that they have to the host project.
  • VMs that use the Shared VPC network can use the custom dynamic routes for VLAN attachments available to that network.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company is building a large-scale web application. Each team is responsible for its own service component of the application
    and wants to manage its own individual projects. You want each service to communicate with the others over the RFC1918 address
    space. What should you do?

    1. Deploy each service into a single project within the same VPC.
    2. Configure Shared VPC, and add each project as a service of the Shared VPC project.
    3. Configure each service to communicate with the others over HTTPS protocol.
    4. Configure a global load balancer for each project, and communicate between each service using the global load balancer IP
      addresses.
  2. Where should you create the Cloud Router instance in a Shared VPC to allow connection from service projects across a new Dedicated Interconnect to your data center?
    1. VPC network in all projects
    2. VPC network in the IT Project
    3. VPC network in the Host Project
    4. VPC network in the Sales, Marketing, and IT Projects

Reference

GCP Virtual Private Cloud – Shared VPC

Google Cloud VPC Peering

Google Cloud VPC Peering

  • Google Cloud VPC Network Peering allows internal IP address or private connectivity across two VPC networks regardless of whether they belong to the same project or the same organization.
  • VPC Network Peering enables VPC networks connection, so that workloads in different VPC networks can communicate internally.
  • Traffic stays within Google’s network and doesn’t traverse public internet
  • VPC Network Peering provides following advantages over using external IP addresses or VPNs to connect networks, including:
    • Network Latency – connectivity uses only internal addresses and provides lower latency than connectivity that uses external addresses
    • Network Security – service owners do not need to have their services exposed to the public Internet and deal with its associated risks.
    • Network Cost – GCP charges egress bandwidth or outbound traffic for networks using external IPs to communicate even if the traffic is within the same zone. However, for peered networks as they use internal IPs to communicate and save on those egress costs.
  • VPC Network Peering is useful in these environments:
    • SaaS (Software-as-a-Service) ecosystems in Google Cloud, which can be made available privately across different VPC networks within and across organizations.
    • Organizations that have several network administrative domains that need to communicate using internal IP addresses.

VPC Peering Properties

  • VPC Network Peering works with Compute Engine, GKE, and App Engine flexible environment.
  • Peered VPC networks remain administratively separate. Routes, firewalls, VPNs, and other traffic management tools are administered and applied separately in each of the VPC networks.
  • Each side of a peering association is set up independently. Peering will be active only when the configuration from both sides matches. Either side can choose to delete the peering association at any time.
  • VPC peers always exchange subnet routes that don’t use privately used public IP addresses. Networks must explicitly export privately used public IP subnet routes for other networks to use them and must explicitly import privately used public IP subnet routes to receive them from other networks
  • Subnet and static routes are global. Dynamic routes can be regional or global, depending on the VPC network’s dynamic routing mode.
  • VPC network can peer with multiple VPC networks (limit of 25 currently)
  • IAM permissions for creating and deleting VPC Network Peering are included as part of the Compute Network Admin role.
  • Peering traffic (traffic flowing between peered networks) has the same latency, throughput, and availability as private traffic in the same network.
  • Billing policy for peering traffic is the same as the billing policy for private traffic in the same network.
  • Peering is allowed with Shared VPC
  • An organization policy administrator can use an organization policy to constrain which VPC networks can peer with VPC networks in the organization. Peering connections to particular VPC networks or to VPC networks in a particular folder or organization can be denied. The constraint applies to new peering configurations and doesn’t affect existing connections. An existing peering connection can continue to work even if a new policy denies new connections.

VPC Peering Restrictions

  • A subnet CIDR range in one peered VPC network cannot overlap with a static route in another peered network. This rule covers both subnet routes and static routes

GCP VPC Peering - Overlapping Subnet IP ranges between two peers

  • A dynamic route can overlap with a subnet route in a peer network. For dynamic routes, the destination ranges that overlap with a subnet route from the peer network are silently dropped. Google Cloud uses the subnet route
  • Only VPC networks are supported for VPC Network Peering. Peering is NOT supported for legacy networks.
  • Subnet route exchange can’t be disabled or subnet routes that can be exchanged cannot be selected. After peering is established, all resources within subnet IP addresses are accessible across directly peered networks.
  • VPC Network Peering doesn’t provide granular route controls to filter out which subnet CIDR ranges are reachable across peered networks. It needs to be done using firewall rules
  • Transitive peering is NOT supported.
  • Tags or service account from one peered network in the other peered network CANNOT be used.
  • Compute Engine internal DNS names created in a network are NOT accessible to peered networks. Use the IP address instead.
  • By default, VPC Network Peering with GKE is supported when used with IP aliases. If you don’t use IP aliases, custom routes can be exported so that GKE containers are reachable from peered networks.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company is working with a partner to provide a solution for a customer. Both your company and the partner organization are
    using GCP. There are applications in the partner’s network that need access to some resources in your company’s VPC. There is
    no CIDR overlap between the VPCs.Which two solutions can you implement to achieve the desired results without compromising
    security?

    1. VPC peering
    2. Shared VPC
    3. Dedicated Interconnect
    4. Cloud NAT
  2. Your organization is deploying a single project for 3 separate departments. Two of these departments require network
    connectivity between each other, but the third department should remain in isolation. Your design should create separate network
    administrative domains between these departments. You want to minimize operational overhead. How should you design the
    topology?

    1. Create a Shared VPC Host Project and the respective Service Projects for each of the 3 separate departments.
    2. Create 3 separate VPCs, and use Cloud VPN to establish connectivity between the two appropriate VPCs.
    3. Create 3 separate VPCs, and use VPC peering to establish connectivity between the two appropriate VPCs.
    4. Create a single project, and deploy specific firewall rules. Use network tags to isolate access between the departments.

Reference

Google Cloud – VPC Peering

Google Cloud Compute Engine – GCE

Google Cloud Compute Engine

  • Compute Engine instance is a virtual machine (VM) hosted on Google’s infrastructure.
  • Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images created or imported from existing systems.
  • Docker containers can also be deployed, which are automatically launched on instances running the Container-Optimized OS public image.
  • Each instance belongs to a GCP project, and a project can have one or more instances. When you delete an instance, it is removed from the project.
  • For instance creation, the zone, operating system, and machine type (number of virtual CPUs and the amount of memory) need to be specified.
  • By default, each Compute Engine instance has a small boot persistent disk that contains the OS. Additional storage options can be attached.
  • Each network interface of a Compute Engine instance is associated with a subnet of a unique VPC network.
  • Regardless of the region where the VM instance is created, the default time for the VM instance is Coordinated Universal Time (UTC).

Compute Engine Instance Lifecycle

Instance life cycle.

  • PROVISIONING. Resources are being allocated for the instance. The instance is not running yet.
  • STAGING. Resources have been acquired and the instance is being prepared for the first boot.
  • RUNNING. The instance is booting up or running. You should be able to ssh into the instance soon, but not immediately after it enters this state.
  • REPAIRING – The instance is being repaired because the instance encountered an internal error or the underlying machine is unavailable due to maintenance. During this time, the instance is unusable. If repair is successful, the instance returns to one of the above states.
  • STOPPING: The instance is being stopped because a user has made a request to stop the instance or there was a failure. This is a temporary status and the instance will move to TERMINATED.
  • TERMINATED. A user shut down the instance, or the instance encountered a failure. You can choose to restart the instance or delete it.
  • SUSPENDING The instance is being suspended due to a user action
  • SUSPENDED – Instance is suspended and can be resumed or deleted

GCP Compute Engine Instance Stopping vs Suspending vs Resetting

Compute Engine Machine Types

  • A machine type is a set of virtualized hardware resources available to a virtual machine (VM) instance, including the system memory size, virtual CPU (vCPU) count, and persistent disk limits.
  • Machine types are grouped and curated by families for different workloads

GCP Compute Engine Machine Types

Compute Engine Storage

Refer blog post @ Compute Engine Storage Options

Compute Engine Guest Environment

  • A Guest environment is automatically installed on the VM instance when using Google-provided public images
  • Guest environment is a set of scripts, daemons, and binaries that read the content of the metadata server to make a VM run properly on CE
  • A metadata server is a communication channel for transferring information from a client to the guest operating system.
  • Guest environment can be manually installed on custom images

Compute Engine Instance Availability Policies

  • Compute Engine does regular maintenance of its infrastructure which entails hardware and software updates
  • Google might require to move the VM away from the host undergoing maintenance and Compute Engine automatically manages the scheduling behavior of these instances.
  • Compute Engine instance’s availability policy determines how it behaves when there is a maintenance event
    • Live migrate – move the VM instances to another host machine
    • Stop the instances
  • Instance’s availability policy can be changed by configuring the following two settings:
    • VM instance’s maintenance behavior onHostMaintenance, which determines whether the instance is live migrated MIGRATE (default) or stopped TERMINATE
    • Instance’s restart behavior  automaticRestart  which determines whether the instance automatically restarts (default) if it crashes or gets stopped

Compute Engine Live Migration

  • Live migration helps keep the VM instances running even when a host system event, such as a software or hardware update, occurs
  • Compute Engine live migrates the running instances to another host in the same zone instead of requiring the VMs to be rebooted
  • Live migration allows Google to perform maintenance to keep infrastructure protected and reliable without interrupting any of the VMs.
  • GCP provides a notification to the guest that migration is imminent, when a VM is scheduled to be live migrated
    • Regular infrastructure maintenance and upgrades.
    • Network and power grid maintenance in the data centers.
    • Failed hardware such as memory, CPU, network interface cards, disks, power, and so on. This is done on a best-effort basis; if hardware fails completely or otherwise prevents live migration, the VM crashes and restarts automatically and a hostError is logged.
    • Host OS and BIOS upgrades.
    • Security-related updates, with the need to respond quickly.
    • System configuration changes, including changing the size of the host root partition, for storage of the host image and packages. Live migration keeps the instances running during:
  • Live migration does not change any attributes or properties of the VM including internal and external IP addresses, instance metadata, block storage data and volumes, OS and application state, network settings, network connections, and so on.
  • Compute Engine can also live migrate instances with local SSDs attached, moving the VMs along with their local SSD to a new machine in advance of any planned maintenance.
  • Instances with GPUs attached cannot be live migrated and must be set to stop and optionally restart. Compute Engine offers a 60-minute notice before a VM instance with a GPU attached is stopped
  • Preemptible instance cannot be configured for live migration

Preemptible VM instances

  • A preemptible VM is an instance that can be created and run at a much lower price than normal instances.
  • Compute Engine might stop (preempt) these instances if it requires access to those resources for other tasks.
  • Preemptible VMs are excess Compute Engine capacity, so their availability varies with usage.
  • Preemptible VM are ideal to reduce costs significantly, if the apps are fault-tolerant and can withstand possible instance preemptions or interruptions
  • Preemptible instance limitations
    • might stop preemptible instances at any time due to system events.
    • always stops preemptible instances after they run for 24 hours.
    • are finite GCE resources, so they might not always be available.
    • can’t live migrate to a regular VM instance, or be set to automatically restart when there is a maintenance event.
    • are not covered by any Service Level Agreement
    • GCP Free Tier credits for Compute Engine don’t apply to preemptible instances
  • Preemption process
    • Compute Engine sends a preemption notice to the instance in the form of an ACPI G2 Soft Off signal.
    • Shutdown script can be used to handle the preemption notice and complete cleanup actions before the instance stops
    • If the instance does not stop after 30 seconds, Compute Engine sends an ACPI G3 Mechanical Off signal to the operating system.
    • Compute Engine transitions the instance to a TERMINATED state.
  • Managed Instance group supports Preemptible instances.

Shielded VM

  • Shielded VM offers verifiable integrity of the Compute Engine VM instances, to confirm the instances haven’t been compromised by boot- or kernel-level malware or rootkits.
  • Shielded VM’s verifiable integrity is achieved through the use of Secure Boot, virtual trusted platform module (vTPM)-enabled Measured Boot, and integrity monitoring.

Managing access to the instances

  • Linux instances:
    • Compute Engine uses key-based SSH authentication to establish connections to Linux virtual machine (VM) instances.
    • By default, local users with passwords aren’t configured on Linux VMs.
    • By default, Compute Engine uses custom project and/or instance metadata to configure SSH keys and to manage SSH access. If OS Login is used, metadata SSH keys are disabled.
    • Managing Instance Access Using OS Login,
      • allows associating SSH keys with the Google Account or Google Workspace account and manage admin or non-admin access to the instance through IAM roles.
      • connecting to the instances using the gcloud command-line tool or SSH from the console, Compute Engine can automatically generate SSH keys and apply them to the Google Account or Google Workspace account.
    • Manage the SSH keys in the project or instance metadata
      • allows granting admin access to instances with metadata access that does not use OS Login.
      • connecting to the instances using the gcloud command-line tool or SSH from the console, Compute Engine can automatically generate SSH keys and apply them to project metadata.
      • Project-wide public SSH keys
        • give users general access to a Linux instance.
        • give users access to all of the Linux instances in a project that allows project-wide public SSH keys
      • Instance metadata
        • If an instance blocks project-wide public SSH keys, a user can’t use the project-wide public SSH key to connect to the instance unless the same public SSH key is also added to instance metadata
  • On Windows Server instances:
    • Create a password for a Windows Server instance

Compute Engine Images

  • Compute Engine Images help provide operation system images to create boot disks and application images with preinstalled, configured software
  • Main purpose is to create new instances or configure instance templates
  • Images can be regional  or multi-regional and can be shared and accessed across projects and organizations
  • Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images created or imported from existing systems.
    • Public images
      • provided and maintained by Google, open-source communities, and third-party vendors.
      • All Google Cloud projects have access to these images and can use them to create instances.
    • Custom images
      • are available only to the Cloud project.
      • Custom images can be created from boot disks and other images.
  • Image families
    • help image versioning
    • helps to manage images in the project by grouping related images together, so that they can roll forward and roll back between specific image versions
    • always points to newest latest non-deprecated version
  • Linux images can be exported as a tar.gz file to Cloud Storage
  • Google Cloud supports images with Container-Optimized OS, an OS image for the CE instances optimized for running Docker containers

Instance Templates

  • Instance template is a resource used to create VM instances and managed instance groups (MIGs) with identical configuration
  • Instance templates define the machine type, boot disk image or container image, labels, and other instance properties
  • Instance templates are a convenient way to save a VM instance’s configuration to create VMs or groups of VMs later
  • Instance template is a global resource that is not bound to a zone or a region. However, if some zonal resources are specified in an instance template for e.g. disks, which restricts the template to the zone where that resource resides.
  • Labels defined within an instance template are applied to all instances that are created from that instance template. The labels do not apply to the instance template itself.
  • Existing instance template cannot be updated or changed after its created

Instance Groups

Refer blog post @ Compute Engine Instance Groups

Snapshots

Refer blog post @ Compute Engine Snapshots

Startup & Shutdown Scripts

  • Startup scripts
    • can be added and executed on the VM instances to perform automated tasks every time the instance boots up.
    • can perform actions such as installing software, turning on services, performing updates, and any other tasks defined in the script.
  • Shutdown scripts
    • execute commands right before a VM instance is stopped or restarted.
    • can be useful allowing instances time to clean up or perform tasks, such as exporting logs, or syncing with other systems.
    • are executed only on a best-effort basis
    • have a limited amount of time to finish running before the instance stops i.e. 90 secs for on-demand and 30 secs for Preemptible instances
  • Startup & Shutdown scripts are executed using root user
  • Startup & Shutdown scripts can be provided to the VM instance using
    • local file, supported by gcloud only
    • inline using startup-script or shutdown-script option
    • Cloud Storage URL and startup-script-url or shutdown-script-url as the metadata key, provided the instance has access to the script

Machine Image

  • A machine image is a Compute Engine resource that stores all the configuration, metadata, permissions, and data from one or more disks required to create a virtual machine (VM) instance.

Sole Tenant Nodes

  • Sole-tenancy provides dedicated hosting only for the project’s VM and provides an added layer of hardware isolation
  • Sole-tenant nodes ensure that the VMs do not share host hardware with VMs from other projects
  • Each sole-tenant node maintains a one-to-one mapping to the physical server that is backing the node
  • Project has exclusive access to a sole-tenant node, which is a physical CE server and can be used to keep the VMs physically separated from VMs in other projects or to group the VMs together on the same host hardware
  • Sole-tenant nodes can help meet dedicated hardware requirements for bring your own license (BYOL) scenarios that require per-core or per-processor licenses

Projects on a multi-tenant host versus a sole-tenant node.

Preventing Accidental VM Deletion

  • Accidental VM deletion can be prevented by setting the property deletionProtection on an instance resource esp. for VMs running critical workloads and need to be protected
  • Deletion request fails if a user attempts to delete a VM instance for which the deletionProtection flag is set
  • Only a user granted with compute.instances.create permission can reset the flag to allow the resource to be deleted.
  • Deletion prevention does not prevent the following actions:
    • Terminating an instance from within the VM (such as running the shutdown command)
    • Stopping an instance
    • Resetting an instance
    • Suspending an instance
    • Instances being removed due to fraud and abuse after being detected by Google
    • Instances being removed due to project termination
  • Deletion protection can be applied to both regular and preemptible VMs.
  • Deletion protection cannot be applied to VMs that are part of a managed instance group but can be applied to instances that are part of unmanaged instance groups.
  • Deletion prevention cannot be specified in instance templates.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company hosts multiple applications on Compute Engine instances. They want the instances to be resilient to any Host maintenance activities performed on the instance. How would you configure the instances?
    1. Set automaticRestart availability policy to true
    2. Set automaticRestart availability policy to false
    3. Set onHostMaintenance availability policy to migrate instances
    4. Set onHostMaintenance availability policy to terminate instances

References

Google Cloud Engine – Compute Engine documentation

Google Cloud Compute Engine Instance Groups

Google Cloud – Compute Engine Instance Groups

  • An instance group is a collection of virtual machine (VM) instances that can be managed as a single entity.
  • Compute Engine offers two kinds of VM instance groups
    • Managed instance groups (MIGs)
      • allows app creation with multiple identical VMs.
      • workloads can be made scalable and highly available by taking advantage of automated MIG services, including: autoscaling, autohealing, regional (multiple zones) deployment, and automatic updating
    • Unmanaged instance groups
      • allows load balance across a fleet of self managed nonidentical VMs

Managed instance groups (MIGs)

  • A MIG creates each of its managed instances based on the instance template and specified optional stateful configuration
  • Managed instance group (MIG) is ideal for scenarios
    • Stateless serving workloads, such as a website frontend
    • Stateless batch, high-performance, or high-throughput compute workloads, such as image processing from a queue
    • Stateful applications, such as databases, legacy applications, and long-running batch computations with check pointing
Use a managed instance group to build highly available deployments for stateless serving, stateful applications, or batch workloads.

Health Checking

  • Managed instance group health checks proactively signal to delete and recreate instances that become UNHEALTHY.
  • Load balancing health checks help direct traffic away from non-responsive instances and toward healthy instances; these health checks do not cause Compute Engine to recreate instances.
  • Health checks used to monitor MIGs are similar to the health checks used for load balancing, with some differences in behavior.

High Availability & Autohealing

  • Managed instance groups maintain high availability of the applications by proactively maintaining the number of instances and keeping the instances available, which means in RUNNING state.
  • Application-based autohealing improves application availability by relying on a health checking signal that detects application-specific issues such as freezing, crashing, or overloading.
  • If a health check determines that an application has failed on a VM, the MIG automatically recreates that VM instance.
  • A MIG automatically recreates an instance that is not RUNNING. However, relying only on VM state may not be sufficient and should include check for application freezes, crashes, or runs out of memory.

Regional or Zonal groups

  • Zonal MIG,
    • deploys instances to a single zone.
  • Regional MIG
    • deploys instances to multiple zones across the same region
    • provides higher availability by spreading application load across multiple zones,
    • protects the workload against zonal failure
    • offer more capacity, with a maximum of 2,000 instances per regional group.

Load Balancing

  • MIGs work with load balancing services to distribute traffic across all of the instances in the group.
  • Google Cloud load balancing can use instance groups to serve traffic by adding instance groups to a target pool or to a backend service.

Scalability & Autoscaling

  • MIGs provides scalability and supports autoscaling that dynamically adds or removes instances in response to increases or decreases in load.
  • Autoscaling policy determines how the group would scale which includes scaling based on CPU utilization, Cloud Monitoring metrics, load balancing capacity,  or, for zonal MIGs, by using a queue-based workload like Pub/Sub
  • Autoscaler continuously collects usage information based on the selected utilization metric, compares actual utilization to the desired target utilization, and uses this information to determine whether the group needs to remove instances (scale in) or add instances (scale out).
  • Cool down period
    • is known as the application initialization period
  • Stabilization period
    • For scaling in, the autoscaler calculates the group’s recommended target size based on peak load over the last 10 minutes which is called the Stabilization period
    • Using the stabilization period, the autoscaler ensures that the recommended size for the managed instance group is always sufficient to serve the peak load observed during the previous 10 minutes.
  • Predictive autoscaling
    • helps to optimize your MIG for availability,
    • the autoscaler forecasts future load based on historical data and scales out a MIG in advance of predicted load, so that new instances are ready to serve when the load arrives.

Automatic Updating

  • MIG automatic updater supports a flexible range of rollout scenarios to deploy new versions of the software to instances in the MIG such as rolling updates and canary updates.
  • Speed and scope of deployment can be controlled as well as the level of disruption to the service.

Stateful Workloads Support

  • MIGs can be used for building highly available deployments and automating operation of applications with stateful data or configuration, such as databases, DNS servers, legacy monolith applications, or long-running batch computations with checkpointing.
  • Uptime and resiliency of such applications can be improved with autohealing, controlled updates, and multi-zone deployments, while preserving each instance’s unique state, including customizable instance name, persistent disks, and metadata.
  • Stateful MIGs preserve each instance’s unique state (instance name, attached persistent disks, and metadata) on machine restart, recreation, auto-healing, and update events.

Preemptible Instances Groups

  • MIG supports preemptible VM instances, which can help reduce cost.
  • Preemptible instances last up to 24 hours and are preempted gracefully and the application has 30 seconds to exit correctly.
  • Preemptible instances can be deleted any time, but autohealing will bring the instances back when preemptible capacity becomes available again

Containers

  • MIG supports the deployment of containers to container-optimized OS that includes docker, if the instance template used specifies a container image.

Network and Subnet

  • Instance template, used with MIG, defines the VPC network and subnet that member instances use.
  • For auto mode VPC networks, the subnet can be omitted ; this instructs GCP to select the automatically-created subnet in the region specified in the template.
  • If VPC network is omitted, GCP attempts to use the VPC network named default.

Unmanaged instance groups

  • Unmanaged instance groups can contain heterogeneous instances that can be arbitrarily added and removed from the group.
  • Unmanaged instance groups do not offer autoscaling, autohealing, rolling update support, multi-zone support, or the use of instance templates and are not a good fit for deploying highly available and scalable workloads.
  • Use unmanaged instance groups, if load balancing needs to be added to groups of heterogeneous instances, or needs self managed instances

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company’s test suite is a custom C++ application that runs tests throughout each day on Linux virtual machines. The full
    test suite takes several hours to complete, running on a limited number of on-premises servers reserved for testing. Your company
    wants to move the testing infrastructure to the cloud, to reduce the amount of time it takes to fully test a change to the system,
    while changing the tests as little as possible. Which cloud infrastructure should you recommend?

    1. Google Compute Engine unmanaged instance groups and Network Load Balancer.
    2. Google Compute Engine managed instance groups with auto-scaling.
    3. Google Cloud Dataproc to run Apache Hadoop jobs to process each test.
    4. Google App Engine with Google Stackdriver for logging.
  2. Your company has a set of compute engine instances that would be hosting production-based applications. These applications
    would be running 24×7 throughout the year. You need to implement the cost-effective, scalable and high availability solution even
    if a zone fails. How would you design the solution?

    1. Use Managed instance groups with preemptible instances across multiple zones
    2. Use Managed instance groups across multiple zones
    3. Use managed instance groups with instances in a single zone
    4. Use Unmanaged instance groups across multiple zones

Google Cloud Resource Manager

Google Cloud Resource Manager

Google Cloud Platform – Resource manager help manage resource containers such as organizations, folders, and projects that allow you to group and hierarchically organize other GCP resources

Resource Hierarchy

Google Cloud Resource Hierarchy

Organizations

  • Organization resource is the root node in the Google Cloud resource hierarchy and is the hierarchical supernode and ancestor of project resources and folders.
  • Organization is at the top of the hierarchy and does not have a parent.
  • Organization provides central visibility and control over every resource that belongs to an organization
  • With an Organization resource, projects belong to the organization instead of the employee who created the project, which means that the projects are no longer deleted when an employee leaves the company; instead, they will follow the organization’s lifecycle on Google Cloud.
  • Organization administrators have central control of all resources and can view and manage all of the company’s projects
  • IAM access control policies applied to the Organization resource apply throughout the hierarchy on all resources in the organization.
  • Roles granted at the organization level are inherited by all projects and folders under the Organization resource
  • Organization is not applicable for personal (e.g. Gmail) accounts
  • Google Workspace or Cloud Identity account represents a company and is a prerequisite to having access to the Organization resource. It provides identity management, recovery mechanism, ownership, and lifecycle management
  •  Google Workspace super admin is the individual responsible for domain ownership verification and the contact in cases of recovery.

Folders

  • Folders are an additional optional grouping mechanism on top of projects and provide isolation boundaries between projects
  • Organization resource is a prerequisite to use folders.
  • Folders can be used to model different legal entities, departments, teams, and environments within a company
  • Folders allow delegation of administration rights as well as control or limit access to resources within the folder

Projects

  • Project resource is the base-level organizing entity
  • Organizations and folders may contain multiple projects
  • Projects are a core organizational component of GCP
  • A project is required to use Google Cloud and forms the basis for creating, enabling, and using all Google Cloud services, managing APIs, enabling billing, adding and removing collaborators, and managing permissions.
  • Each project has a name and a unique project ID across Google Cloud
  • Project ID cannot be reused even if the project is deleted
  • Each project is associated with a billing account.
  • Multiple projects can have their usage billed to the same billing account

IAM Policy Inheritance

  • Identity and Access Management helps control who (users) has what access (roles) to which resources by setting IAM policies on the resources.
  • Resources inherit the policies of the parent node i.e. policy set at the Organization level is inherited by all its child folders and projects, and if a policy set at the project level, it is inherited by all its child resources.
  • Most permissive parent policy always overrules more restrictive child policy i.e. There is no way to explicitly remove permission for a lower-level resource that is granted at a higher level in the resource hierarchy.
  • The effective policy for a resource is the union of the policy set on the resource and the policy inherited from its ancestors.
  • Permission inheritance is transitive i.e. resources inherit policies from the project, which inherit policies from the organization.
  • IAM policy hierarchy follows the same path as the Google Cloud resource hierarchy i.e. if the resource hierarchy is changed for e.g. moving a project from one folder to the other, the policy hierarchy changes as well.

Organization Policy Service

  • Organization Policy Service gives centralized and programmatic control over the organization’s cloud resources
  • Organization Policy Service benefits
    • Centralize control to configure restrictions on how the organization’s resources can be used.
    • Define and establish guardrails for the development teams to stay within compliance boundaries.
    • Help project owners and their teams move quickly without the worry of breaking compliance.
  • Organization policy is set on a resource hierarchy node, all descendants of that node inherit the organization policy by default. i.e. organization policy set at the root organization node will pass down the defined restriction through all descendant folders, projects, and service resources.

Restricting Identities by Domain

  • Resource Manager provides a domain restriction constraint that can be used in organization policies to limit resource sharing based on domain.
  • This constraint allows restricting the set of identities allowed to be used in Identity and Access Management policies
  • Organization policies can use this constraint to limit resource sharing to a specified set of one or more Google Workspace domains, and exceptions can be granted on a per-folder or per-project basis.
  • Domain restriction constraint is not retroactive. Once a domain restriction is set, this limitation will apply to IAM policy changes made from that point forward, and not to any previous changes.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Google Cloud Platform resources are managed hierarchically using organization, folders, and projects. When Cloud Identity and Access Management (IAM) policies exist at these different levels, what is the
    effective policy at a particular node of the hierarchy?

    1. The effective policy is determined only by the policy set at the node
    2. The effective policy is the union of the policy set at the node and policies inherited from its ancestors
    3. The effective policy is the policy set at the node and restricted by the policies of its ancestors
    4. The effective policy is the intersection of the policy set at the node and policies inherited from its ancestors
  2. An Organization has setup an IAM policy at the organization level, the folder level, the project level, and on the resource level. They want to understand what policy takes effect on the entity. What would be the
    correct option?

    1. Effective policy for a resource is the Intersection of the policy set on the resource and the policy inherited from its ancestors
    2. Effective policy for a resource is the policy inherited from its ancestors overriding the policy defined on the resource
    3. Effective policy for a resource is the union of the policy set on the resource and the policy inherited from its ancestors
    4. Effective policy for a resource is the policy defined overriding the policy inherited from its ancestors
  3. Several employees at your company have been creating projects with Cloud Platform and paying for it with their personal credit
    cards, which the company reimburses. The company wants to centralize all these projects under a single, new billing account.
    What should you do?

    1. Contact [email protected] with your bank account details and request a corporate billing account for your company.
    2. Create a ticket with Google Support and wait for their call to share your credit card details over the phone.
    3. In the Google Platform Console, go to the Resource Manager and move all projects to the root Organization.
    4. In the Google Cloud Platform Console, create a new billing account and set up a payment method.

Reference

Google Cloud Platform – Resource Manager

Google Cloud Storage – GCS

Google Cloud Storage – GCS

  • Google Cloud Storage is a service for storing unstructured data i.e. objects/blobs in Google Cloud.
  • Google Cloud Storage provides a RESTful service for storing and accessing the data on Google’s infrastructure.
  • GCS combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities.

Google Cloud Storage Components

Buckets

  • Buckets are the logical containers for objects
  • All buckets are associated with a project and projects can be grouped under an organization.
  • Bucket name considerations
    • reside in a single Cloud Storage namespace.
    • must be unique.
    • are publicly visible.
    • can only be assigned during creation and cannot be changed.
    • can be used in a DNS record as part of a CNAME or A redirect.
  • Bucket name requirements
    • must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Spaces are not allowed. Names containing dots require verification.
    • must start and end with a number or letter.
    • must contain 3-63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.
    • cannot be represented as an IP address for e.g., 192.168.5.4
    • cannot begin with the goog prefix.
    • cannot contain google or close misspellings, such as g00gle.

Objects

  • An object is a piece of data consisting of a file of any format.
  • Objects are stored in containers called buckets.
  • Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime.
  • Objects can be overwritten and overwrites are Atomic
  • Object names reside in a flat namespace within a bucket, which means
    • Different buckets can have objects with the same name.
    • Objects do not reside within subdirectories in a bucket.
  • Existing objects cannot be directly renamed and need to be copied

Object Metadata

  • Objects stored in Cloud Storage have metadata associated with them
  • Metadata exists as key:value pairs and identifies properties of the object
  • Mutability of metadata varies as some metadata is set at the time the object is created for e.g. Content-Type, Cache-Control while for others they  can be edited at any time

Composite Objects

  • Composite objects help to make appends to an existing object, as well as for recreating objects uploaded as multiple components in parallel.
  • Compose operation works with objects
    • having the same storage class.
    • be stored in the same Cloud Storage bucket.
    • NOT use customer-managed encryption keys.

Cloud Storage Locations

  • GCS buckets need to be created in a location for storing the object data.
  • GCS support different location types
    • regional
      • A region is a specific geographic place, such as London.
      • helps optimize latency and network bandwidth for data consumers, such as analytics pipelines, that are grouped in the same region.
    • dual-region
      • is a specific pair of regions, such as Finland and the Netherlands.
      • provides higher availability that comes with being geo-redundant.
    • multi-region
      • is a large geographic area, such as the United States, that contains two or more geographic places.
      • allows serving content to data consumers that are outside of the Google network and distributed across large geographic areas, or
      • provides higher availability that comes with being geo-redundant.
  • Objects stored in a multi-region or dual-region are geo-redundant i.e. data is stored redundantly in at least two separate geographic places separated by at least 100 miles.

Cloud Storage Classes

Refer blog Google Cloud Storage – Storage Classes

Cloud Storage Security

Refer blog Google Cloud Storage – Security

GCS Upload and Download

  • GCS supports upload and storage of any MIME type of data up to 5 TB
  • Uploaded object consists of the data along with any associated metadata
  • GCS supports multiple upload types
    • Simple upload – ideal for small files that can be uploaded again in their entirety if the connection fails, and if there are no object metadata to send as part of the request.
    • Multipart upload – ideal for small files that can be uploaded again in their entirety if the connection fails, and there is a need to include object metadata as part of the request.
    • Resumable upload – ideal for large files with a need for more reliable transfer. Supports streaming transfers, which is a type of resumable upload that allows uploading an object of unknown size.

Resumable Upload

  • Resumable uploads are the recommended method for uploading large files because they don’t need to be restarted from the beginning if there is a network failure while the upload is underway.
  • Resumable upload allows resumption of data transfer operations to Cloud Storage after a communication failure has interrupted the flow of data
  • Resumable uploads work by sending multiple requests, each of which contains a portion of the object you’re uploading.
  • Resumable upload mechanism supports transfers where the file size is not known in advance or for streaming transfer.
  • Resumable upload must be completed within a week of being initiated.

Streaming Transfers

  • Streaming transfers allow streaming data to and from the Cloud Storage account without requiring that the data first be saved to a file.
  • Streaming uploads are useful when uploading data whose final size is not known at the start of the upload, such as when generating the upload data from a process, or when compressing an object on the fly.
  • Streaming downloads are useful to download data from Cloud Storage into a process.

Parallel Composite Uploads

  • Parallel composite uploads divide a file into up to 32 chunks, which are uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted
  • Parallel composite uploads can be significantly faster if network and disk speed are not limiting factors; however, the final object stored in the bucket is a composite object, which only has a crc32c hash and not an MD5 hash
  • As a result, crcmod needs to be used to perform integrity checks when downloading the object with gsutil or other Python applications.
    You should only perform parallel composite uploads if the following apply:
  • Parallel composite uploads do not support buckets with default customer-managed encryption keys, because the compose operation does not support source objects encrypted in this way.
  • Parallel composite uploads do not need the uploaded objects to have an MD5 hash.

Object Versioning

  • Object Versioning retains a noncurrent object version when the live object version gets replaced, overwritten, or deleted
  • Object Versioning is disabled by default.
  • Object Versioning prevents accidental overwrites and deletion
  • Object Versioning causes deleted or overwritten objects to be archived instead of being deleted
  • Object Versioning increases storage costs as it maintains the current and noncurrent versions of the object, which can be partially mitigated by lifecycle management
  • Noncurrent versions retain the name of the object but are uniquely identified by their generation number.
  • Noncurrent versions only appear in requests that explicitly call for object versions to be included.
  • Objects versions can be permanently deleted by including the generation number or configuring Object Lifecycle Management to delete older object versions
  • Object versioning, if disabled, does not create versions for new ones but old versions are not deleted

Object Lifecycle Management

  • Object Lifecycle Management sets Time To Live (TTL) on an object and helps configure transition or expiration of the objects based on specified rules for e.g.  SetStorageClass to downgrade the storage class, delete to expire noncurrent or archived objects
  • Lifecycle management configuration can be applied to a bucket, which contains a set of rules applied to current and future objects in the bucket
  • Lifecycle management rules precedence
    • Delete action takes precedence over any SetStorageClass action.
    • SetStorageClass action switches the object to the storage class with the lowest at-rest storage pricing takes precedence.
  • Cloud Storage doesn’t validate the correctness of the storage class transition
  • Lifecycle actions can be tracked using Cloud Storage usage logs or using Pub/Sub Notifications for Cloud Storage
  • Lifecycle management is done using rules, conditions, and actions and is applied if
    • With multiple rules,  any of the rules can be met (OR operation)
    • All the conditions in a rule (AND operation) should be met

GCS Object Lifecycle Management

Object Lifecycle Behavior

  • Cloud Storage performs the action asynchronously, so there can be a lag between when the conditions are satisfied and the action is taken
  • Updates to lifecycle configuration may take up to 24 hours to take effect.
  • Delete action will not take effect on an object while the object either has an object hold placed on it or an unfulfilled retention policy.
  • SetStorageClass action is not affected by the existence of object holds or retention policies.
  • SetStorageClass does not rewrite an object and hence you are not charged for retrieval and deletion operations.

GCS Requester Pays

  • Project owner of the resource is billed normally for the access which includes operation charges, network charges, and data retrieval charges
  • However, if the requester provides a billing project with their request, the requester’s project is billed instead.
  • Requester Pays requires the requester to include a billing project in their requests, thus billing the requester’s project
  • Enabling Requester Pays is useful, e.g. if you have a lot of data to share, but you don’t want to be charged for their access to that data.
  • Requester Pays does not cover the storage charges and early deletion charges

CORS

  • Cloud Storage allows setting CORS configuration at the bucket level only

Cloud Storage Tracking Updates

  • Pub/Sub notifications
    • sends information about changes to objects in the buckets to Pub/Sub, where the information is added to a specified Pub/Sub topic in the form of messages.
    • Each notification contains information describing both the event that triggered it and the object that changed.
  • Audit Logs
    • Google Cloud services write audit logs to help you answer the questions, “Who did what, where, and when?”
    • Cloud projects contain only the audit logs for resources that are directly within the project.
    • Cloud Audit Logs generates the following audit logs for operations in Cloud Storage:
      • Admin Activity logs: Entries for operations that modify the configuration or metadata of a project, bucket, or object.
        • Data Access logs: Entries for operations that modify objects or read a project, bucket, or object.

Data Consistency

  • Cloud Storage operations are primarily strongly consistent with few exceptions being eventually consistent
  • Cloud Storage provides strong global consistency for the following operations, including both data and metadata:
    • Read-after-write
    • Read-after-metadata-update
    • Read-after-delete
    • Bucket listing
    • Object listing
  • Cloud Storage provides eventual consistency for following operations
    • Granting access to or revoking access from resources.

gsutil

  •  gsutil tool is the standard tool for small- to medium-sized transfers (less than 1 TB) over a typical enterprise-scale network, from a private data center to Google Cloud.
  • gsutil provides all the basic features needed to manage the Cloud Storage instances, including copying the data to and from the local file system and Cloud Storage.
  • gsutil can also move, rename and remove objects and perform real-time incremental syncs, like rsync, to a Cloud Storage bucket.
  • gsutil is especially useful in the following scenarios:
    • as-needed transfers or during command-line sessions by your users.
    • transferring only a few files or very large files, or both.
    • consuming the output of a program (streaming output to Cloud Storage)
    • watch a directory with a moderate number of files and sync any updates with very low latencies.
  • gsutil provides following features
    • Parallel multi-threaded transfers with  gsutil -m, increasing transfer speeds.
    • Composite transfers for a single large file to break them into smaller chunks to increase transfer speed. Chunks are transferred and validated in parallel, sending all data to Google. Once the chunks arrive at Google, they are combined (referred to as compositing) to form a single object
  • gsutil perfdiag can help gather stats to provide diagnostic output to the Cloud Storage team

Best Practices

  • Use IAM over ACL whenever possible as IAM provides an audit trail
  • Cloud Storage auto-scaling performs well if requests ram up gradually rather than having a sudden spike.
    • If the request rate is less than 1000 write requests per second or 5000 read requests per second, then no ramp-up is needed.
    • If the request rate is expected to go over these thresholds, start with a request rate below or near the thresholds and then double the request rate no faster than every 20 minutes.
  • Avoid sequential naming bottleneck as Cloud Storage uploads data to different shards based on the file name/path as using the same pattern would overload a shard leading to performance degrade
  • Use Truncated exponential backoff as a standard error handling strategy
  • For multiple smaller files, use gsutil with -m option that performs a batched, parallel, multi-threaded/multi-processing to upload which can significantly increase the performance of an upload
  • For large objects downloads, use gsutil with HTTP Range GET requests to perform “sliced” downloads in parallel
  • To upload large files efficiently, use parallel composite upload with object composition to perform uploads in parallel for large, local files.  It splits a large file into component pieces, uploads them in parallel, and then recomposes them once they’re in the cloud (and deletes the temporary components it created locally).

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You have a collection of media files over 50GB each that you need to migrate to Google Cloud Storage. The files are in your on-premises data center. What migration method can you use to help speed up the transfer process?
    1. Use multi-threaded uploads using the -m option.
    2. Use parallel uploads to break the file into smaller chunks then transfer it simultaneously.
    3. Use the Cloud Transfer Service to transfer.
    4. Start a recursive upload.
  2. Your company has decided to store data files in Cloud Storage. The data would be hosted in a regional bucket to start with. You need to configure Cloud Storage lifecycle rule to move the data for archival after 30 days and delete the data after a year. Which two actions should you take?
    1. Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Coldline”, and create a second GCS life-cycle rule with Age: “365”, Storage Class: “Coldline”, and Action: “Delete”.
    2. Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Coldline”, and create a second GCS life-cycle rule with Age: “275”, Storage Class: “Coldline”, and Action: “Delete”.
    3. Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Nearline”, and create a second GCS life-cycle rule with Age: “365”, Storage Class: “Nearline”, and Action: “Delete”.
    4. Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Nearline”, and create a second GCS life-cycle rule with Age: “275”, Storage Class: “Nearline”, and Action: “Delete”.

References

Google Cloud Platform – Cloud Storage

Google Cloud Storage – Storage Classes

Google Cloud Storage – Storage Classes

  • Google Cloud Storage – Storage class affects the object’s availability and pricing model
  • Storage class of an existing object can be changed either by rewriting the object or by using Object Lifecycle Management.
  • Bucket’s default storage class is set to Standard Storage, if not specified
  • A default storage class for the bucket can be specified so when a bucket is created, all the objects added to the bucket will inherit this storage class unless explicitly set otherwise.
  • Changing the default storage class of a bucket does not affect any of the objects that already exist in the bucket.

Storage Classes Options

  • All storage classes provide the following
    • Unlimited storage with no minimum object size.
    • Worldwide accessibility and worldwide storage locations.
    • Low latency (time to the first byte typically tens of milliseconds).
    • High durability (99.999999999% annual durability).
    • Geo-redundancy, if the data is stored in a multi-region or dual-region.
    • A uniform experience with Cloud Storage features, security, tools, and APIs

Standard Storage

  • Standard Storage is best for data that is frequently accessed (hot data) and/or stored for only brief periods of time.
  • for regional locations
    • is appropriate for storing data in the same location for co-locating the resources such as GKE clusters or GCE instances with the data used, which helps in maximizing performance can reduce network charges.
    • Availability SLA – 99.99%
  • for dual-region,
    • provides optimized performance when accessing Google Cloud products that are located in one of the associated regions,
    • provides improved availability that comes from storing data in geographically separate locations.
    • Availability SLA > 99.99%
  • for multi-region
    • ideal for storing data that is accessed around the world, such as serving website content, streaming videos, executing interactive workloads, or serving data supporting mobile and gaming applications.
    • Availability SLA > 99.99%

Nearline Storage

  • Nearline Storage is a low-cost, highly durable storage service for storing infrequently accessed data (warm data)
  • Nearline Storage is a better choice than Standard Storage in scenarios where slightly lower availability, a 30-day minimum storage duration, and data access costs are acceptable trade-offs for lowered at-rest storage cost
  • Nearline Storage is ideal for data you plan to read or modify on average once per month or less. for e.g., if you want to continuously add files to Cloud Storage and plan to access those files once a month for analysis, Nearline Storage is a great choice.
  • Nearline Storage is also appropriate for data backup, long-tail multimedia content, and data archiving.

Coldline Storage

  • Coldline Storage provides a very-low-cost, highly durable storage service for storing infrequently accessed data (cold data)
  • Coldline Storage is a better choice than Standard Storage or Nearline Storage in scenarios where slightly lower availability, a 90-day minimum storage duration, and higher costs for data access are acceptable trade-offs for lowered at-rest storage costs.
  • Coldline Storage is ideal for data you plan to read or modify at most once a quarter.

Archive Storage

  • Archive Storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery. (coldest data)
  • Archive Storage has no availability SLA, though the typical availability is comparable to Nearline Storage and Coldline Storage.
  • Data is available within milliseconds, not hours or days.
  • Archive Storage has higher costs for data access and operations, as well as a 365-day minimum storage duration.
  • Archive Storage is the best choice for data that you plan to access less than once a year. for e.g. cold data storage for archival and disaster recovery

Google Cloud Storage - Storage Classes

Legacy Storage Classes

  • Google Cloud Storage provided additional storage classes which have been phased out
    • Multi-Regional Storage
      • Equivalent to Standard Storage, except Multi-Regional Storage can only be used for objects stored in multi-regions or dual-regions.
    • Regional Storage
      • Equivalent to Standard Storage, except Regional Storage, can only be used for objects stored in regions.
    • Durable Reduced Availability (DRA) Storage:
      • Similar to Standard Storage except:
        • DRA has higher pricing for operations.
        • DRA has lower performance, particularly in terms of availability (DRA has a 99% availability SLA).

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You’ve created a bucket to store some data archives for compliance. The data isn’t likely to need to be viewed. However, you need to store it for at least 7 years. What is the best default storage class?
    1. Multi-regional
    2. Coldline
    3. Regional
    4. Nearline

References

HashiCorp Certified Terraform Associate Learning Path

If you are working on an multi-cloud environment and focusing on automation, you would surely have been using Terraform or considered at some point of time. I have been using Terraform for over two years now for provisioning infrastructure on AWS, GCP and AliCloud right through development to production and it has been a wonderful DevOps journey and It was good to validate the Terraform skills through the Terraform Associate certification.

Terraform is for Cloud Engineers specializing in operations, IT, or development who know the basic concepts and skills associated with open source HashiCorp Terraform.

HashiCorp Certified Terraform Associate Exam Summary

  • HashiCorp Certified Terraform Associate exam focuses on Terraform as a Infrastructure as a Code tool
  • HashiCorp Certified Terraform Associate exam has 57 questions with a time limit of 60 minutes
  • Exam has a multi answer, multiple choice, fill in the blanks and True/False type of questions
  • Questions and answer options are pretty short and if you have experience on Terraform they are pretty easy and the time if more than sufficient.

HashiCorp Certified Terraform Associate Exam Topic Summary

Refer Terraform Cheat Sheet for details

Understand Infrastructure as Code (IaC) concepts

  • Explain what IaC is
    • Infrastructure is described using a high-level configuration syntax
    • IaC allows Infrastructure to be versioned and treated as you would any other code.
    • Infrastructure can be shared and re-used.
  • Describe advantages of IaC patterns
    • makes Infrastructure more reliable
    • makes Infrastructure more manageable
    • makes Infrastructure more automated and less error prone

Understand Terraform’s purpose (vs other IaC)

  • Explain multi-cloud and provider-agnostic benefits
    • using multi-cloud setup increases fault tolerance and reduces dependency on a single Cloud
    • Terraform provides a cloud-agnostic framework and allows a single configuration to be used to manage multiple providers, and to even handle cross-cloud dependencies.
    • Terraform simplifies management and orchestration, helping operators build large-scale multi-cloud infrastructures.
  • Explain the benefits of state
    • State is a necessary requirement for Terraform to function.
    • Terraform requires some sort of database to map Terraform config to the real world.
    • Terraform uses its own state structure for mapping configuration to resources in the real world
    • Terraform state helps
      • track metadata such as resource dependencies.
      • provides performance as it stores a cache of the attribute values for all resources in the state
      • aids syncing when using in team with multiple users

Understand Terraform basics

  • Handle Terraform and provider installation and versioning
    • Providers provide abstraction above the upstream API and is responsible for understanding API interactions and exposing resources.
    • Terraform configurations must declare which providers they require, so that Terraform can install and use them
    • Provider requirements are declared in a required_providers block.
  • Describe plugin based architecture
    • Terraform relies on plugins called “providers” to interact with remote systems.
  • Demonstrate using multiple providers
    • supports multiple provider instances using alias for e.g. multiple aws provides with different region
  • Describe how Terraform finds and fetches providers
    • Terraform finds and installs providers when initializing a working directory. It can automatically download providers from a Terraform registry, or load them from a local mirror or cache.
    • Each Terraform module must declare which providers it requires, so that Terraform can install and use them.
  • Explain when to use and not use provisioners and when to use local-exec or remote-exec
    • Terraform provides local-exec and remote-exec to execute tasks not provided by Terraform
      • local exec executes code on the machine running terraform
      • remote exec executes on the resource provisioned and supports ssh and winrm
    • Provisioners should only be used as a last resort.
    • are defined within the resource block.
    • support types – Create and Destroy
      • if creation time fails, resource is tainted if provisioning failed, by default. (next apply it will be re-created)
      • behavior can be overridden by setting the on_failure to continue, which means ignore and continue
      • for destroy, if it fails – resources are not removed

Use the Terraform CLI (outside of core workflow)

  • Given a scenario: choose when to use terraform fmt to format code
    • terraform fmt helps format code to lint into a standard format. It usually aligns the spaces and matches the =
  • Given a scenario: choose when to use terraform taint to taint Terraform resources
    • terraform taint marks a Terraform-managed resource as tainted, forcing it to be destroyed and recreated on the next apply.
    • will not modify infrastructure, but does modify the state file in order to mark a resource as tainted.
    • Infrastructure and state are changed in next apply.
    • can be used to taint a resource within a module
  • Given a scenario: choose when to use terraform import to import existing infrastructure into your Terraform state
    • terraform import helps import already-existing external resources, not managed by Terraform, into Terraform state and allow it to manage those resources
    • Terraform is not able to auto-generate configurations for those imported modules, for now, and requires you to first write the resource definition in Terraform and then import this resource
  • Given a scenario: choose when to use terraform workspace to create workspaces
    • Terraform workspace helps manage multiple distinct sets of infrastructure resources or environments with the same code.
    • state files for each workspace are stored in the directory terraform.tfstate.d
    • terraform workspace new dev creates a new workspace with name dev and switches to it as well
    • does not provide strong separation as it uses the same backend
  • Given a scenario: choose when to use terraform state to view Terraform state
    • state helps keep track of the infrastructure Terraform manages
    • stored locally in the terraform.tfstate
    • recommended not to edit the state manually
    • Use terraform state command
      • mv – to move/rename modules
      • rm – to safely remove resource from the state. (destroy/retain like)
      • pull – to observe current remote state
      • list & show – to write/debug modules
  • Given a scenario: choose when to enable verbose logging and what the outcome/value is
    • debugging can be controlled using TF_LOG , which can be configured for different levels TRACE, DEBUG, INFO, WARN or ERROR, with TRACE being the more verbose.
    • logs path can be controlled TF_LOG_PATHTF_LOG needs to be specified.

Interact with Terraform modules

  • Contrast module source options
    • Terraform Module Registry allows you to browse, filter and search for modules
  • Interact with module inputs and outputs
    • Input variables serve as parameters for a Terraform module, allowing aspects of the module to be customized without altering the module’s own source code, and allowing modules to be shared between different configurations.
    • Resources defined in a module are encapsulated, so the calling module cannot access their attributes directly.
    • Child module can declare output values to selectively export certain values to be accessed by the calling module module.module_name.output_value
  • Describe variable scope within modules/child modules
    • Modules are called from within other modules using module blocks
    • All modules require a source argument, which is a meta-argument defined by Terraform
    • To call a module means to include the contents of that module into the configuration with specific values for its input variables.
  • Discover modules from the public Terraform Module Registry
    • Terraform Module Registry allows you to browse, filter and search for modules
  • Defining module version
    • must be on GitHub and must be a public repo, if using public registry.
    • must be named terraform-<PROVIDER>-<NAME>, where <NAME> reflects the type of infrastructure the module manages and <PROVIDER> is the main provider where it creates that infrastructure. for e.g. terraform-google-vault or terraform-aws-ec2-instance.
    • must maintain x.y.z tags for releases to identify module versions. and can optionally be prefixed with a v for example, v1.0.4 and 0.9.2. Tags that don’t look like version numbers are ignored.
    • must maintain a Standard module structure, which allows the registry to inspect the module and generate documentation, track resource usage, parse submodules and examples, and more.

Navigate Terraform workflow

  • Describe Terraform workflow ( Write -> Plan -> Create )
    • Core Terraform workflow has three steps:
      • Write – Author infrastructure as code.
      • Plan – Preview changes before applying.
      • Apply – Provision reproducible infrastructure.
  • Initialize a Terraform working directory terraform init
    • initializes a working directory containing Terraform configuration files.
    • performs backend initialization, modules and plugins installation.
    • plugins are downloaded in the sub-directory of the present working directory at the path of .terraform/plugins
    • does not delete the existing configuration or state
  • Validate a Terraform configuration terraform validate
    • validates the configuration files in a directory, referring only to the configuration and not accessing any remote services such as remote state, provider APIs, etc.
    • verifies whether a configuration is syntactically valid and internally consistent, regardless of any provided variables or existing state.
    • useful for general verification of reusable modules, including the correctness of attribute names and value types.
  • Generate and review an execution plan for Terraform terraform plan
    • terraform plan create a execution plan as it traverses each vertex and requests each provider using parallelism
    • calculates the difference between the last-known state and the current state and presents this difference as the output of the terraform plan operation to user in their terminal
    • does not modify the infrastructure or state.
    • allows a user to see which actions Terraform will perform prior to making any changes to reach the desired state
    • performs refresh for each resource and might hit rate limiting issues as it calls provider APIs
    • all resources refresh can be disabled or avoided using
      • -refresh=false or
      • target=xxxx or
      • break resources into different directories.
  • Execute changes to infrastructure with Terraform terraform apply
    • will always ask for confirmation before executing unless passed the -auto-approve flag.
    • if a resource successfully creates but fails during provisioning, Terraform will error and mark the resource as “tainted”. Terraform does not roll back the changes
  • Destroy Terraform managed infrastructure terraform destroy
    • will always ask for confirmation before executing unless passed the -auto-approve flag.

Implement and maintain state

  • Describe default local backend
    • A “backend” in Terraform determines how state is loaded and how an operation such as apply is executed. This abstraction enables non-local file state storage, remote execution, etc.
    • determines how state is loaded and how an operation such as apply is executed
    • is responsible for storing state and providing an API for optional state locking
    • needs to be initialized
    • helps
      • collaboration and working as a team, with the state maintained remotely and state locking
      • can provide enhanced security for sensitive data
      • support remote operations
    • local (default) backend stores state in a local JSON file on disk
  • Outline state locking
    • happens for all operations that could write state, if supported by backend for e.g. S3 with DynamoDB, Consul etc.
    • prevents others from acquiring the lock & potentially corrupting the state
    • use force-unlock command to manually unlock the state if unlocking failed
    • backends which support state locking are
      • azurerm
      • Hashicorp consul
      • Tencent Cloud Object Storage (COS)
      • etcdv3
      • Google Cloud Storage GCS
      • HTTP endpoints
      • Kubernetes Secret with locking done using a Lease resource
      • AliCloud Object Storage OSS with locking via TableStore
      • PostgreSQL
      • AWS S3 with locking via DynamoDB
      • Terraform Enterprise
    • Backends which do not support state locking are
      • artifactory
      • etcd
  • Handle backend authentication methods
    • every remote backend support different authentication mechanism and can be configured with the backend configuration
  • Describe remote state storage mechanisms and supported standard backends
    • remote backend stores state remotely like S3, OSS, GCS, Consul and support features like remote operation, state locking, encryption, versioning etc.
    • github is not a supported backend type.
  • Describe effect of Terraform refresh on state
    • terraform refreshis used to reconcile the state Terraform knows about (via its state file) with the real-world infrastructure.
    • can be used to detect any drift from the last-known state, and to update the state file.
    • does not modify infrastructure but does modify the state file.
  • Describe backend block in configuration and best practices for partial configurations
    • Backend configuration doesn’t support interpolations.
    • supports partial configuration with remaining configuration arguments provided as part of the initialization process
    • if switching the backed for the first time setup, Terraform provides a migration option
  • Understand secret management in state files
    • terraform state command is used for advanced state management
    • Terraform has no mechanism to redact or protect secrets that are returned via data sources, so secrets read via this provider will be persisted into the Terraform state, into any plan files, and in some cases in the console output produced while planning and applying.
    • can be protected accordingly either by using Vault and remote backends with encryption and proper access control

Read, generate, and modify configuration

  • Demonstrate use of variables and outputs
    • Variables
      • serve as parameters for a Terraform module and
      • act like function arguments
      • count is a reserved word and cannot be used as variable name
    • Output
      • are like function return values.
      • can be marked sensitive which prevents showing its value in the list of outputs. However, they are stored in the state as plain text.
  • Describe secure secret injection best practice
  • Understand the use of collection and structural types
    • supports primitive data types of
      • string, number and bool
      • automatically convert number and bool values to string values
    • supports complex data types of
      • list – sequence of values identified by consecutive whole numbers starting with zero.
      • map – collection of values where each is identified by a string label
      • set – collection of unique values that do not have any secondary identifiers or ordering.
    • supports structural data types of
      • object – a collection of named attributes with their own type
      • tuple – a sequence of elements identified by consecutive whole numbers starting with zero, where each element has its own type.
  • Create and differentiate resource and data configuration
    • Resources describe one or more infrastructure objects, such as virtual networks, instances, or higher-level components such as DNS records.
    • Data sources allow data to be fetched or computed for use elsewhere in Terraform configuration. Use of data sources allows a Terraform configuration to make use of information defined outside of Terraform, or defined by another separate Terraform configuration.
  • Use resource addressing and resource parameters to connect resources together
  • Use Terraform built-in functions to write configuration
    • lookup retrieves the value of a single element from a map, given its key. If the given key does not exist, a the given default value is returned instead. lookup(map, key, default)
    • zipmap constructs a map from a list of keys and a corresponding list of values. A map is denoted by { } whereas a list is donated by [ ] for e.g. zipmap(["a", "b"], [1, 2]) results into {"a" = 1, "b" = 2}
  • Configure resource using a dynamic block
    • dynamic acts much like a for expression, but produces nested blocks instead of a complex typed value. It iterates over a given complex value, and generates a nested block for each element of that complex value.
    • Overuse of dynamic block is not recommended as it makes the code hard to understand and debug
  • Describe built-in dependency management (order of execution based)
    • Terraform analyses any expressions within a resource block to find references to other objects and treats those references as implicit ordering requirements when creating, updating, or destroying resources.
    • Explicit dependency can be defined using the depends_on attribute where dependencies between resources that are not visible
  • support comments using #, // and /* */

Understand Terraform Cloud and Enterprise capabilities

  • Describe the benefits of Sentinel, registry, and workspaces
    • Terraform Cloud provides private module registry for storing modules private to be used within the organization
  • Differentiate OSS and TFE workspaces
  • Summarize features of Terraform Cloud
    • Terraform Enterprise currently supports running under the following operating systems for a Clustered deployment:
      • Ubuntu 16.04.3 – 16.04.5 / 18.04
      • Red Hat Enterprise Linux 7.4 through 7.7
      • CentOS 7.4 – 7.7
      • Amazon Linux
      • Oracle Linux
      • Clusters currently don’t support other Linux variants.
    • Terraform Enterprise install that is provisioned on a network that does not have Internet access is generally known as an air-gapped install.

HashiCorp Certified Terraform Associate Exam Resources

Terraform Cheat Sheet

  • An open source provisioning declarative tool that based on Infrastructure as a Code paradigm
  • designed on immutable infrastructure principles
  • Written in Golang and uses own syntax – HCL (Hashicorp Configuration Language), but also supports JSON
  • Helps to evolve the infrastructure, safely and predictably
  • Applies Graph Theory to IaaC and provides Automation, Versioning and Reusability
  • Terraform is a multipurpose composition tool:
    ○ Composes multiple tiers (SaaS/PaaS/IaaS)
    ○ A plugin-based architecture model
  • Terraform is not a cloud agnostic tool. It embraces all major Cloud Providers and provides common language to orchestrate the infrastructure resources
  • Terraform is not a configuration management tool and other tools like chef, ansible exists in the market.

Terraform Architecture

Terraform Architecture

Terraform Providers (Plugins)

  • provide abstraction above the upstream API and is responsible for understanding API interactions and exposing resources.
  • Invoke only upstream APIs for the basic CRUD operations
  • Providers are unaware of anything related to configuration loading, graph
    theory, etc.
  • supports multiple provider instances using alias for e.g. multiple aws provides with different region
  • can be integrated with any API using providers framework
  • Most providers configure a specific infrastructure platform (either cloud or self-hosted).
  • can also offer local utilities for tasks like generating random numbers for unique resource names.

Terraform Provisioners

  • run code locally or remotely on resource creation
    • local exec executes code on the machine running terraform
    • remote exec
      • runs on the provisioned resource
      • supports ssh and winrm
    • requires inline list of commands
  • should be used as a last resort
  • are defined within the resource block.
  • support types – Create and Destroy
    • if creation time fails, resource is tainted if provisioning failed, by default. (next apply it will be re-created)
    • behavior can be overridden by setting the on_failure to continue, which means ignore and continue
    • for destroy, if it fails – resources are not removed

Terraform Workspaces

  • helps manage multiple distinct sets of infrastructure resources or environments with the same code.
  • just need to create needed workspace and use them, instead of creating a directory for each environment to manage
  • state files for each workspace are stored in the directory terraform.tfstate.d
  • terraform workspace new dev creates a new workspace and switches to it as well
  • terraform workspace select dev helps select workspace
  • terraform workspace list lists the workspaces and shows the current active one with *
  • does not provide strong separation as it uses the same backend

Terraform Workflow

Terraform Workflow

init

  • initializes a working directory containing Terraform configuration files.
  • performs
    • backend initialization , storage for terraform state file.
    • modules installation, downloaded from terraform registry to local path
    • provider(s) plugins installation, the plugins are downloaded in the sub-directory of the present working directory at the path of .terraform/plugins
  • supports -upgrade to update all previously installed plugins to the newest version that complies with the configuration’s version constraints
  • is safe to run multiple times, to bring the working directory up to date with changes in the configuration
  • does not delete the existing configuration or state

validate

  • validates syntactically for format and correctness.
  • is used to validate/check the syntax of the Terraform files.
  • verifies whether a configuration is syntactically valid and internally consistent, regardless of any provided variables or existing state.
  • A syntax check is done on all the terraform files in the directory, and will display an error if any of the files doesn’t validate.

plan

  • create a execution plan
  • traverses each vertex and requests each provider using parallelism
  • calculates the difference between the last-known state and
    the current state and presents this difference as the output of the terraform plan operation to user in their terminal
  • does not modify the infrastructure or state.
  • allows a user to see which actions Terraform will perform prior to making any changes to reach the desired state
  • will scan all *.tf  files in the directory and create the plan
  • will perform refresh for each resource and might hit rate limiting issues as it calls provider APIs
  • all resources refresh can be disabled or avoided using
    • -refresh=false or
    • target=xxxx or
    • break resources into different directories.
  • supports -out to save the plan

apply

  • apply changes to reach the desired state.
  • scans the current directory for the configuration and applies the changes appropriately.
  • can be provided with a explicit plan, saved as out from terraform plan
  • If no explicit plan file is given on the command line, terraform apply will create a new plan automatically and prompt for approval to apply it
  • will modify the infrastructure and the state.
  • if a resource successfully creates but fails during provisioning,
    • Terraform will error and mark the resource as “tainted”.
    • A resource that is tainted has been physically created, but can’t be considered safe to use since provisioning failed.
    • Terraform also does not automatically roll back and destroy the resource during the apply when the failure happens, because that would go against the execution plan: the execution plan would’ve said a resource will be created, but does not say it will ever be deleted.
  • does not import any resource.
  • supports -auto-approve to apply the changes without asking for a confirmation
  • supports -target to apply a specific module

refresh

  • used to reconcile the state Terraform knows about (via its state file) with the real-world infrastructure
  • does not modify infrastructure, but does modify the state file

destroy

  • destroy the infrastructure and all resources
  • modifies both state and infrastructure
  • terraform destroy -target can be used to destroy targeted resources
  • terraform plan -destroy allows creation of destroy plan

import

  • helps import already-existing external resources, not managed by Terraform, into Terraform state and allow it to manage those resources
  • Terraform is not able to auto-generate configurations for those imported modules, for now, and requires you to first write the resource definition in Terraform and then import this resource

taint

  • marks a Terraform-managed resource as tainted, forcing it to be destroyed and recreated on the next apply.
  • will not modify infrastructure, but does modify the state file in order to mark a resource as tainted. Infrastructure and state are changed in next apply.
  • can be used to taint a resource within a module

fmt

  • format to lint the code into a standard format

console

  • command provides an interactive console for evaluating expressions.

Terraform Modules

  • enables code reuse
  • supports versioning to maintain compatibility
  • stores code remotely
  • enables easier testing
  • enables encapsulation with all the separate resources under one configuration block
  • modules can be nested inside other modules, allowing you to quickly spin up whole separate environments.
  • can be referred using source attribute
  • supports Local and Remote modules
    • Local modules are stored alongside the Terraform configuration (in a separate directory, outside of each environment but in the same repository) with source path ./ or ../
    • Remote modules are stored externally in a separate repository, and supports versioning
  • supports following backends
    • Local paths
    • Terraform Registry
    • GitHub
    • Bitbucket
    • Generic Git, Mercurial repositories
    • HTTP URLs
    • S3 buckets
    • GCS buckets
  • Module requirements
    • must be on GitHub and must be a public repo, if using public registry.
    • must be named terraform-<PROVIDER>-<NAME>, where <NAME> reflects the type of infrastructure the module manages and <PROVIDER> is the main provider where it creates that infrastructure. for e.g. terraform-google-vault or terraform-aws-ec2-instance.
    • must maintain x.y.z tags for releases to identify module versions. Release tag names must be a semantic version, which can optionally be prefixed with a v for example, v1.0.4 and 0.9.2. Tags that don’t look like version numbers are ignored.
    • must maintain a Standard module structure, which allows the registry to inspect the module and generate documentation, track resource usage, parse submodules and examples, and more.

Terraform Read and write configuration

terraform_sample

  • Resources
    • resource are the most important element in the Terraform language that describes one or more infrastructure objects, such as compute instances etc
    • resource type and local name together serve as an identifier for a given resource and must be unique within a module for e.g.  aws_instance.local_name
  • Data Sources
    • data allow data to be fetched or computed for use elsewhere in Terraform configuration
    • allows a Terraform configuration to make use of information defined outside of Terraform, or defined by another separate Terraform configuration
  • Variables
    • variable serve as parameters for a Terraform module and act like function arguments
    • allows aspects of the module to be customized without altering the module’s own source code, and allowing modules to be shared between different configurations
    • can be defined through multiple ways
      • command line for e.g.-var="image_id=ami-abc123"
      • variable definition files .tfvars or .tfvars.json. By default, terraform automatically loads
        • Files named exactly terraform.tfvars or terraform.tfvars.json.
        • Any files with names ending in .auto.tfvars or .auto.tfvars.json
        • file can also be passed with -var-file
      • environment variables can be used to set variables using the format TF_VAR_name
      • Environment variables
      • terraform.tfvars file, if present.
      • terraform.tfvars.json file, if present.
      • Any *.auto.tfvars or *.auto.tfvars.json files, processed in lexical order of their filenames.
      • Any -var and -var-file options on the command line, in the order they are provided.Terraform loads variables in the following order, with later sources taking precedence over earlier ones:
  • Local Values
    • locals assigns a name to an expression, allowing it to be used multiple times within a module without repeating it.
    • are like a function’s temporary local variables.
    • helps to avoid repeating the same values or expressions multiple times in a configuration.
  • Output
    • are like function return values.
    • output can be marked as containing sensitive material using the optional sensitive argument, which prevents Terraform from showing its value in the list of outputs. However, they are still stored in the state as plain text.
    • In a parent module, outputs of child modules are available in expressions as module.<MODULE NAME>.<OUTPUT NAME>.
  • Named Values
    • is an expression that references the associated value for e.g. aws_instance.local_name, data.aws_ami.centos, var.instance_type etc.
    • support Local named values for e.g count.index
  • Dependencies
    • identifies implicit dependencies as Terraform automatically infers when one resource depends on another by studying the resource attributes used in interpolation expressions for e.g aws_eip on resource aws_instance
    • explicit dependencies can be defined using depends_on where dependencies between resources that are not visible to Terraform
  • Data Types
    • supports primitive data types of
      • string, number and bool
      • Terraform language will automatically convert number and bool values to string values when needed
    • supports complex data types of
      • list – a sequence of values identified by consecutive whole numbers starting with zero.
      • map – a collection of values where each is identified by a string label.
      • set –  a collection of unique values that do not have any secondary identifiers or ordering.
    • supports structural data types of
      • object – a collection of named attributes that each have their own type
      • tuple – a sequence of elements identified by consecutive whole numbers starting with zero, where each element has its own type.
  • Built-in Functions
    • includes a number of built-in functions that can be called from within expressions to transform and combine values for e.g. min, max, file, concat, element, index, lookup etc.
    • does not support user-defined functions
  • Dynamic Blocks
    • acts much like a for expression, but produces nested blocks instead of a complex typed value. It iterates over a given complex value, and generates a nested block for each element of that complex value.
  • Terraform Comments
    • supports three different syntaxes for comments:
      • #
      • //
      • /* and */

Terraform Backends

  • determines how state is loaded and how an operation such as apply is executed
  • are responsible for storing state and providing an API for optional state locking
  • needs to be initialized
  • if switching the backed for the first time setup, Terraform provides a migration option
  • helps
    • collaboration and working as a team, with the state maintained remotely and state locking
    • can provide enhanced security for sensitive data
    • support remote operations
  • supports local vs remote backends
    • local (default) backend stores state in a local JSON file on disk
    • remote backend stores state remotely like S3, OSS, GCS, Consul and support features like remote operation, state locking, encryption, versioning etc.
  • supports partial configuration with remaining configuration arguments provided as part of the initialization process
  • Backend configuration doesn’t support interpolations.
  • GitHub is not the supported backend type in Terraform.

Terraform State Management

  • state helps keep track of the infrastructure Terraform manages
  • stored locally in the terraform.tfstate
  • recommended not to edit the state manually
  • Use terraform state command
    • mv – to move/rename modules
    • rm – to safely remove resource from the state. (destroy/retain like)
    • pull – to observe current remote state
    • list & show – to write/debug modules

State Locking

  • happens for all operations that could write state, if supported by backend
  • prevents others from acquiring the lock & potentially corrupting the state
  • backends which support state locking are
    • azurerm
    • Hashicorp consul
    • Tencent Cloud Object Storage (COS)
    • etcdv3
    • Google Cloud Storage GCS
    • HTTP endpoints
    • Kubernetes Secret with locking done using a Lease resource
    • AliCloud Object Storage OSS with locking via TableStore
    • PostgreSQL
    • AWS S3 with locking via DynamoDB
    • Terraform Enterprise
  • Backends which do not support state locking are
    • artifactory
    • etcd
  • can be disabled for most commands with the -lock flag
  • use force-unlock command to manually unlock the state if unlocking failed

State Security

  • can contain sensitive data, depending on the resources in use for e.g passwords and keys
  • using local state, data is stored in plain-text JSON files
  • using remote state, state is held in memory when used by Terraform. It may be encrypted at rest, if supported by backend for e.g. S3, OSS

Terraform Logging

  • debugging can be controlled using TF_LOG , which can be configured for different levels TRACE, DEBUG, INFO, WARN or ERROR, with TRACE being the more verbose.
  • logs path can be controlled TF_LOG_PATHTF_LOG needs to be specified.

Terraform Cloud and Terraform Enterprise

  • Terraform Cloud provides Cloud Infrastructure Automation as a Service. It is offered as a multi-tenant SaaS platform and is designed to suit the needs of smaller teams and organizations. Its smaller plans default to one run at a time, which prevents users from executing multiple runs concurrently.
  • Terraform Enterprise is a private install for organizations who prefer to self-manage. It is designed to suit the needs of organizations with specific requirements for security, compliance and custom operations.
  • Terraform Cloud provides features
    • Remote Terraform Execution – supports Remote Operations for Remote Terraform execution which helps provide consistency and visibility for critical provisioning operations.
    • Workspaces – organizes infrastructure with workspaces instead of directories. Each workspace contains everything necessary to manage a given collection of infrastructure, and Terraform uses that content whenever it executes in the context of that workspace.
    • Remote State Management – acts as a remote backend for the Terraform state. State storage is tied to workspaces, which helps keep state associated with the configuration that created it.
    • Version Control Integration – is designed to work directly with the version control system (VCS) provider.
    • Private Module Registry – provides a private and central library of versioned & validated modules to be used within the organization
    • Team based Permission System – can define groups of users that match the organization’s real-world teams and assign them only the permissions they need
    • Sentinel Policies – embeds the Sentinel policy-as-code framework, which lets you define and enforce granular policies for how the organization provisions infrastructure. Helps eliminate provisioned resources that don’t follow security, compliance, or operational policies.
    • Cost Estimation – can display an estimate of its total cost, as well as any change in cost caused by the proposed updates
    • Security – encrypts state at rest and protects it with TLS in transit.
  • Terraform Enterprise features
    • includes all the Terraform Cloud features with
    • Audit – supports detailed audit logging and tracks the identity of the user requesting state and maintains a history of state changes.
    • SSO/SAML – SAML for SSO provides the ability to govern user access to your applications.
  • Terraform Enterprise currently supports running under the following operating systems for a Clustered deployment:
    • Ubuntu 16.04.3 – 16.04.5 / 18.04
    • Red Hat Enterprise Linux 7.4 through 7.7
    • CentOS 7.4 – 7.7
    • Amazon Linux
    • Oracle Linux
    • Clusters currently don’t support other Linux variants.
  • Terraform Cloud currently supports following VCS Provider
    • GitHub.com
    • GitHub.com (OAuth)
    • GitHub Enterprise
    • GitLab.com
    • GitLab EE and CE
    • Bitbucket Cloud
    • Bitbucket Server
    • Azure DevOps Server
    • Azure DevOps Services
  • A Terraform Enterprise install that is provisioned on a network that does not have Internet access is generally known as an air-gapped install. These types of installs require you to pull updates, providers, etc. from external sources vs. being able to download them directly.