GCP Identity and Access Management – IAM

GCP Identity and Access Management

  • Google Cloud Identity and Access Management (IAM) lets administrators authorize who can take what action on which resources
  • IAM provides a unified view into security policy across the entire organization, with built-in auditing to ease compliance processes.

IAM Components

IAM manages access control by defining who (identity) has what access (role) for which resource

IAM architecture

Member

    • A member can be a Google Account (for end users), a service account (for apps and virtual machines), a Google group, or a Google Workspace or Cloud Identity domain that can access a resource.
    • Identity of a member is an email address associated with a user, service account, or Google group; or a domain name associated with Google Workspace or Cloud Identity domains.

Role

    • Permission to access a resource isn’t granted directly to the end user
    • A role is a collection of permissions and roles are granted to authenticated members.
    • Permissions are represented in the form service.resource.verb for e.g. compute.instances.list
    • Permissions determine what operations are allowed on a resource.
    • Role granted to a member grants all the permissions that the role contains
    • GCP supports different role types
      • Basic roles
        • Roles historically available in the Google Cloud Console. Also, referred to as primitive roles
        • Roles are Owner, Editor, and Viewer.
        • Provide board level of permissions and not recommended
      • Predefined roles
        • Roles that give finer-grained granular access control than the basic roles.
        • Roles are created and maintained by Google and automatically  updated as necessary, such as when Google Cloud adds new features or services.
      • Custom roles
        • Roles created to tailor permissions to the needs of the organization, when predefined roles don’t meet the needs.
        • Custom roles are not maintained by Google; when new permissions, features, or services are added to Google Cloud, the custom roles will not be updated automatically.

IAM Policy

    • IAM policy binds one or more members to a role.
    • An IAM policy defines and enforces what roles are granted to which members, and this policy is attached to a resource.
    • IAM policy attach to the resource defines who (member) has what type of access (role) on the resource
    • IAM policy can be set at any level in the resource hierarchy:  organization level,  folder level, the project level, or the resource level.
    • IAM Policy inheritance is transitive and resources inherit the policies of all of their parent resources.
    • Effective policy for a resource is the union of the policy set on that resource and the policies inherited from higher up in the hierarchy.
  • Basically  Permissions -> Roles -> (IAM Policy) -> Members
  • When an authenticated member attempts to access a resource, IAM checks the resource’s policy to determine whether the action is permitted.

Service Accounts

  • A service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person.
  • Applications use service accounts to make authorized API calls, authorized as either the service account itself, or as Google Workspace or Cloud Identity users through domain-wide delegation.
  • A service account is identified by its email address, which is unique to the account.
  • Service accounts do not have passwords, and cannot log in via browsers or cookies.
  • Service accounts are associated with private/public RSA key-pairs that are used for authentication to Google.
  • Other users or service accounts cannot impersonate a service account.
  • Service accounts are not members of the Google Workspace domain, unlike user accounts. If you share Google Workspace assets, like docs or events, with all members in your Google Workspace domain, they are not shared with service accounts.

Workload Identity Federation

  • Using identity federation, on-premises or multi-cloud workloads can be granted access to GCP resources, without using a service account key.
  • Identity federation can be used with AWS, or with any identity provider that supports OpenID Connect (OIDC), such as Microsoft Azure
  • With identity federation, IAM can be used to grant external identities IAM roles, including the ability to impersonate service accounts using short-lived access token, and eliminates the maintenance and security burden associated with service account keys.

IAM Recommender

  • IAM recommender helps enforce the principle of least privilege by ensuring that members have only the permissions that they actually need.
  • IAM uses Recommender to compare project-level role grants with the permissions that each member used during the past 90 days.
  • Depending on the usage of the member and permissions provided, IAM recommender recommends less permissive role or to revoke the role.
  • IAM recommender never suggests a change that increases a member’s level of access.
  • IAM recommender also uses machine learning to identify permissions in a member’s current role that the member is likely to need in the future, even if the member did not use those permissions in the past 90 days.
  • IAM recommender does not apply recommendations automatically, but the recommendations should be reviewed and then applied or dismissed.
  • IAM recommender does not evaluate
    • Role grants made at the folder or organization level
    • Role grants made below the project level; that is, role grants on service-specific resources within a project
    • Conditional role grants
    • Role grants for Google-managed service accounts
    • Access controls that are separate from IAM

IAM Audit Logging

  • Google Cloud services write audit logs to help answer the questions, “Who did what, where, and when?”
  • Cloud projects contain only the audit logs for resources that are directly within the project. Other entities, such as folders, organizations, and Cloud Billing accounts, contain the audit logs for the entity itself.
  • IAM writes Admin Activity audit logs, which record operations that modify the configuration or metadata of a resource. Admin Activity audit logs cannot be disabled
  • IAM writes Data Access audit logs, if explicitly enabled. Data Access audit logs contain API calls that read the configuration or metadata of resources, as well as user-driven API calls that create, modify, or read user-provided resource data.
  • IAM doesn’t write System Event audit logs.
  • IAM doesn’t write Policy Denied audit logs.
  • Cloud Audit Logs provides the following audit logs for each Cloud project, folder, and organization:
    • Admin Activity audit logs
    • Data Access audit logs
    • System Event audit logs
    • Policy Denied audit logs

 

 

GCP Resource Manager

GCP Resource Manager

Google Cloud Platform – Resource manager help manage resource containers such as organizations, folders, and projects that allow you to group and hierarchically organize other GCP resources

Resource Hierarchy

Organizations

  • Organization resource is the root node in the Google Cloud resource hierarchy and is the hierarchical super node and ancestor of project resources and folders.
  • Organization is the top of the hierarchy and does not have a parent.
  • Organization provides central visibility and control over every resource that belongs to an organization
  • With an Organization resource, projects belong to the organization instead of the employee who created the project, which means that the projects are no longer deleted when an employee leaves the company; instead they will follow the organization’s lifecycle on Google Cloud.
  • Organization administrators have central control of all resources. They can view and manage all of the company’s projects
  • IAM access control policies applied on the Organization resource apply throughout the hierarchy on all resources in the organization.
  • Roles granted at the organization level are inherited by all projects and folders under the Organization resource
  • Google Workspace or Cloud Identity account represents a company and is a prerequisite to have access to the Organization resource. It provides identity management, recovery mechanism, ownership and lifecycle management
  •  Google Workspace super admin is the individual responsible for domain ownership verification and the contact in cases of recovery.

Folders

  • Folders are an additional grouping mechanism on top of projects and provide isolation boundaries between projects
  • Organization resource is a prerequisite to use folders.
  • Folders can be used to model different legal entities, departments, and teams within a company
  • Folders allow delegation of administration rights as well as control or limit access to resources within the folder

Projects

  • Project resource is the base-level organizing entity
  • Organizations and folders may contain multiple projects
  • A project is required to use Google Cloud, and forms the basis for creating, enabling, and using all Google Cloud services, managing APIs, enabling billing, adding and removing collaborators, and managing permissions.

IAM policy inheritance

  • IAM lets you control who (users) has what access (roles) to which resources by setting IAM policies on the resources.
  • Resources inherit the policies of the parent node i.e. policy set at the Organization level is inherited by all its child folders and projects, and if a policy set at the project level, it is inherited by all its child resources.
  • There is no way to explicitly remove a permission for a lower-level resource that is granted at a higher level in the resource hierarchy.
  • The effective policy for a resource is the union of the policy set on the resource and the policy inherited from its ancestors.
  • Permission inheritance is transitive i.e. resources inherit policies from the project, which inherit policies from the organization.
  • IAM policy hierarchy follows the same path as the Google Cloud resource hierarchy i.e. if the resource hierarchy is changed for e.g. moving a project from one folder to the other, the policy hierarchy changes as well.

Organization Policy Service

  • Organization Policy Service gives a centralized and programmatic control over the organization’s cloud resources
  • Organization Policy Service benefits
    • Centralize control to configure restrictions on how the organization’s resources can be used.
    • Define and establish guardrails for the development teams to stay within compliance boundaries.
    • Help project owners and their teams move quickly without worry of breaking compliance.
  • When an organization policy is set on a resource hierarchy node, all descendants of that node inherit the organization policy by default. i.e. organization policy set at the root organization node, will pass down the defined restriction through all descendant folders, projects, and service resources.

Reference

Google Cloud Platform – Resource Manager

GCP Google Cloud Storage – GCS

GCP Google Cloud Storage – GCS

  • Google Cloud Storage is a service for storing objects in Google Cloud.
  • Google Cloud Storage provides a RESTful service for storing and accessing your data on Google’s infrastructure.
  • GCS combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities.

Google Cloud Storage Components

Buckets

  • All buckets are associated with a project, and can projects be grouped under an organization.
  • Bucket name requirements
    • must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Spaces are not allowed. Names containing dots require verification
    • must start and end with a number or letter.
    • must contain 3-63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.
    • cannot be represented as an IP address for e.g., 192.168.5.4
    • cannot begin with the “goog” prefix.
    • cannot contain “google” or close misspellings, such as “g00gle”.
  • Bucket name considerations
    • reside in a single Cloud Storage namespace.
    • must be unique.
    • are publicly visible.
    • can only be assigned during creation and cannot be changed.
    • can be used in a DNS record as part of a CNAME or A redirect.

Objects

  • An object is a piece of data consisting of a file of any format.
  • Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime.
  • Objects are stored in containers called buckets.
  • Object names reside in a flat namespace within a bucket, which means
    • Different buckets can have objects with the same name.
    • Objects do not reside within subdirectories in a bucket.
  • Existing objects cannot be directly renamed and need to be copied

Object Metadata

  • Objects stored in Cloud Storage have metadata associated with them
  • Metadata exists as key:value pairs and identifies properties of the object
  • Mutability of metadata varies as some metadata is set at the time the object is created for e.g. Content-Type, Cache-Control while for others they  can be edited at any time

Composite Objects

  • Composite objects help making appends to an existing object, as well as for recreating objects uploaded as multiple components in parallel.
  • Compose operation works with objects
    • having the same storage class.
    • be stored in the same Cloud Storage bucket.
    • NOT use customer-managed encryption keys.

GCS Locations

  • GCS buckets need to be created in a location for storing the object data.
  • GCS support different location types
    • regional
      • A region is a specific geographic place, such as London.
      • helps optimize latency and network bandwidth for data consumers, such as analytics pipelines, that are grouped in the same region.
    • dual-region
      • is a specific pair of regions, such as Finland and the Netherlands.
      • provides higher availability that comes with being geo-redundant.
    • multi-region
      • is a large geographic area, such as the United States, that contains two or more geographic places.
      • allows you to serve content to data consumers that are outside of the Google network and distributed across large geographic areas, or
      • provides  higher availability that comes with being geo-redundant.
  • Objects stored in a multi-region or dual-region are geo-redundant i.e. data is stored redundantly in at least two separate geographic places separated by at least 100 miles.

GCS Storage Classes

Refer blog Google Cloud Storage – Storage Classes

GCS Requester Pays

  • Project owner of the resource is billed normally for the access which includes operation charges, network charges and data retrieval charges
  • However, if the requester provides a billing project with their request, the requester’s project is billed instead.
  • With Requester Pays enabled on the bucket, it requires requester to include a billing project in their requests, thus billing the requester’s project.
  • Enabling Requester Pays is useful, for e.g., if you have a lot of data you want to make available to users, but you don’t want to be charged for their access to that data.
  • Requester Pays does not cover the storage charges and early deletion charges

GCS Upload and Download

  • GCS supports upload and storage of any MIME type of data up to 5 TB in size.
  • Uploaded object consists of the data along with any associated metadata
  • GCS supports multiple upload types
    • Simple upload – ideal for small files that can be uploaded again in its entirety  if the connection fails, and if there is no object metadata to send as part of the request.
    • Multipart upload – ideal for small files that can be uploaded again in its entirety  if the connection fails, and there is a need to include object metadata as part of the request.
    • Resumable upload – ideal for large files with a need for more reliable transfer. Supports streaming transfers, which is a type of resumable upload that allows uploading an object of unknown size.

Resumable upload

  • Resumable uploads are the recommended method for uploading large files, because they don’t need to be restarted from the beginning if there is a network failure while the upload is underway.
  • Resumable upload allows resumption of data transfer operations to Cloud Storage after a communication failure has interrupted the flow of data
  • Resumable uploads work by sending multiple requests, each of which contains a portion of the object you’re uploading.
  • Resumable upload mechanism supports transfers where the file size is not known in advance or for streaming transfer.
  • Resumable upload must be completed within a week of being initiated.

Streaming transfers

  • Cloud Storage supports streaming transfers, which allows streaming data to and from the Cloud Storage account without requiring that the data first be saved to a file.
  • Streaming uploads are useful when uploading data whose final size is not known at the start of the upload, such as when generating the upload data from a process, or when compressing an object on-the-fly.
  • Streaming downloads are useful to download data from Cloud Storage into a process.

Parallel composite uploads

  • Parallel composite uploads divides a file into up to 32 chunks, which are uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted
  • Parallel composite uploads can be significantly faster if network and disk speed are not limiting factors; however, the final object stored in the bucket is a composite object, which only has a crc32c hash and not an MD5 hash.
  • As a result, crcmod needs to be used to perform integrity checks when downloading the object with gsutil or other Python applications.
    You should only perform parallel composite uploads if the following apply:
  • Parallel composite uploads does not support bucket with default customer-managed encryption keys, because the compose operation does not support source objects encrypted in this way.
  • Parallel composite uploads do not need the uploaded objects to have an MD5 hash.

Object Versioning

  • Object Versioning retains a noncurrent object version when the live object version gets replaced or deleted
  • Object Versioning increases storage costs as it maintains the current and noncurrent versions of the object, which can be partially mitigated by
  • Noncurrent versions retain the name of the object, but are uniquely identified by their generation number.
  • Noncurrent versions only appear in requests that explicitly call for object versions to be included.
  • Objects versions can be permanently deleted by including the generation number or configuring Object Lifecycle Management to delete older object versions

Retention policies

  • Retention policy on a bucket ensures that all current and future objects in the bucket cannot be deleted or replaced until they reach the defined age
  • Retention policy can be applied when creating a bucket or to an existing bucket
  • Retention policy retroactively applies to existing objects in the bucket as well as new objects added to the bucket.

Retention policy locks

  • Retention policy locks will lock a retention policy on a bucket, which prevents the policy from ever being removed or the retention period from ever being reduced (although it can be increased)
  • Once a retention policy is locked, the bucket cannot be deleted until every object in the bucket has met the retention period.
  • Locking a retention policy is irreversible

Bucket Lock

  • Bucket Lock feature provides immutable storage on Cloud Storage
  • Bucket Lock feature allows configuring a data retention policy for a bucket that governs how long objects in the bucket must be retained
  • Bucket Lock feature also locks the data retention policy, permanently preventing the policy from being reduced or removed.
  • Bucket Lock can help with regulatory and compliance requirements

Object Holds

  • Object holds, when set on individual objects, prevents the object from being deleted or replaced, however allows metadata to be edited.
  • Cloud Storage offers the following types of holds:
    • Event-based holds.
    • Temporary holds.
  • When an object is stored in a bucket without a retention policy, both hold types behave exactly the same.
  • When an object is stored in a bucket with a retention policy, the hold types have different effects on the object when the hold is released:
    • An event-based hold resets the object’s time in the bucket for the purposes of the retention period.
    • A temporary hold does not affect the object’s time in the bucket for the purposes of the retention period.

Object Lifecycle Management

  • Object Lifecycle Management helps configure transition or expiration of the objects based on specified rules for e.g.  SetStorageClass to downgrade the storage class, delete to expire noncurrent objects
  • Lifecycle management configuration can be be applied to a bucket, which contains a set of rules applied to current and future objects in the bucket
  • Lifecycle management rules precedence
    • Delete action takes precedence over any SetStorageClass action.
    • SetStorageClass action switches the object to the storage class with the lowest at-rest storage pricing takes precedence.
  • Cloud Storage does not validate correctness of the storage class transition
  • Lifecycle actions can be tracked using Cloud Storage usage logs or using Pub/Sub Notifications for Cloud Storage

GCS Object Lifecycle Management

  • Object Lifecycle Behavior
    • Cloud Storage performs the action asynchronously, so there can be a lag between when the conditions are satisfied and the action is taken
    • Updates to lifecycle configuration may take up to 24 hours to take effect
    • Delete action will not take effect on an object while the object either has an object hold placed on it or a unfulfilled retention policy
    • SetStorageClass action is not affected by the existence of object holds or retention policies.

GCS Access Control

  • Cloud Storage offers two systems for granting users permission to access the buckets and objects: IAM and Access Control Lists (ACLs)
  • IAM and ACLs can be used on the same resource, Cloud Storage grants the broader permission set on the resource
  • Cloud Storage access control can be performed using
    • Uniform (recommended)
      • Uniform bucket-level access allows using IAM alone to manage permissions. IAM applies permissions to all the objects contained inside the bucket or groups of objects with common name prefixes.
      • IAM also allows using features that are not available when working with ACLs, such as IAM Conditions and Cloud Audit Logs.
      • Enabling uniform bucket-level access disables ACLs, but it can be reversed before 90 days
    • Fine-grained
      • Fine-grained option enables using IAM and Access Control Lists (ACLs) together to manage permissions.
      • ACLs are a legacy access control system for Cloud Storage designed for interoperability with Amazon S3.
      • Access and apply permissions can be specified at both the bucket level and per individual object.
  • Objects in the bucket can be made public using ACLs AllUsers:R or IAM allUsers:objectViewer permissions

Signed URLs

  • Signed URLs provide time-limited read or write access to an object through a generated URL.
  • Anyone having access to the URL can access the object for the duration of time specified, regardless of whether or not they have a Google account.

Signed Policy Documents

  • Signed policy documents helps specify what can be uploaded to a bucket.
  • Policy documents allow greater control over size, content type, and other upload characteristics than signed URLs, and can be used by website owners to allow visitors to upload files to Cloud Storage.

CORS

  • Cloud Storage allows setting CORS configuration at the bucket level only

Data Encryption

  • Cloud Storage always encrypts the data on the server side, before it is written to disk, at no additional charge.
  • Cloud supports following encryption
    • Server-side encryption: encryption that occurs after Cloud Storage receives the data, but before the data is written to disk and stored.
      • Google-managed encryption keys
        • Cloud Storage always encrypts the data on the server side, before it is written to disk
        • Cloud Storage manages server-side encryption keys using the same hardened key management systems, including strict key access controls and auditing.
        • Cloud Storage encrypts user data at rest using AES-256.
        • Data is automatically decrypted when read by an authorized user
      • Customer-supplied encryption keys
        • customers create and manage their own encryption keys.
      • Customer-managed encryption keys
        • customers manage their own encryption keys generated by Cloud Key Management Service (KMS)
        • Cloud Storage does not permanently store the key on Google’s servers or otherwise manage your key.
        • Customer provides the key for each GCS operation, and the key is purged from Google’s servers after the operation is complete
        • Cloud Storage stores only a cryptographic hash of the key so that future requests can be validated against the hash.
        • The key cannot be recovered from this hash, and the hash cannot be used to decrypt the data.
    • Client-side encryption: encryption that occurs before data is sent to Cloud Storage, encrypted at client side. This data also undergoes server-side encryption.
  • Cloud Storage supports Transport Layer Security, commonly known as TLS or HTTPS for data encryption in transit

Cloud Storage Tracking Updates

    • Pub/Sub notifications
      • sends information about changes to objects in the buckets to Pub/Sub, where the information is added to a specified Pub/Sub topic in the form of messages.
      • Each notification contains information describing both the event that triggered it and the object that changed.
    • Audit Logs
      • Google Cloud services write audit logs to help you answer the questions, “Who did what, where, and when?”
      • Cloud projects contain only the audit logs for resources that are directly within the project.
      • Cloud Audit Logs generates the following audit logs for operations in Cloud Storage:
        • Admin Activity logs: Entries for operations that modify the configuration or metadata of a project, bucket, or object.
        • Data Access logs: Entries for operations that modify objects or read a project, bucket, or object.

Data Consistency

  • Cloud Storage operations are strongly consistent and which are eventually consistent
  • Cloud Storage provides strong global consistency for the following operations, including both data and metadata:
    • Read-after-write
    • Read-after-metadata-update
    • Read-after-delete
    • Bucket listing
    • Object listing
  • Cloud Storage provides eventual consistency for following operations
    • Granting access to or revoking access from resources.

References

Google Cloud Platform – Cloud Storage

GCP Google Cloud Storage – Storage Classes

Google Cloud Storage – Storage Classes

  • Google Cloud Storage – Storage class affects the object’s availability and pricing model
  • Storage class of an existing object can be changed either by rewriting the object or by using Object Lifecycle Management.
  • Bucket’s default storage class is set to Standard Storage, if  not specified
  • A default storage class for the bucket can be specified so when a bucket is created, all the objects added to the bucket will inherit this storage class unless explicitly set otherwise.
  • Changing the default storage class of a bucket does not affect any of the objects that already exist in the bucket.

Available storage classes

  • All storage classes provide the following
    • Unlimited storage with no minimum object size.
    • Worldwide accessibility and worldwide storage locations.
    • Low latency (time to first byte typically tens of milliseconds).
    • High durability (99.999999999% annual durability).
    • Geo-redundancy if the data is stored in a multi-region or dual-region.
    • A uniform experience with Cloud Storage features, security, tools, and APIs.

Standard Storage

  • Standard Storage is best for data that is frequently accessed (hot data) and/or stored for only brief periods of time.
  • for regional locations
    • is appropriate for storing data in the same location for Co-locating the resources such as GKE clusters or Compute Engine instances with the data used, which helps in maximizing performance can reduce network charges.
    • Availability SLA – 99.9%
  • for dual-region,
    • provides optimized performance when accessing Google Cloud products that are located in one of the associated regions,
    • provides improved availability that comes from storing data in geographically separate locations.
  • for multi-region
    • ideal for storing data that is accessed around the world, such as serving website content, streaming videos, executing interactive workloads, or serving data supporting mobile and gaming applications.

Nearline Storage

  • Nearline Storage is a low-cost, highly durable storage service for storing infrequently accessed data (warm data)
  • Nearline Storage is a better choice than Standard Storage in scenarios where slightly lower availability, a 30-day minimum storage duration, and costs for data access are acceptable trade-offs for lowered at-rest storage costs
  • Nearline Storage is ideal for data you plan to read or modify on average once per month or less. for e.g., if you want to continuously add files to Cloud Storage and plan to access those files once a month for analysis, Nearline Storage is a great choice.
  • Nearline Storage is also appropriate for data backup, long-tail multimedia content, and data archiving.

Coldline Storage

  • Coldline Storage provides a very-low-cost, highly durable storage service for storing infrequently accessed data (cold data)
  • Coldline Storage is a better choice than Standard Storage or Nearline Storage in scenarios where slightly lower availability, a 90-day minimum storage duration, and higher costs for data access are acceptable trade-offs for lowered at-rest storage costs.
  • Coldline Storage is ideal for data you plan to read or modify at most once a quarter.

Archive Storage

  • Archive Storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery. (coldest data)
  • Data is available within milliseconds, not hours or days.
  • Archive Storage has no availability SLA, though the typical availability is comparable to Nearline Storage and Coldline Storage.
  • Archive Storage has higher costs for data access and operations, as well as a 365-day minimum storage duration.
  • Archive Storage is the best choice for data that you plan to access less than once a year. for e.g. cold data storage for archival and disaster recovery

Google Cloud Storage - Storage Classes

Legacy Storage Classes

  • Google Cloud Storage provided additional storage classes which have be phased out
    • Multi-Regional Storage
      • Equivalent to Standard Storage, except Multi-Regional Storage can only be used for objects stored in multi-regions or dual-regions.
    • Regional Storage
      • Equivalent to Standard Storage, except Regional Storage can only be used for objects stored in regions.
    • Durable Reduced Availability (DRA) Storage:
      • Similar to Standard Storage except:
        • DRA has higher pricing for operations.
        • DRA has lower performance, particularly in terms of availability (DRA has a 99% availability SLA).

References

Google Cloud – Associate Cloud Engineer Certification learning path

Google Cloud Certified - Associate Cloud Engineer

Google Cloud – Associate Cloud Engineer certification exam is basically for one who works day-in day-out with the Google Cloud Services. It targets an Cloud Engineer who deploys applications, monitors operations, and manages enterprise solutions. The exam makes sure it covers gamut of services and concepts. Although, the exam is not that tough and time available of 2 hours a quite plenty, if you well prepared.

Quick summary of the exam

  • Wide range of Google Cloud services and what they actually do. It focuses heavily on IAM, Compute, Storage. There is little bit of Network but hardly any data services.
  • Hands-on is a must. Covers Cloud SDK commands and Console operations that you would use for day-to-day work. If you have not worked on GCP before make sure you do lot of labs else you would be absolute clueless for some of the questions and commands
  • Tests are updated for the latest enhancements. There are no reference of Google Container Engine and everything was Google Kubernetes Engine, covers Cloud Functions, Cloud Spanner.
  • Once again be sure that NO Online Course or Practice tests is going to cover all. I did LinuxAcademy which covered maybe 60-70%, but hands-on or practical knowledge is MUST

The list of topics is quite long, but something that you need to be sure to cover are

  • General Services
    • Billing
      • understand how billing works. Monthly vs Threshold and which has priority
      • how to change a billing account for a project and what roles you need. Hint – Project Owner and Billing Administrator for the billing account
    • Cloud SDK
      • understand gcloud commands esp. when dealing with
        • configurations i.e. gcloud config
          • activate profiles or set project and accounts
        • app engine i.e gcloud iam
          • check roles
        • deployment manager i.e. gcloud deployment-manager
  • Network Services
    • Virtual Private Cloud
      • Create a Custom Virtual Private Cloud (VPC), subnets and host applications within them Hint VPC spans across region
      • Understand how Firewall rules works and how they are configured. Hint – Focus on Network Tags.
      • Understand the concept internal and external IPs and difference between static and ephemeral IPs
    • Load Balancer
  • Identity Services
    • Cloud IAM 
      • provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
      • Understand how IAM works and how rules apply esp. the hierarchy from Organization -> Folder -> Project -> Resources
      • Understand the difference between Primitive, Pre-defined and Custom roles and their use cases
      • Need to know and understand the roles for the following services atleast
        • Cloud Storage – Admin vs Creator vs Viewer
        • Compute Engine – Admin vs Instance Admin
        • Spanner – Viewer vs Database User
        • BigQuery – User vs JobUser
      • Know how to copy roles to different projects or organization. Hint – gcloud iam roles copy
      • Know how to use service accounts with applications
  • Compute Services
    • Make sure you know all the compute services Google Compute Engine, Google App Engine and Google Kubernetes Engine, they are heavily covered in the exam.
    • Google Compute Engine
      • Google Compute Engine is the best IaaS option for compute and provides fine grained control
      • Make sure you know how to create a GCE, connect to it using Cloud shell or ssh keys
      • Make sure you know the difference between backups and images and how to create the same
      • Understand how you can recreate instance in different zones and regions
      • Know difference between managed vs unmanaged instance groups and auto-healing feature
      • Understand Preemptible VMs and their use cases.
      • know how to upgrade an instance without downtime. HINT – live migration.
      • In case of any issues or errors, how to debug the same
    • Google App Engine
      • Google App Engine is mainly the best option for PaaS with platforms supported and features provided.
      • Deploy an application with App Engine and understand how versioning and rolling deployments can be done
      • Understand how to keep auto scaling and traffic splitting and migration.
      • Know App Engine is a regional resource and understand the steps to migrate or deploy application to different region and project.
    • Google Kubernetes Engine
      • Google Container Engine is now officially Google Kubernetes Engine and the questions refer to the same
      • Google Kubernetes Engine, powered by the open source container scheduler Kubernetes, enables you to run containers on Google Cloud Platform.
      • Kubernetes Engine takes care of provisioning and maintaining the underlying virtual machine cluster, scaling your application, and operational logistics such as logging, monitoring, and cluster health management.
      • Be sure to Create a Kubernetes Cluster and configure it to host an application
      • Understand how to make the cluster auto repairable and upgradable. Hint – Node auto-upgrades and auto-repairing feature
      • Very important to understand where to use gcloud commands (to create a cluster) and kubectl commands (manage the cluster components)
      • Very important to understand how to increase cluster size and enable autoscaling for the cluster
      • know how to manage secrets like database passwords
  • Storage Services
    • Understand each storage service options and their use cases.
    • Cloud Storage
      • cost-effective object storage for an unstructured data.
      • very important to know the different classes and their use cases esp. Regional and Multi-Regional (frequent access), Nearline (monthly access) and Coldline (yearly access)
      • Understand life cycle management. HINT – Changes are in accordance to object creation date
      • Understand Signed URL to give temporary access and the users do not need to be GCP users
      • Understand permissions – IAM vs ACLs (fine grained control)
    • Relational Databases
      • Know Cloud SQL and Cloud Spanner
      • Cloud SQL
        • is a fully-managed service that provides MySQL and PostgreSQL only.
        • limited to 10TB and is a regional service.
        • know the difference between Failover and Read replicas
        • know how to perform Point-In-Time recovery. Hint – required binary logging and backups
      • Cloud Spanner
        • is a fully managed, mission-critical relational database service.
        • provides a scalable online transaction processing (OLTP) database with high availability and strong consistency at global scale.
        • globally distributed and can scale and handle more than 10TB.
        • not a direct replacement and would need migration
      • There are no direct options for Microsoft SQL Server or Oracle yet.
    • Data Warehousing
      • BigQuery
        • provides scalable, fully managed enterprise data warehouse (EDW) with SQL and fast ad-hoc queries.
        • Remember it is most suitable for historical analysis.
        • know how to perform a preview or dry run. Hint – price is determined by bytes read not bytes returned.
  • Data Services
    • Although there were only a couple of reference of big data services in the exam, it is important to know (DO NOT DEEP DIVE) the Big Data stack (esp. IoT gateway, Pub/Sub, Bigtable vs BigQuery) to understand which service fits the different layers of ingest, store, process, analytics, use
      • Cloud Storage as the medium to store data as data lake
      • Cloud Pub/Sub as the messaging service to capture real time data esp. IoT
      • Cloud Pub/Sub is designed to provide reliable, many-to-many, asynchronous messaging between applications esp. real time IoT data capture
      • Cloud Dataflow to process, transform, transfer data and the key service to integrate store and analytics.
      • Cloud BigQuery for storage and analytics. Remember BigQuery provides the same cost-effective option for storage as Cloud Storage
      • Cloud Dataprep to clean and prepare data. Hint – It can be used anomaly detection.
      • Cloud Dataproc to handle existing Hadoop/Spark jobs. Hint – Use it to replace existing hadoop infra.
      • Cloud Datalab is an interactive tool for exploration, transformation, analysis and visualization of your data on Google Cloud Platform
  • Monitoring
    • Google Stackdriver
      • provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
      • remember audits are mainly checking Stackdriver
  • DevOps services
    • Deployment Manager 
    • Cloud Launcher (Marketplace)
      • provides a way to launch common software packages e.g. Jenkins or WordPress and stacks on Google Compute Engine with just a few clicks like a prepackaged solution.
      • It can help minimize deployment time and can be used without any knowledge about the product

Resources

Google Cloud – Professional Data Engineer Certification learning path

After completing my Google Cloud – Professional Cloud Architect certification exam, I was looking into the Google Cloud – Professional Data Engineer exam and luckily Google Cloud was doing a pilot for their latest updated Professional Data Engineer certification exam. I applied for the free pilot and had a chance to appear for the exam. The pilot exam was 4 hours – 95 questions (as compared to 2 hrs – 50 questions). The results would be out in March 2019, but I can assure the overall exam is quite exhaustive. Once again, the exam covers not only the gamut of services and concepts but also the focus on logical thinking and practical experience.

Quick summary of the exam

  • Wide range of Google Cloud data services and what they actually do. It includes Storage, and a LOTS of Data services
  • Nothing much on Compute and Network is covered
  • Questions sometimes tests your logical thinking rather than any concept regarding Google Cloud.
  • Hands-on, if you have not worked on GCP before make sure you do lots of labs else you would be absolute clueless for some of the questions and commands
  • Tests are updated for the latest enhancements.
  • Pilot exam does not cover the cases studies. But given my Professional Cloud Architect exam experience, make sure you cover the case studies before hand.
  • Be sure that NO Online Course or Practice tests is going to cover all. I did Coursera, LinuxAcademy which is really vast, but hands-on or practical knowledge is MUST.

The list of topics is quite long, but something that you need to be sure to cover are

  • Identity Services
    • Cloud IAM 
      • provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
      • Understand how IAM works and how rules apply esp. the hierarchy from Organization -> Folder -> Project -> Resources
      • Understand IAM Best practices
      • Make sure you know the BigQuery Access roles
  • Storage Services
    • Understand each storage service options and their use cases.
    • Cloud Storage
      • cost-effective object storage for an unstructured data.
      • very important to know the different classes and their use cases esp. Regional and Multi-Regional (frequent access), Nearline (monthly access) and Coldline (yearly access)
      • Understand Signed URL to give temporary access and the users do not need to be GCP users
      • Understand permissions – IAM vs ACLs (fine grained control)
    • Relational Databases
      • Know Cloud SQL and Cloud Spanner
      • Cloud SQL
        • is a fully-managed service that provides MySQL and PostgreSQL only.
        • Limited to 10TB and is a regional service.
      • Cloud Spanner
        • is a fully managed, mission-critical relational database service.
        • provides a scalable online transaction processing (OLTP) database with high availability and strong consistency at global scale.
        • globally distributed and can scale and handle more than 10TB.
        • not a direct replacement and would need migration
      • There are no direct options for Microsoft SQL Server or Oracle yet.
    • NoSQL
      • Know Cloud Datastore and BigTable
      • Datastore
        • provides document database for web and mobile applications. Datastore is not for analytics
        • Understand Datastore indexes and how to update indexes for Datastore
      • Bigtable
        • provides column database suitable for both low-latency single-point lookups and precalculated analytics
        • understand Bigtable is not for long term storage as it is quite expensive
        • know the differences with HBase
        • Know how to measure performance and scale
    • Data Warehousing
      • BigQuery
        • provides scalable, fully managed enterprise data warehouse (EDW) with SQL and fast ad-hoc queries.
        • Remember it is most suitable for historical analysis.
        • know how to access control tables, columns within tables and query results (hint – Authorized View)
        • Be sure to cover the Best Practices including key strategy, cost optimization, partitioning and clustering
  • Data Services
    • Obviously there is lots of Data and Just Data
    • Know the Big Data stack and understand which service fits the different layers of ingest, store, process, analytics, use
    • Cloud Storage
      • as the medium to store data as data lake
      • understand what class is the best suited and which one provides geo-redundancy.
    • Cloud Pub/Sub
      • as the messaging service to capture real time data esp. IoT
    • Cloud Pub/Sub
      • is designed to provide reliable, many-to-many, asynchronous messaging between applications esp. real time IoT data capture
      • how it compares to Kafka
    • Cloud Dataflow
      • to process, transform, transfer data and the key service to integrate store and analytics.
      • know how to improve a Dataflow performance
      • Google expects you to know the Apache Beam features as well
    • Cloud BigQuery
      • for storage and analytics. Remember BigQuery provides the same cost-effective option for storage as Cloud Storage
      • understand how BigQuery Streaming works
      • know BigQuery limitations esp. with updates and inserts
    • Cloud Dataprep
      • to clean and prepare data. It can be used anomaly detection.
      • does not need any programming language knowledge and can be done through graphical interface
      • be sure to know or try hands-on on a dataset
    • Cloud Dataproc
      • to handle existing Hadoop/Spark jobs
      • you need to know how to improve the performance of the Hadoop cluster as well :). Know how to configure the hadoop cluster to use all the cores (hint- spark executor cores) and handle out of memory errors (hint – executor memory)
      • how to install other components (hint – initialization actions)
    • Cloud Datalab
      • is an interactive tool for exploration, transformation, analysis and visualization of your data on Google Cloud Platform
      • based on Jupyter
    • Cloud Composer
      • fully managed workflow orchestration service based on Apache Airflow
      • pipelines are configured as directed acyclic graphs (DAGs)
      • workflow lives on-premises, in multiple clouds, or fully within GCP.
      • provides ability to author, schedule, and monitor your workflows in a unified manner
  • Machine Learning
    • Google expects the Data Engineer to surely know some of the Data scientists stuff
    • Understand the different algorithms
      • Supervised Learning (labelled data)
        • Classification (for e.g. Spam or Not)
        • Regression (for e.g. Stock or House prices)
      • Unsupervised Learning (Unlabelled data)
        • Clustering (for e.g. categories)
      • Reinforcement Learning
    • Know Cloud ML with Tensorflow
    • Know all the Cloud AI products which include
      • Cloud Vision
      • Cloud Natural Language
      • Cloud Speech-to-Text
      • Cloud Video Intelligence
    • Cloud AutoML products, which can help you get started without much machine learning experience
  • Monitoring
    • Google Stackdriver provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
      remember audits are mainly checking Stackdriver
  • Security Services
    • Data Loss Prevention API to handle sensitive data esp. redaction of PII data.
    • understand Encryption techniques
  • Other Services
    • Storage Transfer Service allows import of large amounts of online data into Google Cloud Storage, quickly and cost-effectively. Online data is the key here as it supports AWS S3, HTTP/HTTPS and other GCS buckets. If the data is on-premises you need to use gsutil command
    • Transfer Appliance to transfer large amounts of data quickly and cost-effectively into Google Cloud Platform. Check for the data size and it would be always compared with Google Transfer Service or gsutil commands.
    • BigQuery Data Transfer Service to integrate with third-party services and load data into BigQuery

Resources

Google Cloud – Professional Cloud Architect Certification learning path

Google Cloud – Professional Cloud Architect certification exam is one of the toughest exam I have appeared for. It can surely be compared inline with the AWS Solution Architect/DevOps Professional exams. However, the gamut of services and concepts it tests your knowledge on is really vast.

Quick summary of the exam

  • Wide range of Google Cloud services and what they actually do. It includes Compute, Storage, Network and even Data services
  • Questions sometimes tests your logical thinking rather than any concept regarding Google Cloud.
  • Hands-on, if you have not worked on GCP before make sure you do lots of labs else you would be absolute clueless for some of the questions and commands
  • Tests are updated for the latest enhancements. There are no reference of Google Container Engine and everything was Google Kubernetes Engine, covers Cloud Functions, Cloud Spanner.
  • Make sure you cover the case studies before hand. I got around 15 questions (almost 5 per case study) and it can really be a savior for you in the exams.
  • Be sure that NO Online Course or Practice tests is going to cover all. I did LinuxAcademy which is really vast, but hands-on or practical knowledge is MUST.

The list of topics is quite long, but something that you need to be sure to cover are

  • Identity Services
    • Cloud IAM 
      • provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
      • Understand how IAM works and how rules apply esp. the hierarchy from Organization -> Folder -> Project -> Resources
      • How do you use on-premises authentication provider? Google Cloud Directory Sync (GCDS)
  • Compute Services
    • Make sure you know all the compute services Google Compute Engine, Google App Engine and Google Kubernetes Engine. You need to be sure to know the pros and cons and the use cases that you should use them.
    • Google Compute Engine
      • Google Compute Engine is the best IaaS option for compute and provides fine grained control
      • Make sure you know how to create a GCE, connect to it using Cloud shell or ssh keys
      • Make sure you know the difference between backups and images and how to create the same
      • Understand how you can recreate instance in different zones and regions
      • Understand the pricing and discounts model Hint – Sustained (automatic upto 30%) vs Committed (1 to 3 yrs) discounts.
      • Understand Preemptible VMs and their use cases.
      • Managed instance groups are covered heavily the exam, as they provide the key auto-scaling capability. Hint you need to create an Instance template and associate it with Instance group
      • Understand how migration or traffic splitting with Managed instance groups works Hint – rolling updates & deployments
      • In case of any issues or errors, how to debug the same
    • Google App Engine
      • Google App Engine is mainly the best option for PaaS with platforms supported and features provided.
      • Understand the key differences between Standard and Flexible App Engine. Hint – network did not work for Standard, so VPN connections did not work and you would need to use Flexible environment.
      • Deploy an application with App Engine and understand how versioning and rolling deployments can be done
    • Google Kubernetes Engine
      • Google Container Engine is now officially Google Kubernetes Engine and the questions refer to the same
      • Google Kubernetes Engine, powered by the open source container scheduler Kubernetes, enables you to run containers on Google Cloud Platform.
      • Kubernetes Engine takes care of provisioning and maintaining the underlying virtual machine cluster, scaling your application, and operational logistics such as logging, monitoring, and cluster health management.
      • Be sure to Create a Kubernetes Cluster and configure it to host an application
      • Very important to understand where to use gcloud commands (to create a cluster) and kubectl commands (manage the cluster components)
    • Cloud Functions
      • is a lightweight, event-based, asynchronous compute solution that allows you to create small, single-purpose functions that respond to cloud events without the need to manage a server or a runtime environment.
      • Remember that Cloud Functions is serverless and scales from zero to scale and back to zero as the demand changes.
  • Network Services
    • Virtual Private Cloud
      • Create a Custom Virtual Private Cloud (VPC), subnets and host applications within them Hint VPC spans across region
      • Understand how Firewall rules works and how they are configured. Hint – Focus on Network Tags.
      • Understand the concept of shared VPC which allows for access to resources using internal IPs
      • Understand VPC Peering and Private Google Access use cases
    • On-premises connectivity
      • Cloud VPN and Interconnect are 2 components which help you connect to on-premises data center.
      • Understand limitations of Cloud VPN esp. 1.5Gbps limit. How it can be improved with multiple tunnels.
      • Understand what are the requirements to setup Cloud VPN. Hint – Cloud Router is required for BGP.
      • Know Interconnect as the reliable high speed, low latency and dedicated bandwidth options.
    • Cloud Load Balancer (GCLB)
      • Google Cloud Load Balancing provides scaling, high availability, and traffic management for your internet-facing and private applications.
      • Understand Google Load Balancing options and their use cases esp. which is global and internal and what protocols they support.
  • Storage Services
    • Understand each storage service options and their use cases.
    • Persistent disks
      • attached to the Compute Engines, provide fast access however are limited in scalability, availability and scope.
      • Remember performance depends on the size of the disk
    • Cloud Storage
      • cost-effective object storage for an unstructured data.
      • very important to know the different classes and their use cases esp. Regional and Multi-Regional (frequent access), Nearline (monthly access) and Coldline (yearly access)
      • Understand how encryption works
      • Understand Signed URL to give temporary access and the users do not need to be GCP users
      • Understand permissions – IAM vs ACLs (fine grained control)
    • Relational Databases
      • Know Cloud SQL and Cloud Spanner
      • Cloud SQL
        • is a fully-managed service that provides MySQL and PostgreSQL only.
        • Limited to 10TB and is a regional service.
      • Cloud Spanner
        • is a fully managed, mission-critical relational database service.
        • provides a scalable online transaction processing (OLTP) database with high availability and strong consistency at global scale.
        • globally distributed and can scale and handle more than 10TB.
        • not a direct replacement and would need migration
      • There are no direct options for Microsoft SQL Server or Oracle yet.
    • NoSQL
      • Know Cloud Datastore and BigTable
      • Datastore
        • provides document database for web and mobile applications. Datastore is not for analytics
        • Understand Datastore indexes and how to update indexes for Datastore
        • Can be configured Multi-regional and regional
      • Bigtable
        • provides column database suitable for both low-latency single-point lookups and precalculated analytics
        • understand Bigtable is not for long term storage as it is quite expensive
    • Data Warehousing
      • BigQuery
        • provides scalable, fully managed enterprise data warehouse (EDW) with SQL and fast ad-hoc queries.
        • Remember it is most suitable for historical analysis.
    • MemoryStore and Firebase did not feature in any of the questions
  • Data Services
    • Although there is a different certification for Data Engineer, the Cloud Architect does cover data services. Data services are also part of the use cases so be sure to know about them
    • Know the Big Data stack and understand which service fits the different layers of ingest, store, process, analytics, use
    • Key Services which need to be mainly covered are –
      • Cloud Storage as the medium to store data as data lake
      • Cloud Pub/Sub as the messaging service to capture real time data esp. IoT
      • Cloud Pub/Sub is designed to provide reliable, many-to-many, asynchronous messaging between applications esp. real time IoT data capture
      • Cloud Dataflow to process, transform, transfer data and the key service to integrate store and analytics.
      • Cloud BigQuery for storage and analytics. Remember BigQuery provides the same cost-effective option for storage as Cloud Storage
      • Cloud Dataprep to clean and prepare data. Hint – It can be used anomaly detection.
      • Cloud Dataproc to handle existing Hadoop/Spark jobs. Hint – Use it to replace existing hadoop infra.
      • Cloud Datalab is an interactive tool for exploration, transformation, analysis and visualization of your data on Google Cloud Platform
  • Monitoring
    • Google Stackdriver
      • provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
      • remember audits are mainly checking Stackdriver
  • DevOps services
    • Deployment Manager is Infrastructure as Code
    • Cloud Source Repositories provides source code repository with Git version control to support collaborative development
    • Container Registry is a private Docker image storage system on Google Cloud Platform. Images are immutable.
    • Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure.
    • Cloud Launcher provides a way to launch common software packages e.g. Jenkins or WordPress and stacks on Google Compute Engine with just a few clicks like a prepackaged solution. It can help minimize deployment time
  • Security Services
    • Cloud Security Scanner is a web application security scanner that enables developers to easily check for a subset of common web application vulnerabilities in websites built on App Engine and Compute Engine.
    • Data Loss Prevention API to handle sensitive data esp. redaction of PII data.
    • Focus on PCI-DSS, how to handle to same. Remember, GCP services are PCI-DSS complaint, however you need to make sure for your applications and hosting to be inline with PCI-DSS.
    • Same concept as PCI-DSS applies to GDPR as well
  • Other Services
    • Storage Transfer Service allows import of large amounts of online data into Google Cloud Storage, quickly and cost-effectively. Online data is the key here as it supports AWS S3, HTTP/HTTPS and other GCS buckets. If the data is on-premises you need to use gsutil command
    • Transfer Appliance to transfer large amounts of data quickly and cost-effectively into Google Cloud Platform. Check for the data size and it would be always compared with Google Transfer Service or gsutil commands.
    • Spinnaker is an open source, multi-cloud, continuous delivery platform and does appear in answer options. So be sure to know about it.
    • Jenkins for Continuous Integration and Continuous Delivery.
  • Case Studies

Resources

Google Cloud Certified – Professional Cloud Architect exam assesses your ability to

Section 1: Designing and planning a cloud solution architecture

  • 1.1 Designing a solution infrastructure that meets business requirements. Considerations include:
    • business use cases and product strategy
    • cost optimization
    • supporting the application design
    • integration
    • movement of data
    • tradeoffs
    • build, buy or modify
    • success measurements (e.g., Key Performance Indicators (KPI), Return on Investment (ROI), metrics)
    • Compliance and observability
  • 1.2 Designing a solution infrastructure that meets technical requirements. Considerations include:
    • high availability and failover design
    • elasticity of cloud resources
    • scalability to meet growth requirements
  • 1.3 Designing network, storage, and compute resources. Considerations include:
    • integration with on premises/multi-cloud environments
      Cloud native networking (VPC, peering, firewalls, container networking)
    • identification of data processing pipeline
    • matching data characteristics to storage systems
    • data flow diagrams
    • storage system structure (e.g., Object, File, RDBMS, NoSQL, NewSQL)
    • mapping compute needs to platform products
  • 1.4 Creating a migration plan (i.e., documents and architectural diagrams). Considerations include:
    • integrating solution with existing systems
    • migrating systems and data to support the solution
    • licensing mapping
    • network and management planning
    • testing and proof-of-concept
  • 1.5 Envisioning future solution improvements. Considerations include:
    • cloud and technology improvements
    • business needs evolution
    • evangelism and advocacy

Section 2: Managing and provisioning solution Infrastructure

  • 2.1 Configuring network topologies. Considerations include:
    • extending to on-premise (hybrid networking)using VPN or Interconnect
    • extending to a multi-cloud environment which may include GCP to GCP communication
    • security
    • data protection
  • 2.2 Configuring individual storage systems. Considerations include:
    • data storage allocation
    • data processing/compute provisioning
    • security and access management
    • network configuration for data transfer and latency
    • data retention and data lifecycle management
    • data growth management
  • 2.3 Configuring compute systems. Considerations include:
    • compute system provisioning
    • compute volatility configuration (preemptible vs. standard)
    • network configuration for compute nodes
    • infrastructure provisioning technology configuration (e.g. Chef/Puppet/Ansible/Terraform)
    • container orchestration (e.g. Kubernetes)

Section 3: Designing for security and compliance

  • 3.1 Designing for security. Considerations include:
    • Identity and Access Management (IAM)
    • Resource hierarchy (organizations, folders, projects)
    • data security (key management, encryption)
    • penetration testing
    • Separation of Duties (SoD)
    • security controls
    • Managing customer-supplied encryption keys with Cloud KMS
  • 3.2 Designing for legal compliance. Considerations include:
    • legislation (e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), etc.)
      audits (including logs)
    • certification (e.g., Information Technology Infrastructure Library (ITIL) framework)

Section 4: Analyzing and optimizing technical and business processes

  • 4.1 Analyzing and defining technical processes. Considerations include:
    • Software Development Lifecycle Plan (SDLC)
    • continuous integration / continuous deployment
    • troubleshooting / post mortem analysis culture
    • testing and validation
    • IT enterprise process (e.g. ITIL)
    • business continuity and disaster recovery
  • 4.2 Analyzing and defining business processes. Considerations include:
    • stakeholder management (e.g. Influencing and facilitation)
    • change management
    • team assessment / skills readiness
    • decision making process
    • customer success management
    • cost optimization / resource optimization (Capex / Opex)
  • 4.3 Developing procedures to test resilience of solution in production (e.g., DiRT and Simian Army)

Section 5: Managing implementation

  • 5.1 Advising development/operation team(s) to ensure successful deployment of the solution. Considerations include:
    • application development
    • API best practices
    • testing frameworks (load/unit/integration)
    • data and system migration tooling
  • 5.2 Interacting with Google Cloud using GCP SDK (gcloud, gsutil and bq). Considerations include:
    • local installation
    • Google Cloud Shell

Section 6: Ensuring solution and operations reliability

  • 6.1 Monitoring/Logging/Alerting solution
  • 6.2 Deployment and release management
  • 6.3 Supporting operational troubleshooting
  • 6.4 Evaluating quality control measures

Case Studies

  • Mountkirk Games
  • Dress4Win
  • TerramEarth