Google Cloud Data Transfer Services

Google Cloud Data Transfer Services

Google Cloud Data Transfer services provide various options in terms of network and transfer tools to help transfer data from on-premises to Google Cloud network

Network Services

Cloud VPN

  • Provides network connectivity with Google Cloud between on-premises network and Google Cloud, or from Google Cloud to another cloud provider.
  • Cloud VPN still routes the traffic through the Internet.
  • Cloud VPN is quick to set up (as compared to Interconnect)
  • Each Cloud VPN tunnel can support up to 3 Gbps total for ingress and egress, but available bandwidth depends on the connectivity
  • Choose Cloud VPN to encrypt traffic to Google Cloud, or with lower throughput solution, or experimenting with migrating the workloads to Google Cloud
  • Cloud Interconnect offers a direct connection to Google Cloud through Google or one of the Cloud Interconnect service providers.
  • Cloud Interconnect service prevents data from going on the public internet and can provide a more consistent throughput for large data transfers
  • For enterprise-grade connection to Google Cloud that has higher throughput requirements, choose Dedicated Interconnect (10 Gbps to 100 Gbps) or Partner Interconnect (50 Mbps to 50 Gbps)
  • Cloud Interconnect provides access to all Google Cloud products and services from your on-premises network except Google Workspace.
  • Cloud Interconnect also allows access to supported APIs and services by using Private Google Access from on-premises hosts.
  • Direct Peering provides access to the Google network with fewer network hops than with a public internet connection
  • By using Direct Peering, internet traffic is exchanged between the customer network and Google’s Edge Points of Presence (PoPs), which means the data does not use the public internet.

Google Cloud Networking Services Decision Tree

Google Cloud Hybrid Connectivity

Transfer Services

gsutil

  • gsutil tool is the standard tool for small- to medium-sized transfers (less than 1 TB) over a typical enterprise-scale network, from a private data center to Google Cloud.
  • gsutil provides all the basic features needed to manage the Cloud Storage instances, including copying the data to and from the local file system and Cloud Storage.
  • gsutil can also move, rename and remove objects and perform real-time incremental syncs, like rsync, to a Cloud Storage bucket.
  • gsutil is especially useful in the following scenarios:
    • as-needed transfers or during command-line sessions by your users.
    • transferring only a few files or very large files, or both.
    • consuming the output of a program (streaming output to Cloud Storage)
    • watch a directory with a moderate number of files and sync any updates with very low latencies.
  • gsutil provides following features
    • Parallel multi-threaded transfers with  gsutil -m, increasing transfer speeds.
    • Composite transfers for a single large file to break them into smaller chunks to increase transfer speed. Chunks are transferred and validated in parallel, sending all data to Google. Once the chunks arrive at Google, they are combined (referred to as compositing) to form a single object
  • Storage Transfer Service is a fully managed, highly scalable service to automate transfers from other public clouds into Cloud Storage.
  • Storage Transfer Service for Cloud-to-Cloud transfers
    • supports transfers into Cloud Storage from S3 and HTTP.
    • supports daily copies of any modified objects.
    • doesn’t currently support data transfers to S3.
  • Storage Transfer Service also supports data transfers for on-premises data transfers from network file system (NFS) storage to Cloud Storage.
  • Storage Transfer Service for on-premises data
    • is designed for large-scale transfers (up to petabytes of data, billions of files).
    • supports full copies or incremental copies
    • can be setup by installing on-premises software (known as agents) onto computers in the data center.
  • has a simple, managed graphical user interface; even non-technically savvy users (after setup) can use it to move data.
  • provides robust error-reporting and a record of all files and objects that are moved.
  • supports executing recurring transfers on a schedule.

Transfer Appliance

  • Transfer Appliance is an excellent option for performing large-scale transfers, especially when a fast network connection is unavailable, it’s too costly to acquire more bandwidth or its one-time transfer
  • Expected turnaround time for a network appliance to be shipped, loaded with the data, shipped back, and rehydrated on Google Cloud is 50 days.
  • Consider Transfer Appliance, if the online transfer timeframe is calculated to be substantially more than this timeframe.
  • Transfer Appliance requires the ability to receive and ship back the Google-owned hardware.
  • Transfer Appliance is available only in certain countries.

BigQuery Data Transfer Service

  • BigQuery Data Transfer Service automates data movement into BigQuery on a scheduled, managed basis
  • After a data transfer is configured, the BigQuery Data Transfer Service automatically loads data into BigQuery on a regular basis.
  • BigQuery Data Transfer Service can also initiate data backfills to recover from any outages or gaps.
  • BigQuery Data Transfer Service can only sink data to BigQuery and cannot be used to transfer data out of BigQuery.
  • BigQuery Data Transfer Service supports loading data from the following data sources:
    • Google Software as a Service (SaaS) apps
    • Campaign Manager
    • Cloud Storage
    • Google Ad Manager
    • Google Ads
    • Google Merchant Center (beta)
    • Google Play
    • Search Ads 360 (beta)
    • YouTube Channel reports
    • YouTube Content Owner reports
    • External cloud storage providers
      • Amazon S3
    • Data warehouses
      • Teradata
      • Amazon Redshift

Transfer Data vs Speed Comparison

Data Migration Speeds

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company wants to connect cloud applications to an Oracle database in its data center. Requirements are a maximum of 9 Gbps of data and a Service Level Agreement (SLA) of 99%. Which option best suits the requirements?
    1. Implement a high-throughput Cloud VPN connection
    2. Cloud Router with VPN
    3. Dedicated Interconnect
    4. Partner Interconnect
  2. An organization wishes to automate data movement from Software as a Service (SaaS) applications such as Google Ads and Google Ad Manager on a scheduled, managed basis. This data is further needed for analytics and generate reports. How can the process be automated?
    1. Use Storage Transfer Service to move the data to Cloud Storage
    2. Use Storage Transfer Service to move the data to BigQuery
    3. Use BigQuery Data Transfer Service to move the data to BigQuery
    4. Use Transfer Appliance to move the data to Cloud Storage
  3. Your company’s migration team needs to transfer 1PB of data to Google Cloud. The network speed between the on-premises data center and Google Cloud is 100Mbps.
    The migration activity has a timeframe of 6 months. What is the efficient way to transfer the data?

    1. Use BigQuery Data Transfer Service to transfer the data to Cloud Storage
    2. Expose the data as a public URL and Storage Transfer Service to transfer it
    3. Use Transfer appliance to transfer the data to Cloud Storage
    4. Use gsutil command to transfer the data to Cloud Storage
  4. Your company uses Google Analytics for tracking. You need to export the session and hit data from a Google Analytics 360 reporting view on a scheduled basis into BigQuery for analysis. How can the data be exported?
    1. Configure a scheduler in Google Analytics to convert the Google Analytics data to JSON format, then import directly into BigQuery using bq command line.
    2. Use gsutil to export the Google Analytics data to Cloud Storage, then import into BigQuery and schedule it using Cron.
    3. Import data to BigQuery directly from Google Analytics using Cron
    4. Use BigQuery Data Transfer Service to import the data from Google Analytics

References

 

Google Cloud VPN

Google Cloud VPN

  • Cloud VPN securely connects your peer network to the Virtual Private Cloud (VPC) network through an IPsec VPN connection.
  • Traffic traveling between the two networks is encrypted by one VPN gateway and then decrypted by the other VPN gateway.
  • Cloud VPN protects the data as it travels over the internet.
  • Two instances of Cloud VPN can also be connected to each other.

Cloud VPN Specifications

  • only supports site-to-site IPsec VPN connectivity
  • does not support client-to-gateway scenarios i.e. Cloud VPN doesn’t support use cases where client computers need to “dial in” to a VPN by using client VPN software.
  • only supports IPsec. Other VPN technologies (such as SSL VPN) are not supported.
  • can be used with Private Google Access for on-premises hosts
  • Each Cloud VPN gateway must be connected to another Cloud VPN gateway or a peer VPN gateway.
  • Peer VPN gateway must have a static external (internet routable) IPv4 address, needed to configure Cloud VPN.
  • requires that the peer VPN gateway be configured to support prefragmentation. Packets must be fragmented before being encapsulated.
  • Each Cloud VPN tunnel can support up to 3 Gbps total for ingress and egress
  • Cloud VPN only supports a pre-shared key for authentication. Cloud VPN supports IKEv1 and IKEv2 by using an IKE pre-shared key (shared secret) and IKE ciphers.

Cloud VPN Components

Google Cloud VPN Components

  • Cloud VPN gateway
    • A virtual VPN gateway running in Google Cloud managed by Google, using a specified configuration in the project, and used only by you.
    • Each Cloud VPN gateway is a regional resource that uses one or more regional external IP addresses.
    • A Cloud VPN gateway can connect to a peer VPN gateway.
  • Peer VPN gateway
    • A gateway that is connected to a Cloud VPN gateway.
    • A peer VPN gateway can be one of the following:
      • Another Cloud VPN gateway
      • A VPN gateway hosted by another cloud provider such as AWS or Microsoft Azure
      • An on-premises VPN device or VPN service
  • External VPN gateway
    • A gateway resource configured for HA VPN that provides information to Google Cloud about the peer VPN gateway or gateways.
  • Remote peer IP address
    • For an HA VPN gateway interface that connects to an external VPN gateway, the remote peer IP address is the IP address of the interface on the external VPN gateway that is used for the tunnel.
  • VPN tunnel
    • A VPN tunnel connects two VPN gateways and serves as a virtual medium through which encrypted traffic is passed.
  • Internet Key Exchange (IKE)
    • IKE is the protocol used for authentication and to negotiate a session key for encrypting traffic.

Classic VPN

  • Classic VPN gateways have a single interface, a single external IP address, and support tunnels that use dynamic (BGP) or static routing (policy-based or route-based).
  • Classic VPN provides an SLA of 99.9% service availability.
  • Circa Oct 2021, Classic VPN would be deprecated

Cloud VPN HA

  • Cloud VPN HA provides a highly available and secure connection between the on-premises and the VPC network through an IPsec VPN connection in a single region
  • HA VPN provides an SLA of 99.99% service availability, when configured with two interfaces and two external IP addresses.
  • Cloud VPN supports the creation of multiple HA VPN gateways and each of the HA VPN gateway interfaces supports multiple tunnels
  • Peer VPN gateway device must support dynamic (BGP) routing.
  • To achieve high availability when both VPN gateways are located in VPC networks, two HA VPN gateways must be used, and both of them must be located in the same region.
  • Even though both gateways must be located in the same region, if the VPC network uses global dynamic routing mode, the routes to the subnets that the gateways share with each other can be located in any region

Google Cloud VPN HA

Active/Active vs Active/Passive Routing Options

  • If a Cloud VPN tunnel goes down, it restarts automatically.
  • If an entire virtual VPN device fails, Cloud VPN automatically instantiates a new one with the same configuration.
  • The new gateway and tunnel connect automatically.
  • Active/Active
    • Effective aggregate throughput is the combined throughput of both tunnels.
    • Peer gateway advertises the peer network’s routes with identical MED values for each tunnel.
    • Egress traffic sent to the peer network uses equal-cost multipath (ECMP) routing.
    • If one tunnel becomes unavailable, Cloud Router withdraws the learned custom dynamic routes whose next hops are the unavailable tunnel, which can take ~40 seconds
  • Active/Passive
    • Effective aggregate throughput is the individual throughput of each tunnel
    • Peer gateway advertises the peer network’s routes with different MED values for each tunnel.
    • Egress traffic sent to the peer network uses the route with the highest priority, as long as the associated tunnel is available.
    • Peer gateway can only use the tunnel with the highest priority to send traffic to Google Cloud.
    • If one tunnel becomes unavailable, Cloud Router withdraws the learned custom dynamic routes whose next hops are the unavailable tunnel, which can take ~40 seconds
  • Google Cloud recommends
    • Using Active/Passive configuration with a single HA VPN Gateway as the observed bandwidth capacity at the time of normal tunnel operation matches the bandwidth capacity observed during failover
    • Using Active/Active configuration with multiple HA VPN Gateways as the observed bandwidth capacity at the time of normal tunnel operation is twice that of the guaranteed bandwidth capacity

Classic VPN vs HA VPN

Google Cloud Classic VPN vs HA VPN

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company’s infrastructure is on-premises, but all machines are running at maximum capacity. You want to burst to Google
    Cloud. The workloads on Google Cloud must be able to directly communicate to the workloads on-premises using a private IP
    range. What should you do?

    1. In Google Cloud, configure the VPC as a host for Shared VPC.
    2. In Google Cloud, configure the VPC for VPC Network Peering.
    3. Create bastion hosts both in your on-premises environment and on Google Cloud. Configure both as proxy servers using
      their public IP addresses.
    4. Set up Cloud VPN between the infrastructure on-premises and Google Cloud.

Related Reads

References

Google Cloud Services Cheat Sheet

Google Certification Exam Cheat Sheet

Google Certification Exams cover a lot of topics and a wide range of services with minute details for features, patterns, anti patterns and their integration with other services. This blog post is just to have a quick summary of all the services and key points for a quick glance before you appear for the exam

Google Services

GCP Marketplace (Cloud Launcher)

  • GCP Marketplace offers ready-to-go development stacks, solutions, and services to accelerate development and spend less time installing and more time developing.
    • Deploy production-grade solutions in a few clicks
    • Single bill for all your GCP and 3rd party services
    • Manage solutions using Deployment Manager
    • Notifications when a security update is available
    • Direct access to partner support

References

Google_Cloud_Services

Google Cloud Networking Services Cheat Sheet

Virtual Private Cloud

  • Virtual Private Cloud (VPC) provides networking functionality for the cloud-based resources and services that is global, scalable, and flexible.
  • VPC networks are global resources, including the associated routes and firewall rules, and are not associated with any particular region or zone.
  • Subnets are regional resources and each subnet defines a range of IP addresses
  • Network firewall rules
    • control the Traffic to and from instances.
    • Rules are implemented on the VMs themselves, so traffic can only be controlled and logged as it leaves or arrives at a VM.
    • Firewall rules are defined to allow or deny traffic and are executed within order with a defined priority
    • Highest priority (lower integer) rule applicable to a target for a given type of traffic takes precedence
  • Resources within a VPC network can communicate with one another by using internal IPv4 addresses, subject to applicable network firewall rules.
  • Private access options for services allow instances with internal IP addresses can communicate with Google APIs and services.
  • Shared VPC to keep a VPC network in a common host project shared with service projects. Authorized IAM members from other projects in the same organization can create resources that use subnets of the Shared VPC network
  • VPC Network Peering allow VPC networks to be connected with other VPC networks in different projects or organizations.
  • VPC networks can be securely connected in hybrid environments by using Cloud VPN or Cloud Interconnect.
  • Primary and Secondary IP address cannot overlap with the on-premises CIDR
  • VPC networks only support IPv4 unicast traffic. They do not support broadcast, multicast, or IPv6 traffic within the network; VMs in the VPC network can only send to IPv4 destinations and only receive traffic from IPv4 sources.
  • VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as GKE nodes.

Cloud Load Balancing

  • Cloud Load Balancing is a fully distributed, software-defined managed load balancing service
  • distributes user traffic across multiple instances of the applications and reduces the risk that the of performance issues for the applications experience by spreading the load
  • provides health checking mechanisms that determine if backends, such as instance groups and zonal network endpoint groups (NEGs), are healthy and properly respond to traffic.
  • supports IPv6 clients with HTTP(S) Load Balancing, SSL Proxy Load Balancing, and TCP Proxy Load Balancing.
  • supports multiple Cloud Load Balancing types
    • Internal HTTP(S) Load Balancing
      • is a proxy-based, regional Layer 7 load balancer that enables running and scaling services behind an internal IP address.
      • supports a regional backend service, which distributes HTTP and HTTPS requests to healthy backends (either instance groups containing CE VMs or NEGs containing GKE containers).
      • supports path based routing
      • preserves the Host header of the original client request and also appends two IP addresses (Client and LB )to the X-Forwarded-For header
      • supports a regional health check that periodically monitors the readiness of the backends.
      • has native support for the WebSocket protocol when using HTTP or HTTPS as the protocol to the backend
    • External HTTP(S) Load Balancing
      • is a global, proxy-based Layer 7 load balancer that enables running and scaling the services worldwide behind a single external IP address
      • distributes HTTP and HTTPS traffic to backends hosted on Compute Engine and GKE
      • offers global (cross-regional) and regional load balancing
      • supports content-based load balancing using URL maps
      • preserves the Host header of the original client request and also appends two IP addresses (Client and LB) to the X-Forwarded-For header
      • supports connection draining on backend services
      • has native support for the WebSocket protocol when using HTTP or HTTPS as the protocol to the backend
      • does not support client certificate-based authentication, also known as mutual TLS authentication.
    • Internal TCP/UDP Load Balancing
      • is a managed, internal, pass-through, regional Layer 4 load balancer that enables running and scaling services behind an internal IP address
      • distributes traffic among VM instances in the same region in a Virtual Private Cloud (VPC) network by using an internal IP address.
      • provides high-performance, pass-through Layer 4 load balancer for TCP or UDP traffic.
      • routes original connections directly from clients to the healthy backends, without any interruption.
      • does not terminate SSL traffic and SSL traffic can be terminated by the backends instead of by the load balancer
      • provides access through VPC Network Peering, Cloud VPN or Cloud Interconnect
      • supports health check that periodically monitors the readiness of the backends.
    • External TCP/UDP Network Load Balancing
      • is a managed, external, pass-through, regional Layer 4 load balancer that distributes TCP or UDP traffic originating from the internet to among VM instances in the same region
      • Load-balanced packets are received by backend VMs with their source IP unchanged.
      • Load-balanced connections are terminated by the backend VMs. Responses from the backend VMs go directly to the clients, not back through the load balancer.
      • scope of a network load balancer is regional, not global. A network load balancer cannot span multiple regions. Within a single region, the load balancer services all zones.
      • supports connection tracking table and a configurable consistent hashing algorithm to determine how traffic is distributed to backend VMs.
      • does not support Network endpoint groups (NEGs) as backends
    • External SSL Proxy Load Balancing
      • is a reverse proxy load balancer that distributes SSL traffic coming from the internet to VM instances in the VPC network.
      • with SSL traffic, user SSL (TLS) connections are terminated at the load balancing layer, and then proxied to the closest available backend instances by using either SSL (recommended) or TCP.
      • supports global load balancing service with the Premium Tier
        supports regional load balancing service with the Standard Tier
      • is intended for non-HTTP(S) traffic. For HTTP(S) traffic, GCP recommends using HTTP(S) Load Balancing.
      • supports proxy protocol header to preserve the original source IP addresses of incoming connections to the load balancer
      • does not support client certificate-based authentication, also known as mutual TLS authentication.
    • External TCP Proxy Load Balancing
      • is a reverse proxy load balancer that distributes TCP traffic coming from the internet to VM instances in the VPC network
      • terminates traffic coming over a TCP connection at the load balancing layer, and then forwards to the closest available backend using TCP or SSL
      • use a single IP address for all users worldwide and automatically routes traffic to the backends that are closest to the user
      • supports global load balancing service with the Premium Tier
        supports regional load balancing service with the Standard Tier
      • supports proxy protocol header to preserve the original source IP addresses of incoming connections to the load balancer

Cloud CDN

  • caches website and application content closer to the user
  • uses Google’s global edge network to serve content closer to users, which accelerates the websites and applications.
  • works with external HTTP(S) Load Balancing to deliver content to the users
  • Cloud CDN content can be sourced from various types of backends
    • Instance groups
    • Zonal network endpoint groups (NEGs)
    • Serverless NEGs: One or more App Engine, Cloud Run, or Cloud Functions services
    • Internet NEGs, for endpoints that are outside of Google Cloud (also known as custom origins)
    • Buckets in Cloud Storage
  • Cloud CDN with Google Cloud Armor enforces security policies only for requests for dynamic content, cache misses, or other requests that are destined for the origin server. Cache hits are served even if the downstream Google Cloud Armor security policy would prevent that request from reaching the origin server.
  • recommends
    • using versioning instead of cache invalidation
    • using custom keys to improve cache hit ration
    • cache static content

Cloud VPN

  • securely connects the peer network to the VPC network or two VPCs in GCP through an IPsec VPN connection.
  • encrypts the data as it travels over the internet.
  • only supports site-to-site IPsec VPN connectivity and not client-to-gateway scenarios
  • allows users to access private RFC1918 addresses on resources in the VPC from on-prem computers also using private RFC1918 addresses.
  • can be used with Private Google Access for on-premises hosts
  • Cloud VPN HA
    • provides a high-available and secure connection between the on-premises and the VPC network through an IPsec VPN connection in a single region
    • provides an SLA of 99.99% service availability, when configured with two interfaces and two external IP addresses.
  • supports up to 3Gbps per tunnel with a maximum of 8 tunnels
  • supports static as well as dynamic routing using Cloud Router
  • supports IKEv1 or IKEv2 using a shared secret

Cloud Interconnect

  • Cloud Interconnect provides two options for extending the on-premises network to the VPC networks in Google Cloud.
  • Dedicated Interconnect (Dedicated connection)
    • provides a direct physical connection between the on-premises network and Google’s network
    • requires your network to physically meet Google’s network in a colocation facility with your own routing equipment
    • supports only dynamic routing
    • supports bandwidth to 10 Gbps minimum to 200 Gbps maximum.
  • Partner Interconnect (Use a service provider)
    • provides connectivity between the on-premises and VPC networks through a supported service provider.
    • supports bandwidth to 50 Mbps minimum to 10 Gbps maximum.
    • provides Layer 2 and Layer 3 connectivity
      • For Layer 2 connections, you must configure and establish a BGP session between the Cloud Routers and on-premises routers for each created VLAN attachment
      • For Layer 3 connections, the service provider establishes a BGP session between the Cloud Routers and their edge routers for each VLAN attachment.
  • Single Interconnect connection does not offer redundancy or high availability and its recommended to
    • use 2 in the same metropolitan area (city) as the existing one, but in a different edge availability domain (metro availability zone).
    • use 4 with 2 connections in two different metropolitan areas (city), and each connection in a different edge availability domain (metro availability zone)
    • Cloud Routers are required one in each Google Cloud region
  • Cloud Interconnect does not encrypt the connection between your network and Google’s network. For additional security, use application-level encryption or your own VPN.
  • Currently, Cloud VPN can’t be used with Dedicated Interconnect.

Cloud Router

  • is a fully distributed, managed service that provides dynamic routing and scales with the network traffic.
  • works with both legacy networks and VPC networks.
  • isn’t supported for Direct Peering or Carrier Peering connections.
  • helps dynamically exchange routes between the Google Cloud networks and the on-premises network.
  • peers with the on-premises VPN gateway or router to provide dynamic routing and exchanges topology information through BGP.
  • Google Cloud recommends creating two Cloud Routers in each region for a Cloud Interconnect for 99.99% availability.
  • supports following dynamic routing mode
    • Regional routing mode – provides visibility to resources only in the defined region.
    • Global routing mode – provides has visibility to resources in all regions

Cloud DNS

  • is a high-performance, resilient, reliable, low-latency, global DNS service that publishes the domain names to the global DNS in a cost-effective way.
  • With Shared VPC, Cloud DNS managed private zone, Cloud DNS peering zone, or Cloud DNS forwarding zone must be created in the host project
  • provides Private Zone which supports DNS services for a GCP project. VPCs in the same project can use the same name servers
  • supports DNS Forwarding for Private Zones, which overrides normal DNS resolution for the specified zones. Queries for the specified zones are forwarded to the listed forwarding targets.
  • supports DNS Peering, which allows sending requests for records that come from one zone’s namespace to another VPC network with GCP
  • supports DNS Outbound Policy, which forwards all DNS requests for a VPC network to the specified server targets. It disables internal DNS for the selected networks.
  • Cloud DNS VPC Name Resolution Order
    • DNS Outbound Server Policy
    • DNS Forwarding Zone
    • DNS Peering
    • Compute Engine internal DNS
    • Public Zones
  • supports DNSSEC, a feature of DNS, that authenticates responses to domain name lookups and protects the domains from spoofing and cache poisoning attacks

Google Cloud Compute Services Cheat Sheet

Google Cloud Compute Services

Google Cloud - Compute Services Options

Compute Engine

  • is a virtual machine (VM) hosted on Google’s infrastructure.
  • can run the public images for Google provided Linux and Windows Server as well as custom images created or imported from existing systems
  • availability policy determines how it behaves when there is a maintenance event
    • VM instance’s maintenance behavior onHostMaintenance, which determines whether the instance is live migrated MIGRATE (default) or stopped TERMINATE
    • Instance’s restart behavior automaticRestart  which determines whether the instance automatically restarts (default) if it crashes or gets stopped
  • Live migration helps keep the VM instances running even when a host system event, such as a software or hardware update, occurs
  • Preemptible VM is an instance that can be created and run at a much lower price than normal instances, however can be stopped at any time
  • Shielded VM offers verifiable integrity of the Compute Engine VM instances, to confirm the instances haven’t been compromised by boot- or kernel-level malware or rootkits.
  • Instance template is a resource used to create VM instances and managed instance groups (MIGs) with identical configuration
  • Instance group is a collection of virtual machine (VM) instances that can be managed as a single entity.
    • Managed instance groups (MIGs)
      • allows app creation with multiple identical VMs.
      • workloads can be made scalable and highly available by taking advantage of automated MIG services, including: autoscaling, autohealing, regional (multiple zone) deployment, and automatic updating
      • supports rolling update feature
      • works with load balancing services to distribute traffic across all of the instances in the group.
    • Unmanaged instance groups
      • allows load balance across a fleet of VMs that you manage yourself which may not be identical
  • Instance template are global, while instance groups are regional.
  • Machine image stores all the configuration, data, metadata and permissions from one or more disks required to create a VM instance
  • Sole-tenancy provides dedicated hosting only for the project’s VM and provides added layer of hardware isolation
  • deletionProtection prevents accidental VM deletion esp. for VMs running critical workloads and need to be protected
  • provides Sustained Discounts, Committed discounts, free tier etc in pricing

App Engine

  • App Engine helps build highly scalable applications on a fully managed serverless platform
  • Each Cloud project can contain only a single App Engine application
  • App Engine is regional, which means the infrastructure that runs the apps is located in a specific region, and Google manages it so that it is available redundantly across all of the zones within that region
  • App Engine application location or region cannot be changed once created
  • App engine allows traffic management to an application version by migrating or splitting traffic.
    • Traffic Splitting (Canary) – distributes a percentage of traffic to versions of the application.
    • Traffic Migration – smoothly switches request routing
  • Support Standard and Flexible environments
    • Standard environment
      • Application instances that run in a sandbox, using the runtime environment of a supported language only.
      • Sandbox restricts what the application can do
        • only allows the app to use a limited set of binary libraries
        • app cannot write to disk
        • limits the CPU and memory options available to the application
      • Sandbox does not support
        • SSH debugging
        • Background processes
        • Background threads (limited capability)
        • Using Cloud VPN
    • Flexible environment
      • Application instances run within Docker containers on Compute Engine virtual machines (VM).
      • As Flexible environment supports docker it can support custom runtime or source code written in other programming languages.
      • Allows selection of any Compute Engine machine type for instances so that the application has access to more memory and CPU.
  • min_idle_instances indicates the number of additional instances to be kept running and ready to serve traffic for this version.

GKE

Node Pool

GKE
commands
–num-nodes scale cluster –size is deprecated

Google Cloud Storage Services Cheat Sheet

Google Cloud Storage Options

  • Relational (SQL) – Cloud SQL & Cloud Spanner
  • Non-Relational (NoSQL) – Datastore & Bigtable
  • Structured & Semi-structured – Cloud SQL, Cloud Spanner, Datastore & Bigtable
  • Unstructured – Cloud Storage
  • Block Storage – Persistent disk
  • Transactional (OLTP) – Cloud SQL & Cloud Spanner
  • Analytical (OLAP) – Bigtable & BigQuery
  • Fully Managed (Serverless) – Cloud Spanner, Datastore, BigQuery
  • Requires Provisioning – Cloud SQL, Bigtable
  • Global – Cloud Spanner
  • Regional – Cloud SQL, Bigtable, Datastore

Google Cloud - Storage Options Decision Tree

Google Cloud Storage – GCS

  • provides service for storing unstructured data i.e. objects
  • consists of bucket and objects where an object is an immutable piece of data consisting of a file of any format stored in containers called buckets.
  • support different location types
    • regional
      • A region is a specific geographic place, such as London.
      • helps optimize latency and network bandwidth for data consumers, such as analytics pipelines, that are grouped in the same region.
    • dual-region
      • is a specific pair of regions, such as Finland and the Netherlands.
      • provides higher availability that comes with being geo-redundant.
    • multi-region
      • is a large geographic area, such as the United States, that contains two or more geographic places.
      • allows serving content to data consumers that are outside of the Google network and distributed across large geographic areas
      • provides  higher availability that comes with being geo-redundant.
    • Objects stored in a multi-region or dual-region are geo-redundant i.e. data is stored redundantly in at least two separate geographic places separated by at least 100 miles.
  • Storage class affects the object’s availability and pricing model
    • Standard Storage is best for data that is frequently accessed (hot data) and/or stored for only brief periods of time.
    • Nearline Storage is a low-cost, highly durable storage service for storing infrequently accessed data (warm data)
    • Coldline Storage provides a very-low-cost, highly durable storage service for storing infrequently accessed data (cold data)
    • Archive Storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery. (coldest data)
  • Object Versioning prevents accidental overwrites and deletion. It retains a noncurrent object version when the live object version gets replaced, overwritten or deleted
  • Object Lifecycle Management sets Time To Live (TTL) on an object and helps configure transition or expiration of the objects based on specified rules for e.g.  SetStorageClass to change the storage class, delete to expire noncurrent or archived objects
  • Resumable uploads are the recommended method for uploading large files, because they don’t need to be restarted from the beginning if there is a network failure while the upload is underway.
  • Parallel composite uploads divides a file into up to 32 chunks, which are uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted
  • Requester Pays on the bucket that requires requester to include a billing project in their requests, thus billing the requester’s project.
  • supports upload and storage of any MIME type of data up to 5 TB in size.
  • Retention policy on a bucket ensures that all current and future objects in the bucket cannot be deleted or replaced until they reach the defined age
  • Retention policy locks will lock a retention policy on a bucket and prevents the policy from ever being removed or the retention period from ever being reduced (although it can be increased). Locking a retention policy is irreversible
  • Bucket Lock feature provides immutable storage on Cloud Storage
  • Object holds, when set on individual objects, prevents the object from being deleted or replaced, however allows metadata to be edited.
  • Signed URLs provide time-limited read or write access to an object through a generated URL.
  • Signed policy documents helps specify what can be uploaded to a bucket.
  • Cloud Storage supports encryption at rest and in transit as well
  • Cloud Storage supports both
    • Server-side encryption with support for Google managed, Customer managed and Customer supplied encryption keys
    • Client-side encryption: encryption that occurs before data is sent to Cloud Storage, encrypted at client side.
  • Cloud Storage operations are
    • strongly consistent for read after writes or deletes and listing
    • eventually consistent for granting access to or revoking access
  • Cloud Storage allows setting CORS configuration at the bucket level only

Cloud SQL

  • provides relational MySQL, PostgreSQL and MSSQL databases as a service
  • managed, however, needs to select and provision machines
  • supports automatic replication, managed backups, vertical scaling for read and write, Horizontal scaling (using read replicas)
  • provides High Availability configuration provides data redundancy and failover capability with minimal downtime, when a zone or instance becomes unavailable due to a zonal outage, or an instance corruption
  • HA standby instance does not increase scalability and cannot be used for read queries.
  • Read replicas help scale horizontally the use of data in a database without degrading performance
  • is regional – although it now supports cross region read replicas
  • supports data encryption at rest and in transit
  • supports Point-In-Time recovery with binary logging and backups

Cloud Spanner

Datastore

  • Ancestor Paths + Best Practices

BigQuery

  • user- or project- level custom query quota
  • dry-run
  • on-demand to flat rate
  • supports dry-run which helps in pricing queries based on the amount of bytes read i.e. --dry_run flag in the bq command-line tool or dryRun parameter when submitting a query job using the API

Google Cloud Datastore OR Filestore

MemoryStore

Google Persistent Disk

Google Local SSD

 

 

 

 

Google Cloud Identity Services Cheat Sheet

Identity & Access Management – IAM

  • administrators authorize who can take what action on which resources
  • IAM Member can be a Google Account (for end users), a service account (for apps and virtual machines), a Google group, or a Google Workspace or Cloud Identity domain that can access a resource.
  • IAM Role is a collection of permissions granted to authenticated members.
  • supports 3 kinds of roles
    • Primitive roles – board level of access
    • Predefined roles – finer-grained granular access control
    • Custom roles – tailored permissions when predefined roles don’t meet the needs.
  • Best practice is to use Predefined over primitive roles
  • IAM Policy binds one or more members to a role.
  • IAM policy can be set at any level in the resource hierarchy:  organization level,  folder level, the project level, or the resource level.
  • IAM Policy inheritance is transitive and resources inherit the policies of all of their parent resources.
  • Effective policy for a resource is the union of the policy set on that resource and the policies inherited from higher up in the hierarchy.
  • Service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person.
  • Access Scopes are the legacy method of specifying permissions for the instance for default service accounts
  • Best practice is to set the full cloud-platform access scope on the instance, then securely limit the service account’s access using IAM roles.
  • Delegate responsibility with groups (instead of individual users) and service accounts (for server-to-server interactions)

Cloud Identity

  • Cloud Identity is an Identity as a Service (IDaaS) solution that helps centrally manage the users and groups.
  • configured to federate identities between Google and other identity providers, such as Active Directory and Azure Active Directory
  • Cloud Identity and Google Workspace support Security Assertion Markup Language (SAML) 2.0 for single sign-on  with authentication performed by an external identity provider (IdP)
  • With SAML,  Cloud Identity or Google Workspace acts as a service provider that trusts the SAML IdP to verify a user’s identity on its behalf.
  • Google Cloud Directory Sync – GCDS implements the synchronization process between external IdP

Cloud Billing

  • Google Cloud Billing defines billing accounts linked to Google Cloud Projects to determine who pays for a given set of Google Cloud resources.
  • To move the project to a different billing account, you must be a billing administrator and the project owner.
  • To link a project to a billing account, you must be a Billing Account Administrator or Billing Account User on the billing account OR Project Billing Manager on the project
  • Cloud Billing budgets can be created to monitor all of the Google Cloud charges in one place and configure alerts
  • supports BigQuery export with detailed Google Cloud billing data (such as usage, cost estimates, and pricing data) automatically throughout the day to a specified BigQuery dataset
  • Google Cloud billing data is not added retroactively to BigQuery, so the data before export is enabled will not be visible.

Google Cloud Monitoring – Stackdriver

Google Cloud Monitoring

  • Cloud Monitoring collects measurements of key aspects of the service and of the Google Cloud resources used.
  • Cloud Monitoring provides tools to visualize and monitor this data.
  • Cloud Monitoring helps gain visibility into the performance, availability, and health of the applications and infrastructure.
  • Cloud Monitoring collects metrics, events, and metadata from Google Cloud, AWS, hosted uptime probes, and application instrumentation.
  • Using the BindPlane service, data can be collected from over 150 common application components, on-premise systems, and hybrid cloud systems.

Cloud Monitoring Workspaces

  • Cloud Monitoring uses Workspaces to organize monitoring information
  • Workspace is a tool for monitoring resources across Google Cloud projects
  • A Workspace accesses metric data from its monitored projects, but the metric data remains in those projects.
  • Every Workspace has a host project. If you delete the host project, you also delete the Workspace.
  • A Workspace always monitors its Google Cloud host project
  • Host project is the project used to create the Workspace. The name of the Workspace is set to the name of the host project. This isn’t configurable.
  • Host project for Workspace stores all of the configuration content for dashboards, alerting policies, uptime checks, notification channels, and group definitions that you configure.
  • Workspace can monitor multiple projects but a Google Cloud project can be monitored by exactly 1 Workspace.
  • Projects can be moved from one workspace to another workspace
  • Two different workspaces can be merged into a single workspace

Cloud Monitoring Metrics

  • Metrics are a collection of measurements that help you understand how the applications and system services are performing.
  • Measurements might include the latency of requests to a service, the amount of disk space available on a machine, the number of tables in the SQL database, the number of widgets sold, and so forth.
  • Metric Value type includes
    • For measurements consisting of a single value at a time
      • BOOL, a boolean
      • INT64, a 64-bit integer
      • DOUBLE, a double-precision float
      • STRING, a string
    • For distribution measurements, the value isn’t a single value but a group of values.
      • The value type for distribution measurements is DISTRIBUTION.
      • Values in distribution include the mean, count, max, and other statistics, computed for a group of values.
      • Latency metrics typically capture data as distributions
  • Metric Kind includes
    • Gauge metric – Value is measured at a specific instant in time for e.g, CPU utilization, current temperature.
    • Delta metric – Value is measured as the change since it was last recorded for e.g., metrics measuring request counts are delta metrics; each value records how many requests were received since the last data point was recorded.
    • Cumulative metric – Value constantly increases over time for e.g., a metric for “sent bytes” might be cumulative; each value records the total number of bytes sent by a service at that time.

Cloud Monitoring Agent

  • Google Cloud’s operations suite provides the following agents for collecting metrics on Linux and Windows VM instances.
  • Ops Agent
    • The primary and preferred agent for collecting telemetry from the Compute Engine instances.
    • This agent combines logging and metrics into a single agent, providing YAML-based configurations for collecting the logs and metrics, and features high-throughput logging.
    • Ops Agent uses Fluent Bit for logs, which supports high-throughput logging, and the OpenTelemetry Collector for metrics.
  • Legacy Monitoring Agent
    • The agent gathers system and application metrics from virtual machine instances and sends them to Cloud Monitoring.
    • By default, the legacy monitoring agent collects disk, CPU, network, and process metrics.
    • The agent can be configured to monitor third-party applications to get the full list of agent metrics.
    • The agent is a collectd-based daemon that gathers system and application metrics from VM instances and sends them to Monitoring.

Cloud Monitoring – Uptime Checks

  • An uptime check is a request sent to a publicly accessible IP address on a resource to see whether it responds.
  • Uptime checks can determine the availability of the following:
    • URLs
    • Kubernetes LoadBalancer Services
    • VM instances
    • App Engine services
    • AWS load balancers
  • The availability of a resource can be monitored by creating an alerting policy that creates an incident when the uptime check fails.
  • The alerting policy can be configured to notify by email or through a different channel, and that notification can include details about the resource that failed to respond.
  • The results of uptime checks can also be observed in the Monitoring uptime-check dashboards.
  • For non-publicly available resources, the resource’s firewall must be configured o permit incoming traffic from the uptime-check servers
  • Uptime checks are unable to reach resources that don’t have an external IP address.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You need to monitor resources that are distributed over different projects in Google Cloud Platform. You want to consolidate reporting under the same Stackdriver Monitoring dashboard. What should you do?
    1. Use Shared VPC to connect all projects, and link Stackdriver to one of the projects.
    2. For each project, create a Stackdriver account. In each project, create a service account for that project and grant it the role
      of Stackdriver Account Editor in all other projects.
    3. Configure a single Stackdriver account, and link all projects to the same account.
    4. Configure a single Stackdriver account for one of the projects. In Stackdriver, create a Group and add the other project
      names as criteria for that Group.
  2. You are asked to set up application performance monitoring on Google Cloud projects A, B, and C as a single pane of glass. You want to monitor CPU, memory, and disk. What should you do?
    1. Enable API and then share charts from projects A, B, and C.
    2. Enable API and then give the metrics.reader role to projects A, B, and C.
    3. Enable API and then use default dashboards to view all projects in sequence.
    4. Enable API, create a workspace under project A, and then add projects B and C.

References

Google_Cloud_Monitoring

Google Cloud GCloud Cheat Sheet

Google Cloud GCloud Cheat Sheet

Google Cloud Config

PURPOSE COMMAND
List projects gcloud config list, gcloud config list project
List projects gcloud config list, gcloud config list project
Show project info gcloud compute project-info describe
Switch project gcloud config set project <project-id>
Set the active account gcloud config set account <ACCOUNT>
Set default region gcloud config set compute/region us-west
Set default zone gcloud config set compute/zone us-west1-b
List configurations gcloud config configurations list
Activate configuration gcloud config configurations activate

Google Cloud IAM

PURPOSE COMMAND
get project roles gcloud projects get-iam-policy
copy roles across org and projects gcloud iam roles copy
get project roles gcloud projects get-iam-policy
copy roles across org and projects gcloud iam roles copy

Google Cloud Auth

PURPOSE COMMAND
Display a list of credentialed accounts gcloud auth list
Authenticate client using service account gcloud auth activate-service-account --key-file <key-file>
Auth to GCP Container Registry gcloud auth configure-docker
Print token for active account gcloud auth print-access-token, gcloud auth print-refresh-token
Revoke previous generated credential gcloud auth <application-default> revoke

Google Cloud Storage

PURPOSE COMMAND
List all buckets and files gsutil ls, gsutil ls -lh gs://<bucket-name>
Create bucket gsutil mb gs://<bucket-name>
Download file gsutil cp gs://<bucket-name>/<dir-path>/app.txt
Upload file gsutil cp <filename> gs://<bucket-name>/<directory>/
Delete file gsutil rm gs://<bucket-name>/<filepath>
Move file gsutil mv <src-filepath> gs://<bucket-name>/<directory>/<dest-filepath>
Copy folder gsutil cp -r ./conf gs://<bucket-name>/
Show disk usage gsutil du -h gs://<bucket-name/<directory>
Make all files readable gsutil -m acl set -R -a public-read gs://<bucket-name>/
Create signed url with duration gsutil signurl -d 1m

Google Kubernetes Engine

PURPOSE COMMAND
create cluster gcloud container clusters create cluster-name --num-nodes 1
List all container clusters gcloud container clusters list
Set kubectl context gcloud container clusters get-credentials <cluster-name>
Set default cluster gcloud config set container/cluster cluster-name
resize existing cluster gcloud container clusters resize --num-nodes

Google Cloud Compute Engine

PURPOSE COMMAND
List all instances gcloud compute instances list , gcloud compute instance-templates list
Show instance info gcloud compute instances describe "<instance-name>" --project "<project-name>" --zone "us-west2-a"
Stop an instance gcloud compute instances stop instance-name
Start an instance gcloud compute instances start instance-name
Create an instance gcloud compute instances create vm1 --image image-1 --tags test --zone "<zone>" --machine-type f1-micro
Create premptible instance gcloud compute instances create "preempt" --preemptible
SSH to instance gcloud compute ssh --project "<project-name>" --zone "<zone-name>" "<instance-name>"
Images list gcloud compute images list

Virtual Private Network

PURPOSE COMMAND
List all networks gcloud compute networks list
Detail of one network gcloud compute networks describe <network-name> --format json
Create network gcloud compute networks create <network-name>
Create subnet gcloud compute networks subnets create subnet1 --network subnet-1 --range 10.0.0.0/24
List all firewall rules gcloud compute firewall-rules list
List all forwarding rules gcloud compute forwarding-rules list
Describe one firewall rule gcloud compute firewall-rules describe <rule-name>
Create firewall rule gcloud compute firewall-rules create my-rule --network default --allow tcp:22
Update firewall rule gcloud compute firewall-rules update default --network default --allow tcp:80

Components

PURPOSE COMMAND
List down the components gcloud components list
Update the components gcloud components update
Install the components gcloud components install <component-name>

Deployment Manager

PURPOSE COMMAND
Create deployments gcloud deployment-manager deployments create
Update deployments gcloud deployment-manager deployments update

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You have a development project with appropriate IAM roles defined. You are creating a production project and want to have the same IAM roles on the new project, using the fewest possible steps. What should you do?
    1. Use gcloud iam roles copy and specify the production project as the destination project.
    2. Use gcloud iam roles copy and specify your organization as the destination organization.
    3. In the Google Cloud Platform Console, use the ‘create role from role’ functionality.
    4. In the Google Cloud Platform Console, use the ‘create role’ functionality and select all applicable permissions.
  2. Your team is working on GKE cluster named dev. You have downloaded and installed the gcloud command line interface (CLI) and SDK. You want to avoid having to specify this GKE config with each CLI command when managing this cluster. What should you do?
    1. Set the dev cluster as the default cluster using the gcloud container update dev
    2. Set the dev cluster as the default cluster using the gcloud config set container/cluster dev
    3. Set the dev cluster as the default cluster by adding the config to gke.default in ~/gcloud folder
    4. Set the dev cluster as the default cluster by adding the config to defaults.json in ~/gcloud folder
  3. You have a Kubernetes cluster with 1 node-pool. The cluster receives a lot of traffic and needs to grow. You decide to add a node. What should you do?
    1. Use “gcloud container clusters resize” with the desired number of nodes.
    2. Use “kubectl container clusters resize” with the desired number of nodes.
    3. Edit the managed instance group of the cluster and increase the number of VMs by 1.
    4. Edit the managed instance group of the cluster and enable autoscaling.
  4. You’re trying to provide temporary access to some files in a Cloud Storage bucket with 20 minutes availability. What is the best way to generate a signed URL?
    1. Create a service account and JSON key. Use the gsutil signurl -t 20m command and pass in the JSON key and bucket.
    2. Create a service account and JSON key. Use the gsutil signurl -d 20m command and pass in the JSON key and bucket.
    3. Create a service account and JSON key. Use the gsutil signurl -p 20m command and pass in the JSON key and bucket.
    4. Create a service account and JSON key. Use the gsutil signurl -m 20m command and pass in the JSON key and bucket.