Google Cloud Storage Security

June 6, 2021 ~ Last updated on : July 22, 2021 ~ jayendrapatil

Google Cloud Storage Security

Google Cloud Storage Security includes controlling access using

Uniform Bucket or File-grained ACL access control policies
Data encryption at rest and transit
Retention policies and Retention Policy Locks
Signed URLs

GCS Access Control

Cloud Storage offers two systems for granting users permission to access the buckets and objects: IAM and Access Control Lists (ACLs)
IAM and ACLs can be used on the same resource, Cloud Storage grants the broader permission set on the resource
Cloud Storage access control can be performed using
- Uniform (recommended)
  - Uniform bucket-level access allows using IAM alone to manage permissions.
  - IAM applies permissions to all the objects contained inside the bucket or groups of objects with common name prefixes.
  - IAM also allows using features that are not available when working with ACLs, such as IAM Conditions and Cloud Audit Logs.
  - Enabling uniform bucket-level access disables ACLs, but it can be reversed before 90 days
- Fine-grained
  - Fine-grained option enables using IAM and Access Control Lists (ACLs) together to manage permissions.
  - ACLs are a legacy access control system for Cloud Storage designed for interoperability with S3.
  - Access and apply permissions can be specified at both the bucket level and per individual object.
Objects in the bucket can be made public using ACLs AllUsers:R or IAM allUsers:objectViewer permissions

Data Encryption

Cloud Storage always encrypts the data on the server-side, before it is written to disk, at no additional charge.
Cloud supports the following encryption
- Server-side encryption: encryption that occurs after Cloud Storage receives the data, but before the data is written to disk and stored.
  - Google-managed encryption keys
    - Cloud Storage always encrypts the data on the server-side, before it is written to disk
    - Cloud Storage manages server-side encryption keys using the same hardened key management systems, including strict key access controls and auditing.
    - Cloud Storage encrypts user data at rest using AES-256.
    - Data is automatically decrypted when read by an authorized user
  - Customer-supplied encryption keys
    - customers create and manage their own encryption keys.
    - customer keys can be provided via customer-side using encryption_key=[YOUR_ENCRYPTION_KEY] .boto configuration file
  - Customer-managed encryption keys
    - customers manage their own encryption keys generated by Cloud Key Management Service (KMS)
    - Cloud Storage does not permanently store the key on Google’s servers or otherwise manage your key.
    - Customer provides the key for each GCS operation, and the key is purged from Google’s servers after the operation is complete
    - Cloud Storage stores only a cryptographic hash of the key so that future requests can be validated against the hash.
    - The key cannot be recovered from this hash, and the hash cannot be used to decrypt the data.
- Client-side encryption: encryption that occurs before data is sent to Cloud Storage, encrypted at the client-side. This data also undergoes server-side encryption.
Cloud Storage supports Transport Layer Security, commonly known as TLS or HTTPS for data encryption in transit

Signed URLs

Signed URLs provide time-limited read or write access to an object through a generated URL.
Anyone having access to the URL can access the object for the duration of time specified, regardless of whether or not they have a Google account.

Signed Policy Documents

Signed policy documents help specify what can be uploaded to a bucket.
Policy documents allow greater control over size, content type, and other upload characteristics than signed URLs, and can be used by website owners to allow visitors to upload files to Cloud Storage.

Retention Policies

Retention policy on a bucket ensures that all current and future objects in the bucket cannot be deleted or replaced until they reach the defined age
Retention policy can be applied when creating a bucket or to an existing bucket
Retention policy retroactively applies to existing objects in the bucket as well as new objects added to the bucket.

Retention Policy Locks

Retention policy locks will lock a retention policy on a bucket, which prevents the policy from ever being removed or the retention period from ever being reduced (although it can be increased)
Once a retention policy is locked, the bucket cannot be deleted until every object in the bucket has met the retention period.
Locking a retention policy is irreversible

Bucket Lock

Bucket Lock feature provides immutable storage i.e. Write Once Read Many (WORM) on Cloud Storage
Bucket Lock feature allows configuring a data retention policy for a bucket that governs how long objects in the bucket must be retained
Bucket Lock feature also locks the data retention policy, permanently preventing the policy from being reduced or removed.
Bucket Lock can help with regulatory, legal, and compliance requirements

Object Holds

Object holds, when set on individual objects, prevents the object from being deleted or replaced, however allows metadata to be edited.
Cloud Storage offers the following types of holds:
- Event-based holds.
- Temporary holds.
When an object is stored in a bucket without a retention policy, both hold types behave exactly the same.
When an object is stored in a bucket with a retention policy, the hold types have different effects on the object when the hold is released:
- An event-based hold resets the object’s time in the bucket for the purposes of the retention period.
- A temporary hold does not affect the object’s time in the bucket for the purposes of the retention period.

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

You have an object in a Cloud Storage bucket that you want to share with an external company. The object contains sensitive data. You want access to the content to be removed after four hours. The external company does not have a Google account to which you can grant specific user-based access privileges. You want to use the most secure method that requires the fewest steps. What should you do?
1. Create a signed URL with a four-hour expiration and share the URL with the company.
2. Set object access to “public” and use object lifecycle management to remove the object after four hours.
3. Configure the storage bucket as a static website and furnish the object’s URL to the company. Delete the object from the storage bucket after four hours.
4. Create a new Cloud Storage bucket specifically for the external company to access. Copy the object to that bucket. Delete the bucket after four hours have passed

Google Cloud Storage Services Cheat Sheet

March 23, 2021 ~ Last updated on : May 18, 2021 ~ jayendrapatil ~ 2 Comments

Google Cloud Storage Options

Relational (SQL) – Cloud SQL & Cloud Spanner
Non-Relational (NoSQL) – Datastore & Bigtable
Structured & Semi-structured – Cloud SQL, Cloud Spanner, Datastore & Bigtable
Unstructured – Cloud Storage
Block Storage – Persistent disk
Transactional (OLTP) – Cloud SQL & Cloud Spanner
Analytical (OLAP) – Bigtable & BigQuery
Fully Managed (Serverless) – Cloud Spanner, Datastore, BigQuery
Requires Provisioning – Cloud SQL, Bigtable
Global – Cloud Spanner
Regional – Cloud SQL, Bigtable, Datastore

Google Cloud - Storage Options Decision Tree

Google Cloud Storage – GCS

provides service for storing unstructured data i.e. objects
consists of bucket and objects where an object is an immutable piece of data consisting of a file of any format stored in containers called buckets.
support different location types
- regional
  - A region is a specific geographic place, such as London.
  - helps optimize latency and network bandwidth for data consumers, such as analytics pipelines, that are grouped in the same region.
- dual-region
  - is a specific pair of regions, such as Finland and the Netherlands.
  - provides higher availability that comes with being geo-redundant.
- multi-region
  - is a large geographic area, such as the United States, that contains two or more geographic places.
  - allows serving content to data consumers that are outside of the Google network and distributed across large geographic areas
  - provides higher availability that comes with being geo-redundant.
- Objects stored in a multi-region or dual-region are geo-redundant i.e. data is stored redundantly in at least two separate geographic places separated by at least 100 miles.
Storage class affects the object’s availability and pricing model
- Standard Storage is best for data that is frequently accessed (hot data) and/or stored for only brief periods of time.
- Nearline Storage is a low-cost, highly durable storage service for storing infrequently accessed data (warm data)
- Coldline Storage provides a very-low-cost, highly durable storage service for storing infrequently accessed data (cold data)
- Archive Storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery. (coldest data)
Object Versioning prevents accidental overwrites and deletion. It retains a noncurrent object version when the live object version gets replaced, overwritten or deleted
Object Lifecycle Management sets Time To Live (TTL) on an object and helps configure transition or expiration of the objects based on specified rules for e.g. SetStorageClass to change the storage class, delete to expire noncurrent or archived objects
Resumable uploads are the recommended method for uploading large files, because they don’t need to be restarted from the beginning if there is a network failure while the upload is underway.
Parallel composite uploads divides a file into up to 32 chunks, which are uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted
Requester Pays on the bucket that requires requester to include a billing project in their requests, thus billing the requester’s project.
supports upload and storage of any MIME type of data up to 5 TB in size.
Retention policy on a bucket ensures that all current and future objects in the bucket cannot be deleted or replaced until they reach the defined age
Retention policy locks will lock a retention policy on a bucket and prevents the policy from ever being removed or the retention period from ever being reduced (although it can be increased). Locking a retention policy is irreversible
Bucket Lock feature provides immutable storage on Cloud Storage
Object holds, when set on individual objects, prevents the object from being deleted or replaced, however allows metadata to be edited.
Signed URLs provide time-limited read or write access to an object through a generated URL.
Signed policy documents helps specify what can be uploaded to a bucket.
Cloud Storage supports encryption at rest and in transit as well
Cloud Storage supports both
- Server-side encryption with support for Google managed, Customer managed and Customer supplied encryption keys
- Client-side encryption: encryption that occurs before data is sent to Cloud Storage, encrypted at client side.
Cloud Storage operations are
- strongly consistent for read after writes or deletes and listing
- eventually consistent for granting access to or revoking access
Cloud Storage allows setting CORS configuration at the bucket level only

Cloud SQL

provides relational MySQL, PostgreSQL and MSSQL databases as a service
managed, however, needs to select and provision machines
supports automatic replication, managed backups, vertical scaling for read and write, Horizontal scaling (using read replicas)
provides High Availability configuration provides data redundancy and failover capability with minimal downtime, when a zone or instance becomes unavailable due to a zonal outage, or an instance corruption
HA standby instance does not increase scalability and cannot be used for read queries.
Read replicas help scale horizontally the use of data in a database without degrading performance
is regional – although it now supports cross region read replicas
supports data encryption at rest and in transit
supports Point-In-Time recovery with binary logging and backups

Cloud Spanner

Datastore

Ancestor Paths + Best Practices

BigQuery

user- or project- level custom query quota
dry-run
on-demand to flat rate
supports dry-run which helps in pricing queries based on the amount of bytes read i.e. --dry_run flag in the bq command-line tool or dryRun parameter when submitting a query job using the API

Google Cloud Datastore OR Filestore

MemoryStore

Google Persistent Disk

Google Local SSD

Google Cloud Storage Options

March 3, 2021 ~ Last updated on : August 20, 2021 ~ jayendrapatil ~ 2 Comments

GCP Storage Options

GCP provides various storage options and the selection can be based on

Structured vs Unstructured
Relational (SQL) vs Non-Relational (NoSQL)
Transactional (OLTP) vs Analytical (OLAP)
Fully Managed vs Requires Provisioning
Global vs Regional
Horizontal vs Vertical scaling

Cloud Firestore

Cloud Firestore is a fully managed, highly scalable, serverless, non-relational NoSQL document database
fully managed with no-ops and no planned downtime and no need to provision database instances (vs Bigtable)
uses a distributed architecture to automatically manage scaling.
queries scale with the size of the result set, not the size of the data set
supports ACID Atomic transactions – all or nothing (vs Bigtable)
provides High availability of reads and writes – runs in Google data centers, which use redundancy to minimize impact from points of failure.
provides massive scalability with high performance – uses a distributed architecture to automatically manage scaling.
scales from zero to terabytes with flexible storage and querying of data
provides SQL-like query language
supports strong consistency
supports data encryption at rest and in transit
provides terabytes of capacity with a maximum unit size of 1 MB per entity (vs Bigtable)
Consider using Cloud Firestore if you need to store semi-structured objects, or if require support for transactions and SQL-like queries.

Cloud Bigtable

Bigtable provides a scalable, fully managed, non-relational NoSQL wide-column analytical big data database service suitable for both low-latency single-point lookups and precalculated analytics.
supports large quantities (>1 TB) of semi-structured or structured data (vs Datastore)
supports high throughput or rapidly changing data (vs BigQuery)
managed, but needs provisioning of nodes and can be expensive (vs Datastore and BigQuery)
does not support transactions or strong relational semantics (vs Datastore)
does not support SQL queries (vs BigQuery and Datastore)
Not Transactional and does not support ACID
provides eventual consistency
ideal for time-series or natural semantic ordering data
can run asynchronous batch or real-time processing on the data
can run machine learning algorithms on the data
provides petabytes of capacity with a maximum unit size of 10 MB per cell and 100 MB per row.
Usage Patterns
- Low-latency read/write access
- High-throughput data processing
- Time series support
Anti Patterns
- Not an ideal storage option for future analysis – Use BigQuery instead
- Not an ideal storage option for transactional data – Use relational database or Datastore
Common Use cases
- IoT, finance, adtech
- Personalization, recommendations
- Monitoring
- Geospatial datasets
- Graphs
Consider using Cloud Bigtable, if you need high-performance datastore to perform analytics on a large number of structured objects

Cloud Storage

Cloud Storage provides durable and highly available object storage.
fully managed, simple administration, cost-effective, and scalable service that does not require capacity management
supports unstructured data storage like binary or raw objects
provides high performance, internet-scale
supports data encryption at rest and in transit
Consider using Cloud Storage, if you need to store immutable blobs larger than 10 MB, such as large images or movies. This storage service provides petabytes of capacity with a maximum unit size of 5 TB per object.
Usage Patterns
- Images, pictures, and videos
- Objects and blobs
- Unstructured data
- Long term storage for archival or compliance
Anti Patterns
Common Use cases
- Storing and streaming multimedia
- Storage for custom data analytics pipelines
- Archive, backup, and disaster recovery

Cloud SQL

provides fully managed, relational SQL databases
offers MySQL, PostgreSQL, MSSQL databases as a service
manages OS & Software installation, patches and updates, backups and configuring replications, failover however needs to select and provision machines (vs Cloud Spanner)
single region only – although it now supports cross-region read replicas (vs Cloud Spanner)
Scaling
- provides vertical scalability (Max. storage of 10TB)
- storage can be increased without incurring any downtime
- provides an option to increase the storage automatically
- storage CANNOT be decreased
- supports Horizontal scaling for read-only using read replicas (vs Cloud Spanner)
- performance is linked to the disk size
Security
- data is encrypted when stored in database tables, temporary files, and backups.
- external connections can be encrypted by using SSL, or by using the Cloud SQL Proxy.
High Availability
- fault-tolerance across zones can be achieved by configuring the instance for high availability by adding a failover replica
- failover is automatic
- can be created from primary instance only
- replication from the primary instance to failover replica is semi-synchronous.
- failover replica must be in the same region as the primary instance, but in a different zone
- only one instance for every primary instance allowed
- supports managed backups and backups are created on primary instance only
- supports automatic replication
Backups
- Automated backups can be configured and are stored for 7 days
- Manual backups (snapshots) can be created and are not deleted automatically
Point-in-time recovery
- requires binary logging enabled.
- every update to the database is written to an independent log, which involves a small reduction in write performance.
- performance of the read operations is unaffected by binary logging, regardless of the size of the binary log files.
Usage Patterns
- direct lift and shift for MySQL, PostgreSQL, MSSQL database only
- relational database service with strong consistency
- OLTP workloads
Anti Patterns
- need data storage more than 10TB, use Cloud Spanner
- need global availability with low latency, use Cloud Spanner
- not a direct replacement for Oracle use installation on GCE
Common Use cases
- Websites, blogs, and content management systems (CMS)
- Business intelligence (BI) applications
- ERP, CRM, and eCommerce applications
- Geospatial applications
Consider using Cloud SQL for full relational SQL support for OTLP and lift and shift of MySQL, PostgreSQL databases

Cloud Spanner

Cloud Spanner provides fully managed, relational SQL databases with joins and secondary indexes
provides cross-region, global, horizontal scalability, and availability
supports strong consistency, including strongly consistent secondary indexes
provides high availability through synchronous and built-in data replication.
provides strong global consistency
supports database sizes exceeding ~2 TB (vs Cloud SQL)
does not provide direct lift and shift for relational databases (vs Cloud SQL)
expensive as compared to Cloud SQL
Consider using Cloud Spanner for full relational SQL support, with horizontal scalability spanning petabytes for OTLP

BigQuery

provides fully managed, no-ops, OLAP, enterprise data warehouse (EDW) with SQL and fast ad-hoc queries.
provides high capacity, data warehousing analytics solution
ideal for big data exploration and processing
not ideal for operational or transactional databases
provides SQL interface
A scalable, fully managed
Usage Patterns
- OLAP workloads up to petabyte-scale
- Big data exploration and processing
- Reporting via business intelligence (BI) tools
Anti Patterns
- Not an ideal storage option for transactional data or OLTP – Use Cloud SQL or Cloud Spanner instead
- Low-latency read/write access – Use Bigtable instead
Common Use cases
- Analytical reporting on large data
- Data science and advanced analyses
- Big data processing using SQL

Memorystore

provides scalable, secure, and highly available in-memory service for Redis and Memcached.
fully managed as provisioning, replication, failover, and patching are all automated, which drastically reduces the time spent doing DevOps.
provides 100% compatibility with open source Redis and Memcached
is protected from the internet using VPC networks and private IP and comes with IAM integration
Usage Patterns
- Lift and shift migration of applications
- Low latency data caching and retrieval
Anti Patterns
- Relational or NoSQL database
- Analytics solution
Common Use cases
- User session management

GCP Storage Options Decision Tree

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your application is hosted across multiple regions and consists of both relational database data and static images. Your database has over 10 TB of data. You want to use a single storage repository for each data type across all regions. Which two products would you choose for this task? (Choose two)
1. Cloud Bigtable
2. Cloud Spanner
3. Cloud SQL
4. Cloud Storage
You are building an application that stores relational data from users. Users across the globe will use this application. Your CTO is concerned about the scaling requirements because the size of the user base is unknown. You need to implement a database solution that can scale with your user growth with minimum configuration changes. Which storage solution should you use?
1. Cloud SQL
2. Cloud Spanner
3. Cloud Firestore
4. Cloud Datastore
Your company processes high volumes of IoT data that are time-stamped. The total data volume can be several petabytes. The data needs to be written and changed at a high speed. You want to use the most performant storage option for your data. Which product should you use?
1. Cloud Datastore
2. Cloud Storage
3. Cloud Bigtable
4. BigQuery
Your App Engine application needs to store stateful data in a proper storage service. Your data is non-relational database data. You do not expect the database size to grow beyond 10 GB and you need to have the ability to scale down to zero to avoid unnecessary costs. Which storage service should you use?
1. Cloud Bigtable
2. Cloud Dataproc
3. Cloud SQL
4. Cloud Datastore
A financial organization wishes to develop a global application to store transactions happening from different part of the world. The storage system must provide low latency transaction support and horizontal scaling. Which GCP service is appropriate for this use case?
1. Bigtable
2. Datastore
3. Cloud Storage
4. Cloud Spanner
You work for a mid-sized enterprise that needs to move its operational system transaction data from an on-premises database to GCP. The database is about 20 TB in size. Which database should you choose?
1. Cloud SQL
2. Cloud Bigtable
3. Cloud Spanner
4. Cloud Datastore

Google Cloud Storage – GCS

January 20, 2021 ~ Last updated on : July 22, 2021 ~ jayendrapatil

Google Cloud Storage – GCS

Google Cloud Storage is a service for storing unstructured data i.e. objects/blobs in Google Cloud.
Google Cloud Storage provides a RESTful service for storing and accessing the data on Google’s infrastructure.
GCS combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities.

Google Cloud Storage Components

Buckets

Buckets are the logical containers for objects
All buckets are associated with a project and projects can be grouped under an organization.
Bucket name considerations
- reside in a single Cloud Storage namespace.
- must be unique.
- are publicly visible.
- can only be assigned during creation and cannot be changed.
- can be used in a DNS record as part of a CNAME or A redirect.
Bucket name requirements
- must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Spaces are not allowed. Names containing dots require verification.
- must start and end with a number or letter.
- must contain 3-63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.
- cannot be represented as an IP address for e.g., 192.168.5.4
- cannot begin with the goog prefix.
- cannot contain google or close misspellings, such as g00gle.

Objects

An object is a piece of data consisting of a file of any format.
Objects are stored in containers called buckets.
Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime.
Objects can be overwritten and overwrites are Atomic
Object names reside in a flat namespace within a bucket, which means
- Different buckets can have objects with the same name.
- Objects do not reside within subdirectories in a bucket.
Existing objects cannot be directly renamed and need to be copied

Object Metadata

Objects stored in Cloud Storage have metadata associated with them
Metadata exists as key:value pairs and identifies properties of the object
Mutability of metadata varies as some metadata is set at the time the object is created for e.g. Content-Type, Cache-Control while for others they can be edited at any time

Composite Objects

Composite objects help to make appends to an existing object, as well as for recreating objects uploaded as multiple components in parallel.
Compose operation works with objects
- having the same storage class.
- be stored in the same Cloud Storage bucket.
- NOT use customer-managed encryption keys.

Cloud Storage Locations

GCS buckets need to be created in a location for storing the object data.
GCS support different location types
- regional
  - A region is a specific geographic place, such as London.
  - helps optimize latency and network bandwidth for data consumers, such as analytics pipelines, that are grouped in the same region.
- dual-region
  - is a specific pair of regions, such as Finland and the Netherlands.
  - provides higher availability that comes with being geo-redundant.
- multi-region
  - is a large geographic area, such as the United States, that contains two or more geographic places.
  - allows serving content to data consumers that are outside of the Google network and distributed across large geographic areas, or
  - provides higher availability that comes with being geo-redundant.
Objects stored in a multi-region or dual-region are geo-redundant i.e. data is stored redundantly in at least two separate geographic places separated by at least 100 miles.

Cloud Storage Classes

Refer blog Google Cloud Storage – Storage Classes

Cloud Storage Security

Refer blog Google Cloud Storage – Security

GCS Upload and Download

GCS supports upload and storage of any MIME type of data up to 5 TB
Uploaded object consists of the data along with any associated metadata
GCS supports multiple upload types
- Simple upload – ideal for small files that can be uploaded again in their entirety if the connection fails, and if there are no object metadata to send as part of the request.
- Multipart upload – ideal for small files that can be uploaded again in their entirety if the connection fails, and there is a need to include object metadata as part of the request.
- Resumable upload – ideal for large files with a need for more reliable transfer. Supports streaming transfers, which is a type of resumable upload that allows uploading an object of unknown size.

Resumable Upload

Resumable uploads are the recommended method for uploading large files because they don’t need to be restarted from the beginning if there is a network failure while the upload is underway.
Resumable upload allows resumption of data transfer operations to Cloud Storage after a communication failure has interrupted the flow of data
Resumable uploads work by sending multiple requests, each of which contains a portion of the object you’re uploading.
Resumable upload mechanism supports transfers where the file size is not known in advance or for streaming transfer.
Resumable upload must be completed within a week of being initiated.

Streaming Transfers

Streaming transfers allow streaming data to and from the Cloud Storage account without requiring that the data first be saved to a file.
Streaming uploads are useful when uploading data whose final size is not known at the start of the upload, such as when generating the upload data from a process, or when compressing an object on the fly.
Streaming downloads are useful to download data from Cloud Storage into a process.

Parallel Composite Uploads

Parallel composite uploads divide a file into up to 32 chunks, which are uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted
Parallel composite uploads can be significantly faster if network and disk speed are not limiting factors; however, the final object stored in the bucket is a composite object, which only has a crc32c hash and not an MD5 hash
As a result, crcmod needs to be used to perform integrity checks when downloading the object with gsutil or other Python applications.
You should only perform parallel composite uploads if the following apply:
Parallel composite uploads do not support buckets with default customer-managed encryption keys, because the compose operation does not support source objects encrypted in this way.
Parallel composite uploads do not need the uploaded objects to have an MD5 hash.

Object Versioning

Object Versioning retains a noncurrent object version when the live object version gets replaced, overwritten, or deleted
Object Versioning is disabled by default.
Object Versioning prevents accidental overwrites and deletion
Object Versioning causes deleted or overwritten objects to be archived instead of being deleted
Object Versioning increases storage costs as it maintains the current and noncurrent versions of the object, which can be partially mitigated by lifecycle management
Noncurrent versions retain the name of the object but are uniquely identified by their generation number.
Noncurrent versions only appear in requests that explicitly call for object versions to be included.
Objects versions can be permanently deleted by including the generation number or configuring Object Lifecycle Management to delete older object versions
Object versioning, if disabled, does not create versions for new ones but old versions are not deleted

Object Lifecycle Management

Object Lifecycle Management sets Time To Live (TTL) on an object and helps configure transition or expiration of the objects based on specified rules for e.g. SetStorageClass to downgrade the storage class, delete to expire noncurrent or archived objects
Lifecycle management configuration can be applied to a bucket, which contains a set of rules applied to current and future objects in the bucket
Lifecycle management rules precedence
- Delete action takes precedence over any SetStorageClass action.
- SetStorageClass action switches the object to the storage class with the lowest at-rest storage pricing takes precedence.
Cloud Storage doesn’t validate the correctness of the storage class transition
Lifecycle actions can be tracked using Cloud Storage usage logs or using Pub/Sub Notifications for Cloud Storage
Lifecycle management is done using rules, conditions, and actions and is applied if
- With multiple rules, any of the rules can be met (OR operation)
- All the conditions in a rule (AND operation) should be met

GCS Object Lifecycle Management

Object Lifecycle Behavior

Cloud Storage performs the action asynchronously, so there can be a lag between when the conditions are satisfied and the action is taken
Updates to lifecycle configuration may take up to 24 hours to take effect.
Delete action will not take effect on an object while the object either has an object hold placed on it or an unfulfilled retention policy.
SetStorageClass action is not affected by the existence of object holds or retention policies.
SetStorageClass does not rewrite an object and hence you are not charged for retrieval and deletion operations.

GCS Requester Pays

Project owner of the resource is billed normally for the access which includes operation charges, network charges, and data retrieval charges
However, if the requester provides a billing project with their request, the requester’s project is billed instead.
Requester Pays requires the requester to include a billing project in their requests, thus billing the requester’s project
Enabling Requester Pays is useful, e.g. if you have a lot of data to share, but you don’t want to be charged for their access to that data.
Requester Pays does not cover the storage charges and early deletion charges

CORS

Cloud Storage allows setting CORS configuration at the bucket level only

Cloud Storage Tracking Updates

Pub/Sub notifications
- sends information about changes to objects in the buckets to Pub/Sub, where the information is added to a specified Pub/Sub topic in the form of messages.
- Each notification contains information describing both the event that triggered it and the object that changed.
Audit Logs
- Google Cloud services write audit logs to help you answer the questions, “Who did what, where, and when?”
- Cloud projects contain only the audit logs for resources that are directly within the project.
- Cloud Audit Logs generates the following audit logs for operations in Cloud Storage:
  - Admin Activity logs: Entries for operations that modify the configuration or metadata of a project, bucket, or object.
    - Data Access logs: Entries for operations that modify objects or read a project, bucket, or object.

Data Consistency

Cloud Storage operations are primarily strongly consistent with few exceptions being eventually consistent
Cloud Storage provides strong global consistency for the following operations, including both data and metadata:
- Read-after-write
- Read-after-metadata-update
- Read-after-delete
- Bucket listing
- Object listing
Cloud Storage provides eventual consistency for following operations
- Granting access to or revoking access from resources.

gsutil

gsutil tool is the standard tool for small- to medium-sized transfers (less than 1 TB) over a typical enterprise-scale network, from a private data center to Google Cloud.
gsutil provides all the basic features needed to manage the Cloud Storage instances, including copying the data to and from the local file system and Cloud Storage.
gsutil can also move, rename and remove objects and perform real-time incremental syncs, like rsync, to a Cloud Storage bucket.
gsutil is especially useful in the following scenarios:
- as-needed transfers or during command-line sessions by your users.
- transferring only a few files or very large files, or both.
- consuming the output of a program (streaming output to Cloud Storage)
- watch a directory with a moderate number of files and sync any updates with very low latencies.
gsutil provides following features
- Parallel multi-threaded transfers with gsutil -m, increasing transfer speeds.
- Composite transfers for a single large file to break them into smaller chunks to increase transfer speed. Chunks are transferred and validated in parallel, sending all data to Google. Once the chunks arrive at Google, they are combined (referred to as compositing) to form a single object
gsutil perfdiag can help gather stats to provide diagnostic output to the Cloud Storage team

Best Practices

Use IAM over ACL whenever possible as IAM provides an audit trail
Cloud Storage auto-scaling performs well if requests ram up gradually rather than having a sudden spike.
- If the request rate is less than 1000 write requests per second or 5000 read requests per second, then no ramp-up is needed.
- If the request rate is expected to go over these thresholds, start with a request rate below or near the thresholds and then double the request rate no faster than every 20 minutes.
Avoid sequential naming bottleneck as Cloud Storage uploads data to different shards based on the file name/path as using the same pattern would overload a shard leading to performance degrade
Use Truncated exponential backoff as a standard error handling strategy
For multiple smaller files, use gsutil with -m option that performs a batched, parallel, multi-threaded/multi-processing to upload which can significantly increase the performance of an upload
For large objects downloads, use gsutil with HTTP Range GET requests to perform “sliced” downloads in parallel
To upload large files efficiently, use parallel composite upload with object composition to perform uploads in parallel for large, local files. It splits a large file into component pieces, uploads them in parallel, and then recomposes them once they’re in the cloud (and deletes the temporary components it created locally).

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

You have a collection of media files over 50GB each that you need to migrate to Google Cloud Storage. The files are in your on-premises data center. What migration method can you use to help speed up the transfer process?
1. Use multi-threaded uploads using the -m option.
2. Use parallel uploads to break the file into smaller chunks then transfer it simultaneously.
3. Use the Cloud Transfer Service to transfer.
4. Start a recursive upload.
Your company has decided to store data files in Cloud Storage. The data would be hosted in a regional bucket to start with. You need to configure Cloud Storage lifecycle rule to move the data for archival after 30 days and delete the data after a year. Which two actions should you take?
1. Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Coldline”, and create a second GCS life-cycle rule with Age: “365”, Storage Class: “Coldline”, and Action: “Delete”.
2. Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Coldline”, and create a second GCS life-cycle rule with Age: “275”, Storage Class: “Coldline”, and Action: “Delete”.
3. Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Nearline”, and create a second GCS life-cycle rule with Age: “365”, Storage Class: “Nearline”, and Action: “Delete”.
4. Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Nearline”, and create a second GCS life-cycle rule with Age: “275”, Storage Class: “Nearline”, and Action: “Delete”.

References

Google Cloud Platform – Cloud Storage

Google Cloud Storage – Storage Classes

January 15, 2021 ~ Last updated on : July 22, 2021 ~ jayendrapatil

Google Cloud Storage – Storage Classes

Google Cloud Storage – Storage class affects the object’s availability and pricing model
Storage class of an existing object can be changed either by rewriting the object or by using Object Lifecycle Management.
Bucket’s default storage class is set to Standard Storage, if not specified
A default storage class for the bucket can be specified so when a bucket is created, all the objects added to the bucket will inherit this storage class unless explicitly set otherwise.
Changing the default storage class of a bucket does not affect any of the objects that already exist in the bucket.

Storage Classes Options

All storage classes provide the following
- Unlimited storage with no minimum object size.
- Worldwide accessibility and worldwide storage locations.
- Low latency (time to the first byte typically tens of milliseconds).
- High durability (99.999999999% annual durability).
- Geo-redundancy, if the data is stored in a multi-region or dual-region.
- A uniform experience with Cloud Storage features, security, tools, and APIs

Standard Storage

Standard Storage is best for data that is frequently accessed (hot data) and/or stored for only brief periods of time.
for regional locations
- is appropriate for storing data in the same location for co-locating the resources such as GKE clusters or GCE instances with the data used, which helps in maximizing performance can reduce network charges.
- Availability SLA – 99.99%
for dual-region,
- provides optimized performance when accessing Google Cloud products that are located in one of the associated regions,
- provides improved availability that comes from storing data in geographically separate locations.
- Availability SLA > 99.99%
for multi-region
- ideal for storing data that is accessed around the world, such as serving website content, streaming videos, executing interactive workloads, or serving data supporting mobile and gaming applications.
- Availability SLA > 99.99%

Nearline Storage

Nearline Storage is a low-cost, highly durable storage service for storing infrequently accessed data (warm data)
Nearline Storage is a better choice than Standard Storage in scenarios where slightly lower availability, a 30-day minimum storage duration, and data access costs are acceptable trade-offs for lowered at-rest storage cost
Nearline Storage is ideal for data you plan to read or modify on average once per month or less. for e.g., if you want to continuously add files to Cloud Storage and plan to access those files once a month for analysis, Nearline Storage is a great choice.
Nearline Storage is also appropriate for data backup, long-tail multimedia content, and data archiving.

Coldline Storage

Coldline Storage provides a very-low-cost, highly durable storage service for storing infrequently accessed data (cold data)
Coldline Storage is a better choice than Standard Storage or Nearline Storage in scenarios where slightly lower availability, a 90-day minimum storage duration, and higher costs for data access are acceptable trade-offs for lowered at-rest storage costs.
Coldline Storage is ideal for data you plan to read or modify at most once a quarter.

Archive Storage

Archive Storage is the lowest-cost, highly durable storage service for data archiving, online backup, and disaster recovery. (coldest data)
Archive Storage has no availability SLA, though the typical availability is comparable to Nearline Storage and Coldline Storage.
Data is available within milliseconds, not hours or days.
Archive Storage has higher costs for data access and operations, as well as a 365-day minimum storage duration.
Archive Storage is the best choice for data that you plan to access less than once a year. for e.g. cold data storage for archival and disaster recovery

Google Cloud Storage - Storage Classes

Legacy Storage Classes

Google Cloud Storage provided additional storage classes which have been phased out
- Multi-Regional Storage
  - Equivalent to Standard Storage, except Multi-Regional Storage can only be used for objects stored in multi-regions or dual-regions.
- Regional Storage
  - Equivalent to Standard Storage, except Regional Storage, can only be used for objects stored in regions.
- Durable Reduced Availability (DRA) Storage:
  - Similar to Standard Storage except:
    - DRA has higher pricing for operations.
    - DRA has lower performance, particularly in terms of availability (DRA has a 99% availability SLA).

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

You’ve created a bucket to store some data archives for compliance. The data isn’t likely to need to be viewed. However, you need to store it for at least 7 years. What is the best default storage class?
1. Multi-regional
2. Coldline
3. Regional
4. Nearline

References

Google Cloud Storage – Storage Classes