Table of Contents
hide
Google Cloud Storage – GCS
- Google Cloud Storage is a service for storing unstructured data i.e. objects/blobs in Google Cloud.
- Google Cloud Storage provides a RESTful service for storing and accessing the data on Google’s infrastructure.
- GCS combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities.
- GCS supports a maximum single-object size of up to 5 TiB.
- There is no limit on the number of objects that can be stored in a bucket.
Google Cloud Storage Components
Buckets
- Buckets are the logical containers for objects
- All buckets are associated with a project and projects can be grouped under an organization.
- Bucket name considerations
- reside in a single Cloud Storage namespace.
- must be unique.
- are publicly visible.
- can only be assigned during creation and cannot be changed.
- can be used in a DNS record as part of a
CNAMEorAredirect.
- Bucket name requirements
- must contain only lowercase letters, numbers, dashes (
-), underscores (_), and dots (.). Spaces are not allowed. Names containing dots require verification. - must start and end with a number or letter.
- must contain 3-63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.
- cannot be represented as an IP address for e.g., 192.168.5.4
- cannot begin with the
googprefix. - cannot contain
googleor close misspellings, such asg00gle.
- must contain only lowercase letters, numbers, dashes (
Objects
- An object is a piece of data consisting of a file of any format.
- Objects are stored in containers called buckets.
- Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime.
- Objects can be overwritten and overwrites are Atomic
- Object names reside in a flat namespace within a bucket (unless Hierarchical Namespace is enabled), which means
- Different buckets can have objects with the same name.
- Objects do not reside within subdirectories in a bucket.
- Existing objects cannot be directly renamed and need to be copied (unless the bucket has Hierarchical Namespace enabled, which supports atomic renames)
Object Metadata
- Objects stored in Cloud Storage have metadata associated with them
- Metadata exists as key:value pairs and identifies properties of the object
- Mutability of metadata varies as some metadata is set at the time the object is created for e.g. Content-Type, Cache-Control while for others they can be edited at any time
Composite Objects
- Composite objects help to make appends to an existing object, as well as for recreating objects uploaded as multiple components in parallel.
- Compose operation works with objects
- having the same storage class.
- be stored in the same Cloud Storage bucket.
- NOT use customer-managed encryption keys.
Cloud Storage Locations
- GCS buckets need to be created in a location for storing the object data.
- GCS support different location types
- regional
- A region is a specific geographic place, such as London.
- helps optimize latency and network bandwidth for data consumers, such as analytics pipelines, that are grouped in the same region.
- Regional buckets store data redundantly in at least two availability zones in the region.
- Object writes are only confirmed after data is redundantly stored across at least two availability zones.
- dual-region
- is a specific pair of regions, such as Finland and the Netherlands.
- provides higher availability that comes with being geo-redundant.
- Dual-region pairings can be predefined or configurable (choose any two regions within a continent).
- Supports Turbo Replication for a 15-minute Recovery Point Objective (RPO) across the two regions.
- multi-region
- is a large geographic area, such as the United States, that contains two or more geographic places.
- allows serving content to data consumers that are outside of the Google network and distributed across large geographic areas, or
- provides higher availability that comes with being geo-redundant.
- regional
- Objects stored in a multi-region or dual-region are geo-redundant i.e. data is stored redundantly in at least two separate geographic places separated by at least 100 miles.
Bucket Relocation (New – 2024)
- Bucket Relocation allows moving an existing bucket from one location to another without changing the bucket’s name or requiring manual data transfer.
- All object metadata remains identical throughout the relocation — no path changes needed.
- Applications experience minimal downtime while the underlying storage is moved.
- Bucket relocation requires Storage Intelligence to be configured.
- Limitations:
- Cannot relocate buckets containing objects with holds.
- Cannot relocate buckets with managed folders.
- CMEK/CSEK not supported for relocations with write downtime.
- Supports dry run mode to validate relocation before executing.
Cloud Storage Classes
Refer blog Google Cloud Storage – Storage Classes
Autoclass
- Autoclass simplifies and automates cost savings by automatically transitioning objects between storage classes based on access patterns.
- Autoclass removes retrieval charges, early deletion charges, and class transition charges for transitions it performs.
- The terminal storage class is configurable:
- Default terminal class is Nearline — objects transition down to Nearline and remain there until accessed.
- Can be set to Archive for maximum savings — objects continue transitioning through Coldline to Archive.
- Autoclass can be enabled on existing buckets (not just at creation time).
- Autoclass is ideal when access patterns are unknown or unpredictable.
Cloud Storage Security
Refer blog Google Cloud Storage – Security
GCS Upload and Download
- GCS supports upload and storage of any MIME type of data up to 5 TiB
- Uploaded object consists of the data along with any associated metadata
- GCS supports multiple upload types
- Simple upload – ideal for small files that can be uploaded again in their entirety if the connection fails, and if there are no object metadata to send as part of the request.
- Multipart upload – ideal for small files that can be uploaded again in their entirety if the connection fails, and there is a need to include object metadata as part of the request.
- Resumable upload – ideal for large files with a need for more reliable transfer. Supports streaming transfers, which is a type of resumable upload that allows uploading an object of unknown size.
Resumable Upload
- Resumable uploads are the recommended method for uploading large files because they don’t need to be restarted from the beginning if there is a network failure while the upload is underway.
- Resumable upload allows resumption of data transfer operations to Cloud Storage after a communication failure has interrupted the flow of data
- Resumable uploads work by sending multiple requests, each of which contains a portion of the object you’re uploading.
- Resumable upload mechanism supports transfers where the file size is not known in advance or for streaming transfer.
- Resumable upload must be completed within a week of being initiated.
Streaming Transfers
- Streaming transfers allow streaming data to and from the Cloud Storage account without requiring that the data first be saved to a file.
- Streaming uploads are useful when uploading data whose final size is not known at the start of the upload, such as when generating the upload data from a process, or when compressing an object on the fly.
- Streaming downloads are useful to download data from Cloud Storage into a process.
Parallel Composite Uploads
- Parallel composite uploads divide a file into up to 32 chunks, which are uploaded in parallel to temporary objects, the final object is recreated using the temporary objects, and the temporary objects are deleted
- Parallel composite uploads can be significantly faster if network and disk speed are not limiting factors; however, the final object stored in the bucket is a composite object, which only has a
crc32chash and not anMD5hash - Parallel composite uploads do not support buckets with default customer-managed encryption keys, because the compose operation does not support source objects encrypted in this way.
- Parallel composite uploads do not need the uploaded objects to have an MD5 hash.
Soft Delete (New – 2024)
- Soft Delete protects against accidental and malicious data deletion by retaining deleted objects for a configurable retention period.
- Soft delete is enabled by default on all new buckets with a 7-day retention duration.
- Retention duration is configurable between 7 and 90 days, or can be disabled entirely.
- Soft-deleted objects can be listed and restored within the retention window.
- Soft delete is compatible with all other Cloud Storage features including versioning, lifecycle management, and encryption.
- Soft-deleted objects incur storage charges at the same rate as the object’s storage class.
- Differs from Object Versioning:
- Soft delete applies to all deletions automatically (no need to enable versioning).
- Object Versioning retains overwritten versions; Soft Delete retains truly deleted objects.
- Both can be used together for maximum protection.
Object Versioning
- Object Versioning retains a noncurrent object version when the live object version gets replaced, overwritten, or deleted
- Object Versioning is disabled by default.
- Object Versioning prevents accidental overwrites and deletion
- Object Versioning causes deleted or overwritten objects to be archived instead of being deleted
- Object Versioning increases storage costs as it maintains the current and noncurrent versions of the object, which can be partially mitigated by lifecycle management
- Noncurrent versions retain the name of the object but are uniquely identified by their generation number.
- Noncurrent versions only appear in requests that explicitly call for object versions to be included.
- Objects versions can be permanently deleted by including the generation number or configuring Object Lifecycle Management to delete older object versions
- Object versioning, if disabled, does not create versions for new ones but old versions are not deleted
Object Lifecycle Management
- Object Lifecycle Management sets Time To Live (TTL) on an object and helps configure transition or expiration of the objects based on specified rules for e.g.
SetStorageClassto downgrade the storage class,deleteto expire noncurrent or archived objects - Lifecycle management configuration can be applied to a bucket, which contains a set of rules applied to current and future objects in the bucket
- Lifecycle management rules precedence
Deleteaction takes precedence over anySetStorageClassaction.SetStorageClassaction switches the object to the storage class with the lowest at-rest storage pricing takes precedence.
- Cloud Storage doesn’t validate the correctness of the storage class transition
- Lifecycle actions can be tracked using Cloud Storage usage logs or using Pub/Sub Notifications for Cloud Storage
- Lifecycle management is done using rules, conditions, and actions and is applied if
- With multiple rules, any of the rules can be met (OR operation)
- All the conditions in a rule (AND operation) should be met
- Available lifecycle actions:
Delete,SetStorageClass, andAbortIncompleteMultipartUpload

Object Lifecycle Behavior
- Cloud Storage performs the action asynchronously, so there can be a lag between when the conditions are satisfied and the action is taken
- Updates to lifecycle configuration may take up to 24 hours to take effect.
Deleteaction will not take effect on an object while the object either has an object hold placed on it or an unfulfilled retention policy.SetStorageClassaction is not affected by the existence of object holds or retention policies.SetStorageClassdoes not rewrite an object and hence you are not charged for retrieval and deletion operations.
GCS Requester Pays
- Project owner of the resource is billed normally for the access which includes operation charges, network charges, and data retrieval charges
- However, if the requester provides a billing project with their request, the requester’s project is billed instead.
- Requester Pays requires the requester to include a billing project in their requests, thus billing the requester’s project
- Enabling Requester Pays is useful, e.g. if you have a lot of data to share, but you don’t want to be charged for their access to that data.
- Requester Pays does not cover the storage charges and early deletion charges
CORS
- Cloud Storage allows setting CORS configuration at the bucket level only
Hierarchical Namespace – HNS (New – 2024)
- Hierarchical Namespace (HNS) provides true folder support in Cloud Storage buckets (GA November 2024).
- With HNS enabled, buckets support:
- Atomic rename operations on folders and objects (critical for data lake workloads like Apache Spark, Hive)
- True directory semantics — folders are first-class resources, not just prefix simulations
- Managed folders for fine-grained IAM access control at the folder level
- HNS must be enabled at bucket creation time and cannot be changed later.
- HNS buckets support soft delete and standard lifecycle management.
- Managed Folders allow applying IAM permissions to groups of objects sharing a common prefix, enabling folder-level access control without HNS as well.
Cloud Storage Tracking Updates
- Pub/Sub notifications
- sends information about changes to objects in the buckets to Pub/Sub, where the information is added to a specified Pub/Sub topic in the form of messages.
- Each notification contains information describing both the event that triggered it and the object that changed.
- Pub/Sub notifications are the recommended approach for tracking object changes.
Object Change Notification(Deprecated January 30, 2026)- Object change notification was a legacy mechanism using HTTP webhooks to notify applications about object changes.
- ⚠️ Deprecated as of January 30, 2026. Use Pub/Sub notifications instead.
- Audit Logs
- Google Cloud services write audit logs to help you answer the questions, “Who did what, where, and when?”
- Cloud projects contain only the audit logs for resources that are directly within the project.
- Cloud Audit Logs generates the following audit logs for operations in Cloud Storage:
- Admin Activity logs: Entries for operations that modify the configuration or metadata of a project, bucket, or object.
- Data Access logs: Entries for operations that modify objects or read a project, bucket, or object.
Data Consistency
- Cloud Storage operations are primarily strongly consistent with few exceptions being eventually consistent
- Cloud Storage provides strong global consistency for the following operations, including both data and metadata:
- Read-after-write
- Read-after-metadata-update
- Read-after-delete
- Bucket listing
- Object listing
- Cloud Storage provides eventual consistency for following operations
- Granting access to or revoking access from resources.
Cloud Storage Rapid (New – 2025)
- Cloud Storage Rapid is a high-performance storage offering designed for AI/ML, HPC, and data analytics workloads.
- Cloud Storage Rapid consists of two components:
- Rapid Bucket (formerly Rapid Storage)
- A zonal object storage bucket providing sub-millisecond latency for random reads and writes.
- Delivers up to 15 TB/s aggregate throughput and 20 million QPS.
- Co-locates data in the same physical zone as AI accelerators (TPUs/GPUs).
- Ideal for AI training data, checkpointing, and high-throughput analytics.
- Rapid Cache (formerly Anywhere Cache)
- An SSD-backed zonal read cache for existing Cloud Storage buckets.
- Creates caches in the same zone as compute workloads for low-latency access.
- Provides up to 2.5 TB/s throughput per cache.
- Fully managed, always returns consistent data.
- Supports ingest-on-write configuration for immediate cache population.
- Includes a recommender to identify optimal bucket-zone pairs for caching.
- Rapid Bucket (formerly Rapid Storage)
Storage Intelligence (New – 2025)
- Storage Intelligence is a unified management platform for data exploration, cost optimization, security enforcement, and governance.
- Key capabilities include:
- Zero-configuration dashboards — aggregated views of storage usage and activity
- Storage Insights Datasets — daily metadata and activity insights (within 4 hours) exported to BigQuery for analysis
- Storage Batch Operations — serverless batch operations on billions of objects (up to 1 billion objects in 3 hours)
- Bucket Relocation — move buckets between locations (requires Storage Intelligence)
- Gemini Cloud Assist integration — natural language queries for cost savings, security, and data discovery
- Storage Intelligence offers a 30-day introductory trial.
- Batch operations support: delete objects, set storage class, update metadata, and more at scale.
gcloud storage CLI
gcloud storageis the recommended CLI for Cloud Storage, replacing the legacygsutiltool.gcloud storageprovides significantly better performance:- 79% faster on downloads and 33% faster on uploads for multiple files.
- 94% faster on single large file downloads and 57% faster on uploads.
gcloud storagesupports newer features not available in gsutil (soft delete, managed folders, HNS, etc.).gcloud storagerequires less manual optimization for fastest transfer rates.- Common commands:
gcloud storage cp— copy files to/from Cloud Storagegcloud storage ls— list buckets and objectsgcloud storage rm— remove objectsgcloud storage rsync— synchronize filesgcloud storage buckets create— create bucketsgcloud storage buckets relocate— relocate buckets
gsutil (Legacy)
- ⚠️ gsutil is no longer the recommended CLI for Cloud Storage. Use
gcloud storagecommands instead. gsutildoes not support newer Cloud Storage features such as soft delete, managed folders, and hierarchical namespace.gsutilwas the standard tool for small- to medium-sized transfers (less than 1 TB).gsutilprovides basic features for managing Cloud Storage including copying data, moving, renaming, removing objects, and performing incremental syncs.- Users should migrate to
gcloud storagefor all new workflows.
Best Practices
- Use IAM over ACL whenever possible as IAM provides an audit trail
- Use Managed Folders for fine-grained folder-level access control
- Cloud Storage auto-scaling performs well if requests ramp up gradually rather than having a sudden spike.
- If the request rate is less than 1000 write requests per second or 5000 read requests per second, then no ramp-up is needed.
- If the request rate is expected to go over these thresholds, start with a request rate below or near the thresholds and then double the request rate no faster than every 20 minutes.
- Avoid sequential naming bottleneck as Cloud Storage uploads data to different shards based on the file name/path as using the same pattern would overload a shard leading to performance degradation
- Use Truncated exponential backoff as a standard error handling strategy
- Use
gcloud storagewith--parallel-threadsfor batch uploads of multiple smaller files - For large objects downloads, use sliced downloads (automatic in
gcloud storage) - To upload large files efficiently, use parallel composite upload with object composition
- Enable Soft Delete for protection against accidental deletions
- Use Autoclass when access patterns are unpredictable to optimize costs automatically
- Use Cloud Storage Rapid for AI/ML workloads requiring sub-millisecond latency
- Use Storage Intelligence for visibility into storage usage and cost optimization at scale
GCP Certification Exam Practice Questions
- Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
- GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
- GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
- Open to further feedback, discussion and correction.
- You have a collection of media files over 50GB each that you need to migrate to Google Cloud Storage. The files are in your on-premises data center. What migration method can you use to help speed up the transfer process?
- Use multi-threaded uploads using the -m option.
- Use parallel uploads to break the file into smaller chunks then transfer it simultaneously.
- Use the Cloud Transfer Service to transfer.
- Start a recursive upload.
- Your company has decided to store data files in Cloud Storage. The data would be hosted in a regional bucket to start with. You need to configure Cloud Storage lifecycle rule to move the data for archival after 30 days and delete the data after a year. Which two actions should you take?
- Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Coldline”, and create a second GCS life-cycle rule with Age: “365”, Storage Class: “Coldline”, and Action: “Delete”.
- Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Coldline”, and create a second GCS life-cycle rule with Age: “275”, Storage Class: “Coldline”, and Action: “Delete”.
- Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Nearline”, and create a second GCS life-cycle rule with Age: “365”, Storage Class: “Nearline”, and Action: “Delete”.
- Create a Cloud Storage lifecycle rule with Age: “30”, Storage Class: “Standard”, and Action: “Set to Nearline”, and create a second GCS life-cycle rule with Age: “275”, Storage Class: “Nearline”, and Action: “Delete”.
- Your organization has a Cloud Storage bucket with millions of objects that need their storage class changed to Coldline. What is the most efficient approach?
- Write a custom script to iterate through all objects and change the class.
- Use gsutil rewrite command with the -m flag.
- Use Storage Batch Operations to change the storage class of objects at scale.
- Configure lifecycle management to transition objects.
- A data engineering team needs to run Apache Spark jobs on data stored in Cloud Storage. They require atomic rename operations on directories for job output. What should you recommend?
- Use Object Versioning to handle concurrent writes.
- Store data in a regional bucket with standard namespace.
- Create a bucket with Hierarchical Namespace (HNS) enabled for atomic directory renames.
- Use Cloud Storage FUSE with retry logic.
- Your AI training pipeline requires sub-millisecond latency reads from Cloud Storage. The TPUs are in us-central1-a. What Cloud Storage offering should you use?
- Standard regional bucket in us-central1.
- Dual-region bucket with Turbo Replication.
- Cloud Storage Rapid Bucket co-located in the same zone as the TPUs.
- Multi-regional bucket with CDN caching.
- An application stores objects in Cloud Storage that are critical for compliance. The team needs protection against both accidental deletions and accidental overwrites. What combination should you enable?
- Object Versioning only.
- Soft Delete only.
- Both Object Versioning (for overwrite protection) and Soft Delete (for deletion protection).
- Retention policies with bucket lock.
- Your company needs to relocate a Cloud Storage bucket from us-east1 to europe-west1 without changing the bucket name or application configurations. What feature should you use?
- Create a new bucket and use Storage Transfer Service.
- Use Bucket Relocation to move the bucket to the new location.
- Enable dual-region on the existing bucket.
- Use gsutil rsync to copy data to a new bucket.
- You have workloads reading data from a multi-region bucket, but your GPUs are in a specific zone. You need low-latency access without changing the bucket type. What should you configure?
- Move data to a regional bucket in the same region.
- Configure Cloud Storage Rapid Cache in the same zone as the GPUs.
- Enable Turbo Replication on the bucket.
- Create a duplicate bucket in the target region.