AWS S3 Best Practices

S3 Best Practices

Performance

Multiple Concurrent PUTs/GETs

  • S3 scales to support very high request rates. S3 automatically partitions the buckets as needed to support higher request rates.
  • S3 can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per partitioned prefix in a bucket.
  • There are no limits to the number of prefixes in a bucket, so throughput can be scaled horizontally by parallelizing reads or writes across different prefixes.
  • Random prefix key naming is NO LONGER required for performance optimization.
    • Since July 2018, S3 automatically handles internal partitioning to support high request rates.
    • Logical or sequential naming patterns can be used without any performance implications.
    • S3 dynamically optimizes performance in response to sustained high request rates.
  • If a workload experiences sudden bursts above the per-prefix limit, S3 will return HTTP 503 (Slow Down) responses temporarily while it repartitions. Gradually ramping up request rates (prefix-level warm-up) helps avoid throttling for new prefixes.

S3 Express One Zone (High-Performance Storage)

  • S3 Express One Zone is a high-performance storage class (launched Nov 2023) purpose-built for latency-sensitive applications.
    • Delivers consistent single-digit millisecond first-byte read and write latency — up to 10x faster than S3 Standard.
    • Reduces request costs by up to 50% compared to S3 Standard.
    • Scales to process millions of requests per minute.
    • Uses directory buckets stored in a single Availability Zone.
    • Ideal for ML model training, interactive analytics, media content creation, and high-frequency trading.
  • AWS announced up to 85% price reductions for S3 Express One Zone in April 2025.

Transfer Acceleration

  • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and an S3 bucket.
  • Transfer Acceleration takes advantage of CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path.
  • Use the S3 Transfer Acceleration Speed Comparison tool to determine if it would benefit your use case.

GET-intensive Workloads

  • CloudFront can be used for performance optimization and can help by
    • distributing content with low latency and high data transfer rate.
    • caching the content and thereby reducing the number of direct requests to S3
    • providing multiple endpoints (Edge locations) for data availability
  • CloudFront RTMP distributions were deprecated on December 31, 2020. Use CloudFront Web distributions with HTTP-based streaming (HLS, DASH) for media delivery.
  • To fast data transport over long distances between a client and an S3 bucket, use S3 Transfer Acceleration. Transfer Acceleration uses the globally distributed edge locations in CloudFront to accelerate data transport over geographical distances.

PUTs/GETs for Large Objects

  • AWS allows Parallelizing the PUTs/GETs request to improve the upload and download performance as well as the ability to recover in case it fails
  • For PUTs, Multipart upload can help improve the uploads by
    • performing multiple uploads at the same time and maximizing network bandwidth utilization
    • quick recovery from failures, as only the part that failed to upload needs to be re-uploaded
    • ability to pause and resume uploads
    • begin an upload before the Object size is known
    • Recommended for objects larger than 100 MB; required for objects larger than 5 GB
  • For GETs, the Range HTTP header (byte-range fetches) can help improve the downloads by
    • allowing the object to be retrieved in parts instead of the whole object
    • quick recovery from failures, as only the part that failed to download needs to be retried
    • higher aggregate throughput by downloading parts in parallel

List Operations

  • Object key names are stored lexicographically in S3 indexes, making it hard to sort and manipulate the contents of LIST
  • S3 maintains a single lexicographically sorted list of indexes
  • Build and maintain Secondary Index outside of S3 for e.g. DynamoDB or RDS to store, index and query objects metadata rather than performing operations on S3
  • Use S3 Inventory reports (daily or weekly) as an alternative to LIST API calls for large buckets — more efficient and cost-effective for auditing or analytics workloads.

Security

  • Use Versioning
    • can be used to protect from unintended overwrites and deletions
    • allows the ability to retrieve and restore deleted objects or rollback to previous versions
  • Enable additional security by configuring a bucket to enable MFA (Multi-Factor Authentication) Delete
  • Versioning does not prevent Bucket deletion and must be backed up as if accidentally or maliciously deleted the data is lost
  • Use S3 Object Lock for WORM (Write Once Read Many) protection
    • Prevents objects from being deleted or overwritten for a fixed period or indefinitely
    • Supports Governance mode (can be overridden with special permissions) and Compliance mode (cannot be overridden by anyone, including root)
    • Requires versioning to be enabled
    • Helps meet regulatory requirements (SEC, FINRA, CFTC)
  • Use Same Region Replication or Cross Region Replication feature to backup data to a different bucket or region
  • When using VPC with S3, use VPC S3 endpoints as
    • are horizontally scaled, redundant, and highly available VPC components
    • help establish a private connection between VPC and S3 and the traffic never leaves the Amazon network
    • Support both Gateway endpoints (free, for S3 and DynamoDB) and Interface endpoints (PrivateLink, for cross-region or on-premises access)

S3 Security Defaults (Since 2023)

  • Default Encryption: Since January 5, 2023, all new objects are automatically encrypted with SSE-S3 (AES-256) at no additional cost. You can override with SSE-KMS or SSE-C.
  • Block Public Access: Since April 2023, S3 Block Public Access is enabled by default and ACLs are disabled for all new buckets.
  • SSE-C Disabled by Default: New general purpose buckets automatically disable server-side encryption with customer-provided keys (SSE-C) as a security best practice.
  • Use S3 Access Grants for scalable, fine-grained access control — maps S3 permissions to corporate identities via IAM Identity Center.

Refer blog post @ S3 Security Best Practices

Cost

  • Optimize S3 storage cost by selecting an appropriate storage class for objects:
    • S3 Standard — frequently accessed data
    • S3 Intelligent-Tiering — data with unknown or changing access patterns (automatically moves objects between Frequent, Infrequent, Archive Instant, Archive, and Deep Archive access tiers)
    • S3 Standard-IA — infrequent access, rapid retrieval needed
    • S3 One Zone-IA — infrequent access, non-critical data
    • S3 Glacier Instant Retrieval — archive data needing millisecond access
    • S3 Glacier Flexible Retrieval — archive with minutes to hours retrieval
    • S3 Glacier Deep Archive — lowest cost, 12-48 hour retrieval
    • S3 Express One Zone — highest performance, single-digit ms latency
  • Configure appropriate Lifecycle Management rules to automatically transition objects to lower-cost storage classes and expire them when no longer needed.
  • Use S3 Intelligent-Tiering as the default storage class for data with unpredictable access patterns — no retrieval charges, automatic optimization.
  • Use S3 Storage Lens to get organization-wide visibility into storage usage and activity trends, identify cost optimization opportunities, and apply data protection best practices.
  • Use S3 Storage Class Analysis to identify the optimal lifecycle policy for transitioning data to the right storage class.

Data Integrity

  • Use Conditional Writes (launched August 2024) to prevent overwriting existing objects
    • Supports If-None-Match header to check for object existence before creating
    • Supports If-Match header to check ETag before updating
    • Eliminates the need for external locking mechanisms (e.g., DynamoDB) for multi-writer applications
    • Can be enforced at the bucket level using bucket policies (November 2024)
  • Use S3 Object Lock for immutable data protection (compliance, ransomware protection)
  • Enable S3 Versioning to preserve every version of every object
  • Use additional checksums (CRC32, CRC32C, SHA-1, SHA-256) for end-to-end data integrity validation during uploads

Tracking and Monitoring

  • Use S3 Event Notifications with Amazon EventBridge for advanced event-driven architectures
    • Supports filtering by object size, key name patterns, metadata, and event time
    • Can route events to over 20+ AWS service targets
    • More flexible than legacy S3 Event Notifications (which only support SNS, SQS, and Lambda)
  • Use CloudTrail for API-level logging — captures all S3 API calls for auditing and compliance
  • Use S3 Server Access Logging for detailed access records (object-level access patterns)
  • Use CloudWatch to monitor S3 buckets, tracking metrics such as object counts, bytes stored, request counts, and latency
  • Use S3 Storage Lens for organization-wide visibility across all accounts and buckets with actionable recommendations

S3 Monitoring and Auditing Best Practices

Refer blog post @ S3 Monitoring and Auditing Best Practices

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    1. Increase your network bandwidth to provide faster throughput to S3
    2. Upload the files in parallel to S3 using multipart upload
    3. Pack all files into a single archive, upload it to S3, then extract the files in AWS
    4. Use AWS Import/Export to transfer the video files
  2. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?
    1. Use multi-part upload.
    2. Add a random prefix to the key names.
    3. Amazon S3 will automatically manage performance at this scale. (Since July 2018, S3 automatically partitions for high request rates. 150 PUT/s is well within the 3,500 PUT/s per prefix limit. Random prefixes are no longer needed.)
    4. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names
  3. You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?
    1. Enable enhanced networking
    2. Use Amazon S3 multipart upload
    3. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
    4. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance
  4. Which of the following methods gives you protection against accidental loss of data stored in Amazon S3? (Choose 2)
    1. Set bucket policies to restrict deletes, and also enable versioning
    2. By default, versioning is enabled on a new bucket so you don’t have to worry about it (Not enabled by default)
    3. Build a secondary index of your keys to protect the data (improves performance only)
    4. Back up your bucket to a bucket owned by another AWS account for redundancy
  5. A startup company hired you to help them build a mobile application that will ultimately store billions of image and videos in Amazon S3. The company is lean on funding, and wants to minimize operational costs, however, they have an aggressive marketing plan, and expect to double their current installation base every six months. Due to the nature of their business, they are expecting sudden and large increases to traffic to and from S3, and need to ensure that it can handle the performance needs of their application. What other information must you gather from this customer in order to determine whether S3 is the right option?
    1. You must know how many customers that company has today, because this is critical in understanding what their customer base will be in two years. (No. of customers do not matter)
    2. You must find out total number of requests per second at peak usage.
    3. You must know the size of the individual objects being written to S3 in order to properly design the key namespace. (Size does not relate to the key namespace design but the count does)
    4. In order to build the key namespace correctly, you must understand the total amount of storage needs for each S3 bucket. (S3 provided unlimited storage the key namespace design would depend on the number)
  6. A document storage company is deploying their application to AWS and changing their business model to support both free tier and premium tier users. The premium tier users will be allowed to store up to 200GB of data and free tier customers will be allowed to store only 5GB. The customer expects that billions of files will be stored. All users need to be alerted when approaching 75 percent quota utilization and again at 90 percent quota use. To support the free tier and premium tier users, how should they architect their application?
    1. The company should utilize an amazon simple workflow service activity worker that updates the users data counter in amazon dynamo DB. The activity worker will use simple email service to send an email if the counter increases above the appropriate thresholds.
    2. The company should deploy an amazon relational data base service relational database with a store objects table that has a row for each stored object along with size of each object. The upload server will query the aggregate consumption of the user in questions (by first determining the files store by the user, and then querying the stored objects table for respective file sizes) and send an email via Amazon Simple Email Service if the thresholds are breached. (Good Approach to use RDS but with so many objects might not be a good option)
    3. The company should write both the content length and the username of the files owner as S3 metadata for the object. They should then create a file watcher to iterate over each object and aggregate the size for each user and send a notification via Amazon Simple Queue Service to an emailing service if the storage threshold is exceeded. (List operations on S3 not feasible)
    4. The company should create two separated amazon simple storage service buckets one for data storage for free tier users and another for data storage for premium tier users. An amazon simple workflow service activity worker will query all objects for a given user based on the bucket the data is stored in and aggregate storage. The activity worker will notify the user via Amazon Simple Notification Service when necessary (List operations on S3 not feasible as well as SNS does not address email requirement)
  7. Your company host a social media website for storing and sharing documents. the web application allow users to upload large files while resuming and pausing the upload as needed. Currently, files are uploaded to your php front end backed by Elastic Load Balancing and an autoscaling fleet of amazon elastic compute cloud (EC2) instances that scale upon average of bytes received (NetworkIn) After a file has been uploaded. it is copied to amazon simple storage service(S3). Amazon Ec2 instances use an AWS Identity and Access Management (AMI) role that allows Amazon s3 uploads. Over the last six months, your user base and scale have increased significantly, forcing you to increase the auto scaling groups Max parameter a few times. Your CFO is concerned about the rising costs and has asked you to adjust the architecture where needed to better optimize costs. Which architecture change could you introduce to reduce cost and still keep your web application secure and scalable?
    1. Replace the Autoscaling launch Configuration to include c3.8xlarge instances; those instances can potentially yield a network throughput of 10gbps. (no info of current size and might increase cost)
    2. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic to directly upload the file to amazon s3 using the given credentials and S3 Prefix. (will not provide the ability to handle pause and restarts)
    3. Re-architect your ingest pattern, and move your web application instances into a VPC public subnet. Attach a public IP address for each EC2 instance (using the auto scaling launch configuration settings). Use Amazon Route 53 round robin records set and http health check to DNS load balance the app request this approach will significantly reduce the cost by bypassing elastic load balancing. (ELB is not the bottleneck)
    4. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic that used the S3 multipart upload API to directly upload the file to Amazon s3 using the given credentials and s3 Prefix. (multipart allows one to start uploading directly to S3 before the actual size is known or complete data is downloaded)
  8. If an application is storing hourly log files from thousands of instances from a high traffic web site, which naming scheme would give optimal performance on S3?
    1. Sequential
    2. instanceID_log-HH-DD-MM-YYYY
    3. instanceID_log-YYYY-MM-DD-HH
    4. HH-DD-MM-YYYY-log_instanceID (HH will give some randomness to start with instead of instanceId where the first characters would be i-)
    5. YYYY-MM-DD-HH-log_instanceID

    📝 Note: Since July 2018, S3 no longer requires random prefixes for performance. S3 automatically partitions based on request patterns. However, this exam question may still appear as it tests understanding of the historical key naming optimization concept.

  9. A company wants to ensure that objects uploaded to their S3 bucket are never accidentally overwritten by concurrent writes from multiple application instances. Which S3 feature should they use? [Added 2024]
    1. S3 Versioning
    2. S3 Object Lock in Governance mode
    3. S3 Conditional Writes with If-None-Match header (Conditional writes (Aug 2024) allow checking object existence before creating, preventing accidental overwrites without external locking)
    4. S3 Bucket Policy with deny overwrite
  10. A company stores millions of objects in S3 with unpredictable access patterns. Some objects are accessed frequently for a few weeks, then rarely accessed again. Which storage class provides the most cost-effective solution without operational overhead? [Added 2024]
    1. S3 Standard with lifecycle policy to S3 Standard-IA
    2. S3 Intelligent-Tiering (Automatically moves objects between frequent, infrequent, and archive access tiers based on access patterns with no retrieval charges and no operational overhead)
    3. S3 One Zone-IA
    4. S3 Standard with manual storage class changes
  11. An application requires single-digit millisecond latency for read and write operations on objects stored in S3. The application processes millions of transactions per minute. Which S3 storage option provides the best performance? [Added 2024]
    1. S3 Standard with CloudFront caching
    2. S3 Standard with Transfer Acceleration
    3. S3 Express One Zone (Delivers consistent single-digit millisecond latency, up to 10x faster than S3 Standard, and supports millions of requests per minute. Uses directory buckets in a single AZ.)
    4. S3 Standard with provisioned capacity

References

AWS Simple Storage Service – S3

AWS Simple Storage Service – S3

  • Amazon Simple Storage Service – S3 is a simple key, value object store designed for the Internet
  • provides unlimited storage space and works on the pay-as-you-use model. Service rates get cheaper as the usage volume increases
  • offers an extremely durable, highly available, and infinitely scalable data storage infrastructure at very low costs.
  • is Object-level storage (not Block level storage like EBS volumes) and cannot be used to host OS or dynamic websites (however, S3 can host static websites).
  • S3 resources e.g. buckets and objects are private by default.
  • As of March 2026, S3 stores more than 500 trillion objects, serves more than 200 million requests per second globally across hundreds of exabytes of data.
  • S3 provides strong read-after-write consistency for all operations (PUT, GET, LIST, DELETE, HEAD) automatically, at no additional cost, in all AWS Regions.
  • Starting January 5, 2023, all new objects are automatically encrypted with SSE-S3 (server-side encryption with Amazon S3 managed keys) by default at no additional cost.
  • Starting April 2023, all new S3 buckets have S3 Block Public Access enabled and ACLs disabled by default.
  • Starting April 2026, SSE-C (server-side encryption with customer-provided keys) is disabled by default on all new S3 general purpose buckets.

S3 Bucket Types

  • Amazon S3 offers multiple bucket types designed for different use cases:
    • General Purpose Buckets – Standard buckets for most workloads, storing objects across multiple Availability Zones for high durability
    • Directory Buckets – Used with S3 Express One Zone storage class, stored in a single Availability Zone for lowest latency access
    • Table Buckets – Store Apache Iceberg tables for analytics workloads with built-in table maintenance and optimization
    • Vector Buckets – Purpose-built for storing and querying vector embeddings for AI/ML applications

S3 Buckets & Objects

S3 Buckets

  • A bucket is a container for objects stored in S3
  • Buckets help organize the S3 namespace.
  • A bucket is owned by the AWS account that creates it and helps identify the account responsible for storage and data transfer charges.
  • Bucket names are globally unique, regardless of the AWS region in which it was created and the namespace is shared by all AWS accounts
  • Even though S3 is a global service, buckets are created within a region specified during the creation of the bucket.
  • Every object is contained in a bucket
  • There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether a single bucket or multiple buckets are used to store all the objects
  • The S3 data model is a flat structure i.e. there are no hierarchies or folders within the buckets. However, logical hierarchy can be inferred using the key name prefix e.g. Folder1/Object1
  • Restrictions
    • 10,000 general purpose buckets (default quota) per AWS account, with the ability to request up to 1 million buckets. (Updated Nov 2024: increased from the previous limit of 100)
    • Bucket names should be globally unique and DNS compliant
    • Bucket ownership is not transferable
    • Buckets cannot be nested and cannot have a bucket within another bucket
    • Bucket name and region cannot be changed, once created
  • Empty or a non-empty buckets can be deleted
  • S3 allows retrieval of 1000 objects and provides pagination support

Objects

  • Objects are the fundamental entities stored in a bucket
  • An object is uniquely identified within a bucket by a key name and version ID (if S3 versioning is enabled on the bucket)
  • Objects consist of object data, metadata, and others
    • Key is the object name and a unique identifier for an object
    • Value is actual content stored
    • Metadata is the data about the data and is a set of name-value pairs that describe the object e.g. content-type, size, last modified. Custom metadata can also be specified at the time the object is stored.
    • Version ID is the version id for the object and in combination with the key helps to uniquely identify an object within a bucket
    • Subresources help provide additional information for an object
    • Access Control Information helps control access to the objects
  • S3 objects allow two kinds of metadata
    • System metadata
      • Metadata such as the Last-Modified date is controlled by the system. Only S3 can modify the value.
      • System metadata that the user can control, e.g., the storage class, and encryption configured for the object.
    • User-defined metadata
      • User-defined metadata can be assigned during uploading the object or after the object has been uploaded.
      • User-defined metadata is stored with the object and is returned when an object is downloaded
      • S3 does not process user-defined metadata.
      • User-defined metadata must begin with the prefix “x-amz-meta“, otherwise S3 will not set the key-value pair as you define it
  • Object metadata cannot be modified after the object is uploaded and it can be only modified by performing copy operation and setting the metadata
  • Objects belonging to a bucket that reside in a specific AWS region never leave that region, unless explicitly copied using Cross Region Replication
  • Each object can be up to 5 TB in size
  • An object can be retrieved as a whole or a partially
  • With Versioning enabled, current as well as previous versions of an object can be retrieved

S3 Bucket & Object Operations

  • Listing
    • S3 allows the listing of all the keys within a bucket
    • A single listing request would return a max of 1000 object keys with pagination support using an indicator in the response to indicate if the response was truncated
    • Keys within a bucket can be listed using Prefix and Delimiter.
    • Prefix limits result in only those keys (kind of filtering) that begin with the specified prefix, and the delimiter causes the list to roll up all keys that share a common prefix into a single summary list result.
  • Retrieval
    • An object can be retrieved as a whole
    • An object can be retrieved in parts or partially (a specific range of bytes) by using the Range HTTP header.
    • Range HTTP header is helpful
      • if only a partial object is needed for e.g. multiple files were uploaded as a single archive
      • for fault-tolerant downloads where the network connectivity is poor
    • Objects can also be downloaded by sharing Pre-Signed URLs
    • Metadata of the object is returned in the response headers
  • Object Uploads
    • Single Operation – Objects of size 5GB can be uploaded in a single PUT operation
    • Multipart upload – can be used for objects of size > 5GB and supports the max size of 5TB. It is recommended for objects above size 100MB.
    • Pre-Signed URLs can also be used and shared for uploading objects
    • Objects if uploaded successfully can be verified if the request received a successful response. Additionally, returned ETag can be compared to the calculated MD5 value of the upload object
  • Conditional Writes
    • S3 supports conditional writes using HTTP conditional headers to prevent unintended overwrites (Launched August 2024)
    • If-None-Match – prevents overwrites of existing objects by checking that no object with the same key exists; useful for write-once patterns
    • If-Match – ensures an object has not been modified since last read by comparing ETags; useful for read-modify-write patterns (Added November 2024)
    • Conditional writes can be enforced via bucket policies using s3:if-none-match and s3:if-match condition keys
    • Supported on PutObject, CompleteMultipartUpload, and CopyObject operations
    • Helps coordinate simultaneous writes from multiple writers without external locking mechanisms
  • Copying Objects
    • Copying of objects up to 5GB can be performed using a single operation and multipart upload can be used for uploads up to 5TB
    • When an object is copied
      • user-controlled system metadata e.g. storage class and user-defined metadata are also copied.
      • system controlled metadata e.g. the creation date etc is reset
    • Copying Objects can be needed to
      • Create multiple object copies
      • Copy objects across locations or regions
      • Renaming of the objects
      • Change object metadata for e.g. storage class, encryption, etc
      • Updating any metadata for an object requires all the metadata fields to be specified again
  • Deleting Objects
    • S3 allows deletion of a single object or multiple objects (max 1000) in a single call
    • For Non Versioned buckets,
      • the object key needs to be provided and the object is permanently deleted
    • For Versioned buckets,
      • if an object key is provided, S3 inserts a delete marker and the previous current object becomes the non-current object
      • if an object key with a version ID is provided, the object is permanently deleted
      • if the version ID is of the delete marker, the delete marker is removed and the previous non-current version becomes the current version object
    • Deletion can be MFA enabled for adding extra security
  • Restoring Objects from Glacier
    • Objects must be restored before accessing an archived object stored in S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive
    • S3 Glacier Instant Retrieval provides millisecond access without requiring a restore operation
    • Retrieval options for Glacier Flexible Retrieval include:
      • Expedited – 1-5 minutes
      • Standard – 3-5 hours
      • Bulk – 5-12 hours
    • Restoration request also needs to specify the number of days for which the object copy needs to be maintained.
    • During this period, storage cost applies for both the archive and the copy.

Pre-Signed URLs

  • All buckets and objects are by default private.
  • Pre-signed URLs allows user to be able to download or upload a specific object without requiring AWS security credentials or permissions.
  • Pre-signed URL allows anyone to access the object identified in the URL, provided the creator of the URL has permission to access that object.
  • Pre-signed URLs creation requires the creator to provide security credentials, a bucket name, an object key, an HTTP method (GET for download object & PUT of uploading objects), and expiration date and time
  • Pre-signed URLs are valid only till the expiration date & time.
  • Pre-signed URLs can have a maximum expiration of 7 days when generated using SigV4.

Multipart Upload

  • Multipart upload allows the user to upload a single large object as a set of parts. Each part is a contiguous portion of the object’s data.
  • Multipart uploads support 1 to 10000 parts and each part can be from 5MB to 5GB with the last part size allowed to be less than 5MB
  • Multipart uploads allow a max upload size of 5TB
  • Object parts can be uploaded independently and in any order. If transmission of any part fails, it can be retransmitted without affecting other parts.
  • After all parts of the object are uploaded and completed initiated, S3 assembles these parts and creates the object.
  • Using multipart upload provides the following advantages:
    • Improved throughput – parallel upload of parts to improve throughput
    • Quick recovery from any network issues – Smaller part size minimizes the impact of restarting a failed upload due to a network error.
    • Pause and resume object uploads – Object parts can be uploaded over time. Once a multipart upload is initiated there is no expiry; you must explicitly complete or abort the multipart upload.
    • Begin an upload before the final object size is known – an object can be uploaded as is it being created
  • Three Step process
    • Multipart Upload Initiation
      • Initiation of a Multipart upload request to S3 returns a unique ID for each multipart upload.
      • This ID needs to be provided for each part upload, completion or abort request and listing of parts call.
      • All the Object metadata required needs to be provided during the Initiation call
    • Parts Upload
      • Parts upload of objects can be performed using the unique upload ID
      • A part number (between 1 – 10000) needs to be specified with each request which identifies each part and its position in the object
      • If a part with the same part number is uploaded, the previous part would be overwritten
      • After the part upload is successful, S3 returns an ETag header in the response which must be recorded along with the part number to be provided during the multipart completion request
    • Multipart Upload Completion or Abort
      • On Multipart Upload Completion request, S3 creates an object by concatenating the parts in ascending order based on the part number and associates the metadata with the object
      • Multipart Upload Completion request should include the unique upload ID with all the parts and the ETag information
      • The response includes an ETag that uniquely identifies the combined object data
      • On Multipart upload Abort request, the upload is aborted and all parts are removed. Any new part upload would fail. However, any in-progress part upload is completed, and hence an abort request must be sent after all the parts uploads have been completed.
      • S3 should receive a multipart upload completion or abort request else it will not delete the parts and storage would be charged.

S3 Transfer Acceleration

  • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and a bucket.
  • Transfer Acceleration takes advantage of CloudFront‘s globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path.
  • Transfer Acceleration will have additional charges while uploading data to S3 is free through the public Internet.

S3 Batch Operations

  • S3 Batch Operations help perform large-scale batch operations on S3 objects and can perform a single operation on lists of specified S3 objects.
  • A single job can perform a specified operation on billions of objects containing exabytes of data.
  • S3 tracks progress, sends notifications, and stores a detailed completion report of all actions, providing a fully managed, auditable, and serverless experience.
  • Batch Operations can be used with S3 Inventory to get the object list and use S3 Select to filter the objects.
  • Batch Operations can be used for copying objects, modify object metadata, applying ACLs, encrypting objects, transforming objects, invoke a custom lambda function, etc.

S3 Express One Zone

  • S3 Express One Zone is a high-performance, single-Availability Zone storage class designed for latency-sensitive applications (Launched November 2023)
  • Delivers data access speeds up to 10x faster and request costs up to 50-80% lower than S3 Standard
  • First S3 storage class where you can select a specific Availability Zone to co-locate storage with compute resources
  • Uses directory buckets instead of general purpose buckets, with a hierarchical namespace using forward slash (/) as delimiter
  • Designed for 99.95% availability within a single Availability Zone (vs. 99.99% for S3 Standard across multiple AZs)
  • Supports up to 200,000 reads and 100,000 writes per second per directory bucket
  • Ideal use cases:
    • Machine learning training and inference
    • Interactive analytics
    • Media content creation
    • High-performance computing (HPC)
    • Financial modeling
  • Uses session-based authentication (CreateSession API) for optimized request handling

S3 Tables (Apache Iceberg)

  • S3 Tables provide the first cloud object store with built-in Apache Iceberg support (Launched December 2024)
  • Optimized for analytics workloads with up to 3x faster query throughput and up to 10x higher transactions per second compared to self-managed tables
  • Stores tabular data in table buckets with tables as subresources
  • Provides automatic table maintenance including compaction, snapshot management, and unreferenced file removal
  • Supports Intelligent-Tiering access tiers for automatic cost optimization (Added 2025)
  • Integrates with analytics engines like Apache Spark, Trino, and Amazon Athena
  • Use cases: data lakes, business analytics, real-time analytics, and ML feature stores

S3 Vectors

  • S3 Vectors is the first cloud object storage with native support for storing and querying vector data (GA December 2025)
  • Reduces the total cost of storing and querying vectors by up to 90% compared to specialized vector database solutions
  • Uses a new bucket type — vector bucket — optimized for durable, low-cost vector storage
  • Supports up to 2 billion vectors per index and 10,000 vector indexes per vector bucket
  • Delivers sub-second latency for infrequent queries and ~100ms for frequent queries
  • Supports up to 50 metadata keys alongside each vector for fine-grained filtering
  • Ideal use cases:
    • AI agent persistent memory
    • Retrieval Augmented Generation (RAG)
    • Semantic search
    • Recommendation systems

S3 Files

  • S3 Files makes S3 buckets accessible as high-performance file systems on AWS compute resources (Launched April 2026)
  • First and only cloud object store that provides fully-featured, high-performance file system access via NFS v4.2
  • Provides full file system semantics with sub-millisecond latency on small files
  • Changes to data on the file system are automatically reflected in the S3 bucket
  • Can be attached to multiple compute resources enabling data sharing across clusters without duplication
  • Supported on EC2, Lambda, EKS, and ECS
  • Eliminates the tradeoff between object storage benefits and interactive file capabilities
  • Use cases: AI/ML training, legacy application migration, shared data access across compute

S3 Metadata

  • S3 Metadata automatically captures metadata for objects in general purpose buckets and stores it in read-only, fully managed Apache Iceberg tables (Preview Dec 2024, enhanced 2025)
  • Provides two types of metadata tables:
    • Journal table – records changes as objects are added or modified
    • Live inventory table – provides a complete current snapshot of all objects and their metadata
  • Accelerates data discovery for analytics, AI/ML model training, and content retrieval
  • Supports metadata for existing objects via backfill (Added July 2025)
  • Queryable using standard SQL via Amazon Athena, Spark, and other analytics engines

S3 Access Grants

  • S3 Access Grants provide a simplified model for defining access permissions to S3 data by prefix, bucket, or object (Launched November 2023)
  • Maps corporate identities from directories (Microsoft Entra ID, Okta) directly to S3 datasets without requiring IAM principal mapping
  • Integrates with AWS IAM Identity Center for trusted identity propagation
  • Logs end-user identity and application used to access S3 data in AWS CloudTrail
  • Integrates with AWS Glue, Amazon Redshift, and Lake Formation for analytics workloads
  • Provides fine-grained access control at the prefix or object level

Mountpoint for Amazon S3

  • Mountpoint for Amazon S3 is an open-source file client that mounts an S3 bucket as a local file system on Linux instances (GA August 2023)
  • Translates local file system API calls to S3 REST API calls automatically
  • Optimized for high-throughput read-heavy workloads (sequential and random reads, sequential writes)
  • Available as a CSI driver for Kubernetes/EKS containerized workloads
  • Backed by AWS support for customers with Business and Enterprise Support plans
  • Use cases: data lakes, machine learning training, HPC, media processing
  • Note: For full file system semantics including NFS access, see S3 Files (launched April 2026)

Virtual Hosted Style vs Path-Style Request

S3 allows the buckets and objects to be referred to in Path-style or Virtual hosted-style URLs

Path-style

  • Bucket name is not part of the domain (unless region specific endpoint used)
  • Endpoint used must match the region in which the bucket resides for e.g, if you have a bucket called mybucket that resides in the EU (Ireland) region with object named puppy.jpg, the correct path-style syntax URI is http://s3-eu-west-1.amazonaws.com/mybucket/puppy.jpg.
  • A “PermanentRedirect” error is received with an HTTP response code 301, and a message indicating what the correct URI is for the resource if a bucket is accessed outside the US East (N. Virginia) region with path-style syntax that uses either of the following:
    • http://s3.amazonaws.com
    • An endpoint for a region different from the one where the bucket resides for e.g., if you use http://s3-eu-west-1.amazonaws.com for a bucket that was created in the US West (N. California) region
  • Path-style URLs were planned for deprecation after September 30, 2020, but AWS has indefinitely delayed this plan. Virtual hosted-style is still recommended for all new implementations.

Virtual hosted-style

  • S3 supports virtual hosted-style and path-style access in all regions.
  • In a virtual-hosted-style URL, the bucket name is part of the domain name in the URL for e.g. http://bucketname.s3.amazonaws.com/objectname
  • S3 virtual hosting can be used to address a bucket in a REST API call by using the HTTP Host header
  • Benefits
    • attractiveness of customized URLs,
    • provides an ability to publish to the “root directory” of the bucket’s virtual server. This ability can be important because many existing applications search for files in this standard location.
  • S3 updates DNS to reroute the request to the correct location when a bucket is created in any region, which might take time.
  • S3 routes any virtual hosted-style requests to the US East (N.Virginia) region, by default, if the US East (N. Virginia) endpoint s3.amazonaws.com is used, instead of the region-specific endpoint (for e.g., s3-eu-west-1.amazonaws.com) and S3 redirects it with HTTP 307 redirect to the correct region.
  • When using virtual hosted-style buckets with SSL, the SSL wild card certificate only matches buckets that do not contain periods.To work around this, use HTTP or write your own certificate verification logic.
  • If you make a request to the http://bucket.s3.amazonaws.com endpoint, the DNS has sufficient information to route the request directly to the region where your bucket resides.

S3 Pricing

  • S3 costs vary by Region
  • S3 pricing has dropped approximately 85% since launch, with current rates as low as ~$0.021/GB/month for S3 Standard in US regions
  • Charges are incurred for
    • Storage – cost is per GB/month
    • Requests – per request cost varies depending on the request type GET, PUT
    • Data Transfer
      • data transfer-in is free
      • data transfer out is charged per GB/month (except in the same region or to Amazon CloudFront)

S3 Select (Maintenance Mode)

  • S3 Select is closed to new customers as of July 25, 2024. Existing customers can continue to use the service.
  • S3 Select enabled applications to retrieve only a subset of data from an object using simple SQL expressions
  • Recommended alternatives: S3 Object Lambda, Amazon Athena, or S3 Metadata with Apache Iceberg for querying object data

Additional Topics

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon S3 stand for?
    1. Simple Storage Solution.
    2. Storage Storage Storage (triple redundancy Storage).
    3. Storage Server Solution.
    4. Simple Storage Service
  2. What are characteristics of Amazon S3? Choose 2 answers
    1. Objects are directly accessible via a URL
    2. S3 should be used to host a relational database
    3. S3 allows you to store objects or virtually unlimited size
    4. S3 allows you to store virtually unlimited amounts of data
    5. S3 offers Provisioned IOPS
  3. You are building an automated transcription service in which Amazon EC2 worker instances process an uploaded audio file and generate a text file. You must store both of these files in the same durable storage until the text file is retrieved. You do not know what the storage capacity requirements are. Which storage option is both cost-efficient and scalable?
    1. Multiple Amazon EBS volume with snapshots
    2. A single Amazon Glacier vault
    3. A single Amazon S3 bucket
    4. Multiple instance stores
  4. A user wants to upload a complete folder to AWS S3 using the S3 Management console. How can the user perform this activity?
    1. Just drag and drop the folder using the flash tool provided by S3
    2. Use the Enable Enhanced Folder option from the S3 console while uploading objects
    3. The user cannot upload the whole folder in one go with the S3 management console
    4. Use the Enable Enhanced Uploader option from the S3 console while uploading objects (NOTE – The S3 console now natively supports folder upload via drag and drop without any special option)
  5. A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    1. Increase your network bandwidth to provide faster throughput to S3
    2. Upload the files in parallel to S3 using mulipart upload
    3. Pack all files into a single archive, upload it to S3, then extract the files in AWS
    4. Use AWS Import/Export to transfer the video files
  6. A company is deploying a two-tier, highly available web application to AWS. Which service provides durable storage for static content while utilizing lower Overall CPU resources for the web tier?
    1. Amazon EBS volume
    2. Amazon S3
    3. Amazon EC2 instance store
    4. Amazon RDS instance
  7. You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?
    1. Enable enhanced networking
    2. Use Amazon S3 multipart upload
    3. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
    4. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance
  8. When you put objects in Amazon S3, what is the indication that an object was successfully stored?
    1. Each S3 account has a special bucket named_s3_logs. Success codes are written to this bucket with a timestamp and checksum.
    2. A success code is inserted into the S3 object metadata.
    3. A HTTP 200 result code and MD5 checksum, taken together, indicate that the operation was successful.
    4. Amazon S3 is engineered for 99.999999999% durability. Therefore there is no need to confirm that data was inserted.
  9. You have private video content in S3 that you want to serve to subscribed users on the Internet. User IDs, credentials, and subscriptions are stored in an Amazon RDS database. Which configuration will allow you to securely serve private content to your users?
    1. Generate pre-signed URLs for each user as they request access to protected S3 content
    2. Create an IAM user for each subscribed user and assign the GetObject permission to each IAM user
    3. Create an S3 bucket policy that limits access to your private content to only your subscribed users’ credentials
    4. Create a CloudFront Origin Identity user for your subscribed users and assign the GetObject permission to this user
  10. You run an ad-supported photo sharing website using S3 to serve photos to visitors of your site. At some point you find out that other sites have been linking to the photos on your site, causing loss to your business. What is an effective method to mitigate this?
    1. Remove public read access and use signed URLs with expiry dates.
    2. Use CloudFront distributions for static content.
    3. Block the IPs of the offending websites in Security Groups.
    4. Store photos on an EBS volume of the web server.
  11. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?
    1. Use multi-part upload.
    2. Add a random prefix to the key names.
    3. Amazon S3 will automatically manage performance at this scale. (S3 automatically scales to handle at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per partitioned prefix, with no prefix randomization needed)
    4. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names
  12. What is the maximum number of S3 buckets available per AWS Account?
    1. 100 Per region
    2. There is no Limit
    3. 100 Per Account (Previously correct, but updated Nov 2024)
    4. 500 Per Account
    5. 100 Per IAM User
    6. 10,000 Per Account (default), up to 1 million per account by request (Updated Nov 2024)
  13. Your customer needs to create an application to allow contractors to upload videos to Amazon Simple Storage Service (S3) so they can be transcoded into a different format. She creates AWS Identity and Access Management (IAM) users for her application developers, and in just one week, they have the application hosted on a fleet of Amazon Elastic Compute Cloud (EC2) instances. The attached IAM role is assigned to the instances. As expected, a contractor who authenticates to the application is given a pre-signed URL that points to the location for video upload. However, contractors are reporting that they cannot upload their videos. Which of the following are valid reasons for this behavior? Choose 2 answers { “Version”: “2012-10-17”, “Statement”: [ { “Effect”: “Allow”, “Action”: “s3:*”, “Resource”: “*” } ] }
    1. The IAM role does not explicitly grant permission to upload the object. (The role has all permissions for all activities on S3)
    2. The contractorsˈ accounts have not been granted “write” access to the S3 bucket. (using pre-signed urls the contractors account don’t need to have access but only the creator of the pre-signed urls)
    3. The application is not using valid security credentials to generate the pre-signed URL.
    4. The developers do not have access to upload objects to the S3 bucket. (developers are not uploading the objects but its using pre-signed urls)
    5. The S3 bucket still has the associated default permissions. (does not matter as long as the user has permission to upload)
    6. The pre-signed URL has expired.
  14. A company wants to prevent concurrent writers from accidentally overwriting each other’s data in Amazon S3. Which S3 feature should they use?
    1. S3 Object Lock
    2. S3 Versioning with MFA Delete
    3. S3 Conditional Writes with If-None-Match or If-Match headers
    4. S3 Block Public Access
  15. A machine learning team needs the lowest latency access to frequently accessed training data stored in S3, and their compute resources are in a single Availability Zone. Which S3 storage class is MOST appropriate?
    1. S3 Standard
    2. S3 Intelligent-Tiering
    3. S3 Express One Zone
    4. S3 One Zone-Infrequent Access
  16. An organization wants to grant S3 data access to users based on their corporate directory identity without creating individual IAM users. Which S3 feature enables this? [Choose 1]
    1. S3 Bucket Policies with IAM conditions
    2. S3 ACLs with cross-account access
    3. S3 Access Grants with IAM Identity Center
    4. S3 Object Lambda Access Points
  17. Which of the following are S3 bucket types available as of 2025? (Choose 3)
    1. General purpose buckets
    2. Directory buckets
    3. Archive buckets
    4. Table buckets
    5. Compute buckets
  18. A data engineering team needs to automatically track and query metadata about millions of objects in their S3 bucket for data discovery. Which service should they use?
    1. S3 Inventory with Athena
    2. S3 Select with SQL queries
    3. S3 Metadata with managed Apache Iceberg tables
    4. AWS Glue Data Catalog