AWS Storage Options – S3 & Glacier

Amazon S3

  • highly-scalable, reliable, and low-latency data storage infrastructure at very low costs.
  • provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from within Amazon EC2 or from anywhere on the web.
  • allows you to write, read, and delete objects containing from 1 byte to 5 terabytes of data each.
  • number of objects you can store in an Amazon S3 bucket is virtually unlimited.
  • highly secure, supporting encryption at rest, and providing multiple mechanisms to provide fine-grained control of access to Amazon S3 resources.
  • highly scalable, allowing concurrent read or write access to Amazon S3 data by many separate clients or application threads.
  • provides data lifecycle management capabilities, allowing users to define rules to automatically archive Amazon S3 data to Amazon Glacier, or to delete data at end of life.

Ideal Use Cases

  • Storage & Distribution of static web content and media
    • frequently used to host static websites and provides a highly-available and highly-scalable solution for websites with only static content, including HTML files, images, videos, and client-side scripts such as JavaScript
    • works well for fast growing websites hosting data intensive, user-generated content, such as video and photo sharing sites as no storage provisioning is required
    • content can either be directly served from Amazon S3 since each object in Amazon S3 has a unique HTTP URL address
    • can also act as an Origin store for the Content Delivery Network (CDN) such as Amazon CloudFront
    • it works particularly well for hosting web content with extremely spiky bandwidth demands because of S3’s elasticity
  • Data Store for Large Objects
    • can be paired with RDS or NoSQL database and used to store large objects for e.g. file or objects, while the associated metadata for e.g. name, tags, comments etc. can be stored in RDS or NoSQL database where it can be indexed and queried providing faster access to relevant data
  • Data store for computation and large-scale analytics
    • commonly used as a data store for computation and large-scale analytics, such as analyzing financial transactions, clickstream analytics, and media transcoding.
    • data can be accessed from multiple computing nodes concurrently without being constrained by a single connection because of its horizontal scalability
  • Backup and Archival of critical data
    • used as a highly durable, scalable, and secure solution for backup and archival of critical data, and to provide disaster recovery solutions for business continuity.
    • stores objects redundantly on multiple devices across multiple facilities, it provides the highly-durable storage infrastructure needed for these scenarios.
    • it’s versioning capability is available to protect critical data from inadvertent deletion

Anti-Patterns

Amazon S3 has following Anti-Patterns where it is not an optimal solution

  • Dynamic website hosting
    • While Amazon S3 is ideal for hosting static websites, dynamic websites requiring server side interaction, scripting or database interaction cannot be hosted and should rather be hosted on Amazon EC2
  • Backup and archival storage
    • Data requiring long term archival storage with infrequent read access can be stored more cost effectively in Amazon Glacier
  • Structured Data Query
    • Amazon S3 doesn’t offer query capabilities, so to read an object the object name and key must be known. Instead pair up S3 with RDS or Dynamo DB to store, index and query metadata about Amazon S3 objects
  • Rapidly Changing Data
    • Data that needs to updated frequently might be better served by a storage solution with lower read/write latencies, such as Amazon EBS volumes, RDS or Dynamo DB.
  • File System
    • Amazon S3 uses a flat namespace and isn’t meant to serve as a standalone, POSIX-compliant file system. However, by using delimiters (commonly either the ‘/’ or ‘’ character) you are able construct your keys to emulate the hierarchical folder structure of file system within a given bucket.

Performance

  • Access to Amazon S3 from within Amazon EC2 in the same region is fast.
  • Amazon S3 is designed so that server-side latencies are insignificant relative to Internet latencies.
  • Amazon S3 is also built to scale storage, requests, and users to support a virtually unlimited number of web-scale applications.
  • If Amazon S3 is accessed using multiple threads, multiple applications, or multiple clients concurrently, total Amazon S3 aggregate throughput will typically scale to rates that far exceed what any single server can generate or consume.

Durability & Availability

  • Amazon S3 storage provides provides the highest level of data durability and availability, by automatically and synchronously storing your data across both multiple devices and multiple facilities within the selected geographical region
  • Error correction is built-in, and there are no single points of failure. Amazon S3 is designed to sustain the concurrent loss of data in two facilities, making it very well-suited to serve as the primary data storage for mission-critical data.
  • Amazon S3 is designed for 99.999999999% (11 nines) durability per object and 99.99% availability over a one-year period.
  • Amazon S3 data can be protected from unintended deletions or overwrites using Versioning.
  • Versioning can be enabled with MFA (Multi Factor Authentication) Delete on the bucket, which would require two forms of authentication to delete an object
  • For Non Critical and Reproducible data for e.g. thumbnails, transcoded media etc., S3 Reduced Redundancy Storage (RRS) can be used, which provides a lower level of durability at a lower storage cost
  • RRS is designed to provide 99.99% durability per object over a given year. While RRS is less durable than standard Amazon S3, it is still designed to provide 400 times more durability than a typical disk drive

Cost Model

  • With Amazon S3, you pay only for what you use and there is no minimum fee.
  • Amazon S3 has three pricing components: storage (per GB per month), data transfer in or out (per GB per month), and requests (per n thousand requests per month).

Scalability & Elasticity

  • Amazon S3 has been designed to offer a very high level of scalability and elasticity automatically
  • Amazon S3 supports a virtually unlimited number of files in any bucket
  • Amazon S3 bucket can store a virtually unlimited number of bytes
  • Amazon S3 allows you to store any number of objects (files) in a single bucket, and Amazon S3 will automatically manage scaling and distributing redundant copies of your information to other servers in other locations in the same region, all using Amazon’s high-performance infrastructure.

Interfaces

  • Amazon S3 provides standards-based REST and SOAP web services APIs for both management and data operations.
  • Amazon S3 provides easier to use higher level toolkit or SDK in different languages (Java, .NET, PHP, and Ruby) that wraps the underlying APIs
  • Amazon S3 Command Line Interface (CLI) provides a set of high-level, Linux-like Amazon S3 file commands for common operations, such as ls, cp, mv, sync, etc. They also provide the ability to perform recursive uploads and downloads using a single folder-level Amazon S3 command, and supports parallel transfers.
  • AWS Management Console provides the ability to easily create and manage Amazon S3 buckets, upload and download objects, and browse the contents of your Amazon S3 buckets using a simple web-based user interface
  • All interfaces provide the ability to store Amazon S3 objects (files) in uniquely-named buckets (top-level folders), with each object identified by an unique Object key within that bucket.

Glacier

  • extremely low-cost storage service that provides highly secure, durable, and flexible storage for data backup and archival
  • can reliably store their data for as little as $0.01 per gigabyte per month.
  • to offload the administrative burdens of operating and scaling storage to AWS such as capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time consuming hardware migrations
  • Data is stored in Amazon Glacier as Archives where an archive can represent a single file or multiple files combined into a single archive
  • Archives are stored in Vaults for which the access can be controlled through IAM
  • Retrieving archives from Vaults require initiation of a job and can take anywhere around 3-5 hours
  • Amazon Glacier integrates seamlessly with Amazon S3 by using S3 data lifecycle management policies to move data from S3 to Glacier
  • AWS Import/Export can also be used to accelerate moving large amounts of data into Amazon Glacier using portable storage devices for transport

Ideal Usage Patterns

  • Amazon Glacier is ideally suited for long term archival solution for infrequently accessed data with archiving offsite enterprise information, media assets, research and scientific data, digital preservation and magnetic tape replacement

Anti-Patterns

Amazon Glacier has following Anti-Patterns where it is not an optimal solution

  • Rapidly changing data
    • Data that must be updated very frequently might be better served by a storage solution with lower read/write latencies such as Amazon EBS or a Database
  • Real time access
    • Data stored in Glacier can not be accessed at real time and requires an initiation of a job for object retrieval with retrieval times ranging from 3-5 hours. If immediate access is needed, Amazon S3 is a better choice.

Performance

  • Amazon Glacier is a low-cost storage service designed to store data that is infrequently accessed and long lived.
  • Amazon Glacier jobs typically complete in 3 to 5 hours

Durability and Availability

  • Amazon Glacier redundantly stores data in multiple facilities and on multiple devices within each facility
  • Amazon Glacier is designed to provide average annual durability of 99.999999999% (11 nines) for an archive
  • Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives.
  • Amazon Glacier also performs regular, systematic data integrity checks and is built to be automatically self-healing.

Cost Model

  • Amazon Glacier has three pricing components: storage (per GB per month), data transfer out (per GB per month), and requests (per thousand UPLOAD and RETRIEVAL requests per month).
  • Amazon Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time and allows you to retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. Any additional amount of data retrieved is charged per GB
  • Amazon Glacier also charges a pro-rated charge (per GB) for items deleted prior to 90 days

Scalability & Elasticity

  • A single archive is limited to 40 TBs, but there is no limit to the total amount of data you can store in the service.
  • Amazon Glacier scales to meet your growing and often unpredictable storage requirements whether you’re storing petabytes or gigabytes, Amazon Glacier automatically scales your storage up or down as needed.

Interfaces

  • Amazon Glacier provides a native, standards-based REST web services interface, as well as Java and .NET SDKs.
  • AWS Management Console or the Amazon Glacier APIs can be used to create vaults to organize the archives in Amazon Glacier.
  • Amazon Glacier APIs can be used to upload and retrieve archives, monitor the status of your jobs and also configure your vault to send you a notification via Amazon Simple Notification Service (Amazon SNS) when your jobs complete.
  • Amazon Glacier can be used as a storage class in Amazon S3 by using object lifecycle management to provide automatic, policy-driven archiving from Amazon S3 to Amazon Glacier.
  • Amazon S3 api provides a RESTORE operation and the retrieval process takes the same 3-5 hours
  • On retrieval, a copy of the retrieved object is placed in Amazon S3 RRS storage for a specified retention period; the original archived object remains stored in Amazon Glacier and you are charged for both the storage.
  • When using Amazon Glacier as a storage class in Amazon S3, use the Amazon S3 APIs, and when using “native” Amazon Glacier, you use the Amazon Glacier APIs
  • Objects archived to Amazon Glacier via Amazon S3 can only be listed and retrieved via the Amazon S3 APIs or the AWS Management Console—they are not visible as archives in an Amazon Glacier vault.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You want to pass queue messages that are 1GB each. How should you achieve this?
    1. Use Kinesis as a buffer stream for message bodies. Store the checkpoint id for the placement in the Kinesis Stream in SQS.
    2. Use the Amazon SQS Extended Client Library for Java and Amazon S3 as a storage mechanism for message bodies. (Amazon SQS messages with Amazon S3 can be useful for storing and retrieving messages with a message size of up to 2 GB. To manage Amazon SQS messages with Amazon S3, use the Amazon SQS Extended Client Library for Java. Refer link)
    3. Use SQS’s support for message partitioning and multi-part uploads on Amazon S3.
    4. Use AWS EFS as a shared pool storage medium. Store filesystem pointers to the files on disk in the SQS message bodies.
  2. Company ABCD has recently launched an online commerce site for bicycles on AWS. They have a “Product” DynamoDB table that stores details for each bicycle, such as, manufacturer, color, price, quantity and size to display in the online store. Due to customer demand, they want to include an image for each bicycle along with the existing details. Which approach below provides the least impact to provisioned throughput on the “Product” table?
    1. Serialize the image and store it in multiple DynamoDB tables
    2. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
    3. Add an image data type to the “Product” table to store the images in binary format
    4. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

References

AWS S3 Best Practices – Certification

S3 Best Practices

Performance

Multiple Concurrent PUTs/GETs

  • S3 scales to support very high request rates. If the request rate grows steadily, S3 automatically partitions the buckets as needed to support higher request rates.
  • If the typical workload involves only occasional bursts of 100 requests per second and less than 800 requests per second, AWS scales and handle it.
  • If the typical workload involves request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, its recommended to open a support case to prepare for the workload and avoid any temporary limits on your request rate.
  • S3 best practice guidelines can be applied only if you are routinely processing 100 or more requests per second
  • Workloads that include a mix of request types
    • If the request workload are typically a mix of GET, PUT, DELETE, or GET Bucket (list objects), choosing appropriate key names for the objects ensures better performance by providing low-latency access to the S3 index
    • This behavior is driven by how S3 stores key names.
      • S3 maintains an index of object key names in each AWS region.
      • Object keys are stored lexicographically (UTF-8 binary ordering) across multiple partitions in the index i.e. S3 stores key names in alphabetical order.
      • Object keys are stored in across multiple partitions in the index and the key name dictates which partition the key is stored in
      • Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that S3 will target a specific partition for a large number of keys, overwhelming the I/O capacity of the partition.
    • Introduce some randomness in the key name prefixes, the key names, and the I/O load, will be distributed across multiple index partitions.
    • It also ensures scalability regardless of the number of requests sent per second.
  • Workloads that are GET-intensive
    • Cloudfront can be used for performance optimization and can help by
      • distributing content with low latency and high data transfer rate.
      • caching the content and thereby reducing the number of direct requests to S3
      • providing multiple endpoints (Edge locations) for data availability
      • available in two flavors as Web distribution or RTMP distribution

PUTs/GETs for Large Objects

  • AWS allows Parallelizing the PUTs/GETs request to improve the upload and download performance as well as the ability to recover in case it fails
  • For PUTs, Multipart upload can help improve the uploads by
    • performing multiple uploads at the same time and maximizing network bandwidth utilization
    • quick recovery from failures, as only the part that failed to upload needs to be re-uploaded
    • ability to pause and resume uploads
    • begin an upload before the Object size is known
  • For GETs, range http header can help to improve the downloads by
    • allowing the object to be retrieved in parts instead of the whole object
    • quick recovery from failures, as only the part that failed to upload needs to be re-uploaded

List Operations

  • Object key names are stored lexicographically in Amazon S3 indexes, making it hard to sort and manipulate the contents of LIST
  • S3 maintains a single lexicographically sorted list of indexes
  • Build and maintain Secondary Index outside of S3 for e.g. DynamoDB or RDS to store, index and query objects metadata rather then performing operations on S3

Security

  • Use Versioning
    • can be used to protect from unintended overwrites and deletions
    • allows the ability to retrieve and restore deleted objects or rollback to previous versions
  • Enable additional security by configuring a bucket to enable MFA (Multi-Factor Authentication) delete
  • Versioning does not prevent Bucket deletion and must be backed up, as if accidentally or maliciously deleted the data is lost
  • Use Cross Region replication feature to backup data to a different region
  • When using VPC with S3, use VPC S3 endpoints as
    • are horizontally scaled, redundant, and highly available VPC components
    • help establish a private connection between VPC and S3 and the traffic never leaves the Amazon network

Cost

  • Optimize S3 storage cost by selecting an appropriate storage class for objects
  • Configure appropriate lifecycle management rules to move objects to different storage classes and expire them

Tracking

  • Use Event Notifications to be notified for any put or delete request on the S3 objects
  • Use CloudTrail, which helps capture specific API calls made to S3 from the AWS account and delivers the log files to an S3 bucket
  • Use CloudWatch to monitor the Amazon S3 buckets, tracking metrics such as object counts and bytes stored and configure appropriate actions

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    1. Increase your network bandwidth to provide faster throughput to S3
    2. Upload the files in parallel to S3 using multipart upload
    3. Pack all files into a single archive, upload it to S3, then extract the files in AWS
    4. Use AWS Import/Export to transfer the video files
  2. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?
    1. Use multi-part upload.
    2. Add a random prefix to the key names.
    3. Amazon S3 will automatically manage performance at this scale.
    4. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names
  3. You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?
    1. Enable enhanced networking
    2. Use Amazon S3 multipart upload
    3. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
    4. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance
  4. Which of the following methods gives you protection against accidental loss of data stored in Amazon S3? (Choose 2)
    1. Set bucket policies to restrict deletes, and also enable versioning
    2. By default, versioning is enabled on a new bucket so you don’t have to worry about it (Not enabled by default)
    3. Build a secondary index of your keys to protect the data (improves performance only)
    4. Back up your bucket to a bucket owned by another AWS account for redundancy
  5. A startup company hired you to help them build a mobile application that will ultimately store billions of image and videos in Amazon S3. The company is lean on funding, and wants to minimize operational costs, however, they have an aggressive marketing plan, and expect to double their current installation base every six months. Due to the nature of their business, they are expecting sudden and large increases to traffic to and from S3, and need to ensure that it can handle the performance needs of their application. What other information must you gather from this customer in order to determine whether S3 is the right option?
    1. You must know how many customers that company has today, because this is critical in understanding what their customer base will be in two years. (No. of customers do not matter)
    2. You must find out total number of requests per second at peak usage.
    3. You must know the size of the individual objects being written to S3 in order to properly design the key namespace. (Size does not relate to the key namespace design but the count does)
    4. In order to build the key namespace correctly, you must understand the total amount of storage needs for each S3 bucket. (S3 provided unlimited storage the key namespace design would depend on the number)
  6. A document storage company is deploying their application to AWS and changing their business model to support both free tier and premium tier users. The premium tier users will be allowed to store up to 200GB of data and free tier customers will be allowed to store only 5GB. The customer expects that billions of files will be stored. All users need to be alerted when approaching 75 percent quota utilization and again at 90 percent quota use. To support the free tier and premium tier users, how should they architect their application?
    1. The company should utilize an amazon simple workflow service activity worker that updates the users data counter in amazon dynamo DB. The activity worker will use simple email service to send an email if the counter increases above the appropriate thresholds.
    2. The company should deploy an amazon relational data base service relational database with a store objects table that has a row for each stored object along with size of each object. The upload server will query the aggregate consumption of the user in questions (by first determining the files store by the user, and then querying the stored objects table for respective file sizes) and send an email via Amazon Simple Email Service if the thresholds are breached. (Good Approach to use RDS but with so many objects might not be a good option)
    3. The company should write both the content length and the username of the files owner as S3 metadata for the object. They should then create a file watcher to iterate over each object and aggregate the size for each user and send a notification via Amazon Simple Queue Service to an emailing service if the storage threshold is exceeded. (List operations on S3 not feasible)
    4. The company should create two separated amazon simple storage service buckets one for data storage for free tier users and another for data storage for premium tier users. An amazon simple workflow service activity worker will query all objects for a given user based on the bucket the data is stored in and aggregate storage. The activity worker will notify the user via Amazon Simple Notification Service when necessary (List operations on S3 not feasible as well as SNS does not address email requirement)
  7. Your company host a social media website for storing and sharing documents. the web application allow users to upload large files while resuming and pausing the upload as needed. Currently, files are uploaded to your php front end backed by Elastic Load Balancing and an autoscaling fleet of amazon elastic compute cloud (EC2) instances that scale upon average of bytes received (NetworkIn) After a file has been uploaded. it is copied to amazon simple storage service(S3). Amazon Ec2 instances use an AWS Identity and Access Management (AMI) role that allows Amazon s3 uploads. Over the last six months, your user base and scale have increased significantly, forcing you to increase the auto scaling groups Max parameter a few times. Your CFO is concerned about the rising costs and has asked you to adjust the architecture where needed to better optimize costs. Which architecture change could you introduce to reduce cost and still keep your web application secure and scalable?
    1. Replace the Autoscaling launch Configuration to include c3.8xlarge instances; those instances can potentially yield a network throughput of 10gbps. (no info of current size and might increase cost)
    2. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic to directly upload the file to amazon s3 using the given credentials and S3 Prefix. (will not provide the ability to handle pause and restarts)
    3. Re-architect your ingest pattern, and move your web application instances into a VPC public subnet. Attach a public IP address for each EC2 instance (using the auto scaling launch configuration settings). Use Amazon Route 53 round robin records set and http health check to DNS load balance the app request this approach will significantly reduce the cost by bypassing elastic load balancing. (ELB is not the bottleneck)
    4. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic that used the S3 multipart upload API to directly upload the file to Amazon s3 using the given credentials and s3 Prefix. (multipart allows one to start uploading directly to S3 before the actual size is known or complete data is downloaded)
  8. If an application is storing hourly log files from thousands of instances from a high traffic web site, which naming scheme would give optimal performance on S3?
    1. Sequential
    2. instanceID_log-HH-DD-MM-YYYY
    3. instanceID_log-YYYY-MM-DD-HH
    4. HH-DD-MM-YYYY-log_instanceID (HH will give some randomness to start with instead of instaneId where the first characters would be i-)
    5. YYYY-MM-DD-HH-log_instanceID

Reference

AWS Simple Storage Service – S3 – Certification

AWS Simple Storage Service – S3 Overview

  • Amazon S3 is a simple key, value object store designed for the Internet
  • S3 provides unlimited storage space and works on the pay as you use model. Service rates gets cheaper as the usage volume increases
  • S3 is an Object level storage (not a Block level storage) and cannot be used to host OS or dynamic websites
  • S3 resources for e.g. buckets and objects are private by default

S3 Buckets & Objects

Buckets

  • A bucket is a container for objects stored in S3 and help organize the S3 namespace.
  • A bucket is owned by the AWS account that creates it and helps identify the account responsible for storage and data transfer charges. Bucket ownership is not transferable
  • S3 bucket names are globally unique, regardless of the AWS region in which you create the bucket
  • Even though S3 is a global service, buckets are created within a region specified during the creation of the bucket
  • Every object is contained in a bucket
  • There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets to store your objects or a single bucket to store all your objects
  • S3 data model is a flat structure i.e. there are no hierarchies or folders within the buckets. However, logical hierarchy can be inferred using the keyname prefix e.g. Folder1/Object1
  • Restrictions
    • 100 buckets (soft limit) can be created in each of AWS account
    • Bucket names should be globally unique and DNS compliant
    • Bucket ownership is not transferable
    • Buckets cannot be nested and cannot have bucket within another bucket
  • You can delete a empty or a non-empty bucket
  • S3 allows retrieval of 1000 objects and provides pagination support

Objects

  • Objects are the fundamental entities stored in S3 bucket
  • Object is uniquely identified within a bucket by a keyname and version ID
  • Objects consist of object data, metadata and others
    • Key is object name
    • Value is data portion is opaque to S3
    • Metadata is the data about the data and is a set of name-value pairs that describe the object for e.g. content-type, size, last modified. Custom metadata can also be specified at the time the object is stored.
    • Version ID is the version id for the object and in combination with the key helps to unique identify an object within a bucket
    • Subresources helps provide additional information for an object
    • Access Control Information helps control access to the objects
  • Metadata for an object cannot be modified after the object is uploaded and it can be only modified by performing the copy operation and setting the metadata
  • Objects belonging to a bucket reside in a specific AWS region never leave that region, unless explicitly copied using Cross Region replication
  • Object can be retrieved as a whole or a partially
  • With Versioning enabled, you can retrieve current as well as pervious versions of an object

Bucket & Object Operations

  • Listing
    • S3 allows listing of all the keys within a bucket
    • A single listing request would return a max of 1000 object keys with pagination support using an indicator in the response to indicate if the response was truncated
    • Keys within a bucket can be listed using Prefix and Delimiter.
    • Prefix limits results to only those keys (kind of filtering) that begin with the specified prefix, and delimiter causes list to roll up all keys that share a common prefix into a single summary list result.
  • Retrieval
    • Object can be retrieved as a whole
    • Object can be retrieved in parts or partially (specific range of bytes) by using the Range HTTP header.
    • Range HTTP header is helpful
      • if only partial object is needed for e.g. multiple files were uploaded as a single archive
      • for fault tolerant downloads where the network connectivity is poor
    • Objects can also be downloaded by sharing Pre-Signed urls
    • Metadata of the object is returned in the response headers
  • Object Uploads
    • Single Operation – Objects of size 5GB can be uploaded in a single PUT operation
    • Multipart upload – can be used for objects of size > 5GB and supports max size of 5TB can is recommended for objects above size 100MB
    • Pre-Signed URLs can also be used shared for uploading objects
    • Uploading object if successful, can be verified if the request received a success response. Additionally, returned ETag can be compared to the calculated MD5 value of the upload object
  • Copying Objects
    • Copying of object up to 5GB can be performed using a single operation and multipart upload can be used for uploads up to 5TB
    • When an object is copied
      • user-controlled system metadata e.g. storage class and user-defined metadata are also copied.
      • system controlled metadata e.g. the creation date etc is reset
    • Copying Objects can be needed
      • Create multiple object copies
      • Copy object across locations
      • Renaming of the objects
      • Change object metadata for e.g. storage class, server-side encryption etc
      • Updating any metadata for an object requires all the metadata fields to be specified again
  • Deleting Objects
    • S3 allows deletion of a single object or multiple objects (max 1000) in a single call
    • For Non Versioned buckets,
      • the object key needs to be provided and object is permanently deleted
    • For Versioned buckets,
      • if an object key is provided, S3 inserts a delete marker and the previous current object becomes non current object
      • if an object key with a version ID is provided, the object is permanently deleted
      • if the version ID is of the delete marker, the delete marker is removed and the previous non current version becomes the current version object
    • Deletion can be MFA enabled for adding extra security
  • Restoring Objects from Glacier
    • Objects must be restored before you can access an archived object
    • Restoration of an Object can take about 3 to 5 hours
    • Restoration request also needs to specify the number of days for which the object copy needs to be maintained.
    • During this period, the storage cost for both the archive and the copy is charged

Pre-Signed URLs

  • All buckets and objects are by default private
  • Pre-signed URLs allows user to be able download or upload a specific object without requiring AWS security credentials or permissions
  • Pre-signed URL allows anyone access to the object identified in the URL, provided the creator of the URL has permissions to access that object
  • Creation of the pre-signed urls requires the creator to provide his security credentials, specify a bucket name, an object key, an HTTP method (GET for download object & PUT of uploading objects), and expiration date and time
  • Pre-signed urls are valid only till the expiration date & time

Multipart Upload

  • Multipart upload allows the user to upload a single object as a set of parts. Each part is a contiguous portion of the object’s data.
  • Multipart uploads supports 1 to 10000 parts and each Part can be from 5MB to 5GB with last part size allowed to be less than 5MB
  • Multipart uploads allows max upload size of 5TB (10000 parts * 5GB/part theoretically)
  • Object parts can be uploaded independently and in any order. If transmission of any part fails, it can be retransmitted without affecting other parts.
  • After all parts of the object are uploaded and complete initiated, S3 assembles these parts and creates the object.
  • Using multipart upload provides the following advantages:
    • Improved throughput – parallel upload of parts to improve throughput
    • Quick recovery from any network issues – Smaller part size minimizes the impact of restarting a failed upload due to a network error.
    • Pause and resume object uploads – Object parts can be uploaded over time. Once a multipart upload is initiated there is no expiry; you must explicitly complete or abort the multipart upload.
    • Begin an upload before the final object size is known – an object can be uploaded as is it being created
  • Three Step process
    • Multipart Upload Initiation
      • Initiation of a Multipart upload request to S3 returns a unique ID for each multipart upload.
      • This ID needs to be provided for each part uploads, completion or abort request and listing of parts call.
      • All the Object metadata required needs to be provided during the Initiation call
    • Parts Upload
      • Parts upload of objects can be performed using the unique upload ID
      • A part number (between 1 – 10000) needs to be specified with each request which identifies each part and its position in the object
      • If a part with the same part number is uploaded, the previous part would be overwritten
      • After the part upload is successful, S3 returns an ETag header in the response which must be recorded along with the part number to be provided during the multipart completion request
    • Multipart Upload Completion or Abort
      • On Multipart Upload Completion request, S3 creates an object by concatenating the parts in ascending order based on the part number and associates the metadata with the object
      • Multipart Upload Completion request should include the unique upload ID with all the parts and the ETag information
      • S3 response includes an ETag that uniquely identifies the combined object data
      • On Multipart upload Abort request, the upload is aborted and all parts are removed. Any new part upload would fail. However, any in progress part upload is completed and hence and abort request must be sent after all the parts upload have been completed
      • S3 should receive a multipart upload completion or abort request else it will not delete the parts and storage would be charged

Virtual Hosted Style vs Path-Style Request

S3 allows the buckets and objects to be referred in Path-style or Virtual hosted-style URLs

Path-style

  • Bucket name is not part of the domain (unless you use a region specific endpoint)
  • the endpoint used must match the region in which the bucket resides
  • for e.g, if you have a bucket called mybucket that resides in the EU (Ireland) region with object named puppy.jpg, the correct path-style syntax URI is http://s3-eu-west-1.amazonaws.com/mybucket/puppy.jpg.
  • A “PermanentRedirect” error is received with an HTTP response code 301, and a message indicating what the correct URI is for the resource if a bucket is accessed outside the US East (N. Virginia) region with path-style syntax that uses either of the following:
    • http://s3.amazonaws.com
    • An endpoint for a region different from the one where the bucket resides. For example, if you use http://s3-eu-west-1.amazonaws.com for a bucket that was created in the US West (N. California) region

Virtual hosted-style

  • S3 supports virtual hosted-style and path-style access in all regions.
  • In a virtual-hosted-style URL, the bucket name is part of the domain name in the URL
  • for e.g. http://bucketname.s3.amazonaws.com/objectname
  • S3 virtual hosting can be used to address a bucket in a REST API call by using the HTTP Host header
  • Benefits
    • attractiveness of customized URLs,
    • provides an ability to publish to the “root directory” of the bucket’s virtual server. This ability can be important because many existing applications search for files in this standard location.
  • S3 updates DNS to reroute the request to the correct location when a bucket is created in any region, which might take time.
  • S3 routes any virtual hosted-style requests to the US East (N.Virginia) region, by default, if the US East (N. Virginia) endpoint s3.amazonaws.com is used, instead of the region-specific endpoint (for example, s3-eu-west-1.amazonaws.com) and S3 redirects it with HTTP 307 redirect to the correct region.
  • When using virtual hosted-style buckets with SSL, the SSL wild card certificate only matches buckets that do not contain periods.To work around this, use HTTP or write your own certificate verification logic.
  • If you make a request to the http://bucket.s3.amazonaws.com endpoint, the DNS has sufficient information to route your request directly to the region where your bucket resides.

S3 Pricing

  • Amazon S3 costs vary by Region
  • Charges in S3 are incurred for
    • Storage – cost is per GB/month
    • Requests – per request cost varies depending on the request type GET, PUT
    • Data Transfer
      • data transfer in is free
      • data transfer out is charged per GB/month (except in the same region or to Amazon CloudFront)

Additional Topics

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon S3 stand for?
    1. Simple Storage Solution.
    2. Storage Storage Storage (triple redundancy Storage).
    3. Storage Server Solution.
    4. Simple Storage Service
  2. What are characteristics of Amazon S3? Choose 2 answers
    1. Objects are directly accessible via a URL
    2. S3 should be used to host a relational database
    3. S3 allows you to store objects or virtually unlimited size
    4. S3 allows you to store virtually unlimited amounts of data
    5. S3 offers Provisioned IOPS
  3. You are building an automated transcription service in which Amazon EC2 worker instances process an uploaded audio file and generate a text file. You must store both of these files in the same durable storage until the text file is retrieved. You do not know what the storage capacity requirements are. Which storage option is both cost-efficient and scalable?
    1. Multiple Amazon EBS volume with snapshots
    2. A single Amazon Glacier vault
    3. A single Amazon S3 bucket
    4. Multiple instance stores
  4. A user wants to upload a complete folder to AWS S3 using the S3 Management console. How can the user perform this activity?
    1. Just drag and drop the folder using the flash tool provided by S3
    2. Use the Enable Enhanced Folder option from the S3 console while uploading objects
    3. The user cannot upload the whole folder in one go with the S3 management console
    4. Use the Enable Enhanced Uploader option from the S3 console while uploading objects
  5. A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    1. Increase your network bandwidth to provide faster throughput to S3
    2. Upload the files in parallel to S3 using mulipart upload
    3. Pack all files into a single archive, upload it to S3, then extract the files in AWS
    4. Use AWS Import/Export to transfer the video files
  6. A company is deploying a two-tier, highly available web application to AWS. Which service provides durable storage for static content while utilizing lower Overall CPU resources for the web tier?
    1. Amazon EBS volume
    2. Amazon S3
    3. Amazon EC2 instance store
    4. Amazon RDS instance
  7. You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?
    1. Enable enhanced networking
    2. Use Amazon S3 multipart upload
    3. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
    4. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance
  8. When you put objects in Amazon S3, what is the indication that an object was successfully stored?
    1. Each S3 account has a special bucket named_s3_logs. Success codes are written to this bucket with a timestamp and checksum.
    2. A success code is inserted into the S3 object metadata.
    3. A HTTP 200 result code and MD5 checksum, taken together, indicate that the operation was successful.
    4. Amazon S3 is engineered for 99.999999999% durability. Therefore there is no need to confirm that data was inserted.
  9. You have private video content in S3 that you want to serve to subscribed users on the Internet. User IDs, credentials, and subscriptions are stored in an Amazon RDS database. Which configuration will allow you to securely serve private content to your users?
    1. Generate pre-signed URLs for each user as they request access to protected S3 content
    2. Create an IAM user for each subscribed user and assign the GetObject permission to each IAM user
    3. Create an S3 bucket policy that limits access to your private content to only your subscribed users’ credentials
    4. Create a CloudFront Origin Identity user for your subscribed users and assign the GetObject permission to this user
  10. You run an ad-supported photo sharing website using S3 to serve photos to visitors of your site. At some point you find out that other sites have been linking to the photos on your site, causing loss to your business. What is an effective method to mitigate this?
    1. Remove public read access and use signed URLs with expiry dates.
    2. Use CloudFront distributions for static content.
    3. Block the IPs of the offending websites in Security Groups.
    4. Store photos on an EBS volume of the web server.
  11. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?
    1. Use multi-part upload.
    2. Add a random prefix to the key names.
    3. Amazon S3 will automatically manage performance at this scale.
    4. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names
  12. What is the maximum number of S3 buckets available per AWS Account?
    1. 100 Per region
    2. There is no Limit
    3. 100 Per Account (Refer documentation)
    4. 500 Per Account
    5. 100 Per IAM User
  13. Your customer needs to create an application to allow contractors to upload videos to Amazon Simple Storage Service (S3) so they can be transcoded into a different format. She creates AWS Identity and Access Management (IAM) users for her application developers, and in just one week, they have the application hosted on a fleet of Amazon Elastic Compute Cloud (EC2) instances. The attached IAM role is assigned to the instances. As expected, a contractor who authenticates to the application is given a pre-signed URL that points to the location for video upload. However, contractors are reporting that they cannot upload their videos. Which of the following are valid reasons for this behavior? Choose 2 answers { “Version”: “2012-10-17”, “Statement”: [ { “Effect”: “Allow”, “Action”: “s3:*”, “Resource”: “*” } ] }
    1. The IAM role does not explicitly grant permission to upload the object. (The role has all permissions for all activities on S3)
    2. The contractorsˈ accounts have not been granted “write” access to the S3 bucket. (using pre-signed urls the contractors account don’t need to have access but only the creator of the pre-signed urls)
    3. The application is not using valid security credentials to generate the pre-signed URL.
    4. The developers do not have access to upload objects to the S3 bucket. (developers are not uploading the objects but its using pre-signed urls)
    5. The S3 bucket still has the associated default permissions. (does not matter as long as the user has permission to upload)
    6. The pre-signed URL has expired.

AWS S3 Subresources – Certification

AWS S3 Subresources

  • Amazon S3 Subresources provides support to store, and manage the bucket configuration information
  • S3 subresources only exist in the context of a specific bucket or object
  • S3 defines a set of subresources associated with buckets and objects.
  • S3 Subresources are subordinates to objects; that is, they do not exist on their own, they are always associated with some other entity, such as an object or a bucket
  • S3 supports various options to configure a bucket for e.g., bucket can be configured for website hosting, configuration added to manage lifecycle of objects in the bucket, and to log all access to the bucket.

Bucket Subresources

Lifecycle

Refer to My Blog Post about S3 Object Lifecycle Management

Static Website hosting

  • S3 can be used for Static Website hosting with Client side scripts.
  • S3 does not support server-side scripting
  • S3, in conjunction with Route 53, supports hosting a website at the root domain which can point to the S3 website endpoint
  • S3 website endpoints do not support https.
  • For S3 website hosting the content should be made publicly readable which can be provided using a bucket policy or an ACL on an object
  • User can configure the index, error document as well as configure the conditional routing of on object name
  • Bucket policy applies only to objects owned by the bucket owner. If your bucket contains objects not owned by the bucket owner, then public READ permission on those objects should be granted using the object ACL.
  • Requester Pays buckets or DevPay buckets do not allow access through the website endpoint. Any request to such a bucket will receive a 403 -Access Denied response

Versioning

Refer to My Blog Post about S3 Object Versioning

Policy & Access Control List (ACL)

Refer to My Blog Post about S3 Permissions

CORS (Cross Origin Resource Sharing)

  • All browsers implement the Same-Origin policy, for security reasons, where the web page from an domain can only request resources from the same domain.
  • CORS allow client web applications loaded in one domain access to the restricted resources to be requested from another domain
  • With CORS support in S3, you can selectively allow cross-origin access to your S3 resources
  • CORS configuration rules identify the origins allowed to access the bucket, the operations (HTTP methods) that would be supported for each origin, and other operation-specific information

Logging

  • Logging, disabled by default, enables tracking access requests to S3 bucket
  • Each access log record provides details about a single access request, such as the requester, bucket name, request time, request action, response status, and error code, if any.
  • Access log information can be useful in security and access audits and also help learn about the customer base and understand the S3 bill
  • S3 periodically collects access log records, consolidates the records in log files, and then uploads log files to a target bucket as log objects.
  • If logging is enabled on multiple source buckets with same target bucket, the target bucket will have access logs for all those source buckets, but each log object will report access log records for a specific source bucket

Tagging

  • S3 provides the tagging subresource to store and manage tags on a bucket
  • Cost allocation tags can be added to the bucket to categorize and track AWS costs
  • AWS can generate a cost allocation report with usage and costs aggregated by the tags applied to the buckets

Location

  • When you create a bucket, AWS region needs to be specified where the S3 bucket will be created
  • S3 stores this information in the location subresource and provides an API for retrieving this information

Notification

  • S3 notification feature enables notifications to be triggered when certain events happen in your bucket
  • Notifications are enabled at Bucket level
  • Notifications can be configured to be filtered by the prefix and suffix of the key name of objects. However, filtering rules cannot be defined with  overlapping prefixes, overlapping suffixes, or prefix and suffix overlapping
  • S3 can publish the following events
    • New Objects created event
      • Can be enabled for PUT, POST or COPY operations
      • You will not receive event notifications from failed operations
    • Object Removal event
      • Can public delete events for object deletion, version object deletion or insertion of delete marker
      • You will not receive event notifications from automatic deletes from lifecycle policies or from failed operations.
    • Reduced Redundancy Storage (RRS) object lost event
      • Can be used to reproduce/recreate the Object
  • S3 can publish events to the following destination
    • Amazon SNS topic
    • Amazon SQS queue
    • AWS Lambda
  • For S3 to be able to publish events to the destination, S3 principal should be granted necessary permissions

Cross Region Replication

  • Cross-region replication is a bucket-level feature that enables automatic, asynchronous copying of objects across buckets in different AWS regions
  • S3 can replicate all or a subset of objects with specific key name prefixes
  • S3 encrypts all data in transit across AWS regions using SSL
  • Object replicas in the destination bucket are exact replicas of the objects in the source bucket with the same key names and the same metadata.
  • Cross Region Replication can be useful for the following scenarios :-
    • Compliance requirement to have data backed up across regions
    • Minimize latency to allow users across geography to access objects
    • Operational reasons compute clusters in two different regions that analyze the same set of objects
  • Requirements
    • source and destination buckets must be versioning-enabled
    • source and destination buckets must be in different AWS regions
    • objects can be replicated from a source bucket to only one destination bucket
    • S3 must have permission to replicate objects from that source bucket to the destination bucket on your behalf.
    • If the source bucket owner also owns the object, the bucket owner has full permissions to replicate the object. If not, the source bucket owner must have permission for the S3 actions s3:GetObjectVersion and s3:GetObjectVersionACL to read the object and object ACL
    • If you are setting up cross-region replication in a cross-account scenario (where the source and destination buckets are owned by different AWS accounts), the source bucket owner must have permission to replicate objects in the destination bucket.
  • Replicated & Not Replicated
    • Any new objects created after you add a replication configuration are replicated
    • S3 does not retroactively replicate objects that existed before you added replication configuration.
    • Only Objects created with SSE-S3 are replicated using server-side encryption using the Amazon S3-managed encryption key.
    • Objects created with server-side encryption using either customer-provided (SSE-C) or AWS KMS–managed encryption (SSE-KMS) keys are not replicated
    • S3 replicates only objects in the source bucket for which the bucket owner has permission to read objects and read ACLs
    • S3 does not replicate objects in the source bucket for which the bucket owner does not have permissions.
    • Any object ACL updates are replicated, although there can be some delay before Amazon S3 can bring the two in sync. This applies only to objects created after you add a replication configuration to the bucket.
    • Updates to bucket-level S3 subresources are not replicated, allowing different bucket configurations on the source and destination buckets
    • Only customer actions are replicated & actions performed by lifecycle configuration are not replicated
    • Objects in the source bucket that are replicas, created by another cross-region replication, are not replicated.

Requester Pays

  • By default, buckets are owned by the AWS account that created it (the bucket owner) and the AWS account pays for storage costs, downloads and data transfer charges associated with the bucket.
  • Using Requester Pays subresource :-
    • Bucket owner specifies that the requester requesting the download will be charged for the download
    • However, the bucket owner still pays the storage costs
  • Enabling Requester Pays on a bucket
    • disables anonymous access to that bucket
    • does not support BitTorrent
    • does not support SOAP requests
    • cannot be enabled for end user logging bucket

Object Subresources

Torrent

  • Default distribution mechanism for S3 data is via client/server download
  • Bucket owner bears the cost of Storage as well as the request and transfer charges which can increase linearly for an popular object
  • S3 also supports the BitTorrent protocol
    • BitTorrent is an open source Internet distribution protocol
    • BitTorrent addresses this problem by recruiting the very clients that are downloading the object as distributors themselves
    • S3 bandwidth rates are inexpensive, but BitTorrent allows developers to further save on bandwidth costs for a popular piece of data by letting users download from Amazon and other users simultaneously
  • Benefit for publisher is that for large, popular files the amount of data actually supplied by S3 can be substantially lower than what it would have been serving the same clients via client/server download
  • Any object in S3 that is publicly available and can be read anonymously can be downloaded via BitTorrent
  • torrent file can be retrieved for any publicly available object by simply adding a “?torrent” query string parameter at the end of the REST GET request for the object
  • Generating the .torrent for an object takes time proportional to the size of that object, so its recommended to make a first torrent request yourself to generate the file so that subsequent requests are faster
  • Torrent are enabled only for objects that are less than 5 GB in size.
  • Torrent subresource can only be retrieve, and cannot be created, updated or deleted

Object ACL

Refer to My Blog Post about S3 Permissions

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An organization’s security policy requires multiple copies of all critical data to be replicated across at least a primary and backup data center. The organization has decided to store some critical data on Amazon S3. Which option should you implement to ensure this requirement is met?
    1. Use the S3 copy API to replicate data between two S3 buckets in different regions
    2. You do not need to implement anything since S3 data is automatically replicated between regions
    3. Use the S3 copy API to replicate data between two S3 buckets in different facilities within an AWS Region
    4. You do not need to implement anything since S3 data is automatically replicated between multiple facilities within an AWS Region
  2. A customer wants to track access to their Amazon Simple Storage Service (S3) buckets and also use this information for their internal security and access audits. Which of the following will meet the Customer requirement?
    1. Enable AWS CloudTrail to audit all Amazon S3 bucket access.
    2. Enable server access logging for all required Amazon S3 buckets
    3. Enable the Requester Pays option to track access via AWS Billing
    4. Enable Amazon S3 event notifications for Put and Post.
  3. A user is enabling a static website hosting on an S3 bucket. Which of the below mentioned parameters cannot be configured by the user?
    1. Error document
    2. Conditional error on object name
    3. Index document
    4. Conditional redirection on object name
  4. Company ABCD is running their corporate website on Amazon S3 accessed from http//www.companyabcd.com. Their marketing team has published new web fonts to a separate S3 bucket accessed by the S3 endpoint: https://s3-us-west1.amazonaws.com/abcdfonts. While testing the new web fonts, Company ABCD recognized the web fonts are being blocked by the browser. What should Company ABCD do to prevent the web fonts from being blocked by the browser?
    1. Enable versioning on the abcdfonts bucket for each web font
    2. Create a policy on the abcdfonts bucket to enable access to everyone
    3. Add the Content-MD5 header to the request for webfonts in the abcdfonts bucket from the website
    4. Configure the abcdfonts bucket to allow cross-origin requests by creating a CORS configuration
  5. Company ABCD is currently hosting their corporate site in an Amazon S3 bucket with Static Website Hosting enabled. Currently, when visitors go to http://www.companyabcd.com the index.html page is returned. Company C now would like a new page welcome.html to be returned when a visitor enters http://www.companyabcd.com in the browser. Which of the following steps will allow Company ABCD to meet this requirement? Choose 2 answers.
    1. Upload an html page named welcome.html to their S3 bucket
    2. Create a welcome subfolder in their S3 bucket
    3. Set the Index Document property to welcome.html
    4. Move the index.html page to a welcome subfolder
    5. Set the Error Document property to welcome.html

AWS S3 Storage Classes – Certification

AWS S3 Storage Classes Overview

  • Amazon S3 storage classes are designed to sustain the concurrent loss of data in one or two facilities
  • S3 storage classes allows lifecycle management for automatic migration of objects for cost savings
  • S3 storage classes support SSL encryption of data in transit and data encryption at rest
  • S3 also regularly verifies the integrity of your data using checksums and provides auto healing capability

AWS S3 Storage Classes Comparision

Standard

  • Storage class is ideal for performance-sensitive use cases and frequently accessed data and is designed to sustain the loss of data in a two facilities
  • STANDARD is the default storage class, if none specified during upload
  • Low latency and high throughput performance
  • Designed for durability of 99.999999999% of objects
  • Designed for 99.99% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability.

Standard IA

  • S3 STANDARD_IA (Infrequent Access) storage class is optimized for long-lived and less frequently accessed data for e.g. backups and older data where access is limited, but the use case still demands high performance
  • STANDARD_IA is designed to sustain the loss of data in a two facilities
  • STANDARD_IA objects are available for real-time access.
  • STANDARD_IA storage class is suitable for larger objects greater than 128 KB (smaller objects are charged for 128KB only) kept for at least 30 days.
  • Same low latency and high throughput performance of Standard
  • Designed for durability of 99.999999999% of objects
  • Designed for 99.9% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability

Reduced Redundancy Storage – RRS

  • Reduced Redundancy Storage (RRS) storage class is designed for noncritical, reproducible data stored at lower levels of redundancy than the STANDARD storage class, which reduces storage costs
  • Designed for durability of 99.99% of objects
  • Designed for 99.99% availability over a given year
  • Lower level of redundancy results in less durability and availability
  • RRS stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive,
  • RRS does not replicate objects as many times as S3 standard storage and is designed to sustain the loss of data in a single facility.
  • If an RRS object is lost, S3 returns a 405 error on requests made to that object
  • S3 can send an event notification, configured on the bucket, to alert a user or start a workflow when it detects that an RRS object is lost which can be used to replace the lost object

Glacier

  • GLACIER storage class is suitable for archiving data where data access is infrequent and retrieval time of several (3-5) hours  is acceptable.
  • GLACIER storage class uses the very low-cost Amazon Glacier storage service, but the objects in this storage class are still managed through S3
  • Designed for durability of 99.999999999% of objects
  • GLACIER cannot be specified as the storage class at the object creation time but has to be transitioned fromSTANDARD, RRS, or STANDARD_IA to GLACIER storage class using lifecycle management.
  • For accessing GLACIER objects,
    • object must be restored which can taken anywhere between 3-5 hours
    • objects are only available for the time period (number of days) specified during the restoration request
    • object’s storage class remains GLACIER
    • charges are levied for both the archive (GLACIER rate) and the copy restored temporarily (RRS rate)
  • Vault Lock feature enforces compliance via a lockable policy

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does RRS stand for when talking about S3?
    1. Redundancy Removal System
    2. Relational Rights Storage
    3. Regional Rights Standard
    4. Reduced Redundancy Storage
  2. What is the durability of S3 RRS?
    1. 99.99%
    2. 99.95%
    3. 99.995%
    4. 99.999999999%
  3. What is the Reduced Redundancy option in Amazon S3?
    1. Less redundancy for a lower cost
    2. It doesn’t exist in Amazon S3, but in Amazon EBS.
    3. It allows you to destroy any copy of your files outside a specific jurisdiction.
    4. It doesn’t exist at all
  4. An application is generating a log file every 5 minutes. The log file is not critical but may be required only for verification in case of some major issue. The file should be accessible over the internet whenever required. Which of the below mentioned options is a best possible storage solution for it?
    1. AWS S3
    2. AWS Glacier
    3. AWS RDS
    4. AWS S3 RRS
  5. A user has moved an object to Glacier using the life cycle rules. The user requests to restore the archive after 6 months. When the restore request is completed the user accesses that archive. Which of the below mentioned statements is not true in this condition?
    1. The archive will be available as an object for the duration specified by the user during the restoration request
    2. The restored object’s storage class will be RRS (After the object is restored the storage class still remains GLACIER. Read more)
    3. The user can modify the restoration period only by issuing a new restore request with the updated period
    4. The user needs to pay storage for both RRS (restored) and Glacier (Archive) Rates
  6. Your department creates regular analytics reports from your company’s log files. All log data is collected in Amazon S3 and processed by daily Amazon Elastic Map Reduce (EMR) jobs that generate daily PDF reports and aggregated tables in CSV format for an Amazon Redshift data warehouse. Your CFO requests that you optimize the cost structure for this system. Which of the following alternatives will lower costs without compromising average performance of the system or data integrity for the raw data? [PROFESSIONAL]
    1. Use reduced redundancy storage (RRS) for PDF and CSV data in Amazon S3. Add Spot instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift. (Spot instances impacts performance)
    2. Use reduced redundancy storage (RRS) for all data in S3. Use a combination of Spot instances and Reserved Instances for Amazon EMR jobs. Use Reserved instances for Amazon Redshift (Combination of the Spot and reserved with guarantee performance and help reduce cost. Also, RRS would reduce cost and guarantee data integrity, which is different from data durability )
    3. Use reduced redundancy storage (RRS) for all data in Amazon S3. Add Spot Instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift (Spot instances impacts performance)
    4. Use reduced redundancy storage (RRS) for PDF and CSV data in S3. Add Spot Instances to EMR jobs. Use Spot Instances for Amazon Redshift. (Spot instances impacts performance)
  7. Which of the below mentioned options can be a good use case for storing content in AWS RRS?
    1. Storing mission critical data Files
    2. Storing infrequently used log files
    3. Storing a video file which is not reproducible
    4. Storing image thumbnails
  8. A newspaper organization has an on-premises application which allows the public to search its back catalogue and retrieve individual newspaper pages via a website written in Java. They have scanned the old newspapers into JPEGs (approx. 17TB) and used Optical Character Recognition (OCR) to populate a commercial search product. The hosting platform and software is now end of life and the organization wants to migrate its archive to AWS and produce a cost efficient architecture and still be designed for availability and durability. Which is the most appropriate? [PROFESSIONAL]
    1. Use S3 with reduced redundancy to store and serve the scanned files, install the commercial search application on EC2 Instances and configure with auto-scaling and an Elastic Load Balancer. (RRS impacts durability and commercial search would add to cost)
    2. Model the environment using CloudFormation. Use an EC2 instance running Apache webserver and an open source search application, stripe multiple standard EBS volumes together to store the JPEGs and search index. (Using EBS is not cost effective for storing files)
    3. Use S3 with standard redundancy to store and serve the scanned files, use CloudSearch for query processing, and use Elastic Beanstalk to host the website across multiple availability zones. (Standard S3 and Elastic Beanstalk provides availability and durability, Standard S3 and CloudSearch provides cost effective storage and search)
    4. Use a single-AZ RDS MySQL instance to store the search index and the JPEG images use an EC2 instance to serve the website and translate user queries into SQL. (RDS is not ideal and cost effective to store files, Single AZ impacts availability)
    5. Use a CloudFront download distribution to serve the JPEGs to the end users and Install the current commercial search product, along with a Java Container for the website on EC2 instances and use Route53 with DNS round-robin. (CloudFront needs a source and using commercial search product is not cost effective)
  9. A research scientist is planning for the one-time launch of an Elastic MapReduce cluster and is encouraged by her manager to minimize the costs. The cluster is designed to ingest 200TB of genomics data with a total of 100 Amazon EC2 instances and is expected to run for around four hours. The resulting data set must be stored temporarily until archived into an Amazon RDS Oracle instance. Which option will help save the most money while meeting requirements? [PROFESSIONAL]
    1. Store ingest and output files in Amazon S3. Deploy on-demand for the master and core nodes and spot for the task nodes.
    2. Optimize by deploying a combination of on-demand, RI and spot-pricing models for the master, core and task nodes. Store ingest and output files in Amazon S3 with a lifecycle policy that archives them to Amazon Glacier. (Master and Core must be RI or On Demand. Cannot be Spot)
    3. Store the ingest files in Amazon S3 RRS and store the output files in S3. Deploy Reserved Instances for the master and core nodes and on-demand for the task nodes. (Need better durability for ingest file. Spot instances can be used for task nodes for cost saving. RI will not provide cost saving in this case)
    4. Deploy on-demand master, core and task nodes and store ingest and output files in Amazon S3 RRS (Input must be in S3 standard)

AWS S3 Data Protection – Certification

AWS S3 Data Protection Overview

  • Amazon S3 provides a S3 data protection using highly durable storage infrastructure designed for mission-critical and primary data storage.
  • Objects are redundantly stored on multiple devices across multiple facilities in an S3 region.
  • Amazon S3 PUT and PUT Object copy operations synchronously store the data across multiple facilities before returning SUCCESS.
  • Once the objects are stored, S3 maintains its durability by quickly detecting and repairing any lost redundancy.
  • S3 also regularly verifies the integrity of data stored using checksums. If Amazon S3 detects data corruption, it is repaired using redundant data.
  • In addition, S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data
  • Data protection against accidental overwrites and deletions can be added by enabling Versioning to preserve, retrieve and restore every version of the object stored
  • S3 also provides the ability to protect data in-transit (as it travels to and from S3) and at rest (while it is stored in S3)

Data Protection

Data in-transit

S2 allows protection of data in-transit by enabling communication via SSL or using client-side encryption

Data at Rest

  • S3 supports both client side encryption and server side encryption for protecting data at rest
  • Using Server-Side Encryption, S3 encrypts the object before saving it on disks in its data centers and decrypt it when the objects are downloaded
  • Using Client-Side Encryption, you can encrypt data client-side and upload the encrypted data to S3. In this case, you manage the encryption process, the encryption keys, and related tools.

Server-Side Encryption

  • Server-side encryption is about data encryption at rest
  • Server-side encryption encrypts only the object data. Any object metadata is not encrypted.
  • S3 handles the encryption (as it writes to disks) and decryption (when you access the objects) of the data objects
  • There is no difference in the access mechanism for both encrypted or unencrypted objects and is handled transparently by S3
Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
  • Each object is encrypted with a unique data key employing strong multi-factor encryption.
  • SSE-S3 encrypts the data key with a master key that is regularly rotated.
  • S3 server-side encryption uses one of the strongest block ciphers available , 256-bit Advanced Encryption Standard (AES-256), to encrypt the data.
  • Whether or not objects are encrypted with SSE-S3 can’t be enforced when they are uploaded using pre-signed URLs, because the only way you can specify server-side encryption is through the AWS Management Console or through an HTTP request header
Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)

Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)

  • SSE-KMS is similar to SSE-S3, but it uses AWS Key management Services (KMS) which provides additional benefits along with additional charges
    • KMS is a service that combines secure, highly available hardware and software to provide a key management system scaled for the cloud.
    • KMS uses customer master keys (CMKs) to encrypt the S3 objects.
    • Master key is never made available
    • KMS enables you to centrally create encryption keys, define the policies that control how keys can be used
    • Allows audit use of key usage to prove they are being used correctly, by inspecting logs in AWS CloudTrail
    • Allows keys to temporarily disabled and re-enabled
    • Allows keys to be rotated regularly
    • Security controls in AWS KMS can help meet encryption-related compliance requirements.
  • SSE-KMS enables separate permissions for the use of an envelope key (that is, a key that protects the data’s encryption key) that provides added protection against unauthorized access of the objects in S3.
  • SSE-KMS provides the option to create and manage encryption keys yourself, or use a default customer master key (CMK) that is unique to you, the service you’re using, and the region you’re working in.
  • Creating and Managing your own CMK gives you more flexibility, including the ability to create, rotate, disable, and define access controls, and to audit the encryption keys used to protect your data.
  • Data keys used to encrypt your data are also encrypted and stored alongside the data they protect and are unique to each object
  • Process flow
    • An application or AWS service client requests an encryption key to encrypt data and passes a reference to a master key under the account.
    • Client requests are authenticated based on whether they have access to use the master key.
    • A new data encryption key is created, and a copy of it is encrypted under the master key.
    • Both the data key and encrypted data key are returned to the client.
    • Data key is used to encrypt customer data and then deleted as soon as is practical.
    • Encrypted data key is stored for later use and sent back to AWS KMS when the source data needs to be decrypted.
Server-Side Encryption with Customer-Provided Keys (SSE-C)

AWS S3 Server Side Encryption using Customer Provided Keys SSE-C

  • Encryption keys can be managed and provided by the Customer and S3 manages the encryption, as it writes to disks, and decryption, when you access the objects
  • When you upload an object, the encryption key is provided as a part of the request and S3 uses that encryption key to apply AES-256 encryption to the data and removes the encryption key from memory.
  • When you download an object, the same encryption key should be provided as a part of the request. S3 first verifies the encryption key and if matches decrypts the object before returning back to you
  • As each object and each object’s version can be encrypted with a different key, you are responsible for maintaining the mapping between the object and the encryption key used.
  • SSE-C request must be done through HTTPS and S3 will reject any requests made over http when using SSE-C.
  • For security considerations, AWS recommends to consider any key sent erroneously using http to be compromised and discarded or rotated
  • S3 does not store the encryption key provided. Instead, it stores a randomly salted HMAC value of the encryption key which can be used to validate future requests. The salted HMAC value cannot be used to derive the value of the encryption key or to decrypt the contents of the encrypted object. That means, if you lose the encryption key, you lose the object.

Client-Side Encryption

Client-side encryption refers to encrypting data before sending it to Amazon S3 and decrypting the data after downloading it

AWS KMS-managed customer master key (CMK)
  • Customer can maintain the encryption CMK with AWS KMS and can provide the CMK id to the client to encrypt the data
  • Uploading Object
    • AWS S3 encryption client first sends a request to AWS KMS for the key to encrypt the object data
    • AWS KMS returns a  randomly generated data encryption key with 2 versions a plain text version for encrypting the data and cipher blob to be uploaded with the object as object metadata
    • Client obtains a unique data encryption key for each object it uploads.
    • AWS S3 encryption client uploads the encrypted data and the cipher blob with object metadata
  • Download Object
    • AWS Client first downloads the encrypted object from Amazon S3 along with the cipher blob version of the data encryption key stored as object metadata.
    • AWS Client then sends the cipher blob to AWS KMS to get the plain text version of the same, so that it can decrypt the object data.
Client-Side master key
  • Encryption master keys are completely maintained at Client-side
  • Uploading Object
    • Amazon S3 encryption client ( for e.g. AmazonS3EncryptionClient in the AWS SDK for Java) locally generates randomly a one-time-use symmetric key (also known as a data encryption key or data key).
    • Client encrypts the data encryption key using the customer provided master key
    • Client uses this dataencryption key to encrypt the data of a single S3 object (for each object, the client generates a separate data key).
    • Client then uploads the encrypted data to Amazon S3 and also saves the encrypted data key and itsmaterial description  as object metadata (x-amz-meta-x-amz-key) in Amazon S3 by default
  • Downloading Object
    • Client first downloads the encrypted object from Amazon S3 along with the object metadata.
    • Using the material description in the metadata, the client first determines which master key to use to decrypt the encrypted data key.
    • Using that master key, the client decrypts the data key and uses it to decrypt the object
  • Client-side master keys and your unencrypted data are never sent to AWS
  • If the master key is lost the data cannot be decrypted

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is storing data on Amazon Simple Storage Service (S3). The company’s security policy mandates that data is encrypted at rest. Which of the following methods can achieve this? Choose 3 answers
    1. Use Amazon S3 server-side encryption with AWS Key Management Service managed keys
    2. Use Amazon S3 server-side encryption with customer-provided keys
    3. Use Amazon S3 server-side encryption with EC2 key pair.
    4. Use Amazon S3 bucket policies to restrict access to the data at rest.
    5. Encrypt the data on the client-side before ingesting to Amazon S3 using their own master key
    6. Use SSL to encrypt the data while in transit to Amazon S3.
  2. A user has enabled versioning on an S3 bucket. The user is using server side encryption for data at Rest. If the user is supplying his own keys for encryption (SSE-C) which of the below mentioned statements is true?
    1. The user should use the same encryption key for all versions of the same object
    2. It is possible to have different encryption keys for different versions of the same object
    3. AWS S3 does not allow the user to upload his own keys for server side encryption
    4. The SSE-C does not work when versioning is enabled
  3. A storage admin wants to encrypt all the objects stored in S3 using server side encryption. The user does not want to use the AES 256 encryption key provided by S3. How can the user achieve this?
    1. The admin should upload his secret key to the AWS console and let S3 decrypt the objects
    2. The admin should use CLI or API to upload the encryption key to the S3 bucket. When making a call to the S3 API mention the encryption key URL in each request
    3. S3 does not support client supplied encryption keys for server side encryption
    4. The admin should send the keys and encryption algorithm with each API call
  4. A user has enabled versioning on an S3 bucket. The user is using server side encryption for data at rest. If the user is supplying his own keys for encryption (SSE-C), what is recommended to the user for the purpose of security?
    1. User should not use his own security key as it is not secure
    2. Configure S3 to rotate the user’s encryption key at regular intervals
    3. Configure S3 to store the user’s keys securely with SSL
    4. Keep rotating the encryption key manually at the client side
  5. A system admin is planning to encrypt all objects being uploaded to S3 from an application. The system admin does not want to implement his own encryption algorithm; instead he is planning to use server side encryption by supplying his own key (SSE-C.. Which parameter is not required while making a call for SSE-C?
    1. x-amz-server-side-encryption-customer-key-AES-256
    2. x-amz-server-side-encryption-customer-key
    3. x-amz-server-side-encryption-customer-algorithm
    4. x-amz-server-side-encryption-customer-key-MD5
  6. A customer is leveraging Amazon Simple Storage Service in eu-west-1 to store static content for a web-based property. The customer is storing objects using the Standard Storage class. Where are the customers objects replicated?
    1. A single facility in eu-west-1 and a single facility in eu-central-1
    2. A single facility in eu-west-1 and a single facility in us-east-1
    3. Multiple facilities in eu-west-1
    4. A single facility in eu-west-1
  7. You are designing a personal document-archiving solution for your global enterprise with thousands of employee. Each employee has potentially gigabytes of data to be backed up in this archiving solution. The solution will be exposed to he employees as an application, where they can just drag and drop their files to the archiving system. Employees can retrieve their archives through a web interface. The corporate network has high bandwidth AWS DirectConnect connectivity to AWS. You have regulatory requirements that all data needs to be encrypted before being uploaded to the cloud. How do you implement this in a highly available and cost efficient way?
    1. Manage encryption keys on-premise in an encrypted relational database. Set up an on-premises server with sufficient storage to temporarily store files and then upload them to Amazon S3, providing a client-side master key. (Storing temporary increases cost and not a high availability option)
    2. Manage encryption keys in a Hardware Security Module(HSM) appliance on-premise server with sufficient storage to temporarily store, encrypt, and upload files directly into amazon Glacier. (Not cost effective)
    3. Manage encryption keys in amazon Key Management Service (KMS), upload to amazon simple storage service (s3) with client-side encryption using a KMS customer master key ID and configure Amazon S3 lifecycle policies to store each object using the amazon glacier storage tier. (with CSE-KMS the encryption happens at client side before the object is upload to S3 and KMS is cost effective as well)
    4. Manage encryption keys in an AWS CloudHSM appliance. Encrypt files prior to uploading on the employee desktop and then upload directly into amazon glacier (Not cost effective)
  8. A user has enabled server side encryption with S3. The user downloads the encrypted object from S3. How can the user decrypt it?
    1. S3 does not support server side encryption
    2. S3 provides a server side key to decrypt the object
    3. The user needs to decrypt the object using their own private key
    4. S3 manages encryption and decryption automatically
  9. When uploading an object, what request header can be explicitly specified in a request to Amazon S3 to encrypt object data when saved on the server side?
    1. x-amz-storage-class
    2. Content-MD5
    3. x-amz-security-token
    4. x-amz-server-side-encryption

AWS S3 Permissions – Certification

S3 Permissions Overview

  • By default, all S3 buckets, objects and related subresources are private
  • User is the AWS Account or the IAM user who access the resource
  • Bucket owner is the AWS account that created a bucket
  • Object owner is the AWS account that uploads the object to a bucket, not owned by the account
  • Only the Resource owner, the AWS account that creates the resource, can access the resource
  • Resource owner can be
    • AWS account that creates the bucket or object owns those resources
    • If an IAM user creates the bucket or object, the AWS account of the IAM user owns the resource
    • If the bucket owner grants cross-account permissions to other AWS account users to upload objects to the buckets, the objects are owned by the AWS account of the user who uploaded the object and not the bucket owner except for the following conditions
      • Bucket owner can deny access to the object, as its still the bucket owner who pays for the object
      • Bucket owner can delete or apply archival rules to the object and perform restoration

S3 Permissions Classification

S3 permissions are classified into Resource based policies and User policies

User policies

  • User based policies use IAM with S3 to control the type of access a user or group of users has to specific parts of an S3 bucket the AWS account owns
  • User based policy is always attached to an User, Group or a Role, anonymous permissions cannot be granted
  • If an AWS account that owns a bucket wants to grant permission to users in its account, it can use either a bucket policy or a user policy

Resource based policies

Bucket policies and access control lists (ACLs) are resource-based because they are attached to the Amazon S3 resources

Screen Shot 2016-03-28 at 5.57.36 PM
Bucket Policy & Bucket ACL

Bucket Policies

  • Bucket policy can be used to grant cross-account access to other AWS accounts or IAM users in other accounts for the bucket and objects in it.
  • Bucket policies provide centralized, access control to buckets and objects based on a variety of conditions, including S3 operations, requesters, resources, and aspects of the request (e.g. IP address)
  • If an AWS account that owns a bucket wants to grant permission to users in its account, it can use either a bucket policy or a user policy
  • Permissions attached to a bucket apply to all of the objects in that bucket created and owned by the bucket owner
  • Policies can either add or deny permissions across all (or a subset) of objects within a bucket
  • Only the bucket owner is allowed to associate a policy with a bucket

Access Control Lists (ACLs)

  • Each bucket and object has an ACL associated with it.
  • An ACL is a list of grants identifying grantee and permission granted
  • ACLs are used to grant basic read/write permissions on resources to other AWS accounts.
  • ACL supports limited permissions set and
    • cannot grant conditional permissions, nor can you explicitly deny permissions
    • cannot be used to grant permissions for bucket subresources
  • Permission can be granted to an AWS account by the email address or the canonical user ID (is just an obfuscated Account Id). If an email address is provided, S3 will still find the canonical user ID for the user and add it to the ACL.
  • It is Recommended to use Canonical user ID as email address would not be supported
  • Bucket ACL
    • Only recommended use case for the bucket ACL is to grant write permission to S3 Log Delivery group to write access log objects to the bucket
    • Bucket ACL will help grant write permission on the bucket to the Log Delivery group if access log delivery is needed to your bucket
    • Only way you can grant necessary permissions to the Log Delivery group is via a bucket ACL
  • Object ACL
    • Object ACLs control only Object-level Permissions
    • Object ACL is the only way to manage permission to an object in the bucket not owned by the bucket owner i.e. If the bucket owner allows cross-account object uploads and if the object owner is different from the bucket owner, the only way for the object owner to grant permissions on the object is through Object ACL
    • If the Bucket and Object is owned by the same AWS account, Bucket policy can be used to manage the permissions
    • If the Object and User is owned by the same AWS account, User policy can be used to manage the permissions

Amazon S3 Request Authorization

When Amazon S3 receives a request, it must evaluate all the user policies, bucket policies and acls to determine whether to authorize or deny the request.

S3 evaluates the policies in 3 context

  • User context is basically the context in which S3 evaluates the User policy that the parent AWS account (context authority) attaches to the user
  • Bucket context is the context in which S3 evaluates the access policies owned by the bucket owner (context authority) to check if the bucket owner has not explicitly denied access to the resource
  • Object context is the context where S3 evaluates policies owned by the Object owner (context authority)

Analogy

  • Consider 3 Parents (AWS Account) A, B and C with Child (IAM User) AA, BA and CA respectively
  • Parent A owns a Toy box (Bucket) with Toy AAA and also allows toys (Objects) to be dropped and picked up
  • Parent A can grant permission (User Policy OR Bucket policy OR both) to his Child AA to access the Toy box and the toys
  • Parent A can grant permissions (Bucket policy) to Parent B (different AWS account) to drop toys into the toys box. Parent B can grant permissions (User policy) to his Child BA to drop Toy BAA
  • Parent B can grant permissions (Object ACL) to Parent A to access Toy BAA
  • Parent A can grant permissions (Bucket Policy) to Parent C to pick up the Toy AAA who in turn can grant permission (User Policy) to his Child CA to access the toy
  • Parent A can grant permission (through IAM Role) to Parent C to pick up the Toy BAA who in turn can grant permission (User Policy) to his Child CA to access the toy

Bucket Operation Authorization

Screen Shot 2016-03-28 at 6.35.36 AM

  1. If the requester is an IAM user, the user must have permission (User Policy) from the parent AWS account to which it belongs
  2. Amazon S3 evaluates a subset of policies owned by the parent account. This subset of policies includes the user policy that the parent account attaches to the user.
  3. If the parent also owns the resource in the request (in this case, the bucket), Amazon S3 also evaluates the corresponding resource policies (bucket policy and bucket ACL) at the same time.
  4. Requester must also have permissions (Bucket Policy or ACL) from the bucket owner to perform a specific bucket operation.
  5. Amazon S3 evaluates a subset of policies owned by the AWS account that owns the bucket. The bucket owner can grant permission by using a bucket policy or bucket ACL.
  6. Note that, if the AWS account that owns the bucket is also the parent account of an IAM user, then it can configure bucket permissions in a user policy or bucket policy or both

Object Operation Authorization

Screen Shot 2016-03-28 at 6.39.54 AM

  1. If the requester is an IAM user, the user must have permission (User Policy) from the parent AWS account to which it belongs.
  2. Amazon S3 evaluates a subset of policies owned by the parent account. This subset of policies includes the user policy that the parent attaches to the user.
  3. If the parent also owns the resource in the request (bucket, object), Amazon S3 evaluates the corresponding resource policies (bucket policy, bucket ACL, and object ACL) at the same time.
  4. If the parent AWS account owns the resource (bucket or object), it can grant resource permissions to its IAM user by using either the user policy or the resource policy.
  5. S3 evaluates policies owned by the AWS account that owns the bucket.
  6. If the AWS account that owns the object in the request is not same as the bucket owner, in the bucket context Amazon S3 checks the policies if the bucket owner has explicitly denied access to the object.
  7. If there is an explicit deny set on the object, Amazon S3 does not authorize the request.
  8. Requester must have permissions from the object owner (Object ACL) to perform a specific object operation.
  9. Amazon S3 evaluates the object ACL.
  10. If bucket and object owners are the same, access to the object can be granted in the bucket policy, which is evaluated at the bucket context.
  11. If the owners are different, the object owners must use an object ACL to grant permissions.
  12. If the AWS account that owns the object is also the parent account to which the IAM user belongs, it can configure object permissions in a user policy, which is evaluated at the user context.

Permission Delegation

  • If an AWS account owns a resource, it can grant those permissions to another AWS account.
  • That account can then delegate those permissions, or a subset of them, to users in the account. This is referred to as permission delegation.
  • But an account that receives permissions from another account cannot delegate permission cross-account to another AWS account.
  • If the Bucket owner wants to grant permission to the Object which does not belong to it to an other AWS account it cannot do it through cross-account permissions and need to define a IAM role which can be assumed by the AWS account to gain access

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which features can be used to restrict access to data in S3? Choose 2 answers
    1. Set an S3 ACL on the bucket or the object.
    2. Create a CloudFront distribution for the bucket.
    3. Set an S3 bucket policy.
    4. Enable IAM Identity Federation
    5. Use S3 Virtual Hosting
  2. Which method can be used to prevent an IP address block from accessing public objects in an S3 bucket?
    1. Create a bucket policy and apply it to the bucket
    2. Create a NACL and attach it to the VPC of the bucket
    3. Create an ACL and apply it to all objects in the bucket
    4. Modify the IAM policies of any users that would access the bucket
  3. A user has granted read/write permission of his S3 bucket using ACL. Which of the below mentioned options is a valid ID to grant permission to other AWS accounts (grantee. using ACL?
    1. IAM User ID
    2. S3 Secure ID
    3. Access ID
    4. Canonical user ID
  4. A root account owner has given full access of his S3 bucket to one of the IAM users using the bucket ACL. When the IAM user logs in to the S3 console, which actions can he perform?
    1. He can just view the content of the bucket
    2. He can do all the operations on the bucket
    3. It is not possible to give access to an IAM user using ACL
    4. The IAM user can perform all operations on the bucket using only API/SDK
  5. A root AWS account owner is trying to understand various options to set the permission to AWS S3. Which of the below mentioned options is not the right option to grant permission for S3?
    1. User Access Policy
    2. S3 Object Policy
    3. S3 Bucket Policy
    4. S3 ACL
  6. A system admin is managing buckets, objects and folders with AWS S3. Which of the below mentioned statements is true and should be taken in consideration by the sysadmin?
    1. Folders support only ACL
    2. Both the object and bucket can have an Access Policy but folder cannot have policy
    3. Folders can have a policy
    4. Both the object and bucket can have ACL but folders cannot have ACL
  7. A user has created an S3 bucket which is not publicly accessible. The bucket is having thirty objects which are also private. If the user wants to make the objects public, how can he configure this with minimal efforts?
    1. User should select all objects from the console and apply a single policy to mark them public
    2. User can write a program which programmatically makes all objects public using S3 SDK
    3. Set the AWS bucket policy which marks all objects as public
    4. Make the bucket ACL as public so it will also mark all objects as public
  8. You need to configure an Amazon S3 bucket to serve static assets for your public-facing web application. Which methods ensure that all objects uploaded to the bucket are set to public read? Choose 2 answers
    1. Set permissions on the object to public read during upload.
    2. Configure the bucket ACL to set all objects to public read.
    3. Configure the bucket policy to set all objects to public read.
    4. Use AWS Identity and Access Management roles to set the bucket to public read.
    5. Amazon S3 objects default to public read, so no action is needed.
  9. Amazon S3 doesn’t automatically give a user who creates _____ permission to perform other actions on that bucket or object.
    1. a file
    2. a bucket or object
    3. a bucket or file
    4. a object or file
  10. A root account owner is trying to understand the S3 bucket ACL. Which of the below mentioned options cannot be used to grant ACL on the object using the authorized predefined group?
    1. Authenticated user group
    2. All users group
    3. Log Delivery Group
    4. Canonical user group
  11. A user is enabling logging on a particular bucket. Which of the below mentioned options may be best suitable to allow access to the log bucket?
    1. Create an IAM policy and allow log access
    2. It is not possible to enable logging on the S3 bucket
    3. Create an IAM Role, which has access to the log bucket
    4. Provide ACL for the logging group
  12. A user is trying to configure access with S3. Which of the following options is not possible to provide access to the S3 bucket / object?
    1. Define the policy for the IAM user
    2. Define the ACL for the object
    3. Define the policy for the object
    4. Define the policy for the bucket
  13. A user is having access to objects of an S3 bucket, which is not owned by him. If he is trying to set the objects of that bucket public, which of the below mentioned options may be a right fit for this action?
    1. Make the bucket public with full access
    2. Define the policy for the bucket
    3. Provide ACL on the object
    4. Create an IAM user with permission
  14. A bucket owner has allowed another account’s IAM users to upload or access objects in his bucket. The IAM user of Account A is trying to access an object created by the IAM user of account B. What will happen in this scenario?
    1. The bucket policy may not be created as S3 will give error due to conflict of Access Rights
    2. It is not possible to give permission to multiple IAM users
    3. AWS S3 will verify proper rights given by the owner of Account A, the bucket owner as well as by the IAM user B to the object
    4. It is not possible that the IAM user of one account accesses objects of the other IAM user

AWS S3 Object Lifecycle Management – Certification

S3 Object Lifecycle Overview

  • S3 Object lifecycle can be managed by using a lifecycle configuration, which defines how S3 manages objects during their lifetime.
  • Lifecycle configuration enables simplification of object lifecycle management, for e.g. moving of less frequently access objects, backup or archival of data for several years or permanent deletion of objects, all transitions can be controlled automatically
  • 1000 lifecycle rules can be configured per bucket
  • S3 Object Lifecycle Management rules applied to an bucket are applicable to all the existing objects in the bucket as well as the ones that will be added anew
  • S3 Object lifecycle management allows 2 types of behavior
    • Transition in which the storage class for the objects change
    • Expiration where the objects are permanently deleted
  • Lifecycle Management can be configured with Versioning, which allows storage of one current object version and zero or more non current object versions
  • Object’s lifecycle management applies to both Non Versioning and Versioning enabled buckets
  • For Non Versioned buckets
    • Transitioning period is considered from the object’s creation date
  • For Versioned buckets,
    • Transitioning period for current object is calculated for the object creation date
    • Transitioning period for non current object is calculated for the date when the object became a noncurrent versioned object
    • S3 uses the number of days since its successor was created as the number of days an object is noncurrent.
  • S3 calculates the time by adding the number of days specified in the rule to the object creation time and rounding the resulting time to the next day midnight UTC. For e.g., if an object was created at 15/1/2016 10:30 AM UTC and you specify 3 days in a transition rule, which results in 18/1/2016 10:30 AM UTC and rounded of to next day midnight time 19/1/2016 00:00 UTC.
  • Lifecycle configuration on MFA-enabled buckets is not supported.

S3 Object Lifecycle Management Rules

  1. STANDARD or REDUCED_REDUNDANCY -> (128 KB & 30 days) -> STANDARD_IA
    • Only objects with size more than 128 KB can be transitioned, as cost benefits for transitioning to STANDARD_IA can be realized only for larger objects
    • Objects must be stored for at least 30 days in the current storage class before being transitioned to the STANDARD_IA, as younger objects are accessed more frequently or deleted sooner than is suitable for STANDARD_IA
  2. STANDARD_IA -> X -> STANDARD or REDUCED_REDUNDANCY
    • Cannot transition from STANDARD_IA to STANDARD or REDUCED_REDUNDANCY
  3. STANDARD or REDUCED_REDUNDANCY or STANDARD_IA -> GLACIER
    • Any Storage class can be transitioned to GLACIER
  4. STANDARD or REDUCED_REDUNDANCY -> (1 day) -> GLACIER
    • Transitioning from Standard or RRS to Glacier can be done in a day
  5. STANDARD_IA -> (30 days) -> GLACIER
    • Transitioning from Standard IA to Glacier can be done only after 30 days or 60 days from the object creation date or non current version date
  6. GLACIER-> X -> STANDARD or REDUCED_REDUNDANCY or STANDARD_IA
    • Transition of objects to the GLACIER storage class is one-way
    • Cannot transition from GLACIER to any other storage class.
  7. GLACIER -> (90 days) -> Permanent Deletion
    • Deleting data that is archived to Glacier is free, if the objects you delete are archived for three months or longer.
    • Amazon S3 charges a prorated early deletion fee, if the object is deleted or overwritten within three months of archiving it.
  8. STANDARD or STANDARD_IA or GLACIER -> X-> REDUCED_REDUNDANCY
    • Cannot transition from any storage class to REDUCED_REDUNDANCY.
  9. Archival of objects to Amazon Glacier by using object lifecycle management is performed asynchronously and there may be a delay between the transition date in the lifecycle configuration rule and the date of the physical transition. However, AWS charges Amazon Glacier prices based on the transition date specified in the rule
  10. For a versioning-enabled bucket
    • Transition and Expiration actions apply to current versions.
    • NoncurrentVersionTransition and NoncurrentVersionExpiration actions apply to noncurrent versions and works similar to the non versioned objects except the time period is from the time the objects became noncurrent
  11. Expiration Rules
    • For Non Versioned bucket
      • Object is permanently deleted
    • For Versioned bucket
      • Expiration is applicable to the Current object only and does not impact any of the non current objects
      • S3 will insert a Delete Marker object with unique id and the previous current object becomes a non current version
      • S3 will not take any action if the Current object is a Delete Marker
      • If the bucket has a single object which is the Delete Marker (referred to as expired object delete marker), S3 removes the Delete Marker
    • For Versioned Suspended bucket
      • S3 will insert a Delete Marker object with version ID null and overwrite the any object with version ID null
  12. When an object reaches the end of its lifetime, Amazon S3 queues it for removal and removes it asynchronously. There may be a delay between the expiration date and the date at which S3 removes an object.You are not charged for storage time associated with an object that has expired.
  13. There are additional cost considerations if you put lifecycle policy to expire objects that have been in STANDARD_IA for less than 30 days, or GLACIER for less than 90 days.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your firm has uploaded a large amount of aerial image data to S3. In the past, in your on-premises environment, you used a dedicated group of servers to oaten process this data and used Rabbit MQ, an open source messaging system, to get job information to the servers. Once processed the data would go to tape and be shipped offsite. Your manager told you to stay with the current design, and leverage AWS archival storage and messaging services to minimize cost. Which is correct?
    1. Use SQS for passing job messages, use Cloud Watch alarms to terminate EC2 worker instances when they become idle. Once data is processed, change the storage class of the S3 objects to Reduced Redundancy Storage (Need to replace On-Premises Tape functionality)
    2. Setup Auto-Scaled workers triggered by queue depth that use spot instances to process messages in SQS. Once data is processed, change the storage class of the S3 objects to Reduced Redundancy Storage (Need to replace On-Premises Tape functionality)
    3. Setup Auto-Scaled workers triggered by queue depth that use spot instances to process messages in SQS. Once data is processed, change the storage class of the S3 objects to Glacier (Glacier suitable for Tape backup)
    4. Use SNS to pass job messages use Cloud Watch alarms to terminate spot worker instances when they become idle. Once data is processed, change the storage class of the S3 object to Glacier.
  2. You have a proprietary data store on-premises that must be backed up daily by dumping the data store contents to a single compressed 50GB file and sending the file to AWS. Your SLAs state that any dump file backed up within the past 7 days can be retrieved within 2 hours. Your compliance department has stated that all data must be held indefinitely. The time required to restore the data store from a backup is approximately 1 hour. Your on-premise network connection is capable of sustaining 1gbps to AWS. Which backup methods to AWS would be most cost-effective while still meeting all of your requirements?
    1. Send the daily backup files to Glacier immediately after being generated (will not meet the RTO)
    2. Transfer the daily backup files to an EBS volume in AWS and take daily snapshots of the volume (Not cost effective)
    3. Transfer the daily backup files to S3 and use appropriate bucket lifecycle policies to send to Glacier (Store in S3 for seven days and then archive to Glacier)
    4. Host the backup files on a Storage Gateway with Gateway-Cached Volumes and take daily snapshots (Not Cost effective as local storage as well as S3 storage)

AWS S3 Object Versioning

S3 Object Versioning

  • S3 Object Versioning can be used to protect from unintended overwrites and deletions
  • Versioning helps to keep multiple variants of an object in the same bucket and can be used to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket.
  • As Versioning maintains multiple copies of the same objects as whole and you accrue charges for multiple versions for e.g. for a 1GB file with 5 copies with minor differences would consume 5GB of S3 storage space and you would be charged for the same.
  • Versioning is not enabled by default and has to be explicitly enabled for each bucket
  • Versioning once enabled, cannot be disabled and can only be suspended
  • Versioning enabled on a bucket applies to all the objects within the bucket
  • Permissions are set at the version level. Each version has its own object owner; an AWS account that creates the object version is the owner. So, you can set different permissions for different versions of the same object.
  • Irrespective of the Versioning, each object in the bucket has a version.
    • For Non Versioned bucket, the version ID for each object is null
    • For Versioned buckets, a unique version ID is assigned to each object
  • With Versioning, version ID forms a key element to define uniqueness of an object within an bucket along with the bucket name and object key
  • Object Retrieval
    • For Non Versioned bucket
      • An Object retrieval always return the only object available
    • For Versional bucket
      • An object retrieval returns the Current object.
      • Non Current object can be retrieved by specifying the version ID.
  • Object Addition
    • For Non Versioned bucket
      • If an object with the same key is uploaded again it overwrites the object
    • For Versioned bucket
      • If an object with the same key is uploaded the new uploaded object becomes the Current version and the previous object becomes the Non current version.
      • A non current versioned object can be retrieved and restored hence protecting against accidental overwrites
  • When an object in a bucket is deleted
    • For Non Versioned bucket
      • An object is permanently deleted and cannot be recovered
    • For Versioned bucket,
      • All versions remain in the bucket and Amazon inserts a delete marker which becomes the Current version
      • A non current versioned object can be retrieved and restored hence protecting against accidental overwrites
      • If a Object with a specific version ID is deleted, a permanent deletion happens and the object cannot be recovered
  • Delete marker
    • Delete Marker object does not have any data or acl associated with it, just the key and the version ID
    • An object retrieval on a bucket with delete marker as the Current version would return a 404
    • Only a DELETE operation is allowed on the Delete Marker object
    • If the Delete marker object is deleted by specifying its version ID, the previous non current version object becomes the current version object
    • If a DELETE request is fired on the Bucket with Delete Marker as the current version, the Delete marker object is not deleted but an Delete Marker is added again
  • Restoring Previous Versions
    • Copy a previous version of the object into the same bucket. Copied object becomes the current version of that object and all object versions are preserved – Recommended as you still keep all the versions
    • Permanently delete the current version of the object. When you delete the current object version, you, in effect, turn the previous version into the current version of that object.
  • Versioning Suspended Bucket
    • Versioning can be suspended to stop accruing new versions of the same object in a bucket
    • Existing objects in your bucket do not change and only future requests behavior changes
    • For each new object addition, a object with version ID null is added.
    • For each object addition with the same key name, the object with the version ID null is overwritten
    • An object retrieval request will always return the current version of the object
    • A DELETE request on the bucket, would permanently delete the version ID null object and inserts a Delete Marker
    • A DELETE request does not delete anything if the bucket does not have an object with version ID null
    • A DELETE request can still be fired with a specific version ID for any previous object with version IDs stored
  • MFA Delete
    • Additional security can be enabled by configuring a bucket to enable MFA (Multi-Factor Authentication) delete
    • MFA Delete can be enabled on a bucket to ensure that data in your bucket cannot be accidentally deleted
    • While the bucket owner, the AWS account that created the bucket (root account), and all authorized IAM users can enable versioning, but only the bucket owner (root account) can enable MFA delete.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which set of Amazon S3 features helps to prevent and recover from accidental data loss?
    1. Object lifecycle and service access logging
    2. Object versioning and Multi-factor authentication
    3. Access controls and server-side encryption
    4. Website hosting and Amazon S3 policies
  2. You use S3 to store critical data for your company Several users within your group currently have full permissions to your S3 buckets. You need to come up with a solution that does not impact your users and also protect against the accidental deletion of objects. Which two options will address this issue? Choose 2 answers
    1. Enable versioning on your S3 Buckets
    2. Configure your S3 Buckets with MFA delete
    3. Create a Bucket policy and only allow read only permissions to all users at the bucket level
    4. Enable object life cycle policies and configure the data older than 3 months to be archived in Glacier
  3. To protect S3 data from both accidental deletion and accidental overwriting, you should
    1. enable S3 versioning on the bucket
    2. access S3 data using only signed URLs
    3. disable S3 delete using an IAM bucket policy
    4. enable S3 Reduced Redundancy Storage
    5. enable Multi-Factor Authentication (MFA) protected access
  4. A user has not enabled versioning on an S3 bucket. What will be the version ID of the object inside that bucket?
    1. 0
    2. There will be no version attached
    3. Null
    4. Blank
  5. A user is trying to find the state of an S3 bucket with respect to versioning. Which of the below mentioned states AWS will not return when queried?
    1. versioning-enabled
    2. versioning-suspended
    3. unversioned
    4. versioned

AWS S3 Data Durability

Sample Exam Question :-
A customer is leveraging Amazon Simple Storage Service in eu-west-1 to store static content for web-based property. The customer is storing objects using the Standard Storage class. Where are the customers’ objects replicated?
  1. Single facility in eu-west-1 and a single facility in eu-central-1
  2. Single facility in eu-west-1 and a single facility in us-east-1
  3. Multiple facilities in eu-west-1
  4. A single facility in eu-west-1
Answer :-
The question targets the S3 Data Durability which mentions the objects are stored redundantly across multiple facilities in the same Amazon S3 region.

Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage. Objects are redundantly stored on multiple devices across multiple facilities in an Amazon S3 region. To help better ensure data durability, Amazon S3 PUT and PUT Object copy operations synchronously store your data across multiple facilities before returning SUCCESS. Once the objects are stored, Amazon S3 maintains their durability by quickly detecting and repairing any lost redundancy.