S3 STANDARD_IA (Infrequent Access) storage class is optimized for long-lived and less frequently accessed data for e.g. backups and older data where access is limited, but the use case still demands high performance
STANDARD_IA is designed to sustain the loss of data in a two facilities
STANDARD_IA objects are available for real-time access.
STANDARD_IA storage class is suitable for larger objects greater than 128 KB (smaller objects are charged for 128KB only) kept for at least 30 days.
Same low latency and high throughput performance of Standard
Designed for durability of 99.999999999% of objects
Reduced Redundancy Storage (RRS) storage class is designed for noncritical, reproducible data stored at lower levels of redundancy than the STANDARD storage class, which reduces storage costs
Designed for durability of 99.99% of objects
Designed for 99.99% availability over a given year
Lower level of redundancy results in less durability and availability
RRS stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive,
RRS does not replicate objects as many times as S3 standard storage and is designed to sustain the loss of data in a single facility.
If an RRS object is lost, S3 returns a 405 error on requests made to that object
S3 can send an event notification, configured on the bucket, to alert a user or start a workflow when it detects that an RRS object is lost which can be used to replace the lost object
GLACIER storage class is suitable for archiving data where data access is infrequent and retrieval time of several (3-5) hours is acceptable.
GLACIER storage class uses the very low-cost Amazon Glacier storage service, but the objects in this storage class are still managed through S3
Designed for durability of 99.999999999% of objects
GLACIER cannot be specified as the storage class at the object creation time but has to be transitioned fromSTANDARD, RRS, or STANDARD_IA to GLACIER storage class using lifecycle management.
For accessing GLACIER objects,
object must be restored which can taken anywhere between 3-5 hours
objects are only available for the time period (number of days) specified during the restoration request
object’s storage class remains GLACIER
charges are levied for both the archive (GLACIER rate) and the copy restored temporarily (RRS rate)
Vault Lock feature enforces compliance via a lockable policy
AWS Certification Exam Practice Questions
Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
Open to further feedback, discussion and correction.
What does RRS stand for when talking about S3?
Redundancy Removal System
Relational Rights Storage
Regional Rights Standard
Reduced Redundancy Storage
What is the durability of S3 RRS?
What is the Reduced Redundancy option in Amazon S3?
Less redundancy for a lower cost
It doesn’t exist in Amazon S3, but in Amazon EBS.
It allows you to destroy any copy of your files outside a specific jurisdiction.
It doesn’t exist at all
An application is generating a log file every 5 minutes. The log file is not critical but may be required only for verification in case of some major issue. The file should be accessible over the internet whenever required. Which of the below mentioned options is a best possible storage solution for it?
AWS S3 RRS
A user has moved an object to Glacier using the life cycle rules. The user requests to restore the archive after 6 months. When the restore request is completed the user accesses that archive. Which of the below mentioned statements is not true in this condition?
The archive will be available as an object for the duration specified by the user during the restoration request
The restored object’s storage class will be RRS (After the object is restored the storage class still remains GLACIER. Read more)
The user can modify the restoration period only by issuing a new restore request with the updated period
The user needs to pay storage for both RRS (restored) and Glacier (Archive) Rates
Your department creates regular analytics reports from your company’s log files. All log data is collected in Amazon S3 and processed by daily Amazon Elastic Map Reduce (EMR) jobs that generate daily PDF reports and aggregated tables in CSV format for an Amazon Redshift data warehouse. Your CFO requests that you optimize the cost structure for this system. Which of the following alternatives will lower costs without compromising average performance of the system or data integrity for the raw data? [PROFESSIONAL]
Use reduced redundancy storage (RRS) for PDF and CSV data in Amazon S3. Add Spot instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift. (Spot instances impacts performance)
Use reduced redundancy storage (RRS) for all data in S3. Use a combination of Spot instances and Reserved Instances for Amazon EMR jobs. Use Reserved instances for Amazon Redshift (Combination of the Spot and reserved with guarantee performance and help reduce cost. Also, RRS would reduce cost and guarantee data integrity, which is different from data durability )
Use reduced redundancy storage (RRS) for all data in Amazon S3. Add Spot Instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift (Spot instances impacts performance)
Use reduced redundancy storage (RRS) for PDF and CSV data in S3. Add Spot Instances to EMR jobs. Use Spot Instances for Amazon Redshift. (Spot instances impacts performance)
Which of the below mentioned options can be a good use case for storing content in AWS RRS?
Storing mission critical data Files
Storing infrequently used log files
Storing a video file which is not reproducible
Storing image thumbnails
A newspaper organization has an on-premises application which allows the public to search its back catalogue and retrieve individual newspaper pages via a website written in Java. They have scanned the old newspapers into JPEGs (approx. 17TB) and used Optical Character Recognition (OCR) to populate a commercial search product. The hosting platform and software is now end of life and the organization wants to migrate its archive to AWS and produce a cost efficient architecture and still be designed for availability and durability. Which is the most appropriate? [PROFESSIONAL]
Use S3 with reduced redundancy to store and serve the scanned files, install the commercial search application on EC2 Instances and configure with auto-scaling and an Elastic Load Balancer. (RRS impacts durability and commercial search would add to cost)
Model the environment using CloudFormation. Use an EC2 instance running Apache webserver and an open source search application, stripe multiple standard EBS volumes together to store the JPEGs and search index. (Using EBS is not cost effective for storing files)
Use S3 with standard redundancy to store and serve the scanned files, use CloudSearch for query processing, and use Elastic Beanstalk to host the website across multiple availability zones. (Standard S3 and Elastic Beanstalk provides availability and durability, Standard S3 and CloudSearch provides cost effective storage and search)
Use a single-AZ RDS MySQL instance to store the search index and the JPEG images use an EC2 instance to serve the website and translate user queries into SQL. (RDS is not ideal and cost effective to store files, Single AZ impacts availability)
Use a CloudFront download distribution to serve the JPEGs to the end users and Install the current commercial search product, along with a Java Container for the website on EC2 instances and use Route53 with DNS round-robin. (CloudFront needs a source and using commercial search product is not cost effective)
A research scientist is planning for the one-time launch of an Elastic MapReduce cluster and is encouraged by her manager to minimize the costs. The cluster is designed to ingest 200TB of genomics data with a total of 100 Amazon EC2 instances and is expected to run for around four hours. The resulting data set must be stored temporarily until archived into an Amazon RDS Oracle instance. Which option will help save the most money while meeting requirements? [PROFESSIONAL]
Store ingest and output files in Amazon S3. Deploy on-demand for the master and core nodes and spot for the task nodes.
Optimize by deploying a combination of on-demand, RI and spot-pricing models for the master, core and task nodes. Store ingest and output files in Amazon S3 with a lifecycle policy that archives them to Amazon Glacier. (Master and Core must be RI or On Demand. Cannot be Spot)
Store the ingest files in Amazon S3 RRS and store the output files in S3. Deploy Reserved Instances for the master and core nodes and on-demand for the task nodes. (Need better durability for ingest file. Spot instances can be used for task nodes for cost saving. RI will not provide cost saving in this case)
Deploy on-demand master, core and task nodes and store ingest and output files in Amazon S3 RRS (Input must be in S3 standard)