AWS S3 Storage Classes – Certification

Udemy Braincert-AWS-Certified-SA-Professional-Practice-Exam

AWS S3 Storage Classes Overview

  • Amazon S3 storage classes are designed to sustain the concurrent loss of data in one or two facilities
  • S3 storage classes allows lifecycle management for automatic migration of objects for cost savings
  • S3 storage classes support SSL encryption of data in transit and data encryption at rest
  • S3 also regularly verifies the integrity of your data using checksums and provides auto healing capability

AWS S3 Storage Classes Comparision

Standard

  • Storage class is ideal for performance-sensitive use cases and frequently accessed data and is designed to sustain the loss of data in a two facilities
  • STANDARD is the default storage class, if none specified during upload
  • Low latency and high throughput performance
  • Designed for durability of 99.999999999% of objects
  • Designed for 99.99% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability.

Standard IA

  • S3 STANDARD_IA (Infrequent Access) storage class is optimized for long-lived and less frequently accessed data for e.g. backups and older data where access is limited, but the use case still demands high performance
  • STANDARD_IA is designed to sustain the loss of data in a two facilities
  • STANDARD_IA objects are available for real-time access.
  • STANDARD_IA storage class is suitable for larger objects greater than 128 KB (smaller objects are charged for 128KB only) kept for at least 30 days.
  • Same low latency and high throughput performance of Standard
  • Designed for durability of 99.999999999% of objects
  • Designed for 99.9% availability over a given year
  • Backed with the Amazon S3 Service Level Agreement for availability

Reduced Redundancy Storage – RRS

  • Reduced Redundancy Storage (RRS) storage class is designed for noncritical, reproducible data stored at lower levels of redundancy than the STANDARD storage class, which reduces storage costs
  • Designed for durability of 99.99% of objects
  • Designed for 99.99% availability over a given year
  • Lower level of redundancy results in less durability and availability
  • RRS stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive,
  • RRS does not replicate objects as many times as S3 standard storage and is designed to sustain the loss of data in a single facility.
  • If an RRS object is lost, S3 returns a 405 error on requests made to that object
  • S3 can send an event notification, configured on the bucket, to alert a user or start a workflow when it detects that an RRS object is lost which can be used to replace the lost object

Glacier

  • GLACIER storage class is suitable for archiving data where data access is infrequent and retrieval time of several (3-5) hours  is acceptable.
  • GLACIER storage class uses the very low-cost Amazon Glacier storage service, but the objects in this storage class are still managed through S3
  • Designed for durability of 99.999999999% of objects
  • GLACIER cannot be specified as the storage class at the object creation time but has to be transitioned fromSTANDARD, RRS, or STANDARD_IA to GLACIER storage class using lifecycle management.
  • For accessing GLACIER objects,
    • object must be restored which can taken anywhere between 3-5 hours
    • objects are only available for the time period (number of days) specified during the restoration request
    • object’s storage class remains GLACIER
    • charges are levied for both the archive (GLACIER rate) and the copy restored temporarily (RRS rate)
  • Vault Lock feature enforces compliance via a lockable policy

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does RRS stand for when talking about S3?
    1. Redundancy Removal System
    2. Relational Rights Storage
    3. Regional Rights Standard
    4. Reduced Redundancy Storage
  2. What is the durability of S3 RRS?
    1. 99.99%
    2. 99.95%
    3. 99.995%
    4. 99.999999999%
  3. What is the Reduced Redundancy option in Amazon S3?
    1. Less redundancy for a lower cost
    2. It doesn’t exist in Amazon S3, but in Amazon EBS.
    3. It allows you to destroy any copy of your files outside a specific jurisdiction.
    4. It doesn’t exist at all
  4. An application is generating a log file every 5 minutes. The log file is not critical but may be required only for verification in case of some major issue. The file should be accessible over the internet whenever required. Which of the below mentioned options is a best possible storage solution for it?
    1. AWS S3
    2. AWS Glacier
    3. AWS RDS
    4. AWS S3 RRS
  5. A user has moved an object to Glacier using the life cycle rules. The user requests to restore the archive after 6 months. When the restore request is completed the user accesses that archive. Which of the below mentioned statements is not true in this condition?
    1. The archive will be available as an object for the duration specified by the user during the restoration request
    2. The restored object’s storage class will be RRS (After the object is restored the storage class still remains GLACIER. Read more)
    3. The user can modify the restoration period only by issuing a new restore request with the updated period
    4. The user needs to pay storage for both RRS (restored) and Glacier (Archive) Rates
  6. Your department creates regular analytics reports from your company’s log files. All log data is collected in Amazon S3 and processed by daily Amazon Elastic Map Reduce (EMR) jobs that generate daily PDF reports and aggregated tables in CSV format for an Amazon Redshift data warehouse. Your CFO requests that you optimize the cost structure for this system. Which of the following alternatives will lower costs without compromising average performance of the system or data integrity for the raw data? [PROFESSIONAL]
    1. Use reduced redundancy storage (RRS) for PDF and CSV data in Amazon S3. Add Spot instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift. (Spot instances impacts performance)
    2. Use reduced redundancy storage (RRS) for all data in S3. Use a combination of Spot instances and Reserved Instances for Amazon EMR jobs. Use Reserved instances for Amazon Redshift (Combination of the Spot and reserved with guarantee performance and help reduce cost. Also, RRS would reduce cost and guarantee data integrity, which is different from data durability )
    3. Use reduced redundancy storage (RRS) for all data in Amazon S3. Add Spot Instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift (Spot instances impacts performance)
    4. Use reduced redundancy storage (RRS) for PDF and CSV data in S3. Add Spot Instances to EMR jobs. Use Spot Instances for Amazon Redshift. (Spot instances impacts performance)
  7. Which of the below mentioned options can be a good use case for storing content in AWS RRS?
    1. Storing mission critical data Files
    2. Storing infrequently used log files
    3. Storing a video file which is not reproducible
    4. Storing image thumbnails
  8. A newspaper organization has an on-premises application which allows the public to search its back catalogue and retrieve individual newspaper pages via a website written in Java. They have scanned the old newspapers into JPEGs (approx. 17TB) and used Optical Character Recognition (OCR) to populate a commercial search product. The hosting platform and software is now end of life and the organization wants to migrate its archive to AWS and produce a cost efficient architecture and still be designed for availability and durability. Which is the most appropriate? [PROFESSIONAL]
    1. Use S3 with reduced redundancy to store and serve the scanned files, install the commercial search application on EC2 Instances and configure with auto-scaling and an Elastic Load Balancer. (RRS impacts durability and commercial search would add to cost)
    2. Model the environment using CloudFormation. Use an EC2 instance running Apache webserver and an open source search application, stripe multiple standard EBS volumes together to store the JPEGs and search index. (Using EBS is not cost effective for storing files)
    3. Use S3 with standard redundancy to store and serve the scanned files, use CloudSearch for query processing, and use Elastic Beanstalk to host the website across multiple availability zones. (Standard S3 and Elastic Beanstalk provides availability and durability, Standard S3 and CloudSearch provides cost effective storage and search)
    4. Use a single-AZ RDS MySQL instance to store the search index and the JPEG images use an EC2 instance to serve the website and translate user queries into SQL. (RDS is not ideal and cost effective to store files, Single AZ impacts availability)
    5. Use a CloudFront download distribution to serve the JPEGs to the end users and Install the current commercial search product, along with a Java Container for the website on EC2 instances and use Route53 with DNS round-robin. (CloudFront needs a source and using commercial search product is not cost effective)
  9. A research scientist is planning for the one-time launch of an Elastic MapReduce cluster and is encouraged by her manager to minimize the costs. The cluster is designed to ingest 200TB of genomics data with a total of 100 Amazon EC2 instances and is expected to run for around four hours. The resulting data set must be stored temporarily until archived into an Amazon RDS Oracle instance. Which option will help save the most money while meeting requirements? [PROFESSIONAL]
    1. Store ingest and output files in Amazon S3. Deploy on-demand for the master and core nodes and spot for the task nodes.
    2. Optimize by deploying a combination of on-demand, RI and spot-pricing models for the master, core and task nodes. Store ingest and output files in Amazon S3 with a lifecycle policy that archives them to Amazon Glacier. (Master and Core must be RI or On Demand. Cannot be Spot)
    3. Store the ingest files in Amazon S3 RRS and store the output files in S3. Deploy Reserved Instances for the master and core nodes and on-demand for the task nodes. (Need better durability for ingest file. Spot instances can be used for task nodes for cost saving. RI will not provide cost saving in this case)
    4. Deploy on-demand master, core and task nodes and store ingest and output files in Amazon S3 RRS (Input must be in S3 standard)

4 thoughts on “AWS S3 Storage Classes – Certification

Leave a Reply

Your email address will not be published. Required fields are marked *