AWS Tags – Resource Groups – Tag Editor

AWS Tags

  • Tags are key/value pairs that can be attached to AWS resources
  • Tags are metadata: that means that they don’t actually do anything, they’re purely for labeling purposes and helps to organize AWS resources
  • Tagging allows the user to assign her own (words/phrases/labels) metadata to each resource in the form of tags.
  • Tags don’t have any semantic meaning to the resources it is assigned and are interpreted strictly as a string of characters
  • Tags can
    • help to manage AWS resources & services for e.g. instances, images, security groups, etc.
    • help categorize AWS resources in different ways, for e.g., by purpose, owner (Developer, Finance, etc), or environment (DEV, TEST, PROD, etc).
    • help search and filter the resources
    • be used as a mechanism to organize resource costs on the cost allocation report.
  • Tags are not automatically assigned to the resources, however, are (sometimes) inherited for e.g. services such as Auto Scaling, Elastic Beanstalk, and CloudFormation can create other resources, such as RDS or EC2 instances, and usually tag that resource with a reference to itself. These tags do count toward the total tag limit for a resource
  • Tags can be defined using the
    • AWS Management Console,
    • AWS CLI
    • Amazon API.
  • Tags can be assigned only to resources that already exist and cannot be assigned when you create a resource; for e.g., when you use the run-instances AWS CLI command.
  • However, when using the AWS Management console, some resource creation screens enable you to specify tags that are applied immediately after the resource is created.
  • Each tag consists of a key and value
    • key and an optional value, both of which are user-controlled
    • defining a new tag that has the same key as an existing tag on that resource, the new value overwrites the old value.
    • keys and values can be edited, removed from a resource at any time.
    • value can be defined as an empty string, but can’t be set to null.
  • IAM allows you the ability to control which users in the AWS account have permission to create, edit, or delete tags.
  • Common examples of tags are Environment, Application, Owner, Cost Center, Purpose, Stack, etc.

Tags Restriction

  • Maximum number of tags per resource – 50
  • Maximum key length – 128 Unicode characters in UTF-8
  • Maximum value length – 256 Unicode characters in UTF-8
  • Tag keys and values are case-sensitive.
  • Do not use the aws: prefix in the tag names or values because it is reserved for AWS use. Tags with this prefix can’t be edited or deleted and they do not count against the tags per resource limit.
  • Tags allowed characters are: letters, spaces, and numbers representable in UTF-8, plus the following special characters: + – = . _ : / @.

Tagging Strategy

  • AWS does not enforce any tagging naming conventions and can be used as per the user convenience
  • As the number of tags allows per resource are limited, Complex Tagging can be used for e.g. keyName = value1|value2|value3 or keyName = key1|value1;key2|value2

EC2 Resources Tags

  • For tags on EC2 instances, instances can’t terminate, stop, or delete a resource based solely on its tags; the resource identifier must be specified
  • Public or shared resources can be tagged, but the tags assigned are available only to the AWS account and not to the other accounts sharing the resource.
  • Almost all resources can be tagged, with some can only be tagged using API actions or the command line or during creation.

Cost Allocation Tags

  • Tags can be used as a mechanism to organize the resource costs on the cost allocation report.
  • Cost allocation tags can be used to categorize and track AWS costs.
  • When tags are applied to AWS resources such as EC2 instances or S3 buckets and activated in the billing console, AWS generates a cost allocation report as a (CSV file) with the usage and costs aggregated by active tags.
  • Tags can be applied so that they represent business categories (such as cost centers, application names, or owners) to organize costs across multiple services.
  • Cost allocation report includes all of the AWS costs for each billing period and includes both tagged and untagged resources
  • Tags can also be used to filter views in Cost Explorer

Access Control Tags

Resource Groups

  • A Resource Group is a collection of resources that share one or more tags
  • Resource groups help combine information for multiple resources and services on a single screen for e.g. for a Dev tag there might be multiple resources for ELB, EC2, and RDS. Using Resource Groups all the resources and their status can be views on a single page

Tag Editor

  • Tag Editor allows the addition of tags to multiple resources at once
  • Tag Editor allows searching of resources using tags and then add, edit, remove tags for these resources

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Fill in the blanks: _________ let you categorize your EC2 resources in different ways, for example, by purpose, owner, or environment.
    1. Wildcards
    2. Pointers
    3. Tags
    4. Special filters
  2. Please select the Amazon EC2 resource, which can be tagged.
    1. Key pairs
    2. Elastic IP addresses
    3. Placement groups
    4. Amazon EBS snapshots
  3. Can the string value of ‘Key’ be prefixed with aws:?
    1. No
    2. Only for EC2 not S3
    3. Yes
    4. Only for S3 not EC
  4. What is the maximum key length of a tag?
    1. 512 Unicode characters
    2. 64 Unicode characters
    3. 256 Unicode characters
    4. 128 Unicode characters
  5. An organization has launched 5 instances: 2 for production and 3 for testing. The organization wants that one particular group of IAM users should only access the test instances and not the production ones. How can the organization set that as a part of the policy?
    1. Launch the test and production instances in separate regions and allow region wise access to the group (possible using location constraint condition but not flexible)
    2. Define the IAM policy which allows access based on the instance ID (not flexible as it would change)
    3. Create an IAM policy with a condition which allows access to only small instances (not flexible as it would change)
    4. Define the tags on the test and production servers and add a condition to the IAM policy which allows access to specific tags (possible using ResourceTag condition)
  6. A user has launched multiple EC2 instances for the purpose of development and testing in the same region. The user wants to find the separate cost for the production and development instances. How can the user find the cost distribution?
    1. The user should download the activity report of the EC2 services as it has the instance ID wise data
    2. It is not possible to get the AWS cost usage data of single region instances separately
    3. User should use Cost Distribution Metadata and AWS detailed billing
    4. User should use Cost Allocation Tags and AWS billing reports
  7. An organization is using cost allocation tags to find the cost distribution of different departments and projects. One of the instances has two separate tags with the key/value as “InstanceName/HR”, “CostCenter/HR”. What will AWS do in this case?
    1. InstanceName is a reserved tag for AWS. Thus, AWS will not allow this tag
    2. AWS will not allow the tags as the value is the same for different keys
    3. AWS will allow tags but will not show correctly in the cost allocation report due to the same value of the two separate keys
    4. AWS will allow both the tags and show properly in the cost distribution report
  8. A user is launching an instance. He is on the “Tag the instance” screen. Which of the below mentioned information will not help the user understand the functionality of an AWS tag?
    1. Each tag will have a key and value
    2. The user can apply tags to the S3 bucket
    3. The maximum value of the tag key length is 64 unicode characters
    4. AWS tags are used to find the cost distribution of various resources
  9. Your system recently experienced down time during the troubleshooting process. You found that a new administrator mistakenly terminated several production EC2 instances. Which of the following strategies will help prevent a similar situation in the future? The administrator still must be able to:- launch, start stop, and terminate development resources. – launch and start production instances.
    1. Create an IAM user, which is not allowed to terminate instances by leveraging production EC2 termination protection. (EC2 termination protection is enabled on EC2 instance)
    2. Leverage resource based tagging along with an IAM user, which can prevent specific users from terminating production EC2 resources. (Identify production resources using tags and add explicit deny)
    3. Leverage EC2 termination protection and multi-factor authentication, which together require users to authenticate before terminating EC2 instances. (Does not still prevent user from terminating instance)
    4. Create an IAM user and apply an IAM role, which prevents users from terminating production EC2 instances. (Role is not applied to User but assumed by the User also need a way to identify production EC2 instances)
  10. Your manager has requested you to tag EC2 instances to organize and manage a load balancer. Which of the following statements about tag restrictions is incorrect?
    1. The maximum key length is 127 Unicode characters.
    2. The maximum value length is 255 Unicode characters.
    3. Tag keys and values are case sensitive.
    4. The maximum number of tags per load balancer is 20. (50 is the limit)
  11. What is the maximum number of tags that a user can assign to an EC2 instance?
    1. 50
    2. 10
    3. 5
    4. 25

 

CloudWatch Monitoring Supported AWS Services

CloudWatch Monitoring Supported AWS Services

  • CloudWatch offers either basic or detailed monitoring for supported AWS services.
  • Basic monitoring means that a service sends data points to CloudWatch every five minutes.
  • Detailed monitoring means that a service sends data points to CloudWatch every minute.
  • If the AWS service supports both basic and detailed monitoring, the basic would be enabled by default and the detailed monitoring needs to be enabled for details metrics

AWS Services with Monitoring support

  • Auto Scaling
    • By default, basic monitoring is enabled when the launch configuration is created using the AWS Management Console, and detailed monitoring is enabled when the launch configuration is created using the AWS CLI or an API
    • Auto Scaling sends data to CloudWatch every 5 minutes by default when created from Console.
    • For an additional charge, you can enable detailed monitoring for Auto Scaling, which sends data to CloudWatch every minute.
  • Amazon CloudFront
    • Amazon CloudFront sends data to CloudWatch every minute by default.
  • Amazon CloudSearch
    • Amazon CloudSearch sends data to CloudWatch every minute by default.
  • Amazon CloudWatch Events
    • Amazon CloudWatch Events sends data to CloudWatch every minute by default.
  • Amazon CloudWatch Logs
    • Amazon CloudWatch Logs sends data to CloudWatch every minute by default.
  • Amazon DynamoDB
    • Amazon DynamoDB sends data to CloudWatch every minute for some metrics and every 5 minutes for other metrics.
  • Amazon EC2 Container Service
    • Amazon EC2 Container Service sends data to CloudWatch every minute.
  • Amazon ElastiCache
    • Amazon ElastiCache sends data to CloudWatch every minute.
  • Amazon Elastic Block Store
    • Amazon Elastic Block Store sends data to CloudWatch every 5 minutes.
    • Provisioned IOPS SSD (io1) volumes automatically send one-minute metrics to CloudWatch.
  • Amazon Elastic Compute Cloud
    • Amazon EC2 sends data to CloudWatch every 5 minutes by default. For an additional charge, you can enable detailed monitoring for Amazon EC2, which sends data to CloudWatch every minute.
  • Elastic Load Balancing
    • Elastic Load Balancing sends data to CloudWatch every minute.
  • Amazon Elastic MapReduce
    • Amazon Elastic MapReduce sends data to CloudWatch every 5 minutes.
  • Amazon Elasticsearch Service
    • Amazon Elasticsearch Service sends data to CloudWatch every minute.
  • Amazon Kinesis Streams
    • Amazon Kinesis Streams sends data to CloudWatch every minute.
  • Amazon Kinesis Firehose
    • Amazon Kinesis Firehose sends data to CloudWatch every minute.
  • AWS Lambda
    • AWS Lambda sends data to CloudWatch every minute.
  • Amazon Machine Learning
    • Amazon Machine Learning sends data to CloudWatch every 5 minutes.
  • AWS OpsWorks
    • AWS OpsWorks sends data to CloudWatch every minute.
  • Amazon Redshift
    • Amazon Redshift sends data to CloudWatch every minute.
  • Amazon Relational Database Service
    • Amazon Relational Database Service sends data to CloudWatch every minute.
  • Amazon Route 53
    • Amazon Route 53 sends data to CloudWatch every minute.
  • Amazon Simple Notification Service
    • Amazon Simple Notification Service sends data to CloudWatch every 5 minutes.
  • Amazon Simple Queue Service
    • Amazon Simple Queue Service sends data to CloudWatch every 5 minutes.
  • Amazon Simple Storage Service
    • Amazon Simple Storage Service sends data to CloudWatch once a day.
  • Amazon Simple Workflow Service
    • Amazon Simple Workflow Service sends data to CloudWatch every 5 minutes.
  • AWS Storage Gateway
    • AWS Storage Gateway sends data to CloudWatch every 5 minutes.
  • AWS WAF
    • AWS WAF sends data to CloudWatch every minute.
  • Amazon WorkSpaces
    • Amazon WorkSpaces sends data to CloudWatch every 5 minutes.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What is the minimum time Interval for the data that Amazon CloudWatch receives and aggregates?
    1. One second
    2. Five seconds
    3. One minute
    4. Three minutes
    5. Five minutes
  2. In the ‘Detailed’ monitoring data available for your Amazon EBS volumes, Provisioned IOPS volumes automatically send _____ minute metrics to Amazon CloudWatch.
    1. 3
    2. 1
    3. 5
    4. 2
  3. Using Amazon CloudWatch’s Free Tier, what is the frequency of metric updates, which you receive?
    1. 5 minutes
    2. 500 milliseconds.
    3. 30 seconds
    4. 1 minute
  4. What is the type of monitoring data (for Amazon EBS volumes) which is available automatically in 5-minute periods at no charge called?
    1. Basic
    2. Primary
    3. Detailed
    4. Local
  5. A user has created an Auto Scaling group using CLI. The user wants to enable CloudWatch detailed monitoring for that group. How can the user configure this?
    1. When the user sets an alarm on the Auto Scaling group, it automatically enables detail monitoring
    2. By default detailed monitoring is enabled for Auto Scaling (Detailed monitoring is enabled when you create the launch configuration using the AWS CLI or an API)
    3. Auto Scaling does not support detailed monitoring
    4. Enable detail monitoring from the AWS console
  6. A user is trying to understand the detailed CloudWatch monitoring concept. Which of the below mentioned services provides detailed monitoring with CloudWatch without charging the user extra?
    1. AWS Auto Scaling
    2. AWS Route 53
    3. AWS EMR
    4. AWS SNS
  7. A user is trying to understand the detailed CloudWatch monitoring concept. Which of the below mentioned services does not provide detailed monitoring with CloudWatch?
    1. AWS EMR
    2. AWS RDS
    3. AWS ELB
    4. AWS Route53
  8. A user has enabled detailed CloudWatch monitoring with the AWS Simple Notification Service. Which of the below mentioned statements helps the user understand detailed monitoring better?
    1. SNS will send data every minute after configuration
    2. There is no need to enable since SNS provides data every minute
    3. AWS CloudWatch does not support monitoring for SNS
    4. SNS cannot provide data every minute
  9. A user has configured an Auto Scaling group with ELB. The user has enabled detailed CloudWatch monitoring on Auto Scaling. Which of the below mentioned statements will help the user understand the functionality better?
    1. It is not possible to setup detailed monitoring for Auto Scaling
    2. In this case, Auto Scaling will send data every minute and will charge the user extra
    3. Detailed monitoring will send data every minute without additional charges
    4. Auto Scaling sends data every minute only and does not charge the user

References

AWS Storage Options – Whitepaper – Certification

Storage Options Whitepaper

AWS Storage Options is one of the most important Whitepaper for AWS Solution Architect Professional Certification exam and covers a brief summary of each AWS storage options, their ideal usage patterns, anti-patterns, performance, durability and availability, scalability etc.

Overview

  • AWS offers multiple cloud-based storage options. Each has a unique combination of performance, durability, availability, cost, and interface, as well as other characteristics such as scalability and elasticity
  • All storage options are ideally suited for some uses cases and there are certain Anti-Patterns which should be taken in account while making a storage choice

AWS Various Storage Options

Amazon S3 & Amazon Glacier

More Details @ AWS Storage Options – S3 & Glacier

Amazon Elastic Block Store (EBS) & Instance Store Volumes

More details @ AWS Storage Options – EBS & Instance Store

Amazon RDS, DynamoDB & Database on EC2

More details @ AWS Storage Options – RDS, DynamoDB & Database on EC2

Amazon SQS & Redshift

More details @ AWS Storage Options – SQS & Redshift

Amazon CloudFront & Elasticache

More details @ AWS Storage Options – CloudFront & ElastiCache

Amazon Storage Gateway & Import/Export

More details @ AWS Storage Options – Storage Gateway & Import/Export

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are developing a highly available web application using stateless web servers. Which services are suitable for storing session state data? Choose 3 answers.
    1. Elastic Load Balancing
    2. Amazon Relational Database Service (RDS)
    3. Amazon CloudWatch
    4. Amazon ElastiCache
    5. Amazon DynamoDB
    6. AWS Storage Gateway
  2. Your firm has uploaded a large amount of aerial image data to S3. In the past, in your on-premises environment, you used a dedicated group of servers to oaten process this data and used Rabbit MQ, an open source messaging system, to get job information to the servers. Once processed the data would go to tape and be shipped offsite. Your manager told you to stay with the current design, and leverage AWS archival storage and messaging services to minimize cost. Which is correct? [PROFESSIONAL]
    1. Use SQS for passing job messages, use Cloud Watch alarms to terminate EC2 worker instances when they become idle. Once data is processed, change the storage class of the S3 objects to Reduced Redundancy Storage.
    2. Setup Auto-Scaled workers triggered by queue depth that use spot instances to process messages in SQS. Once data is processed, change the storage class of the S3 objects to Reduced Redundancy Storage.
    3. Setup Auto-Scaled workers triggered by queue depth that use spot instances to process messages in SQS. Once data is processed, change the storage class of the S3 objects to Glacier.
    4. Use SNS to pass job messages use Cloud Watch alarms to terminate spot worker instances when they become idle. Once data is processed, change the storage class of the S3 object to Glacier.
  3. You are developing a new mobile application and are considering storing user preferences in AWS, which would provide a more uniform cross-device experience to users using multiple mobile devices to access the application. The preference data for each user is estimated to be 50KB in size. Additionally 5 million customers are expected to use the application on a regular basis. The solution needs to be cost-effective, highly available, scalable and secure, how would you design a solution to meet the above requirements? [PROFESSIONAL]
    1. Setup an RDS MySQL instance in 2 availability zones to store the user preference data. Deploy a public facing application on a server in front of the database to manage security and access credentials
    2. Setup a DynamoDB table with an item for each user having the necessary attributes to hold the user preferences. The mobile application will query the user preferences directly from the DynamoDB table. Utilize STS. Web Identity Federation, and DynamoDB Fine Grained Access Control to authenticate and authorize access
    3. Setup an RDS MySQL instance with multiple read replicas in 2 availability zones to store the user preference data .The mobile application will query the user preferences from the read replicas. Leverage the MySQL user management and access privilege system to manage security and access credentials.
    4. Store the user preference data in S3 Setup a DynamoDB table with an item for each user and an item attribute pointing to the user’ S3 object. The mobile application will retrieve the S3 URL from DynamoDB and then access the S3 object directly utilize STS, Web identity Federation, and S3 ACLs to authenticate and authorize access.
  4. A company is building a voting system for a popular TV show, viewers would watch the performances then visit the show’s website to vote for their favorite performer. It is expected that in a short period of time after the show has finished the site will receive millions of visitors. The visitors will first login to the site using their Amazon.com credentials and then submit their vote. After the voting is completed the page will display the vote totals. The company needs to build the site such that can handle the rapid influx of traffic while maintaining good performance but also wants to keep costs to a minimum. Which of the design patterns below should they use? [PROFESSIONAL]
    1. Use CloudFront and an Elastic Load balancer in front of an auto-scaled set of web servers, the web servers will first can the Login With Amazon service to authenticate the user then process the users vote and store the result into a multi-AZ Relational Database Service instance.
    2. Use CloudFront and the static website hosting feature of S3 with the Javascript SDK to call the Login With Amazon service to authenticate the user, use IAM Roles to gain permissions to a DynamoDB table to store the users vote.
    3. Use CloudFront and an Elastic Load Balancer in front of an auto-scaled set of web servers, the web servers will first call the Login with Amazon service to authenticate the user, the web servers will process the users vote and store the result into a DynamoDB table using IAM Roles for EC2 instances to gain permissions to the DynamoDB table.
    4. Use CloudFront and an Elastic Load Balancer in front of an auto-scaled set of web servers, the web servers will first call the Login. With Amazon service to authenticate the user, the web servers would process the users vote and store the result into an SQS queue using IAM Roles for EC2 Instances to gain permissions to the SQS queue. A set of application servers will then retrieve the items from the queue and store the result into a DynamoDB table
  5. A large real-estate brokerage is exploring the option to adding a cost-effective location-based alert to their existing mobile application. The application backend infrastructure currently runs on AWS. Users who opt in to this service will receive alerts on their mobile device regarding real-estate offers in proximity to their location. For the alerts to be relevant delivery time needs to be in the low minute count. The existing mobile app has 5 million users across the US. Which one of the following architectural suggestions would you make to the customer? [PROFESSIONAL]
    1. Mobile application will submit its location to a web service endpoint utilizing Elastic Load Balancing and EC2 instances. DynamoDB will be used to store and retrieve relevant offers. EC2 instances will communicate with mobile earners/device providers to push alerts back to mobile application. —
    2. Use AWS Direct Connect or VPN to establish connectivity with mobile carriers EC2 instances will receive the mobile applications location through carrier connection: RDS will be used to store and relevant offers. EC2 instances will communicate with mobile carriers to push alerts back to the mobile application
    3. Mobile application will send device location using SQS. EC2 instances will retrieve the relevant offers from DynamoDB. AWS Mobile Push will be used to send offers to the mobile application
    4. Mobile application will send device location using AWS Mobile Push. EC2 instances will retrieve the relevant offers from DynamoDB. EC2 instances will communicate with mobile carriers/device providers to push alerts back to the mobile application.
  6. You are running a news website in the eu-west-1 region that updates every 15 minutes. The website has a worldwide audience and it uses an Auto Scaling group behind an Elastic Load Balancer and an Amazon RDS database. Static content resides on Amazon S3, and is distributed through Amazon CloudFront. Your Auto Scaling group is set to trigger a scale up event at 60% CPU utilization; you use an Amazon RDS extra-large DB instance with 10.000 Provisioned IOPS its CPU utilization is around 80%. While freeable memory is in the 2 GB range. Web analytics reports show that the average load time of your web pages is around 1.5 to 2 seconds, but your SEO consultant wants to bring down the average load time to under 0.5 seconds. How would you improve page load times for your users? (Choose 3 answers) [PROFESSIONAL]
    1. Lower the scale up trigger of your Auto Scaling group to 30% so it scales more aggressively.
    2. Add an Amazon ElastiCache caching layer to your application for storing sessions and frequent DB queries
    3. Configure Amazon CloudFront dynamic content support to enable caching of re-usable content from your site
    4. Switch Amazon RDS database to the high memory extra-large Instance type
    5. Set up a second installation in another region, and use the Amazon Route 53 latency-based routing feature to select the right region.
  7. A read only news reporting site with a combined web and application tier and a database tier that receives large and unpredictable traffic demands must be able to respond to these traffic fluctuations automatically. What AWS services should be used meet these requirements? [PROFESSIONAL]
    1. Stateless instances for the web and application tier synchronized using ElastiCache Memcached in an autoscaling group monitored with CloudWatch. And RDS with read replicas.
    2. Stateful instances for the web and application tier in an autoscaling group monitored with CloudWatch and RDS with read replicas
    3. Stateful instances for the web and application tier in an autoscaling group monitored with CloudWatch. And multi-AZ RDS
    4. Stateless instances for the web and application tier synchronized using ElastiCache Memcached in an autoscaling group monitored with CloudWatch and multi-AZ RDS
  8. You have a periodic Image analysis application that gets some files as input, analyzes them and for each file writes some data in output to a ten file. The number of files in input per day is high and concentrated in a few hours of the day. Currently you have a server on EC2 with a large EBS volume that hosts the input data and the results it takes almost 20 hours per day to complete the process. What services could be used to reduce the elaboration time and improve the availability of the solution? [PROFESSIONAL]
    1. S3 to store I/O files. SQS to distribute elaboration commands to a group of hosts working in parallel. Auto scaling to dynamically size the group of hosts depending on the length of the SQS queue
    2. EBS with Provisioned IOPS (PIOPS) to store I/O files. SNS to distribute elaboration commands to a group of hosts working in parallel Auto Scaling to dynamically size the group of hosts depending on the number of SNS notifications
    3. S3 to store I/O files, SNS to distribute evaporation commands to a group of hosts working in parallel. Auto scaling to dynamically size the group of hosts depending on the number of SNS notifications
    4. EBS with Provisioned IOPS (PIOPS) to store I/O files SOS to distribute elaboration commands to a group of hosts working in parallel Auto Scaling to dynamically size the group to hosts depending on the length of the SQS queue.
  9. A 3-tier e-commerce web application is current deployed on-premises and will be migrated to AWS for greater scalability and elasticity. The web server currently shares read-only data using a network distributed file system The app server tier uses a clustering mechanism for discovery and shared session state that depends on IP multicast The database tier uses shared-storage clustering to provide database fail over capability, and uses several read slaves for scaling. Data on all servers and the distributed file system directory is backed up weekly to off-site tapes. Which AWS storage and database architecture meets the requirements of the application? [PROFESSIONAL]
    1. Web servers store read-only data in S3, and copy from S3 to root volume at boot time. App servers share state using a combination of DynamoDB and IP unicast. Database use RDS with multi-AZ deployment and one or more Read Replicas. Backup web and app servers backed up weekly via AMIs, database backed up via DB snapshots.
    2. Web servers store read-only data in S3, and copy from S3 to root volume at boot time. App servers share state using a combination of DynamoDB and IP unicast. Database use RDS with multi-AZ deployment and one or more Read replicas. Backup web servers app servers, and database backed up weekly to Glacier using snapshots (Snapshots to Glacier don’t work directly with EBS snapshots)
    3. Web servers store read-only data in S3 and copy from S3 to root volume at boot time. App servers share state using a combination of DynamoDB and IP unicast. Database use RDS with multi-AZ deployment. Backup web and app servers backed up weekly via AMIs. Database backed up via DB snapshots (Need Read replicas for scalability and elasticity)
    4. Web servers, store read-only data in an EC2 NFS server, mount to each web server at boot time App servers share state using a combination of DynamoDB and IP multicast Database use RDS with multi-AZ deployment and one or more Read Replicas Backup web and app servers backed up weekly via AMIs database backed up via DB snapshots (IP multicast not available in AWS)
  10. Our company is getting ready to do a major public announcement of a social media site on AWS. The website is running on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers) [PROFESSIONAL]
    1. Deploy ElasticCache in-memory cache running in each availability zone
    2. Implement sharding to distribute load to multiple RDS MySQL instances (Would distributed read write both, focus is on read contention)
    3. Increase the RDS MySQL Instance size and Implement provisioned IOPS (Would distributed read write both, focus is on read contention)
    4. Add an RDS MySQL read replica in each availability zone
  11. Run 2-tier app with the following: an ELB, three web app server on EC2, and 1 MySQL RDS db. With grown load, db queries take longer and longer and slow down the overall response time for user request. What Options could speed up performance? (Choose 3) [PROFESSIONAL]
    1. Create an RDS read-replica and redirect half of the database read request to it
    2. Cache database queries in amazon ElastiCache
    3. Setup RDS in multi-availability zone mode.
    4. Shard the database and distribute loads between shards.
    5. Use amazon CloudFront to cache database queries.
  12. You have a web application leveraging an Elastic Load Balancer (ELB) In front of the web servers deployed using an Auto Scaling Group Your database is running on Relational Database Service (RDS) The application serves out technical articles and responses to them in general there are more views of an article than there are responses to the article. On occasion, an article on the site becomes extremely popular resulting in significant traffic Increases that causes the site to go down. What could you do to help alleviate the pressure on the infrastructure while maintaining availability during these events? Choose 3 answers [PROFESSIONAL]
    1. Leverage CloudFront for the delivery of the articles.
    2. Add RDS read-replicas for the read traffic going to your relational database
    3. Leverage Elastic Cache for caching the most frequently used data.
    4. Use SQS to queue up the requests for the technical posts and deliver them out of the queue (does not process and would not be real time)
    5. Use Route53 health checks to fail over to an S3 bucket for an error page (more of an error handling then availability)
  13. Your website is serving on-demand training videos to your workforce. Videos are uploaded monthly in high resolution MP4 format. Your workforce is distributed globally often on the move and using company-provided tablets that require the HTTP Live Streaming (HLS) protocol to watch a video. Your company has no video transcoding expertise and it required you might need to pay for a consultant. How do you implement the most cost-efficient architecture without compromising high availability and quality of video delivery? [PROFESSIONAL]
    1. Elastic Transcoder to transcode original high-resolution MP4 videos to HLS. S3 to host videos with lifecycle Management to archive original flies to Glacier after a few days. CloudFront to serve HLS transcoded videos from S3 (Elastic Transcoder for High quality, S3 to host videos cheaply, Glacier for archives and CloudFront for high availability)
    2. A video transcoding pipeline running on EC2 using SQS to distribute tasks and Auto Scaling to adjust the number or nodes depending on the length of the queue S3 to host videos with Lifecycle Management to archive all files to Glacier after a few days CloudFront to serve HLS transcoding videos from Glacier
    3. Elastic Transcoder to transcode original high-resolution MP4 videos to HLS EBS volumes to host videos and EBS snapshots to incrementally backup original rues after a few days. CloudFront to serve HLS transcoded videos from EC2.
    4. A video transcoding pipeline running on EC2 using SQS to distribute tasks and Auto Scaling to adjust the number of nodes depending on the length of the queue. EBS volumes to host videos and EBS snapshots to incrementally backup original files after a few days. CloudFront to serve HLS transcoded videos from EC2
  14. To meet regulatory requirements, a pharmaceuticals company needs to archive data after a drug trial test is concluded. Each drug trial test may generate up to several thousands of files, with compressed file sizes ranging from 1 byte to 100MB. Once archived, data rarely needs to be restored, and on the rare occasion when restoration is needed, the company has 24 hours to restore specific files that match certain metadata. Searches must be possible by numeric file ID, drug name, participant names, date ranges, and other metadata. Which is the most cost-effective architectural approach that can meet the requirements? [PROFESSIONAL]
    1. Store individual files in Amazon Glacier, using the file ID as the archive name. When restoring data, query the Amazon Glacier vault for files matching the search criteria. (Individual files are expensive and does not allow searching by participant names etc)
    2. Store individual files in Amazon S3, and store search metadata in an Amazon Relational Database Service (RDS) multi-AZ database. Create a lifecycle rule to move the data to Amazon Glacier after a certain number of days. When restoring data, query the Amazon RDS database for files matching the search criteria, and move the files matching the search criteria back to S3 Standard class. (As the data is not needed can be stored to Glacier directly and the data need not be moved back to S3 standard)
    3. Store individual files in Amazon Glacier, and store the search metadata in an Amazon RDS multi-AZ database. When restoring data, query the Amazon RDS database for files matching the search criteria, and retrieve the archive name that matches the file ID returned from the database query. (Individual files and Multi-AZ is expensive)
    4. First, compress and then concatenate all files for a completed drug trial test into a single Amazon Glacier archive. Store the associated byte ranges for the compressed files along with other search metadata in an Amazon RDS database with regular snapshotting. When restoring data, query the database for files that match the search criteria, and create restored files from the retrieved byte ranges.
    5. Store individual compressed files and search metadata in Amazon Simple Storage Service (S3). Create a lifecycle rule to move the data to Amazon Glacier, after a certain number of days. When restoring data, query the Amazon S3 bucket for files matching the search criteria, and retrieve the file to S3 reduced redundancy in order to move it back to S3 Standard class. (Once the data is moved from S3 to Glacier the metadata is lost, as Glacier does not have metadata and must be maintained externally)
  15. A document storage company is deploying their application to AWS and changing their business model to support both free tier and premium tier users. The premium tier users will be allowed to store up to 200GB of data and free tier customers will be allowed to store only 5GB. The customer expects that billions of files will be stored. All users need to be alerted when approaching 75 percent quota utilization and again at 90 percent quota use. To support the free tier and premium tier users, how should they architect their application? [PROFESSIONAL]
    1. The company should utilize an amazon simple work flow service activity worker that updates the users data counter in amazon dynamo DB. The activity worker will use simple email service to send an email if the counter increases above the appropriate thresholds.
    2. The company should deploy an amazon relational data base service relational database with a store objects table that has a row for each stored object along with size of each object. The upload server will query the aggregate consumption of the user in questions by first determining the files store by the user, and then querying the stored objects table for respective file sizes) and send an email via amazon simple email service if the thresholds are breached.
    3. The company should write both the content length and the username of the files owner as S3 metadata for the object. They should then create a file watcher to iterate over each object and aggregate the size for each user and send a notification via amazon simple queue service to an emailing service if the storage threshold is exceeded.
    4. The company should create two separated amazon simple storage service buckets one for data storage for free tier users and another for data storage for premium tier users. An amazon simple workflow service activity worker will query all objects for a given user based on the bucket the data is stored
  16. Your company has been contracted to develop and operate a website that tracks NBA basketball statistics. Statistical data to derive reports like “best game-winning shots from the regular season” and more frequently built reports like “top shots of the game” need to be stored durably for repeated lookup. Leveraging social media techniques, NBA fans submit and vote on new report types from the existing data set so the system needs to accommodate variability in data queries and new static reports must be generated and posted daily. Initial research in the design phase indicates that there will be over 3 million report queries on game day by end users and other applications that use this application as a data source. It is expected that this system will gain in popularity over time and reach peaks of 10-15 million report queries of the system on game days. Select the answer that will allow your application to best meet these requirements while minimizing costs. [PROFESSIONAL]
    1. Launch a multi-AZ MySQL Amazon Relational Database Service (RDS) Read Replica connected to your multi AZ master database and generate reports by querying the Read Replica. Perform a daily table cleanup.
    2. Implement a multi-AZ MySQL RDS deployment and have the application generate reports from Amazon ElastiCache for in-memory performance results. Utilize the default expire parameter for items in the cache.
    3. Generate reports from a multi-AZ MySQL Amazon RDS deployment and have an offline task put reports in Amazon Simple Storage Service (S3) and use CloudFront to cache the content. Use a TTL to expire objects daily. (Offline task with S3 storage and CloudFront cache)
    4. Query a multi-AZ MySQL RDS instance and store the results in a DynamoDB table. Generate reports from the DynamoDB table. Remove stale tables daily.

References

AWS Storage Options – SQS & Redshift

SQS

  • is a temporary data repository for messages  and provides a reliable, highly scalable, hosted message queuing service for temporary storage and delivery of short (up to 256 KB) text-based data messages.
  • supports a virtually unlimited number of queues and supports unordered, at-least-once delivery of messages.

Ideal Usage patterns

  • is ideally suited to any scenario where multiple application components must communicate and coordinate their work in a loosely coupled manner particularly producer consumer scenarios
  • can be used to coordinate a multi-step processing pipeline, where each message is associated with a task that must be processed.
  • enables the number of worker instances to scale up or down, and also enable the processing power of each single worker instance to scale up or down, to suit the total workload, without any application changes.

Anti-Patterns

  • Binary or Large Messages
    • SQS is suited for text messages with maximum size of 64 KB. If the application requires binary or messages exceeding the length, it is best to use Amazon S3 or RDS and use SQS to store the pointer
  • Long Term storage
    • SQS stores messages for max 14 days and if application requires storage period longer than 14 days, Amazon S3 or other storage options should be preferred
  • High-speed message queuing or very short tasks
    • If the application requires a very high-speed message send and receive response from a single producer or consumer, use of Amazon DynamoDB or a message-queuing system hosted on Amazon EC2 may be more appropriate.

Performance

  • is a distributed queuing system that is optimized for horizontal scalability, not for single-threaded sending or receiving speeds.
  • A single client can send or receive Amazon SQS messages at a rate of about 5 to 50 messages per second. Higher receive performance can be achieved by requesting multiple messages (up to 10) in a single call.

Durability & Availability

  • are highly durable but temporary.
  • stores all messages redundantly across multiple servers and data centers.
  • Message retention time is configurable on a per-queue basis, from a minimum of one minute to a maximum of 14 days.
  • Messages are retained in a queue until they are explicitly deleted, or until they are automatically deleted upon expiration of the retention time.

Cost Model

  • pricing is based on
    • number of requests and
    • the amount of data transferred in and out (priced per GB per month).

Scalability & Elasticity

  • is both highly elastic and massively scalable.
  • is designed to enable a virtually unlimited number of computers to read and write a virtually unlimited number of messages at any time.
  • supports virtually unlimited numbers of queues and messages per queue for any user.

Amazon Redshift

  • is a fast, fully-managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
  • is optimized for datasets that range from a few hundred gigabytes to a petabyte or more.
  • manages the work needed to set up, operate, and scale a data warehouse, from provisioning the infrastructure capacity to automating ongoing administrative tasks such as backups and patching.

Ideal Usage Pattern

  • is ideal for analyzing large datasets using the existing business intelligence tools
  • Common use cases include
    • Analyze global sales data for multiple products
    • Store historical stock trade data
    • Analyze ad impressions and clicks
    • Aggregate gaming data
    • Analyze social trends
    • Measure clinical quality, operation efficiency, and financial
    • performance in the health care space

Anti-Pattern

  • OLTP workloads
    • Redshift is a column-oriented database and more suited for data warehousing and analytics. If application involves online transaction processing, Amazon RDS would be a better choice.
  • Blob data
    • For Blob storage, Amazon S3 would be a better choice with metadata in other storage as RDS or DynamoDB

Performance

  • Amazon Redshift allows a very high query performance on datasets ranging in size from hundreds of gigabytes to a petabyte or more.
  • It uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries.
  • It has a massively parallel processing (MPP) architecture that parallelizes and distributes SQL operations to take advantage of all available resources.
  • Underlying hardware is designed for high performance data processing that uses local attached storage to maximize throughput.

Durability & Availability

  • Amazon Redshift stores three copies of your data—all data written to a node in your cluster is automatically replicated to other nodes within the cluster, and all data is continuously backed up to Amazon S3.
  • Snapshots are automated, incremental, and continuous and stored for a user-defined period (1-35 days)
  • Manual snapshots can be created and are retained until explicitly deleted.
  • Amazon Redshift also continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary.

Cost Model

  • has three pricing components:
    • data warehouse node hours – total number of hours run across all the compute node
    • backup storage – storage cost for automated and manual snapshots
    • data transfer
      • There is no data transfer charge for data transferred to or from Amazon Redshift outside of Amazon VPC
      • Data transfer to or from Amazon Redshift in Amazon VPC accrues standard AWS data transfer charges.

Scalability & Elasticity

  • provides push button scaling and the number of nodes can be easily scaled in the data warehouse cluster as the demand changes.
  • Redshift places the existing cluster in the read only mode, so the existing queries can continue to run, while is provisions a new cluster with chosen size and copies the data to it. Once the data is copied, it automatically redirects queries to the new cluster

AWS Storage Options – CloudFront & ElastiCache

Amazon CloudFront

  • is a webservice for content delivery
  • provides low latency by caching and delivering content from a global network of edge locations located nearest to the user
  • supports both HTTP to allows static, dynamic content and Real Time Messaging Protocol (RTMP) for streaming of videos
  • optimized to work as with Amazon services like S3, ELB etc. as well as works seamlessly with any non-AWS origin server

Ideal Usage Patterns

  • is ideal for distribution of frequently accessed static content, or dynamic content or for streaming audio or video that benefits from edge delivery

Anti-Pattern

  • Infrequently accessed data
    • If the data is infrequently accessed, it would be better to serve the data from the Origin server
  • Programmatic cache invalidation
    • CloudFront supports cache invalidation, however AWS recommends using object versioning rather than programmatic cache invalidation.

Performance

  • is designed for low latency and high bandwidth delivery of content by redirecting the user to the nearest edge location in terms of latency and caching the content preventing the round trip to the origin server

Durability & Availability

  • provides high Availability by delivering content from a distributed global network of edge locations. Amazon also constantly monitors the network paths connecting Origin servers to CloudFront
  • does not provide durable storage, which is more of the responsibility of the underlying Origin server providing the content for e.g. S3

Cost Model

  • has two pricing components:
    • regional data transfer out (per GB) and
    • requests (per 10,000)

Scalability & Elasticity

  • provides seamless scalability & elasticity by automatically responding to the increase or the decrease in the demand

ElastiCache

  • is a webservice that makes it easy to deploy, operate, and scale a distributed, in-memory cache in the cloud
  • helps improves performance of the applications by allowing retrieval of data from fast, managed, in-memory caching system
  • supports Memcached (object caching) & Redis (key value store that supports data structure) open source caching engines

Ideal Usage Patterns

  • improving application performance by storing critical data in-memory for low latency access
  • use cases involve usage as a database front end for read heavy applications, improving performance and reducing load on databases, or managing user session data, cache dynamically generated pages, or compute intensive calculations etc.

Anti-Patterns

  • Persistent Data
    • If the application needs fast access to data coupled with strong data durability, Amazon DynamoDB would be a better option

Performance

  • Although ElastiCache provides low latency access to the data, the performance depends on the caching strategy and the hit ratio at the application level

Durability & Availability

  • stores transient data or transient copies of durable data, so the data durability is managed by the source
  • With the Memcached engine
    • all ElastiCache nodes in a single cache cluster are provisioned in a single Availability Zone.
    • ElastiCache automatically monitors the health of your cache nodes and replaces them in the event of network partitioning, host hardware, or software failure.
    • In the event of cache node failure, the cluster remains available, but performance may be reduced due to time needed to repopulate the cache in the new “cold” cache nodes.
    • To provide enhanced fault-tolerance for Availability Zone failures or cold-cache effects, you can run redundant cache clusters in different Availability Zones.
  • With the Redis engine,
    • ElastiCache supports replication to up to five read replicas for scaling. To improve availability, you can place read replicas in other Availability Zones.
    • ElastiCache monitors the primary node, and if the node becomes unavailable, ElastiCache will repair or replace the primary node if possible, using the same DNS name.
    • If the primary cache node recovery fails or its Availability Zone is unavailable, primary node can be failed over to one of the read replicas with an API call.

Cost Model

  • has a single pricing component:
    • pricing is per cache node-hour consumed

Scalability & Elasticity

  • ElastiCache is highly scalable and elastic.
  • Cache node can be added or deleted to the cache cluster
  • Auto Discovery enables automatic discovery of Memcached cache nodes by ElastiCache Clients when the nodes are added to or removed from an ElastiCache cluster.

 

Storage Options Whitepaper – Storage Gateway – Import/Export – AWS Certification

AWS Storage Options Whitepaper cont.

Provides a brief summary for the Ideal Use cases and Anti-Patterns for Storage Gateway and Import/Export AWS storage options

AWS Storage Gateway

  • Storage Gateway is a service that connects an on-premises software appliance with cloud-based storage to provide seamless and secure integration between the organization’s on-premises IT environment and AWS’s storage infrastructure.
  • Storage Gateway enables store data securely to the AWS cloud for scalable and cost-effective storage.
  • It provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of your data encrypted in S3.
  • For disaster recovery scenarios, it can serve as a cloud-hosted solution, together with EC2, that mirrors your entire production environment.
  • Storage Gateway can be configured as
    • Gateway-cached volumes
      • Gateway-cached volumes utilizes S3 for primary data backup, while retaining frequently accessed data locally in a cache.
      • These volumes minimize the need to scale the on-premises storage infrastructure, while still providing applications with low-latency access to their frequently accessed data.
      • Data written to the volumes is stored in S3, with only a cache of recently written and recently read data is stored locally on the on-premises storage hardware.
    • Gateway-stored volumes
      • Gateway-stored volumes stores the complete primary data locally, while asynchronously backing up that data to AWS.
      • These volumes provide the on-premises applications with low-latency access to their entire datasets, while providing durable, off-site backups.
      • Data written to the gateway-stored volumes is stored on the on-premises storage hardware, and asynchronously backed up to S3 in the form of EBS snapshots.

Ideal Usage Patterns

  • AWS Storage Gateway use cases include
    • corporate file sharing,
    • enabling existing on-premises backup applications to store primary backups on S3,
    • disaster recovery, and
    • data mirroring to cloud-based compute resources.

Anti-Patterns

  • Database storage
    • For Database backup or storage, EC2 instances using EBS volumes are a natural choice for database storage and workloads.

Performance

  • As the Storage Gateway VM sits between the application, underlying on-premises storage and S3, the performance experienced will be dependent upon a number of factors, including the speed and configuration of the underlying local disks, the network bandwidth between the iSCSI initiator and gateway VM, the amount of local storage allocated to the gateway VM, and the bandwidth between the gateway VM and S3.
  • For gateway-cached volumes, to provide low-latency read access to the on-premises applications, it’s important to provide enough local cache storage to store the recently accessed data.
  • Storage Gateway efficiently uses the Internet bandwidth to speed up the upload of on-premises application data to AWS.
  • Storage Gateway only uploads incremental changes (data that has changed), which minimizes the amount of data sent over the Internet.
  • AWS Direct Connect can be used to further increase throughput and reduce the network costs by establishing a dedicated network connection between the on-premises gateway and AWS.

Durability and Availability

  • AWS Storage Gateway durably stores on-premises application data by uploading it to S3.
  • S3 stores data in multiple facilities and on multiple devices within each facility.
  • S3 also performs regular, systematic data integrity checks and is built to be automatically self-healing.

Cost Model

  • AWS Storage Gateway has four pricing components:
    • gateway usage (per gateway per month),
    • snapshot storage usage (per GB per month),
    • volume storage usage (per GB per month), and
    • data transfer out (per GB per month).

Scalability and Elasticity

  • AWS Storage Gateway stores data in Amazon S3, which has been designed to offer a very high level of scalability and elasticity automatically.

Interfaces

  • AWS Management Console can be used to download the AWS Storage Gateway VM image, select between a gateway-cached or gateway-stored configuration, activate the on-premises by associating the gateway’s IP Address with your AWS account, select an AWS region, and create AWS Storage Gateway volumes and attach these volumes as iSCSI devices to your on-premises application servers.

AWS Import/Export (Upgraded to Snowball)

  • AWS Import/Export accelerates moving large amounts of data into and out of AWS using portable storage devices for transport.
  • AWS transfers the data directly onto and off of storage devices using Amazon’s high-speed internal network and bypassing the Internet and can be much faster and more cost effective than upgrading connectivity.
  • AWS Import/Export supports importing into several types of AWS storage, including EBS snapshots, S3 buckets, and Glacier vaults and exporting data from S3.

Ideal Usage Patterns

  • AWS Import/Export is ideal for transferring large amounts of data in and out of the AWS cloud, especially in cases where transferring the data over the Internet would be too slow (a week or more) or too costly.
  • Common use cases include
    • initial data upload to AWS,
    • content distribution or regular data interchange to/from your customers or business associates,
    • transfer to Amazon S3 or Amazon Glacier for off-site backup and archival storage, and quick retrieval of large backups from Amazon S3 or Amazon Glacier for disaster recovery.

Anti-Patterns

  • AWS Import/Export may not be the ideal solution for data that is more easily transferred over the Internet in less than one week.

Performance

  • Each AWS Import/Export station is capable of loading data at over 100 MB per second
  • Rate of the data load will be bounded by a combination of the read or write speed of the portable storage device and, for Amazon S3 data loads, the average object (file) size.

Durability and Availability

  • Durability and availability characteristics of the target storage i.e. EBS, S3 or Glacier applies, after the data has been imported

Cost Model

  • AWS Import/Export has three pricing components: a per-device fee, a data load time charge (per data-loading-hour), and possible return shipping charges (for expedited shipping, or shipping to destinations not local to that AWS Import/Export region).
  • Storage pricing applies for the destination storage, the standard Amazon EBS snapshot, Amazon S3, and Amazon Glacier request and storage pricing applies.

Scalability and Elasticity

  • Total amount of data you can load using AWS Import/Export is limited only by the capacity of the devices sent to AWS.
  • For Amazon S3, individual files will be loaded as objects in Amazon S3, and may range up to 5 terabytes in size.
  • For Amazon Glacier, individual devices will be loaded as a single archive, and may range up to 4 terabytes in size.
  • Aggregate total amount of data that can be imported is virtually unlimited.

Interfaces

  • To upload or download data, AWS Import/Export job for each storage device shipped need to be created and submitted
  • Jobs can be created using AWS CLI, AWS SDK or native REST API
  • Each job request requires a manifest file, a YAML-formatted text file that contains a set of key-value pairs that supply the required information—such as your device ID, secret access key, and return address—necessary to complete the job.
  • Job request is tied to the storage device through a signature file in the root directory (for Amazon S3 import jobs), or by a barcode taped to the device (for Amazon EBS and Amazon Glacier jobs).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are working with a customer who has 10 TB of archival data that they want to migrate to Amazon Glacier. The customer has a 1-Mbps connection to the Internet. Which service or feature provides the fastest method of getting the data into Amazon Glacier?
    1. Amazon Glacier multipart upload
    2. AWS Storage Gateway
    3. VM Import/Export
    4. AWS Import/Export

AWS Storage Options – RDS, DynamoDB & Database on EC2

AWS Storage Options Whitepaper with RDS, DynamoDB & Database on EC2 Cont.

Provides a brief summary for the Ideal Use cases, Anti-Patterns and other factors for Amazon RDS, DynamoDB & Databases on EC2 storage options

Amazon RDS

  • RDS is a web service that provides the capabilities of MySQL, Oracle, MariaDB, Postgres or Microsoft SQL Server relational database as a managed, cloud-based service
  • RDS eliminates much of the administrative overhead associated with launching, managing, and scaling your own relational database on Amazon EC2 or in another computing environment.

Ideal Usage Patterns

  • RDS is a great solution for cloud-based fully-managed relational database
  • RDS is also optimal for new applications with structured data that requires more sophisticated querying and joining capabilities than that provided by Amazon’s NoSQL database offering, DynamoDB.
  • RDS provides full compatibility with the databases supported and direct access to native database engines, code and libraries and is ideal for existing applications that rely on these databases

Anti-Patterns

  • Index and query-focused data
    • If the applications don’t require advanced features such as joins and complex transactions and is more oriented toward indexing and querying data, DynamoDB would be more appropriate for this needs
  • Numerous BLOBs
    • If the application makes heavy use of files (audio files, videos, images, etc), it is a better choice to use S3 to store the objects instead of database engines Blob feature and use RDS or DynamoDB only to save the metadata
  • Automated scalability
    • RDS provides pushbutton scaling and it only scales up and has limited scale out ability. If fully-automated scaling is needed, DynamoDB may be a better choice.
  • Complete control
    • RDS does not provide admin access and does not enable the full feature set of the database engines.
    • So if the application requires complete, OS-level control of the database server with full root or admin login privileges, a self-managed database on EC2 may be a better match.
  • Other database platforms
    • RDS, at this time, provides a MySQL, Oracle, MariaDB, PostgreSQL and SQL Server databases.
    • If any other database platform (such as IBM DB2, Informix, or Sybase) is needed, it should be deployed on a self-managed database on an EC2 instance by using a relational database AMI, or by installing database software on an EC2 instance.

Performance

  • RDS Provisioned IOPS, where the IOPS can be specified when the instance is launched and is guaranteed over the life of the instance, provides a high-performance storage option designed to deliver fast, predictable, and consistent performance for I/O intensive transactional database workload

Durability and Availability

  • RDS leverages Amazon EBS volumes as its data store
  • RDS provides database backups, for enhanced durability, which are replicated across multiple AZ’s
    • Automated backups
      • If enabled, RDS will automatically perform a full daily backup of your data during the specified backup window, and will also capture DB transaction logs
    • User initiated backups
      • User can initiate backups at time and they are not deleted unless deleted explicitly by the user
  • RDS Multi AZ’s feature enhances both the durability and the availability of the database by synchronously replicating the data between a primary RDS DB instance and a standby instance in another Availability Zone, which prevents data loss,
  • RDS provides a DNS endpoint and in case of an failure on the primary, it automatically fails over to the standby instance
  • RDS also allows Read replicas for the supported databases, which are replicated asynchronously

Cost Model

  • RDS offers a tiered pricing structure, based on the size of the database instance, the deployment type (Single-AZ/Multi-AZ), and the AWS region.
  • Pricing for RDS is based on several factors: the DB instance hours (per hour), the amount of provisioned database storage (per GB-month and per million I/O requests), additional backup storage (per GB-month), and data transfer in/out (per GB per month)

Scalability and Elasticity

  • RDS resources can be scaled elastically in several dimensions: database storage size, database storage IOPS rate, database instance compute capacity, and the number of read replicas
  • RDS supports “pushbutton scaling” of both database storage and compute resources. Additional storage can either be added immediately or during the next maintenance cycle
  • RDS for MySQL also enables you to scale out beyond the capacity of a single database deployment for read-heavy database workloads by creating one or more read replicas.
  • Multiple RDS instances can also be configured to leverage database partitioning or sharding to spread the workload over multiple DB instances, achieving even greater database scalability and elasticity.

Interfaces

  • RDS APIs and the AWS Management Console provide a management interface that allows you to create, delete, modify, and terminate RDS DB instances; to create DB snapshots; and to perform point-in-time restores
  • There is no AWS data API for Amazon RDS.
  • Once a database is created, RDS provides a DNS endpoint for the database which can be used to connect to the database.
  • Endpoint does not change over the lifetime of the instance even during the failover in case of Multi-AZ configuration

Amazon DynamoDB

  • Amazon DynamoDB is a fast, fully-managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic.
  • DynamoDB being a managed service helps offload the administrative burden of operating and scaling a highly-available distributed database cluster.
  • DynamoDB helps meet the latency and throughput requirements of highly demanding applications by providing extremely fast and predictable performance with seamless throughput and storage scalability.
  • DynamoDB provides both eventually-consistent reads (by default), and strongly-consistent reads (optional), as well as implicit item-level transactions for item put, update, delete, conditional operations, and increment/decrement.
  • Amazon DynamoDB handles the data as below :-
    • DynamoDB stores structured data in tables, indexed by primary key, and allows low-latency read and write access to items.
    • DynamoDB supports three data types: number, string, and binary, in both scalar and multi-valued sets.
    • Tables do not have a fixed schema, so each data item can have a different number of attributes.
    • Primary key can either be a single-attribute hash key or a composite hash-range key.
    • Local secondary indexes provide additional flexibility for querying against attributes other than the primary key.

Ideal Usage Patterns

  • DynamoDB is ideal for existing or new applications that need a flexible NoSQL database with low read and write latencies, and the ability to scale storage and throughput up or down as needed without code changes or downtime.
  • Use cases require a highly available and scalable database because downtime or performance degradation has an immediate negative impact on an organization’s business. for e.g. mobile apps, gaming, digital ad serving, live voting and audience interaction for live events, sensor networks, log ingestion, access control for web-based content, metadata storage for S3 objects, e-commerce shopping carts, and web session management

Anti-Patterns

  • Structured data with Join and/or Complex Transactions
    • If the application uses structured data and required joins, complex transactions or other relationship infrastructure provided by traditional database platforms, it is better to use RDS or Database installed on an EC2 instance
  • Large Blob data
    • If the application uses large blob data for e.g. media, files, videos etc., it is better to use S3 to store the objects and use DynamoDB to store metadata for e.g. name, size, content-type etc
  • Large Objects with Low I/O rate
    • DynamoDB uses SSD drives and is optimized for workloads with a high I/O rate per GB stored. If the applications stores very large amounts of data that are infrequently accessed, S3 might be a better choice
  • Prewritten application with databases
    • For Porting an existing application using databases, RDS or database installed on the EC2 instance would be a better and seamless solution

Performance

  • SSDs and limited indexing on attributes provides high throughput and low latency and drastically reduces the cost of read and write operations.
  • Predictable performance can be achieved by defining the provisioned throughput capacity required for a given table.
  • DynamoDB handles the provisioning of resources to achieve the requested throughput rate, taking away the burden to think about instances, hardware, memory, and other factors that can affect an application’s throughput rate.
  • Provisioned throughput capacity reservations are elastic and can be increased or decreased on demand.

Durability and Availability

  • DynamoDB has built-in fault tolerance that automatically and synchronously replicates data across three AZ’s in a region for high availability and to help protect data against individual machine, or even facility failures.

Cost Model

  • DynamoDB has three pricing components: provisioned throughput capacity (per hour), indexed data storage (per GB per month), data transfer in or out (per GB per month)

Scalability and Elasticity

  • DynamoDB is both highly-scalable and elastic.
  • DynamoDB provides unlimited storage capacity, and the service automatically allocates more storage as the demand increases
  • Data is automatically partitioned and re-partitioned as needed, while the use of SSDs provides predictable low-latency response times at any scale.
  • DynamoDB is also elastic, in that you can simply “dial-up” or “dial-down” the read and write capacity of a table as your needs change.

Interfaces

  • DynamoDB provides a low-level REST API, as well as higher-level SDKs in different languages
  • APIs provide both a management and data interface for Amazon DynamoDB, that enable table management (creating, listing, deleting, and obtaining metadata) and working with attributes (getting, writing, and deleting attributes; query using an index, and full scan).

Databases on EC2

  • EC2 with EBS volumes allows hosting a self managed relational database
  • Ready to use, prebuilt AMIs are also available from leading database solutions

Ideal Usage Patterns

  • Self managed database on EC2 is an ideal scenario for users whose application requires a specific traditional relational database not supported by Amazon RDS for e.g. IBM DB2, Informix, or Sybase
  • Users or applications that require a maximum level of administrative control and configurability which is not provided by RDS

Anti-Patterns

  • Index and query-focused data
    • If the applications don’t require advanced features such as joins and complex transactions and is more oriented toward indexing and querying data, DynamoDB would be more appropriate for this needs
  • Numerous BLOBs
    • If the application makes heavy use of files (audio files, videos, images, and so on), it is a better choice to use S3 to store the objects instead of database engines Blob feature and use RDS or DynamoDB only to save the metadata
  • Automated scalability
    • Relational databases on EC2 leverages the scalability and elasticity of the underlying AWS platform, but this requires system administrators or DBAs to perform a manual or scripted task. If you need pushbutton scaling or fully-automated scaling, DynamoDB or RDS may be a better choice.
  • RDS supported database platforms
    • If the application using RDS supported database engine and all the features are available, RDS would be a better choice instead of self managed relational database on EC2

Performance

  • Performance depends on the size of the underlying EC2 instance, the number and configuration of the EBS volumes and the database itself
  • Performance can be increased by scaling up memory and compute resources by choosing a larger Amazon EC2 instance size.
  • For database storage, it is usually best to use EBS Provisioned IOPS volumes. To scale up I/O performance, the Provisioned IOPS can be increased, the number of EBS volumes changed, or use software RAID 0 (disk striping) across multiple EBS volumes, which will aggregate total IOPS and bandwidth.

Durability & Availability

  • As the database on EC2 uses EBS as storage, it has the same durability and availability provided by EBS and can be further enhanced by using EBS snapshots or by using third-party database backup utilities (such as Oracle’s RMAN) to store database backups in Amazon S3

Cost Model

  • Cost for running a database on EC2 instance is mainly determined by the size and the number of EC2 instance running, the size of the EBS volume used for database storage and any third party licensing cost for the database

Scalability & Elasticity

  • Users of traditional relational database solutions on Amazon EC2 can take advantage of the scalability and elasticity of the underlying AWS platform by creating AMI and spawning multiple instances

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following are use cases for Amazon DynamoDB? Choose 3 answers
    1. Storing BLOB data.
    2. Managing web sessions
    3. Storing JSON documents
    4. Storing metadata for Amazon S3 objects
    5. Running relational joins and complex updates.
    6. Storing large amounts of infrequently accessed data.
  2. A client application requires operating system privileges on a relational database server. What is an appropriate configuration for highly available database architecture?
    1. A standalone Amazon EC2 instance
    2. Amazon RDS in a Multi-AZ configuration
    3. Amazon EC2 instances in a replication configuration utilizing a single Availability Zone
    4. Amazon EC2 instances in a replication configuration utilizing two different Availability Zones
  3. You are developing a new mobile application and are considering storing user preferences in AWS, which would provide a more uniform cross-device experience to users using multiple mobile devices to access the application. The preference data for each user is estimated to be 50KB in size. Additionally 5 million customers are expected to use the application on a regular basis. The solution needs to be cost-effective, highly available, scalable and secure, how would you design a solution to meet the above requirements?
    1. Setup an RDS MySQL instance in 2 availability zones to store the user preference data. Deploy a public facing application on a server in front of the database to manage security and access credentials
    2. Setup a DynamoDB table with an item for each user having the necessary attributes to hold the user preferences. The mobile application will query the user preferences directly from the DynamoDB table. Utilize STS. Web Identity Federation, and DynamoDB Fine Grained Access Control to authenticate and authorize access (DynamoDB provides high availability as it synchronously replicates data across three facilities within an AWS Region and scalability as it is designed to scale its provisioned throughput up or down while still remaining available. Also suitable for storing user preference data)
    3. Setup an RDS MySQL instance with multiple read replicas in 2 availability zones to store the user preference data .The mobile application will query the user preferences from the read replicas. Leverage the MySQL user management and access privilege system to manage security and access credentials.
    4. Store the user preference data in S3 Setup a DynamoDB table with an item for each user and an item attribute pointing to the user’ S3 object. The mobile application will retrieve the S3 URL from DynamoDB and then access the S3 object directly utilize STS, Web identity Federation, and S3 ACLs to authenticate and authorize access.
  4. A customer is running an application in US-West (Northern California) region and wants to setup disaster recovery failover to the Asian Pacific (Singapore) region. The customer is interested in achieving a low Recovery Point Objective (RPO) for an Amazon RDS multi-AZ MySQL database instance. Which approach is best suited to this need?
    1. Synchronous replication
    2. Asynchronous replication
    3. Route53 health checks
    4. Copying of RDS incremental snapshots
  5. You are designing a file -sharing service. This service will have millions of files in it. Revenue for the service will come from fees based on how much storage a user is using. You also want to store metadata on each file, such as title, description and whether the object is public or private. How do you achieve all of these goals in a way that is economical and can scale to millions of users?
    1. Store all files in Amazon Simple Storage Service (53). Create a bucket for each user. Store metadata in the filename of each object, and access it with LIST commands against the S3 API.
    2. Store all files in Amazon 53. Create Amazon DynamoDB tables for the corresponding key -value pairs on the associated metadata, when objects are uploaded.
    3. Create a striped set of 4000 IOPS Elastic Load Balancing volumes to store the data. Use a database running in Amazon Relational Database Service (RDS) to store the metadata.
    4. Create a striped set of 4000 IOPS Elastic Load Balancing volumes to store the data. Create Amazon DynamoDB tables for the corresponding key-value pairs on the associated metadata, when objects are uploaded.
  6. Company ABCD has recently launched an online commerce site for bicycles on AWS. They have a “Product” DynamoDB table that stores details for each bicycle, such as, manufacturer, color, price, quantity and size to display in the online store. Due to customer demand, they want to include an image for each bicycle along with the existing details. Which approach below provides the least impact to provisioned throughput on the “Product” table?
    1. Serialize the image and store it in multiple DynamoDB tables
    2. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
    3. Add an image data type to the “Product” table to store the images in binary format
    4. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

AWS Storage Options – S3 & Glacier

Amazon S3

  • highly-scalable, reliable, and low-latency data storage infrastructure at very low costs.
  • provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from within Amazon EC2 or from anywhere on the web.
  • allows you to write, read, and delete objects containing from 1 byte to 5 terabytes of data each.
  • number of objects you can store in an Amazon S3 bucket is virtually unlimited.
  • highly secure, supporting encryption at rest, and providing multiple mechanisms to provide fine-grained control of access to Amazon S3 resources.
  • highly scalable, allowing concurrent read or write access to Amazon S3 data by many separate clients or application threads.
  • provides data lifecycle management capabilities, allowing users to define rules to automatically archive Amazon S3 data to Amazon Glacier, or to delete data at end of life.

Ideal Use Cases

  • Storage & Distribution of static web content and media
    • frequently used to host static websites and provides a highly-available and highly-scalable solution for websites with only static content, including HTML files, images, videos, and client-side scripts such as JavaScript
    • works well for fast growing websites hosting data intensive, user-generated content, such as video and photo sharing sites as no storage provisioning is required
    • content can either be directly served from Amazon S3 since each object in Amazon S3 has a unique HTTP URL address
    • can also act as an Origin store for the Content Delivery Network (CDN) such as Amazon CloudFront
    • it works particularly well for hosting web content with extremely spiky bandwidth demands because of S3’s elasticity
  • Data Store for Large Objects
    • can be paired with RDS or NoSQL database and used to store large objects for e.g. file or objects, while the associated metadata for e.g. name, tags, comments etc. can be stored in RDS or NoSQL database where it can be indexed and queried providing faster access to relevant data
  • Data store for computation and large-scale analytics
    • commonly used as a data store for computation and large-scale analytics, such as analyzing financial transactions, clickstream analytics, and media transcoding.
    • data can be accessed from multiple computing nodes concurrently without being constrained by a single connection because of its horizontal scalability
  • Backup and Archival of critical data
    • used as a highly durable, scalable, and secure solution for backup and archival of critical data, and to provide disaster recovery solutions for business continuity.
    • stores objects redundantly on multiple devices across multiple facilities, it provides the highly-durable storage infrastructure needed for these scenarios.
    • it’s versioning capability is available to protect critical data from inadvertent deletion

Anti-Patterns

Amazon S3 has following Anti-Patterns where it is not an optimal solution

  • Dynamic website hosting
    • While Amazon S3 is ideal for hosting static websites, dynamic websites requiring server side interaction, scripting or database interaction cannot be hosted and should rather be hosted on Amazon EC2
  • Backup and archival storage
    • Data requiring long term archival storage with infrequent read access can be stored more cost effectively in Amazon Glacier
  • Structured Data Query
    • Amazon S3 doesn’t offer query capabilities, so to read an object the object name and key must be known. Instead pair up S3 with RDS or Dynamo DB to store, index and query metadata about Amazon S3 objects
    • NOTE – S3 now provides query capabilities and also Athena can be used
  • Rapidly Changing Data
    • Data that needs to updated frequently might be better served by a storage solution with lower read/write latencies, such as Amazon EBS volumes, RDS or Dynamo DB.
  • File System
    • Amazon S3 uses a flat namespace and isn’t meant to serve as a standalone, POSIX-compliant file system. However, by using delimiters (commonly either the ‘/’ or ‘’ character) you are able construct your keys to emulate the hierarchical folder structure of file system within a given bucket.

Performance

  • Access to Amazon S3 from within Amazon EC2 in the same region is fast.
  • Amazon S3 is designed so that server-side latencies are insignificant relative to Internet latencies.
  • Amazon S3 is also built to scale storage, requests, and users to support a virtually unlimited number of web-scale applications.
  • If Amazon S3 is accessed using multiple threads, multiple applications, or multiple clients concurrently, total Amazon S3 aggregate throughput will typically scale to rates that far exceed what any single server can generate or consume.

Durability & Availability

  • Amazon S3 storage provides provides the highest level of data durability and availability, by automatically and synchronously storing your data across both multiple devices and multiple facilities within the selected geographical region
  • Error correction is built-in, and there are no single points of failure. Amazon S3 is designed to sustain the concurrent loss of data in two facilities, making it very well-suited to serve as the primary data storage for mission-critical data.
  • Amazon S3 is designed for 99.999999999% (11 nines) durability per object and 99.99% availability over a one-year period.
  • Amazon S3 data can be protected from unintended deletions or overwrites using Versioning.
  • Versioning can be enabled with MFA (Multi Factor Authentication) Delete on the bucket, which would require two forms of authentication to delete an object
  • For Non Critical and Reproducible data for e.g. thumbnails, transcoded media etc., S3 Reduced Redundancy Storage (RRS) can be used, which provides a lower level of durability at a lower storage cost
  • RRS is designed to provide 99.99% durability per object over a given year. While RRS is less durable than standard Amazon S3, it is still designed to provide 400 times more durability than a typical disk drive

Cost Model

  • With Amazon S3, you pay only for what you use and there is no minimum fee.
  • Amazon S3 has three pricing components: storage (per GB per month), data transfer in or out (per GB per month), and requests (per n thousand requests per month).

Scalability & Elasticity

  • Amazon S3 has been designed to offer a very high level of scalability and elasticity automatically
  • Amazon S3 supports a virtually unlimited number of files in any bucket
  • Amazon S3 bucket can store a virtually unlimited number of bytes
  • Amazon S3 allows you to store any number of objects (files) in a single bucket, and Amazon S3 will automatically manage scaling and distributing redundant copies of your information to other servers in other locations in the same region, all using Amazon’s high-performance infrastructure.

Interfaces

  • Amazon S3 provides standards-based REST and SOAP web services APIs for both management and data operations.
  • NOTE – SOAP support over HTTP is deprecated, but it is still available over HTTPS. New Amazon S3 features will not be supported for SOAP. We recommend that you use either the REST API or the AWS SDKs.
  • Amazon S3 provides easier to use higher level toolkit or SDK in different languages (Java, .NET, PHP, and Ruby) that wraps the underlying APIs
  • Amazon S3 Command Line Interface (CLI) provides a set of high-level, Linux-like Amazon S3 file commands for common operations, such as ls, cp, mv, sync, etc. They also provide the ability to perform recursive uploads and downloads using a single folder-level Amazon S3 command, and supports parallel transfers.
  • AWS Management Console provides the ability to easily create and manage Amazon S3 buckets, upload and download objects, and browse the contents of your Amazon S3 buckets using a simple web-based user interface
  • All interfaces provide the ability to store Amazon S3 objects (files) in uniquely-named buckets (top-level folders), with each object identified by an unique Object key within that bucket.

Glacier

  • extremely low-cost storage service that provides highly secure, durable, and flexible storage for data backup and archival
  • can reliably store their data for as little as $0.01 per gigabyte per month.
  • to offload the administrative burdens of operating and scaling storage to AWS such as capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time consuming hardware migrations
  • Data is stored in Amazon Glacier as Archives where an archive can represent a single file or multiple files combined into a single archive
  • Archives are stored in Vaults for which the access can be controlled through IAM
  • Retrieving archives from Vaults require initiation of a job and can take anywhere around 3-5 hours
  • Amazon Glacier integrates seamlessly with Amazon S3 by using S3 data lifecycle management policies to move data from S3 to Glacier
  • AWS Import/Export can also be used to accelerate moving large amounts of data into Amazon Glacier using portable storage devices for transport

Ideal Usage Patterns

  • Amazon Glacier is ideally suited for long term archival solution for infrequently accessed data with archiving offsite enterprise information, media assets, research and scientific data, digital preservation and magnetic tape replacement

Anti-Patterns

Amazon Glacier has following Anti-Patterns where it is not an optimal solution

  • Rapidly changing data
    • Data that must be updated very frequently might be better served by a storage solution with lower read/write latencies such as Amazon EBS or a Database
  • Real time access
    • Data stored in Glacier can not be accessed at real time and requires an initiation of a job for object retrieval with retrieval times ranging from 3-5 hours. If immediate access is needed, Amazon S3 is a better choice.

Performance

  • Amazon Glacier is a low-cost storage service designed to store data that is infrequently accessed and long lived.
  • Amazon Glacier jobs typically complete in 3 to 5 hours

Durability and Availability

  • Amazon Glacier redundantly stores data in multiple facilities and on multiple devices within each facility
  • Amazon Glacier is designed to provide average annual durability of 99.999999999% (11 nines) for an archive
  • Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives.
  • Amazon Glacier also performs regular, systematic data integrity checks and is built to be automatically self-healing.

Cost Model

  • Amazon Glacier has three pricing components: storage (per GB per month), data transfer out (per GB per month), and requests (per thousand UPLOAD and RETRIEVAL requests per month).
  • Amazon Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time and allows you to retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. Any additional amount of data retrieved is charged per GB
  • Amazon Glacier also charges a pro-rated charge (per GB) for items deleted prior to 90 days

Scalability & Elasticity

  • A single archive is limited to 40 TBs, but there is no limit to the total amount of data you can store in the service.
  • Amazon Glacier scales to meet your growing and often unpredictable storage requirements whether you’re storing petabytes or gigabytes, Amazon Glacier automatically scales your storage up or down as needed.

Interfaces

  • Amazon Glacier provides a native, standards-based REST web services interface, as well as Java and .NET SDKs.
  • AWS Management Console or the Amazon Glacier APIs can be used to create vaults to organize the archives in Amazon Glacier.
  • Amazon Glacier APIs can be used to upload and retrieve archives, monitor the status of your jobs and also configure your vault to send you a notification via Amazon Simple Notification Service (Amazon SNS) when your jobs complete.
  • Amazon Glacier can be used as a storage class in Amazon S3 by using object lifecycle management to provide automatic, policy-driven archiving from Amazon S3 to Amazon Glacier.
  • Amazon S3 api provides a RESTORE operation and the retrieval process takes the same 3-5 hours
  • On retrieval, a copy of the retrieved object is placed in Amazon S3 RRS storage for a specified retention period; the original archived object remains stored in Amazon Glacier and you are charged for both the storage.
  • When using Amazon Glacier as a storage class in Amazon S3, use the Amazon S3 APIs, and when using “native” Amazon Glacier, you use the Amazon Glacier APIs
  • Objects archived to Amazon Glacier via Amazon S3 can only be listed and retrieved via the Amazon S3 APIs or the AWS Management Console—they are not visible as archives in an Amazon Glacier vault.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You want to pass queue messages that are 1GB each. How should you achieve this?
    1. Use Kinesis as a buffer stream for message bodies. Store the checkpoint id for the placement in the Kinesis Stream in SQS.
    2. Use the Amazon SQS Extended Client Library for Java and Amazon S3 as a storage mechanism for message bodies. (Amazon SQS messages with Amazon S3 can be useful for storing and retrieving messages with a message size of up to 2 GB. To manage Amazon SQS messages with Amazon S3, use the Amazon SQS Extended Client Library for Java. Refer link)
    3. Use SQS’s support for message partitioning and multi-part uploads on Amazon S3.
    4. Use AWS EFS as a shared pool storage medium. Store filesystem pointers to the files on disk in the SQS message bodies.
  2. Company ABCD has recently launched an online commerce site for bicycles on AWS. They have a “Product” DynamoDB table that stores details for each bicycle, such as, manufacturer, color, price, quantity and size to display in the online store. Due to customer demand, they want to include an image for each bicycle along with the existing details. Which approach below provides the least impact to provisioned throughput on the “Product” table?
    1. Serialize the image and store it in multiple DynamoDB tables
    2. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
    3. Add an image data type to the “Product” table to store the images in binary format
    4. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

References

AWS EC2 Best Practices

AWS EC2 Best Practices

AWS recommends the following best practices to get maximum benefit and satisfaction from EC2

Security & Network

  • Implement the least permissive rules for the security group.
  • Regularly patch, update, and secure the operating system and applications on the instance
  • Manage access to AWS resources and APIs using identity federation, IAM users, and IAM roles
  • Establish credential management policies and procedures for creating, distributing, rotating, and revoking AWS access credentials
  • Launch the instances into a VPC instead of EC2-Classic (If AWS account is newly created VPC is used by default)
  • Encrypt EBS volumes and snapshots.

Storage

  • EC2 supports Instance store and EBS volumes, so its best to understand the implications of the root device type for data persistence, backup, and recovery
  • Use separate Amazon EBS volumes for the operating system (root device) versus your data.
  • Ensure that the data volume (with the data) persists after instance termination.
  • Use the instance store available for the instance to only store temporary data. Remember that the data stored in the instance store is deleted when an instance is stopped or terminated.
  • If you use instance store for database storage, ensure that you have a cluster with a replication factor that ensures fault tolerance.

Resource Management

  • Use instance metadata and custom resource tags to track and identify your AWS resources
  • View your current limits for Amazon EC2. Plan to request any limit increases in advance of the time that you’ll need them.

Backup & Recovery

  • Regularly back up the instance using Amazon EBS snapshots (not done automatically) or a backup tool.
  • Data Lifecycle Manager (DLM) to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes
  • Create an Amazon Machine Image (AMI) from the instance to save the configuration as a template for launching future instances.
  • Implement High Availability by deploying critical components of the application across multiple Availability Zones, and replicate the data appropriately
  • Monitor and respond to events.
  • Design the applications to handle dynamic IP addressing when the instance restarts.
  • Implement failover. For a basic solution, you can manually attach a network interface or Elastic IP address to a replacement instance
  • Regularly test the process of recovering your instances and Amazon EBS volumes if they fail.

References

AWS Encrypting Data at Rest – Whitepaper – Certification

Encrypting Data at Rest

  • AWS delivers a secure, scalable cloud computing platform with high availability, offering the flexibility for you to build a wide range of applications
  • AWS allows several options for encrypting data at rest, for additional layer of security, ranging from completely automated AWS encryption solution to manual client-side options
  • Encryption requires 3 things
    • Data to encrypt
    • Encryption keys
    • Cryptographic algorithm method to encrypt the data
  • AWS provides different models for Securing data at rest on the following parameters
    • Encryption method
      • Encryption algorithm selection involves evaluating security, performance, and compliance requirements specific to your application
    • Key Management Infrastructure (KMI)
      • KMI enables managing & protecting the encryption keys from unauthorized access
      • KMI provides
        • Storage layer that protects plain text keys
        • Management layer that authorize key usage
  • Hardware Security Module (HSM)
    • Common way to protect keys in a KMI is using HSM
    • An HSM is a dedicated storage and data processing device that performs cryptographic operations using keys on the device.
    • An HSM typically provides tamper evidence, or resistance, to protect keys from unauthorized use.
    • A software-based authorization layer controls who can administer the HSM and which users or applications can use which keys in the HSM
  • AWS CloudHSM
    • AWS CloudHSM appliance has both physical and logical tamper detection and response mechanisms that trigger zeroization of the appliance.
    • Zeroization erases the HSM’s volatile memory where any keys in the process of being decrypted were stored and destroys the key that encrypts stored objects, effectively causing all keys on the HSM to be inaccessible and unrecoverable.
    • AWS CloudHSM can be used to generate and store key material and can perform encryption and decryption operations,
    • AWS CloudHSM, however, does not perform any key lifecycle management functions (e.g., access control policy, key rotation) and needs a compatible KMI.
    • KMI can be deployed either on-premises or within Amazon EC2 and can communicate to the AWS CloudHSM instance securely over SSL to help protect data and encryption keys.
    • AWS CloudHSM service uses SafeNet Luna appliances, any key management server that supports the SafeNet Luna platform can also be used with AWS CloudHSM
  • AWS Key Management Service (KMS)
    • AWS KMS is a managed encryption service that allows you to provision and use keys to encrypt data in AWS services and your applications.
    • Masters key, after creation, are designed to never be exported from the service.
    • AWS KMS gives you centralized control over who can access your master keys to encrypt and decrypt data, and it gives you the ability to audit this access.
    • Data can be sent into the KMS to be encrypted or decrypted under a specific master key under you account.
    • AWS KMS is natively integrated with other AWS services (for e.g. Amazon EBS, Amazon S3, and Amazon Redshift) and AWS SDKs to simplify encryption of your data within those services or custom applications
    • AWS KMS provides global availability, low latency, and a high level of durability for your keys.

Encryption Models in AWS

Encryption models in AWS depends on the on how you/AWS provides the encryption method and the KMI

  • You control the encryption method and the entire KMI
  • You control the encryption method, AWS provides the storage component of the KMI, and you provide the management layer of the KMI.
  • AWS controls the encryption method and the entire KMI.

Screen Shot 2016-04-08 at 7.39.04 AM

Model A: You control the encryption method and the entire KMI

  • You use your own KMI to generate, store, and manage access to keys as well as control all encryption methods in your applications
  • Proper storage, management, and use of keys to ensure the confidentiality, integrity, and availability of your data is your responsibility
  • AWS has no access to your keys and cannot perform encryption or decryption on your behalf.
  • Amazon S3
    • Encryption of the data is done before the object is sent to AWS S3
    • Encryption of the data can be done using any encryption method and the encrypted data can be uploaded using the PUT request in the Amazon S3 API
    • Key used to encrypt the data needs to be stored securely in your KMI
    • To decrypt this data, the encrypted object can be downloaded from Amazon S3 using the GET request in the Amazon S3 API and then decrypted using the key in your KMI
    • AWS provide Client-side encryption handling, where you can provide your key to the AWS S3 encryption client which will encrypt and decrypt the data on your behalf. However, AWS never has access to the keys or the unencrypted data
    • Screen Shot 2016-04-08 at 6.51.32 PM.png
  • Amazon EBS
    • Amazon Elastic Block Store (Amazon EBS) provides block-level storage volumes for use with Amazon EC2 instances. Amazon EBS volumes are network-attached, and persist independently from the life of an instance.
    • Because Amazon EBS volumes are presented to an instance as a block device, you can leverage most standard encryption tools for file system-level or block-level encryption
    • Block level encryption
      • Block level encryption tools usually operate below the file system layer using kernel space device drivers to perform encryption and decryption of data.
      • These tools are useful when you want all data written to a volume to be encrypted regardless of what directory the data is stored in
    • File System level encryption
      • File system level encryption usually works by stacking an encrypted file system on top of an existing file system.
      • This method is typically used to encrypt a specific directory
    • These solutions require you to provide keys, either manually or from your KMI.
    • Both block-level and file system-level encryption tools can only be used to encrypt data volumes that are not Amazon EBS boot volumes, as they don’t allow you to automatically make a trusted key available to the boot volume at startup
    • There are third party solutions available, which can help encrypt both the boot and data volumes as well as supplying and protecting keys
  • AWS Storage Gateway
    • AWS Storage Gateway is a service connecting an on-premises software appliance with Amazon S3. Data on disk volumes attached to the AWS Storage Gateway will be automatically uploaded to Amazon S3 based on policy
    • Encryption of the source data on the disk volumes can be either done before writing to the disk or using block level encryption on the iSCSI endpoint that AWS Storage Gateway exposes to encrypt all data on the disk volume.
  • Amazon RDS
    • Amazon RDS doesn’t expose the attached disk it uses for data storage, transparent disk encryption using techniques for EBS section cannot be applied.
    • However, individual fields data can be encrypted before the data is written to RDS and decrypted after reading it.

Model B: You control the encryption method, AWS provides the KMI storage component, and you provide the KMI management layer

  • Model B is similar to Model A where the encryption method is managed by you
  • Model B differs in the approach to Model A where the keys are maintained in AWS CloudHSM rather than than the on-premise key storage system
  • Only you have access to the cryptographic partitions within the dedicated HSM to use the keys

Screen Shot 2016-04-10 at 1.01.54 PM.png

Model C: AWS controls the encryption method and the entire KMI

  • AWS provides and manages the server-side encryption of your data, transparently managing the encryption method and the keys.
  • AWS KMS and other services that encrypt your data directly use a method called envelope encryption to provide a balance between performance and security.
  • Envelope Encryption method
    • A master key is defined either by you or AWS
    • A data key (data encryption key) is generated by the AWS service at the time when data encryption is requested
    • Data key is used to encrypt your data.
    • Data key is then encrypted with a key-encrypting key (master key) unique to the service storing your data.
    • Encrypted data key and the encrypted data are then stored by the AWS storage service on your behalf.
  • Master key (key-encrypting keys) used to encrypt data keys are stored and managed separately from the data and the data keys
  • For decryption of the data, the process is reversed. Encrypted data key is decrypted using the key-encrypting key; the data key is then used to decrypt your data
  • Authorized use of encryption keys is done automatically and is securely managed by AWS.
  • Because unauthorized access to those keys could lead to the disclosure of your data, AWS has built systems and processes with strong access controls that minimize the chance of unauthorized access and had these systems verified by third-party audits to achieve security certifications including SOC 1, 2, and 3, PCI-DSS, and FedRAMP.
  • Amazon S3
    • SSE-S3
      • AWS encrypts each object using a unique data key
      • Data key is encrypted with a periodically rotated master key managed by S3
      • Amazon S3 server-side encryption uses 256-bit Advanced Encryption Standard (AES) keys for both object and master keys
    • SSE-KMS
      • Master keys are defined and managed in KMS for your account
      • Object Encryption
        • When an object is uploaded, a request is sent to KMS to create an object key.
        • KMS generates a unique object key and encrypts it using the master key; KMS then returns this encrypted object key along with the plaintext object key to Amazon S3.
        • Amazon S3 web server encrypts your object using the plaintext object key and stores the now encrypted object (with the encrypted object key) and deletes the plaintext object key from memory.
      • Object Decryption
        • To retrieve the encrypted object, Amazon S3 sends the encrypted object key to AWS KMS.
        • AWS KMS decrypts the object key using the correct master key and returns the decrypted (plaintext) object key to S3.
        • Amazon S3 decrypts the encrypted object, with the plaintext object key, and returns it to you.
    • SSE-C
      • Amazon S3 is provided an encryption key, while uploading the object
      • Encryption key is used by Amazon S3 to encrypt your data using AES-256
      • After object encryption, Amazon S3 deletes the encryption key
      • For downloading, you need to provide the same encryption key, which AWS matches, decrypts and returns the object
  • Amazon EBS
    • When Amazon EBS volume is created, you can choose the master key in KMS to be used for encrypting the volume
    • Volume encryption
      • Amazon EC2 server sends an authenticated request to AWS KMS to create a volume key.
      • AWS KMS generates this volume key, encrypts it using the master key, and returns the plaintext volume key and the encrypted volume key to the Amazon EC2 server.
      • Plaintext volume key is stored in memory to encrypt and decrypt all data going to and from your attached EBS volume.
    • Volume decryption
      • When the encrypted volume (or any encrypted snapshots derived
        from that volume) needs to be re-attached to an instance, a call is made to AWS KMS to decrypt the encrypted volume key.
      • AWS KMS decrypts this encrypted volume key with the correct master key and returns the decrypted volume key to Amazon EC2.
  • Amazon Glacier
    • Glacier provide encryption of the data, by default
    • Before it’s written to disk, data is always automatically encrypted using 256-bit AES keys unique to the Amazon Glacier service that are stored in separate systems under AWS control
  • AWS Storage Gateway
    • AWS Storage Gateway transfers your data to AWS over SSL
    • AWS Storage Gateway stores data encrypted at rest in Amazon S3 or Amazon Glacier using their respective server side encryption schemes.
  • Amazon RDS – Oracle
    • Oracle Advanced Security option for Oracle on Amazon RDS can be used to leverage the native Transparent Data Encryption (TDE) and Native Network Encryption (NNE) features
    • Oracle encryption module creates data and key-encrypting keys to encrypt the database
    • Key-encrypting keys specific to your Oracle instance on Amazon RDS are themselves encrypted by a periodically rotated 256-bit AES master key.
    • Master key is unique to the Amazon RDS service and is stored in separate systems under AWS control
  • Amazon RDS -SQL server
    • Transparent Data Encryption (TDE) can be provisioned for Microsoft SQL Server on Amazon RDS.
    • SQL Server encryption module creates data and keyencrypting keys to encrypt the database.
    • Key-encrypting keys specific to your SQL Server instance on Amazon RDS are themselves encrypted by a periodically rotated, regional 256-bit AES master key
    • Master key is unique to the Amazon RDS service and is stored in separate systems under AWS control

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. How can you secure data at rest on an EBS volume?
    1. Encrypt the volume using the S3 server-side encryption service
    2. Attach the volume to an instance using EC2’s SSL interface.
    3. Create an IAM policy that restricts read and write access to the volume.
    4. Write the data randomly instead of sequentially.
    5. Use an encrypted file system on top of the EBS volume
  2. Your company policies require encryption of sensitive data at rest. You are considering the possible options for protecting data while storing it at rest on an EBS data volume, attached to an EC2 instance. Which of these options would allow you to encrypt your data at rest? (Choose 3 answers)
    1. Implement third party volume encryption tools
    2. Do nothing as EBS volumes are encrypted by default
    3. Encrypt data inside your applications before storing it on EBS
    4. Encrypt data using native data encryption drivers at the file system level
    5. Implement SSL/TLS for all services running on the server
  3. A company is storing data on Amazon Simple Storage Service (S3). The company’s security policy mandates that data is encrypted at rest. Which of the following methods can achieve this? Choose 3 answers
    1. Use Amazon S3 server-side encryption with AWS Key Management Service managed keys
    2. Use Amazon S3 server-side encryption with customer-provided keys
    3. Use Amazon S3 server-side encryption with EC2 key pair.
    4. Use Amazon S3 bucket policies to restrict access to the data at rest.
    5. Encrypt the data on the client-side before ingesting to Amazon S3 using their own master key
    6. Use SSL to encrypt the data while in transit to Amazon S3.
  4. Which 2 services provide native encryption
    1. Amazon EBS
    2. Amazon Glacier
    3. Amazon Redshift (is optional)
    4. Amazon RDS (is optional)
    5. Amazon Storage Gateway
  5. With which AWS services CloudHSM can be used (select 2)
    1. S3
    2. DynamoDb
    3. RDS
    4. ElastiCache
    5. Amazon Redshift

References