AWS Storage Options – S3 & Glacier

Amazon S3

  • highly-scalable, reliable, and low-latency data storage infrastructure at very low costs.
  • provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from within Amazon EC2 or from anywhere on the web.
  • allows you to write, read, and delete objects containing from 1 byte to 5 terabytes of data each.
  • number of objects you can store in an Amazon S3 bucket is virtually unlimited.
  • highly secure, supporting encryption at rest, and providing multiple mechanisms to provide fine-grained control of access to Amazon S3 resources.
  • highly scalable, allowing concurrent read or write access to Amazon S3 data by many separate clients or application threads.
  • provides data lifecycle management capabilities, allowing users to define rules to automatically archive Amazon S3 data to Amazon Glacier, or to delete data at end of life.

Ideal Use Cases

  • Storage & Distribution of static web content and media
    • frequently used to host static websites and provides a highly-available and highly-scalable solution for websites with only static content, including HTML files, images, videos, and client-side scripts such as JavaScript
    • works well for fast growing websites hosting data intensive, user-generated content, such as video and photo sharing sites as no storage provisioning is required
    • content can either be directly served from Amazon S3 since each object in Amazon S3 has a unique HTTP URL address
    • can also act as an Origin store for the Content Delivery Network (CDN) such as Amazon CloudFront
    • it works particularly well for hosting web content with extremely spiky bandwidth demands because of S3’s elasticity
  • Data Store for Large Objects
    • can be paired with RDS or NoSQL database and used to store large objects for e.g. file or objects, while the associated metadata for e.g. name, tags, comments etc. can be stored in RDS or NoSQL database where it can be indexed and queried providing faster access to relevant data
  • Data store for computation and large-scale analytics
    • commonly used as a data store for computation and large-scale analytics, such as analyzing financial transactions, clickstream analytics, and media transcoding.
    • data can be accessed from multiple computing nodes concurrently without being constrained by a single connection because of its horizontal scalability
  • Backup and Archival of critical data
    • used as a highly durable, scalable, and secure solution for backup and archival of critical data, and to provide disaster recovery solutions for business continuity.
    • stores objects redundantly on multiple devices across multiple facilities, it provides the highly-durable storage infrastructure needed for these scenarios.
    • it’s versioning capability is available to protect critical data from inadvertent deletion

Anti-Patterns

Amazon S3 has following Anti-Patterns where it is not an optimal solution

  • Dynamic website hosting
    • While Amazon S3 is ideal for hosting static websites, dynamic websites requiring server side interaction, scripting or database interaction cannot be hosted and should rather be hosted on Amazon EC2
  • Backup and archival storage
    • Data requiring long term archival storage with infrequent read access can be stored more cost effectively in Amazon Glacier
  • Structured Data Query
    • Amazon S3 doesn’t offer query capabilities, so to read an object the object name and key must be known. Instead pair up S3 with RDS or Dynamo DB to store, index and query metadata about Amazon S3 objects
    • NOTE – S3 now provides query capabilities and also Athena can be used
  • Rapidly Changing Data
    • Data that needs to updated frequently might be better served by a storage solution with lower read/write latencies, such as Amazon EBS volumes, RDS or Dynamo DB.
  • File System
    • Amazon S3 uses a flat namespace and isn’t meant to serve as a standalone, POSIX-compliant file system. However, by using delimiters (commonly either the ‘/’ or ‘’ character) you are able construct your keys to emulate the hierarchical folder structure of file system within a given bucket.

Performance

  • Access to Amazon S3 from within Amazon EC2 in the same region is fast.
  • Amazon S3 is designed so that server-side latencies are insignificant relative to Internet latencies.
  • Amazon S3 is also built to scale storage, requests, and users to support a virtually unlimited number of web-scale applications.
  • If Amazon S3 is accessed using multiple threads, multiple applications, or multiple clients concurrently, total Amazon S3 aggregate throughput will typically scale to rates that far exceed what any single server can generate or consume.

Durability & Availability

  • Amazon S3 storage provides provides the highest level of data durability and availability, by automatically and synchronously storing your data across both multiple devices and multiple facilities within the selected geographical region
  • Error correction is built-in, and there are no single points of failure. Amazon S3 is designed to sustain the concurrent loss of data in two facilities, making it very well-suited to serve as the primary data storage for mission-critical data.
  • Amazon S3 is designed for 99.999999999% (11 nines) durability per object and 99.99% availability over a one-year period.
  • Amazon S3 data can be protected from unintended deletions or overwrites using Versioning.
  • Versioning can be enabled with MFA (Multi Factor Authentication) Delete on the bucket, which would require two forms of authentication to delete an object
  • For Non Critical and Reproducible data for e.g. thumbnails, transcoded media etc., S3 Reduced Redundancy Storage (RRS) can be used, which provides a lower level of durability at a lower storage cost
  • RRS is designed to provide 99.99% durability per object over a given year. While RRS is less durable than standard Amazon S3, it is still designed to provide 400 times more durability than a typical disk drive

Cost Model

  • With Amazon S3, you pay only for what you use and there is no minimum fee.
  • Amazon S3 has three pricing components: storage (per GB per month), data transfer in or out (per GB per month), and requests (per n thousand requests per month).

Scalability & Elasticity

  • Amazon S3 has been designed to offer a very high level of scalability and elasticity automatically
  • Amazon S3 supports a virtually unlimited number of files in any bucket
  • Amazon S3 bucket can store a virtually unlimited number of bytes
  • Amazon S3 allows you to store any number of objects (files) in a single bucket, and Amazon S3 will automatically manage scaling and distributing redundant copies of your information to other servers in other locations in the same region, all using Amazon’s high-performance infrastructure.

Interfaces

  • Amazon S3 provides standards-based REST and SOAP web services APIs for both management and data operations.
  • NOTE – SOAP support over HTTP is deprecated, but it is still available over HTTPS. New Amazon S3 features will not be supported for SOAP. We recommend that you use either the REST API or the AWS SDKs.
  • Amazon S3 provides easier to use higher level toolkit or SDK in different languages (Java, .NET, PHP, and Ruby) that wraps the underlying APIs
  • Amazon S3 Command Line Interface (CLI) provides a set of high-level, Linux-like Amazon S3 file commands for common operations, such as ls, cp, mv, sync, etc. They also provide the ability to perform recursive uploads and downloads using a single folder-level Amazon S3 command, and supports parallel transfers.
  • AWS Management Console provides the ability to easily create and manage Amazon S3 buckets, upload and download objects, and browse the contents of your Amazon S3 buckets using a simple web-based user interface
  • All interfaces provide the ability to store Amazon S3 objects (files) in uniquely-named buckets (top-level folders), with each object identified by an unique Object key within that bucket.

Glacier

  • extremely low-cost storage service that provides highly secure, durable, and flexible storage for data backup and archival
  • can reliably store their data for as little as $0.01 per gigabyte per month.
  • to offload the administrative burdens of operating and scaling storage to AWS such as capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time consuming hardware migrations
  • Data is stored in Amazon Glacier as Archives where an archive can represent a single file or multiple files combined into a single archive
  • Archives are stored in Vaults for which the access can be controlled through IAM
  • Retrieving archives from Vaults require initiation of a job and can take anywhere around 3-5 hours
  • Amazon Glacier integrates seamlessly with Amazon S3 by using S3 data lifecycle management policies to move data from S3 to Glacier
  • AWS Import/Export can also be used to accelerate moving large amounts of data into Amazon Glacier using portable storage devices for transport

Ideal Usage Patterns

  • Amazon Glacier is ideally suited for long term archival solution for infrequently accessed data with archiving offsite enterprise information, media assets, research and scientific data, digital preservation and magnetic tape replacement

Anti-Patterns

Amazon Glacier has following Anti-Patterns where it is not an optimal solution

  • Rapidly changing data
    • Data that must be updated very frequently might be better served by a storage solution with lower read/write latencies such as Amazon EBS or a Database
  • Real time access
    • Data stored in Glacier can not be accessed at real time and requires an initiation of a job for object retrieval with retrieval times ranging from 3-5 hours. If immediate access is needed, Amazon S3 is a better choice.

Performance

  • Amazon Glacier is a low-cost storage service designed to store data that is infrequently accessed and long lived.
  • Amazon Glacier jobs typically complete in 3 to 5 hours

Durability and Availability

  • Amazon Glacier redundantly stores data in multiple facilities and on multiple devices within each facility
  • Amazon Glacier is designed to provide average annual durability of 99.999999999% (11 nines) for an archive
  • Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives.
  • Amazon Glacier also performs regular, systematic data integrity checks and is built to be automatically self-healing.

Cost Model

  • Amazon Glacier has three pricing components: storage (per GB per month), data transfer out (per GB per month), and requests (per thousand UPLOAD and RETRIEVAL requests per month).
  • Amazon Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time and allows you to retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. Any additional amount of data retrieved is charged per GB
  • Amazon Glacier also charges a pro-rated charge (per GB) for items deleted prior to 90 days

Scalability & Elasticity

  • A single archive is limited to 40 TBs, but there is no limit to the total amount of data you can store in the service.
  • Amazon Glacier scales to meet your growing and often unpredictable storage requirements whether you’re storing petabytes or gigabytes, Amazon Glacier automatically scales your storage up or down as needed.

Interfaces

  • Amazon Glacier provides a native, standards-based REST web services interface, as well as Java and .NET SDKs.
  • AWS Management Console or the Amazon Glacier APIs can be used to create vaults to organize the archives in Amazon Glacier.
  • Amazon Glacier APIs can be used to upload and retrieve archives, monitor the status of your jobs and also configure your vault to send you a notification via Amazon Simple Notification Service (Amazon SNS) when your jobs complete.
  • Amazon Glacier can be used as a storage class in Amazon S3 by using object lifecycle management to provide automatic, policy-driven archiving from Amazon S3 to Amazon Glacier.
  • Amazon S3 api provides a RESTORE operation and the retrieval process takes the same 3-5 hours
  • On retrieval, a copy of the retrieved object is placed in Amazon S3 RRS storage for a specified retention period; the original archived object remains stored in Amazon Glacier and you are charged for both the storage.
  • When using Amazon Glacier as a storage class in Amazon S3, use the Amazon S3 APIs, and when using “native” Amazon Glacier, you use the Amazon Glacier APIs
  • Objects archived to Amazon Glacier via Amazon S3 can only be listed and retrieved via the Amazon S3 APIs or the AWS Management Console—they are not visible as archives in an Amazon Glacier vault.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You want to pass queue messages that are 1GB each. How should you achieve this?
    1. Use Kinesis as a buffer stream for message bodies. Store the checkpoint id for the placement in the Kinesis Stream in SQS.
    2. Use the Amazon SQS Extended Client Library for Java and Amazon S3 as a storage mechanism for message bodies. (Amazon SQS messages with Amazon S3 can be useful for storing and retrieving messages with a message size of up to 2 GB. To manage Amazon SQS messages with Amazon S3, use the Amazon SQS Extended Client Library for Java. Refer link)
    3. Use SQS’s support for message partitioning and multi-part uploads on Amazon S3.
    4. Use AWS EFS as a shared pool storage medium. Store filesystem pointers to the files on disk in the SQS message bodies.
  2. Company ABCD has recently launched an online commerce site for bicycles on AWS. They have a “Product” DynamoDB table that stores details for each bicycle, such as, manufacturer, color, price, quantity and size to display in the online store. Due to customer demand, they want to include an image for each bicycle along with the existing details. Which approach below provides the least impact to provisioned throughput on the “Product” table?
    1. Serialize the image and store it in multiple DynamoDB tables
    2. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
    3. Add an image data type to the “Product” table to store the images in binary format
    4. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

References

AWS S3 Best Practices

S3 Best Practices

Performance

Multiple Concurrent PUTs/GETs

  • S3 scales to support very high request rates. If the request rate grows steadily, S3 automatically partitions the buckets as needed to support higher request rates.
  • S3 can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.
  • If the typical workload involves only occasional bursts of 100 requests per second and less than 800 requests per second, AWS scales and handle it.
  • If the typical workload involves a request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, it’s recommended to open a support case to prepare for the workload and avoid any temporary limits on your request rate.
  • S3 best practice guidelines can be applied only if you are routinely processing 100 or more requests per second
  • Workloads that include a mix of request types
    • If the request workload is typically a mix of GET, PUT, DELETE, or GET Bucket (list objects), choosing appropriate key names for the objects ensures better performance by providing low-latency access to the S3 index
    • This behavior is driven by how S3 stores key names.
      • S3 maintains an index of object key names in each AWS region.
      • Object keys are stored lexicographically (UTF-8 binary ordering) across multiple partitions in the index i.e. S3 stores key names in alphabetical order.
      • Object keys are stored in across multiple partitions in the index and the key name dictates which partition the key is stored in
      • Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that S3 will target a specific partition for a large number of keys, overwhelming the I/O capacity of the partition.
    • Introduce some randomness in the key name prefixes, the key names, and the I/O load, will be distributed across multiple index partitions.
    • It also ensures scalability regardless of the number of requests sent per second.

Transfer Acceleration

  • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and an S3 bucket.
  • Transfer Acceleration takes advantage of CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path.

GET-intensive Workloads

  • CloudFront can be used for performance optimization and can help by
    • distributing content with low latency and high data transfer rate.
    • caching the content and thereby reducing the number of direct requests to S3
    • providing multiple endpoints (Edge locations) for data availability
    • available in two flavors as Web distribution or RTMP distribution
  • To fast data transport over long distances between a client and an S3 bucket, use S3 Transfer Acceleration. Transfer Acceleration uses the globally distributed edge locations in CloudFront to accelerate data transport over geographical distances

PUTs/GETs for Large Objects

  • AWS allows Parallelizing the PUTs/GETs request to improve the upload and download performance as well as the ability to recover in case it fails
  • For PUTs, Multipart upload can help improve the uploads by
    • performing multiple uploads at the same time and maximizing network bandwidth utilization
    • quick recovery from failures, as only the part that failed to upload, needs to be re-uploaded
    • ability to pause and resume uploads
    • begin an upload before the Object size is known
  • For GETs, the range HTTP header can help to improve the downloads by
    • allowing the object to be retrieved in parts instead of the whole object
    • quick recovery from failures, as only the part that failed to download needs to be retried.

List Operations

  • Object key names are stored lexicographically in S3 indexes, making it hard to sort and manipulate the contents of LIST
  • S3 maintains a single lexicographically sorted list of indexes
  • Build and maintain Secondary Index outside of S3 for e.g. DynamoDB or RDS to store, index and query objects metadata rather than performing operations on S3

Security

  • Use Versioning
    • can be used to protect from unintended overwrites and deletions
    • allows the ability to retrieve and restore deleted objects or rollback to previous versions
  • Enable additional security by configuring a bucket to enable MFA (Multi-Factor Authentication) to delete
  • Versioning does not prevent Bucket deletion and must be backed up as if accidentally or maliciously deleted the data is lost
  • Use Same Region Replication or Cross Region replication feature to backup data to a different region
  • When using VPC with S3, use VPC S3 endpoints as
    • are horizontally scaled, redundant, and highly available VPC components
    • help establish a private connection between VPC and S3 and the traffic never leaves the Amazon network

Refer blog post @ S3 Security Best Practices

Cost

  • Optimize S3 storage cost by selecting an appropriate storage class for objects
  • Configure appropriate lifecycle management rules to move objects to different storage classes and expire them

Tracking

  • Use Event Notifications to be notified for any put or delete request on the S3 objects
  • Use CloudTrail, which helps capture specific API calls made to S3 from the AWS account and delivers the log files to an S3 bucket
  • Use CloudWatch to monitor the Amazon S3 buckets, tracking metrics such as object counts and bytes stored, and configure appropriate actions

S3 Monitoring and Auditing Best Practices

Refer blog post @ S3 Monitoring and Auditing Best Practices

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    1. Increase your network bandwidth to provide faster throughput to S3
    2. Upload the files in parallel to S3 using multipart upload
    3. Pack all files into a single archive, upload it to S3, then extract the files in AWS
    4. Use AWS Import/Export to transfer the video files
  2. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?
    1. Use multi-part upload.
    2. Add a random prefix to the key names.
    3. Amazon S3 will automatically manage performance at this scale.
    4. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names
  3. You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?
    1. Enable enhanced networking
    2. Use Amazon S3 multipart upload
    3. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
    4. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance
  4. Which of the following methods gives you protection against accidental loss of data stored in Amazon S3? (Choose 2)
    1. Set bucket policies to restrict deletes, and also enable versioning
    2. By default, versioning is enabled on a new bucket so you don’t have to worry about it (Not enabled by default)
    3. Build a secondary index of your keys to protect the data (improves performance only)
    4. Back up your bucket to a bucket owned by another AWS account for redundancy
  5. A startup company hired you to help them build a mobile application that will ultimately store billions of image and videos in Amazon S3. The company is lean on funding, and wants to minimize operational costs, however, they have an aggressive marketing plan, and expect to double their current installation base every six months. Due to the nature of their business, they are expecting sudden and large increases to traffic to and from S3, and need to ensure that it can handle the performance needs of their application. What other information must you gather from this customer in order to determine whether S3 is the right option?
    1. You must know how many customers that company has today, because this is critical in understanding what their customer base will be in two years. (No. of customers do not matter)
    2. You must find out total number of requests per second at peak usage.
    3. You must know the size of the individual objects being written to S3 in order to properly design the key namespace. (Size does not relate to the key namespace design but the count does)
    4. In order to build the key namespace correctly, you must understand the total amount of storage needs for each S3 bucket. (S3 provided unlimited storage the key namespace design would depend on the number)
  6. A document storage company is deploying their application to AWS and changing their business model to support both free tier and premium tier users. The premium tier users will be allowed to store up to 200GB of data and free tier customers will be allowed to store only 5GB. The customer expects that billions of files will be stored. All users need to be alerted when approaching 75 percent quota utilization and again at 90 percent quota use. To support the free tier and premium tier users, how should they architect their application?
    1. The company should utilize an amazon simple workflow service activity worker that updates the users data counter in amazon dynamo DB. The activity worker will use simple email service to send an email if the counter increases above the appropriate thresholds.
    2. The company should deploy an amazon relational data base service relational database with a store objects table that has a row for each stored object along with size of each object. The upload server will query the aggregate consumption of the user in questions (by first determining the files store by the user, and then querying the stored objects table for respective file sizes) and send an email via Amazon Simple Email Service if the thresholds are breached. (Good Approach to use RDS but with so many objects might not be a good option)
    3. The company should write both the content length and the username of the files owner as S3 metadata for the object. They should then create a file watcher to iterate over each object and aggregate the size for each user and send a notification via Amazon Simple Queue Service to an emailing service if the storage threshold is exceeded. (List operations on S3 not feasible)
    4. The company should create two separated amazon simple storage service buckets one for data storage for free tier users and another for data storage for premium tier users. An amazon simple workflow service activity worker will query all objects for a given user based on the bucket the data is stored in and aggregate storage. The activity worker will notify the user via Amazon Simple Notification Service when necessary (List operations on S3 not feasible as well as SNS does not address email requirement)
  7. Your company host a social media website for storing and sharing documents. the web application allow users to upload large files while resuming and pausing the upload as needed. Currently, files are uploaded to your php front end backed by Elastic Load Balancing and an autoscaling fleet of amazon elastic compute cloud (EC2) instances that scale upon average of bytes received (NetworkIn) After a file has been uploaded. it is copied to amazon simple storage service(S3). Amazon Ec2 instances use an AWS Identity and Access Management (AMI) role that allows Amazon s3 uploads. Over the last six months, your user base and scale have increased significantly, forcing you to increase the auto scaling groups Max parameter a few times. Your CFO is concerned about the rising costs and has asked you to adjust the architecture where needed to better optimize costs. Which architecture change could you introduce to reduce cost and still keep your web application secure and scalable?
    1. Replace the Autoscaling launch Configuration to include c3.8xlarge instances; those instances can potentially yield a network throughput of 10gbps. (no info of current size and might increase cost)
    2. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic to directly upload the file to amazon s3 using the given credentials and S3 Prefix. (will not provide the ability to handle pause and restarts)
    3. Re-architect your ingest pattern, and move your web application instances into a VPC public subnet. Attach a public IP address for each EC2 instance (using the auto scaling launch configuration settings). Use Amazon Route 53 round robin records set and http health check to DNS load balance the app request this approach will significantly reduce the cost by bypassing elastic load balancing. (ELB is not the bottleneck)
    4. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic that used the S3 multipart upload API to directly upload the file to Amazon s3 using the given credentials and s3 Prefix. (multipart allows one to start uploading directly to S3 before the actual size is known or complete data is downloaded)
  8. If an application is storing hourly log files from thousands of instances from a high traffic web site, which naming scheme would give optimal performance on S3?
    1. Sequential
    2. instanceID_log-HH-DD-MM-YYYY
    3. instanceID_log-YYYY-MM-DD-HH
    4. HH-DD-MM-YYYY-log_instanceID (HH will give some randomness to start with instead of instaneId where the first characters would be i-)
    5. YYYY-MM-DD-HH-log_instanceID

Reference

S3_Optimizing_Performance

AWS EC2 – Elastic Cloud Compute

Elastic Cloud Compute – EC2

  • Elastic Compute Cloud – EC2 provides scalable computing capacity in AWS
  • Elastic Compute Cloud – EC2
    • eliminates the need to invest in hardware upfront, so applications can be developed and deployed faster.
    • can be used to launch as many or as few virtual servers as you need, configure security and networking, and manage storage.
    • enables you to scale up or down to handle changes in requirements or spikes in popularity, reducing the need to forecast traffic.

EC2 features

  • EC2 instances – Virtual computing environments
  • Amazon Machine Images (AMIs) – Preconfigured templates for the instances that package the bits needed for a server (including the operating system and additional software)
  • Instance types – Various configurations of CPU, memory, storage, and networking capacity for the instances
  • Key Pairs – Secure login information for the instances (AWS stores the public key, and you store the private key in a secure place)
  • Instance Store VolumesStorage volumes for temporary data that are deleted when you stop or terminate your instance, known as
  • EBS Volumes – Persistent storage volumes for the data using Elastic Block Store (EBS)
  • Regions and Availability ZonesMultiple physical locations for the resources, such as instances and EBS volumes
  • Security GroupsA firewall that enables you to specify the protocols, ports, and source IP ranges that can reach the instances
  • Elastic IP addresses – Static IP addresses for dynamic cloud computing
  • Tags – Metadata can be created and assigned to EC2 resources

Accessing EC2

  • Amazon EC2 console
    • Amazon EC2 console is the web-based user interface that can be accessed from the AWS management console
  • AWS Command line Interface (CLI)
    • Provides commands for a broad set of AWS products, and is supported on Windows, Mac, and Linux.
  • Amazon EC2 Command Line Interface (CLI) tools
    • Provides commands for Amazon EC2, Amazon EBS, and Amazon VPC, and is supported on Windows, Mac, and Linux
  • AWS Tools for Windows Powershell
    • Provides commands for a broad set of AWS products for those who script in the PowerShell environment
  • AWS Query API
    • Query API allows for requests are HTTP or HTTPS requests that use the HTTP verbs GET or POST and a Query parameter named Action
  • AWS SDK libraries
    • AWS provides libraries in various languages which provide basic functions that automate tasks such as cryptographically signing your requests, retrying requests, and handling error responses

Additional Reading

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What are the Amazon EC2 API tools?
    1. They don’t exist. The Amazon EC2 AMI tools, instead, are used to manage permissions.
    2. Command-line tools to the Amazon EC2 web service
    3. They are a set of graphical tools to manage EC2 instances.
    4. They don’t exist. The Amazon API tools are a client interface to Amazon Web Services.
  2. When a user is launching an instance with EC2, which of the below mentioned options is not available during the instance launch console for a key pair?
    1. Proceed without the key pair
    2. Upload a new key pair
    3. Select an existing key pair
    4. Create a new key pair

References

AWS_EC2

AWS EC2 Security

AWS EC2 Security

  • IAM helps control whether users in the organization can perform a task using specific EC2 API actions and whether they can use specific AWS resources.
  • Use IAM roles to prevent the need to share as well as manage, and rotate the security credentials that the applications use.
  • Security groups act as a virtual firewall that controls the traffic to the EC2 instances. They can help specify rules that control the inbound traffic that’s allowed to reach the instances and the outbound traffic that’s allowed to leave the instance.
  • Use AWS Systems Manager Session Manager to connect to the instance as it provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys.
  • Use EC2 Instance Connect to connect to your instances using Secure Shell (SSH) without the need to share and manage SSH keys.
  • Use AWS Systems Manager Run Command to automate common administrative tasks instead of opening inbound SSH ports and managing SSH keys.
  • Use Systems Manager Patch Manager can be used to automate the process of patching, installing security-related updates for both the operating system and applications.

EC2 Key Pairs

  • EC2 uses public-key cryptography to encrypt & decrypt login information
  • Public-key cryptography uses a public key to encrypt a piece of data, such as a password, then the recipient uses the private key to decrypt the data.
  • Public and private keys are known as a key pair.
  • To log in to an EC2 instance, a key pair needs to be created and specified when the instance is launched, and the private key can be used to connect to the instance.
  • Linux instances have no password, and the key pair is used for ssh log in
  • For Windows instances, the key pair can be used to obtain the administrator password and then log in using RDP
  • EC2 stores the public key only, and the private key resides with the user. EC2 doesn’t keep a copy of your private key
  • Public key content (on Linux instances) is placed in an entry within  ~/.ssh/authorized_keys at boot time and enables the user to securely access the instance without passwords
  • Public key specified for an instance when launched is also available through its instance metadata http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key
  • EC2 Security Best Practice: Store the private keys in a secure place as anyone who possesses the private key can decrypt the login information
  • Also, if the private key is lost, there is no way to recover the same.
    • For instance store, you cannot access the instance
    • For EBS-backed Linux instances, access can be regained.
      • EBS-backed instance can be stopped, its root volume detached and attached to another instance as a data volume
      • Modify the authorized_keys file, move the volume back to the original instance, and restart the instance
  • Key pair associated with the instances can either be
    • Generated by EC2
      • Keys that EC2 uses are 2048-bit SSH-2 RSA keys.
    • Created separately (using third-party tools) and Imported into EC2
      • EC2 only accepts RSA keys and does not accept DSA keys
      • Supported lengths: 1024, 2048, and 4096
  • supports five thousand key pairs per region
  • Deleting a key pair only deletes the public key and does not impact the servers already launched with the key.
  • Use AWS Systems Manager Session Manager to connect to the instance as it provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys.

EC2 Security Groups

  • An EC2 instance, when launched, can be associated with one or more security groups, which acts as a virtual firewall that controls the traffic to that instance
  • Security groups help specify rules that control the inbound traffic that’s allowed to reach the instances and the outbound traffic that’s allowed to leave the instance
  • Security groups are associated with network interfaces. Changing an instance’s security groups changes the security groups associated with the primary network interface (eth0)
  • An ENI can be associated with 5 security groups and with 50 60 rules per security group
  • Rules for a security group can be modified at any time; the new rules are automatically applied to all instances associated with the security group.
  • All the rules from all associated security groups are evaluated to decide where to allow traffic to an instance
  • Security Group features
    • For the VPC default security group, it allows all inbound traffic from other instances associated with the default security group
    • By default, VPC default security groups or newly created security groups allow all outbound traffic
    • Security group rules are always permissive; deny rules can’t be created
    • Rules can be added and removed any time.
    • Any modification to the rules are automatically applied to the instances associated with the security group after a short period, depending on the connection tracking for the traffic
    • Security groups are stateful — if you send a request from your instance, the response traffic for that request is allowed to flow in regardless of inbound security group rules. For VPC security groups, this also means that responses to allowed inbound traffic are allowed to flow out, regardless of outbound rules
    • If multiple rules are defined for the same protocol and port, the Most permissive rule is applied for e.g. for multiple rules for tcp and port 22 for specific IP and Everyone, everyone is granted access being the most permissive rule

Connection Tracking

  • Security groups are Stateful and they use Connection tracking to track information about traffic to and from the instance.
  • This allows responses to inbound traffic to flow out of the instance regardless of outbound security group rules, and vice versa.
  • Connection Tracking is maintained only if there is no explicit Outbound rule for an Inbound request (and vice versa)
  • However, if there is an explicit Outbound rule for an Inbound request, the response traffic is allowed on the basis of the Outbound rule and not on the Tracking information
  • Any existing flow of traffic, that is tracked, is not interrupted even if the rules for the security groups are changed. To ensure traffic is immediately interrupted, use NACL as they are stateless and therefore do not allow automatic response traffic.
  • Also, If the instance (host A) initiates traffic to host B and uses a protocol other than TCP, UDP, or ICMP,  the instance’s firewall only tracks the IP address and protocol number for the purpose of allowing response traffic from host B. If host B initiates traffic to your instance in a separate request within 600 seconds of the original request or response, your instance accepts it regardless of inbound security group rules, because it’s regarded as response traffic.
  • can be controlled by modifying the security group’s outbound rules to permit only certain types of outbound traffic or using NACL

IAM with EC2

  • IAM policy can be defined to allow or deny a user access to the EC2 resources and actions
  • EC2 partially supports resource-level permissions. For some EC2 API actions, you cannot specify which resource a user is allowed to work with for that action; instead, you have to allow users to work with all resources for that action
  • IAM allows to control only what actions a user can perform on the EC2 resources but cannot be used to grant access for users to be able to access or login to the instances

EC2 with IAM Role

  • EC2 instances can be launched with IAM roles so that the applications can securely make API requests from your instances,
  • IAM roles prevent the need to share as well as manage, rotate the security credentials that the applications use.
  • IAM role can be added to an existing running EC2 instance.
  • EC2 uses an instance profile as a container for an IAM role.
    • Creation of an IAM role using the console, creates an instance profile automatically and gives it the same name as the role it corresponds to.
    • When using the AWS CLI, API, or an AWS SDK to create a role, the role and instance profile needs to be created as separate actions, and they can be given different names.
  • To launch an instance with an IAM role, the name of its instance profile needs to be specified.
  • An application on the instance can retrieve the security credentials provided by the role from the instance metadata item http://169.254.169.254/latest/meta-data/iam/security-credentials/role-name.
  • Security credentials are temporary and are rotated automatically and new credentials are made available at least five minutes prior to the expiration of the old credentials.
  • Best Practice: Always launch EC2 instance with IAM role instead of hardcoded credentials

EC2 IAM Role S3 Access

EC2 Resiliency

  • EC2 offers the following features to support your data resiliency:
    • Copying AMIs across Regions
    • Copying EBS snapshots across Regions
    • Automating EBS-backed AMIs using Data Lifecycle Manager
    • Automating EBS snapshots using Data Lifecycle Manager
    • Maintaining the health and availability of the fleet using EC2 Auto Scaling
    • Distributing incoming traffic across multiple instances in a single AZ or multiple AZs using Elastic Load Balancing

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You launch an Amazon EC2 instance without an assigned AWS identity and Access Management (IAM) role. Later, you decide that the instance should be running with an IAM role. Which action must you take in order to have a running Amazon EC2 instance with an IAM role assigned to it?
    1. Create an image of the instance, and register the image with an IAM role assigned and an Amazon EBS volume mapping.
    2. Create a new IAM role with the same permissions as an existing IAM role, and assign it to the running instance. (As per AWS latest enhancement, this is possible now)
    3. Create an image of the instance, add a new IAM role with the same permissions as the desired IAM role, and deregister the image with the new role assigned.
    4. Create an image of the instance, and use this image to launch a new instance with the desired IAM role assigned (This was correct before, as it was not possible to add an IAM role to an existing instance)
  2. What does the following command do with respect to the Amazon EC2 security groups? ec2-revoke RevokeSecurityGroupIngress
    1. Removes one or more security groups from a rule.
    2. Removes one or more security groups from an Amazon EC2 instance.
    3. Removes one or more rules from a security group
    4. Removes a security group from our account.
  3. Which of the following cannot be used in Amazon EC2 to control who has access to specific Amazon EC2 instances?
    1. Security Groups
    2. IAM System
    3. SSH keys
    4. Windows passwords
  4. You must assign each server to at least _____ security group
    1. 3
    2. 2
    3. 4
    4. 1
  5. A company is building software on AWS that requires access to various AWS services. Which configuration should be used to ensure that AWS credentials (i.e., Access Key ID/Secret Access Key combination) are not compromised?
    1. Enable Multi-Factor Authentication for your AWS root account.
    2. Assign an IAM role to the Amazon EC2 instance
    3. Store the AWS Access Key ID/Secret Access Key combination in software comments.
    4. Assign an IAM user to the Amazon EC2 Instance.
  6. Which of the following items are required to allow an application deployed on an EC2 instance to write data to a DynamoDB table? Assume that no security keys are allowed to be stored on the EC2 instance. (Choose 2 answers)
    1. Create an IAM Role that allows write access to the DynamoDB table
    2. Add an IAM Role to a running EC2 instance. (As per AWS latest enhancement, this is possible now)
    3. Create an IAM User that allows write access to the DynamoDB table.
    4. Add an IAM User to a running EC2 instance.
    5. Launch an EC2 Instance with the IAM Role included in the launch configuration (This was correct before, as it was not possible to add an IAM role to an existing instance)
  7. You have an application running on an EC2 Instance, which will allow users to download files from a private S3 bucket using a pre-assigned URL. Before generating the URL the application should verify the existence of the file in S3. How should the application use AWS credentials to access the S3 bucket securely?
    1. Use the AWS account access Keys the application retrieves the credentials from the source code of the application.
    2. Create a IAM user for the application with permissions that allow list access to the S3 bucket launch the instance as the IAM user and retrieve the IAM user’s credentials from the EC2 instance user data.
    3. Create an IAM role for EC2 that allows list access to objects in the S3 bucket. Launch the instance with the role, and retrieve the role’s credentials from the EC2 Instance metadata
    4. Create an IAM user for the application with permissions that allow list access to the S3 bucket. The application retrieves the IAM user credentials from a temporary directory with permissions that allow read access only to the application user.
  8. A user has created an application, which will be hosted on EC2. The application makes calls to DynamoDB to fetch certain data. The application is using the DynamoDB SDK to connect with from the EC2 instance. Which of the below mentioned statements is true with respect to the best practice for security in this scenario?
    1. The user should attach an IAM role with DynamoDB access to the EC2 instance
    2. The user should create an IAM user with DynamoDB access and use its credentials within the application to connect with DynamoDB
    3. The user should create an IAM role, which has EC2 access so that it will allow deploying the application
    4. The user should create an IAM user with DynamoDB and EC2 access. Attach the user with the application so that it does not use the root account credentials
  9. Your application is leveraging IAM Roles for EC2 for accessing object stored in S3. Which two of the following IAM policies control access to you S3 objects.
    1. An IAM trust policy allows the EC2 instance to assume an EC2 instance role.
    2. An IAM access policy allows the EC2 role to access S3 objects
    3. An IAM bucket policy allows the EC2 role to access S3 objects. (Bucket policy is defined with S3 and not with IAM)
    4. An IAM trust policy allows applications running on the EC2 instance to assume as EC2 role (Trust policy allows EC2 instance to assume the role)
    5. An IAM trust policy allows applications running on the EC2 instance to access S3 objects. (Applications can access S3 through EC2 assuming the role)
  10. You have an application running on an EC2 Instance, which will allow users to download files from a private S3 bucket using a pre-assigned URL. Before generating the URL the application should verify the existence of the file in S3. How should the application use AWS credentials to access the S3 bucket securely?
    1. Use the AWS account access Keys the application retrieves the credentials from the source code of the application.
    2. Create a IAM user for the application with permissions that allow list access to the S3 bucket launch the instance as the IAM user and retrieve the IAM user’s credentials from the EC2 instance user data.
    3. Create an IAM role for EC2 that allows list access to objects in the S3 bucket. Launch the instance with the role, and retrieve the role’s credentials from the EC2 Instance metadata
    4. Create an IAM user for the application with permissions that allow list access to the S3 bucket. The application retrieves the IAM user credentials from a temporary directory with permissions that allow read access only to the application user.

AWS EC2 Instance Lifecycle

EC2 Instance Lifecycle Overview

  • EC2 instance lifecycle determines how an EC2 instance transitions through different states from the moment it is launched to its termination

EC2 Instance Lifecycle

Instance Launch

  •  Pending
    • When the instance is first launched is enters into the pending state
  • Running
    • After the instance is launched, it enters into the running state
    • Charges are incurred for each second, with a one-minute minimum, that the instance running is running, even if the instance remains idle

Instance Start & Stop (EBS-backed instances only)

  • Only an EBS-backed instance can be stopped and started.
  • Instance store-backed instance cannot be stopped and started.
  • An instance can be stopped & started in case the instance fails a status check or is not running as expected
  • Stop
    • After the instance is stopped, it enters in stopping state and then to stopped state.
    • Charges are only incurred for the EBS storage and not for the instance hourly charge or data transfer.
    • While the instance is stopped, its root volume can be treated like any other volume, and modified for e.g. repair file system problems or update software or change the instance type, user data, EBS optimization attributes, etc
    • Volume can be detached from the stopped instance, and attached to a running instance, modified, detached from the running instance, and then reattached to the stopped instance. It should be reattached using the storage device name that’s specified as the root device in the block device mapping for the instance.
  • Start
    • When the instance is started, it enters into pending state and then into running
    • An instance, when stopped and started, is launched on a new host
    • Any data on an instance store volume (not root volume) would be lost while data on the EBS volume persists
  • EC2 instance retains its private IP address as well as the Elastic IP address.
  • If the instance has an IPv6 address, it retains its IPv6 address.
  • However, the public IP address, if assigned instead of the Elastic IP address, would be released
  • For each transition of an instance from stopped to running, charges per second are incurred when the instance is running, with a minimum of one minute every time the instance is started

Instance Hibernate

  • Instance hibernation signals the operating system to perform hibernation (suspend-to-disk), which saves the contents from the instance memory (RAM) to the EBS root volume
  • Instance’s EBS root volume and any attached EBS data volumes are persisted, including the saved contents of the RAM.
  • Any EC2 instance store volumes remain attached to the instance, but the data on the instance store volumes is lost.
  • When the instance is restarted, the EBS root volume is restored to its previous state and the RAM contents are reloaded. Previously attached data volumes are reattached and the instance retains its instance ID.
  • After the instance is hibernated, it enters in stopping state and then to stopped state.
  • When the instance is restarted
    • It enters the pending state and the instance is moved to a new host computer (though in some cases, it remains on the current host).
    • EBS root volume is restored to its previous state
    • RAM contents are reloaded
    • Processes that were previously running on the instance are resumed
    • Previously attached data volumes are reattached and the instance retains its instance ID
    • Instance retains private IPv4 addresses and any IPv6 addresses
    • Instance retains its Elastic IP address
    • Instance releases its Public IPv4 address and would get a new one
  • Hibernation prerequisites
    • Supported instance families – C3, C4, C5, M3, M4, M5, R3, R4, R5, & T2
    • Instance RAM size – must be less than 150 GB.
    • Instance size – not supported for bare metal instances.
    • Supported AMIs must be an HVM AMI that supports hibernation
    • Root volume type – must be EBS volume and not instance store
    • EBS root volume size – must be large enough to store the RAM contents
    • EBS root volume MUST be encrypted to ensure the protection of sensitive content that is in memory at the time of hibernation
    • Enable hibernation at launch, as changing it is not supported on an existing instance
    • Purchasing options – Only On-Demand Instances and Reserved Instances supported
  • Limitations or Unsupported Actions
    • Changing the instance type or size of a hibernated instance
    • Creating snapshots or AMIs from hibernated instances or instances for which hibernation is enabled
    • the data on any instance store volumes is lost
    • can’t hibernate an instance that has more than 150 GB of RAM.
    • can’t hibernate an instance that is in an Auto Scaling group or used by ECS. If the instance is in an Auto Scaling group is hibernated, the EC2 Auto Scaling service marks the stopped instance as unhealthy, and may terminate it and launch a replacement instance.
    • An instance cannot be hibernated for more than 60 days.

Instance Reboot

  • Both EBS-backed and Instance store-backed instances can be rebooted
  • An instance remains on the same host computer and maintains its public DNS name, private IP address
  • Data on the EBS and Instance store volume is also retained
  • AWS recommends using EC2 to reboot the instance instead of running the operating system reboot command from the instance as it performs a hard reboot if the instance does not cleanly shut down within four minutes also creates an API record in CloudTrail if enabled.

Instance Retirement

  • An instance is scheduled to be retired when AWS detects an irreparable failure of the underlying hardware hosting the instance.
  • When an instance reaches its scheduled retirement date, it is stopped or terminated by AWS.
  • If the instance root device is an EBS volume, the instance is stopped and can be started again at any time.
  • If the instance root device is an instance store volume, the instance is terminated, and cannot be used again.

Instance Termination

  • An instance can be terminated, and it enters into the shutting-down, and then the terminated state
  • After an instance is terminated, it can’t be connected and no charges are incurred
  • Instance Shutdown behavior
    • Each EBS-backed instance supports the InstanceInitiatedShutdownBehavior attribute which determines whether the instance would be stopped or terminated when a shutdown command is initiated from the instance itself for e.g. shutdown, halt or poweroff command in linux
    • Default behavior for the instance to be stopped.
    • A shutdown command for an Instance store-backed instance will always terminate the instance
  • Termination protection
    • Termination protection ( DisableApiTermination attribute) can be enabled on the instance to prevent it from being accidentally terminated
    • DisableApiTermination from the Console, CLI or API.
    • Instance can be terminated through EC2 CLI.
    • Termination protection does not work for instances when
      • part of an Autoscaling group
      • launched as Spot instances
      • terminating an instance by initiating shutdown from the instance
  • Data persistence
    • EBS volume has a DeleteOnTermination  attribute which determines whether the volumes would be persisted or deleted when an instance they are associated with are terminated
    • Data on Instance store volume data does not persist
    • Default is to delete the root device volume and preserve any other EBS volumes. i.e.
      • Data on EBS root volumes have the DeleteOnTermination flag set to true and would be deleted by default
      • Additional EBS volumes attached have the DeleteOnTermination flag set to false are not deleted but just detached from the instance

EC2 Instance Lifecycle States and Billing

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon EC2 provide?
    1. Virtual servers in the Cloud
    2. A platform to run code (Java, PHP, Python), paying on an hourly basis.
    3. Computer Clusters in the Cloud.
    4. Physical servers, remotely managed by the customer.
  2. A user has enabled termination protection on an EC2 instance. The user has also set Instance initiated shutdown behavior to terminate. When the user shuts down the instance from the OS, what will happen?
    1. The OS will shutdown but the instance will not be terminated due to protection
    2. It will terminate the instance
    3. It will not allow the user to shutdown the instance from the OS
    4. It is not possible to set the termination protection when an Instance initiated shutdown is set to Terminate
  3. A user has launched an EC2 instance and deployed a production application in it. The user wants to prohibit any mistakes from the production team to avoid accidental termination. How can the user achieve this?
    1. The user can the set DisableApiTermination attribute to avoid accidental termination
    2. It is not possible to avoid accidental termination
    3. The user can set the Deletion termination flag to avoid accidental termination
    4. The user can set the InstanceInitiatedShutdownBehavior flag to avoid accidental termination
  4. You have been doing a lot of testing of your VPC Network by deliberately failing EC2 instances to test whether instances are failing over properly. Your customer who will be paying the AWS bill for all this asks you if he being charged for all these instances. You try to explain to him how the billing works on EC2 instances to the best of your knowledge. What would be an appropriate response to give to the customer in regards to this?
    1. Billing commences when Amazon EC2 AMI instance is completely up and billing ends as soon as the instance starts to shutdown.
    2. Billing commences when Amazon EC2 initiates the boot sequence of an AMI instance and billing ends when the instance shuts down.
    3. Billing only commences only after 1 hour of uptime and billing ends when the instance terminates.
    4. Billing commences when Amazon EC2 initiates the boot sequence of an AMI instance and billing ends as soon as the instance starts to shutdown.

References

AWS EC2 Instance Purchasing Option

AWS EC2 Instance Purchasing Option

  • Amazon provides different ways to pay for the EC2 instances
    • On-Demand Instances
    • Reserved Instances
    • Spot Instances
    • Dedicated Hosts
    • Dedicated Instances
    • Capacity reservations
  • EC2 instances can be launched on shared or dedicated tenancy

On-Demand Instances

  • Pay for the instances and the compute capacity used by the hour or the second, depending on which instances you run
  • No long-term commitments or up-front payments
  • Instances can be scaled accordingly as per the demand
  • Although AWS makes effort to have the capacity to launch On-Demand instances, there might be instances during peak demand where the instance cannot be launched
  • Well suited for
    • Users that want the low cost and flexibility of EC2 without any up-front payment or long-term commitment
    • Applications with short term, spiky, or unpredictable workloads that cannot be interrupted
    • Applications being developed or tested on EC2 for the first time

Reserved Instances

  • Reserved Instances provides lower hourly running costs by providing a billing discount (up to 75%) as well as capacity reservation that is applied to instances and there would never be a case of insufficient capacity
  • Discounted usage price is fixed as long as you own the Reserved Instance, allowing compute costs prediction over the term of the reservation
  • Reserved instances are best suited if consistent, heavy, use is expected and they can provide savings over owning the hardware or running only On-Demand instances.
  • Well Suited for
    • Applications with steady state or predictable usage
    • Applications that require reserved capacity
    • Users are able to make upfront payments to reduce their total computing costs even further
  • Reserved instance is not a physical instance that is launched, but rather a billing discount applied to the use of On-Demand Instances
  • On-Demand Instances must match certain attributes, such as instance type and Region, in order to benefit from the billing discount.
  • Reserved Instances do not renew automatically, and the EC2 instances can be continued to be used but charged On-Demand rates
  • Auto Scaling or other AWS services can be used to launch the On-Demand instances that use the Reserved Instance benefits
  • With Reserved Instances
    • You pay for the entire term, regardless of the usage
    • Once purchased, the reservation cannot be canceled but can be sold in the Reserved Instance Marketplace
    • Reserved Instance pricing tier discounts only apply to purchases made from AWS, and not to the third party Reserved instances

Reserved Instance Pricing Key Variables

Instance attributes

A Reserved Instance has four instance attributes that determine its price.

  • Instance type: Instance family + Instance size e.g.m4.large composed of the instance family (m4) and the instance size (large).
  • Region: Region in which the Reserved Instance is purchased.
  • Tenancy: Whether your instance runs on shared (default) or single-tenant (dedicated) hardware.
  • Platform: Operating system; for example, Windows or Linux/Unix.

Term commitment

Reserved Instance can be purchased for a one-year or three-year commitment, with the three-year commitment offering a bigger discount.

  • One-year: A year is defined as 31536000 seconds (365 days).
  • Three-year: Three years is defined as 94608000 seconds (1095 days).

Payment options

  • No Upfront
    • No upfront payment is required and the account is charged at a discounted hourly rate for every hour, regardless of the usage
    • Only available as a 1-year reservation
  • Partial Upfront
    • A portion of the cost is paid upfront and the remaining hours in the term are charged at an hourly discounted rate, regardless of the usage
  • Full Upfront
    • Full payment is made at the start of the term, with no costs for the remainder of the term, regardless of the usage

Offering class

  • Standard: Provide the most significant discount, but can only be modified.
  • Convertible: These provide a lower discount than Standard Reserved Instances, but can be exchanged for another Convertible Reserved Instance with different instance attributes. Convertible Reserved Instances can also be modified.

How Reserved Instances work

Billing Benefits & Payment Options

  • Reserved Instance purchase reservation is automatically applied to running instances that match the specified parameters
  • Reserved Instance can also be utilized by launching On-Demand instances with the same configuration as to the purchased reserved capacity

Understanding Hourly Billing

  • Reserved Instances are billed for every clock-hour during the term that you select, regardless of whether the instance is running or not.
  • A Reserved Instance billing benefit can be applied to a running instance on a per-second basis. Per-second billing is available for instances using an open-source Linux distribution, such as Amazon Linux and Ubuntu.
  • Per-hour billing is used for commercial Linux distributions, such as Red Hat Enterprise Linux and SUSE Linux Enterprise Server.
  • A Reserved Instance billing benefit can apply to a maximum of 3600 seconds (one hour) of instance usage per clock-hour. You can run multiple instances concurrently, but can only receive the benefit of the Reserved Instance discount for a total of 3600 seconds per clock-hour; instance usage that exceeds 3600 seconds in a clock-hour is billed at the On-Demand rate.
  • Reservations and discounted rates only apply to one instance-hour per hour. If an instance restarts during the first hour of a reservation and runs for two hours before stopping, the first instance-hour is charged at the discounted rate but three instance-hours are charged at the On-Demand rate. If the instance restarts during one hour and again the next hour before running the remainder of the reservation, one instance-hour is charged at the On-Demand rate but the discounted rate is applied to previous and subsequent instance-hours.

Consolidated Billing

  • Pricing benefits of Reserved Instances are shared when the purchasing account is part of a set of accounts billed under one consolidated billing payer account
  • Consolidated billing account aggregates the list value of member accounts within a region.
  • When the list value of all active Reserved Instances for the consolidated billing account reaches a discount pricing tier, any Reserved Instances purchased after this point by any member of the consolidated billing account are charged at the discounted rate (as long as the list value for that consolidated account stays above the discount pricing tier threshold)

Buying Reserved Instances

Buying Reserved Instances need a selection of the following

  • Platform (for example, Linux)
  • Instance type (for example, m1.small)
  • Availability Zone in which to run the instance for Zonal reserved instance
  • Term (time period) over which you want to reserve capacity
  • Tenancy You can reserve capacity for your instance to run in single-tenant hardware (dedicated tenancy, as opposed to shared).
  • Offering (No Upfront, Partial Upfront, All Upfront).

Modifying Reserved Instances

  • Standard or Convertible Reserved Instances can be modified and continue to benefit from the capacity reservation as the computing needs change.
  • Availability Zone, instance size (within the same instance family), and scope of the Reserved Instance can be modified
  • All or a subset of the Reserved Instances can be modified
  • Two or more Reserved Instances can be merged into a single Reserved Instance
  • Modification does not change the remaining term of the Reserved Instances; their end dates remain the same.
  • There is no fee, and you do not receive any new bills or invoices.
  • Modification is separate from purchasing and does not affect how you use, purchase, or sell Reserved Instances.
  • Complete reservation or a subset of it can be modified in one or more of the following ways:
    • Switch Availability Zones within the same region
    • Change between EC2-VPC and EC2-Classic
    • Change the instance size within the same instance type, given the instance size footprint remains the same for e.g. four m1.medium instances (4 x 2), you can turn it into a reservation for eight m1.small instances (8 x 1) and vice versa. However, you cannot convert a reservation for a single m1.small instance (1 x 1) into a reservation for an m1.large instance (1 x 4).

Screen Shot 2016-04-26 at 7.07.24 AM.png

Scheduled Reserved Instances

  • AWS does not have any capacity available for Scheduled Reserved Instances or any plans to make it available in the future. To reserve capacity, use On-Demand Capacity Reservations instead
  • Scheduled Reserved Instances (Scheduled Instances) enable capacity reservations purchase that recurs on a daily, weekly, or monthly basis, with a specified start time and duration, for a one-year term.
  • Capacity is reserved in advance and is always available when needed
  • Charges are incurred for the time that the instances are scheduled, even if they are not used
  • Scheduled Instances are a good choice for workloads that do not run continuously, but do run on a regular schedule for e.g. weekly or monthly batch jobs
  • EC2 launches the instances, based on the launch specification during their scheduled time periods
  • EC2 terminates the EC2 instances three minutes before the end of the current scheduled time period to ensure the capacity is available for any other Scheduled Instances it is reserved for.
  • Scheduled Reserved instances cannot be stopped or rebooted, however, they can be terminated and relaunched within minutes of termination
  • Scheduled Reserved instances limits or restrictions
    • after purchase cannot be modified, canceled, or resold
    • only supported instance types: C3, C4, M4, and R3
    • the required term is 365 days (one year).
    • minimum required utilization is 1,200 hours per year
    • purchase up to three months in advance

On-Demand Capacity Reservations

  • On-Demand Capacity Reservations enable you to reserve compute capacity for the EC2 instances in a specific AZ for any duration.
  • This gives you the ability to create and manage Capacity Reservations independently from the billing discounts offered by Savings Plans or regional Reserved Instances.
  • By creating Capacity Reservations, you ensure that you always have access to EC2 capacity when you need it, for as long as you need it.
  • Capacity Reservations can be created at any time, without entering into a one-year or three-year term commitment, and the capacity is available immediately.
  • Billing starts as soon as the capacity is provisioned and the Capacity Reservation enters the active state.
  • When no longer needed, the Capacity Reservation can be canceled to stop incurring charges.
  • Capacity Reservation creation requires
    • AZ in which to reserve the capacity
    • Number of instances for which to reserve capacity
    • Instance attributes, including the instance type, tenancy, and platform/OS
  • Capacity Reservations can only be used by instances that match their attributes. By default, they are automatically used by running instances that match the attributes. If you don’t have any running instances that match the attributes of the Capacity Reservation, it remains unused until you launch an instance with matching attributes.

Spot Instances

Refer blog post @ EC2 Spot Instances

Dedicated Instances

  • Dedicated Instances are EC2 instances that run in a VPC on hardware that’s dedicated to a single customer
  • Dedicated Instances are physically isolated at the host hardware level from the instances that aren’t Dedicated Instances and from instances that belong to other AWS accounts.
  • Each VPC has a related instance tenancy attribute.
    • default
      • default is shared.
      • the tenancy can be changed to dedicated after creation
      • all instances launched would be shared, unless you explicitly specify a different tenancy during instance launch.
    • dedicated
      • all instances launched would be dedicated
      • the tenancy can’t be changed to default after creation
  • Each instance launched into a VPC has a tenancy attribute. Default tenancy depends on the VPC tenancy, which by default is shared.
    • default – instance runs on shared hardware.
    • dedicated – instance runs on single-tenant hardware.
    • host – instance runs on a Dedicated Host, which is an isolated server with configurations that you can control.
    • default tenancy cannot be changed to dedicatedor hostand vice versa.
    • dedicatedtenancy can be changed to hostand vice version
  • Dedicated Instances can be launched using
    • Create the VPC with the instance tenancy set to dedicated, all instances launched into this VPC are Dedicated Instances even though if you mark the tenancy as shared.
    • Create the VPC with the instance tenancy set to default, and specify dedicated tenancy for any instances that should be Dedicated Instances when launched.

Dedicated Hosts

  • EC2 Dedicated Host is a physical server with EC2 instance capacity fully dedicated to your use
  • Dedicated Hosts allow using existing per-socket, per-core, or per-VM software licenses, including Windows Server, Microsoft SQL Server, SUSE, and Linux Enterprise Server.

Dedicated Hosts vs Dedicated Instances

EC2 Dedicated Host vs Dedicated Instances

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. If I want my instance to run on a single-tenant hardware, which value do I have to set the instance’s tenancy attribute to?
    1. dedicated
    2. isolated
    3. one
    4. reserved
  2. You have a video transcoding application running on Amazon EC2. Each instance polls a queue to find out which video should be transcoded, and then runs a transcoding process. If this process is interrupted, the video will be transcoded by another instance based on the queuing system. You have a large backlog of videos, which need to be transcoded, and would like to reduce this backlog by adding more instances. You will need these instances only until the backlog is reduced. Which type of Amazon EC2 instances should you use to reduce the backlog in the most cost efficient way?
    1. Reserved instances
    2. Spot instances
    3. Dedicated instances
    4. On-demand instances
  3. The one-time payment for Reserved Instances is __________ refundable if the reservation is cancelled.
    1. always
    2. in some circumstances
    3. never
  4. You run a web application where web servers on EC2 Instances are In an Auto Scaling group Monitoring over the last 6 months shows that 6 web servers are necessary to handle the minimum load. During the day up to 12 servers are needed Five to six days per year, the number of web servers required might go up to 15. What would you recommend to minimize costs while being able to provide hill availability?
    1. 6 Reserved instances (heavy utilization). 6 Reserved instances (medium utilization), rest covered by On-Demand instances
    2. 6 Reserved instances (heavy utilization). 6 On-Demand instances, rest covered by Spot Instances (don’t go for spot as availability not guaranteed)
    3. 6 Reserved instances (heavy utilization) 6 Spot instances, rest covered by On-Demand instances (don’t go for spot as availability not guaranteed)
    4. 6 Reserved instances (heavy utilization) 6 Reserved instances (medium utilization) rest covered by Spot instances (don’t go for spot as availability not guaranteed)
  5. A user is running one instance for only 3 hours every day. The user wants to save some cost with the instance. Which of the below mentioned Reserved Instance categories is advised in this case?
    1. The user should not use RI; instead only go with the on-demand pricing (seems question before the introduction of the Scheduled Reserved instances in Jan 2016, which can be used in this case)
    2. The user should use the AWS high utilized RI
    3. The user should use the AWS medium utilized RI
    4. The user should use the AWS low utilized RI
  6. Which of the following are characteristics of a reserved instance? Choose 3 answers (but 4 answers seem correct)
    1. It can be migrated across Availability Zones (can be modified)
    2. It is specific to an Amazon Machine Image (AMI) (specific to platform)
    3. It can be applied to instances launched by Auto Scaling (are allowed)
    4. It is specific to an instance Type (specific to instance family but instance type can be changed)
    5. It can be used to lower Total Cost of Ownership (TCO) of a system (helps to reduce cost)
  7. You have a distributed application that periodically processes large volumes of data across multiple Amazon EC2 Instances. The application is designed to recover gracefully from Amazon EC2 instance failures. You are required to accomplish this task in the most cost-effective way. Which of the following will meet your requirements?
    1. Spot Instances
    2. Reserved instances
    3. Dedicated instances
    4. On-Demand instances
  8. Can I move a Reserved Instance from one Region to another?
    1. No
    2. Only if they are moving into GovCloud
    3. Yes
    4. Only if they are moving to US East from another region
  9. An application you maintain consists of multiple EC2 instances in a default tenancy VPC. This application has undergone an internal audit and has been determined to require dedicated hardware for one instance. Your compliance team has given you a week to move this instance to single-tenant hardware. Which process will have minimal impact on your application while complying with this requirement?
    1. Create a new VPC with tenancy=dedicated and migrate to the new VPC (possible but impact not minimal)
    2. Use ec2-reboot-instances command line and set the parameter dedicated=true
    3. Right click on the instance, select properties and check the box for dedicated tenancy
    4. Stop the instance, create an AMI, launch a new instance with tenancy=dedicated, and terminate the old instance
  10. Your department creates regular analytics reports from your company’s log files. All log data is collected in Amazon S3 and processed by daily Amazon Elastic Map Reduce (EMR) jobs that generate daily PDF reports and aggregated tables in CSV format for an Amazon Redshift data warehouse. Your CFO requests that you optimize the cost structure for this system. Which of the following alternatives will lower costs without compromising average performance of the system or data integrity for the raw data? [PROFESSIONAL]
    1. Use reduced redundancy storage (RRS) for PDF and CSV data in Amazon S3. Add Spot instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift. (Spot instances impacts performance)
    2. Use reduced redundancy storage (RRS) for all data in S3. Use a combination of Spot instances and Reserved Instances for Amazon EMR jobs. Use Reserved instances for Amazon Redshift (Combination of the Spot and reserved with guarantee performance and help reduce cost. Also, RRS would reduce cost and guarantee data integrity, which is different from data durability )
    3. Use reduced redundancy storage (RRS) for all data in Amazon S3. Add Spot Instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift (Spot instances impacts performance)
    4. Use reduced redundancy storage (RRS) for PDF and CSV data in S3. Add Spot Instances to EMR jobs. Use Spot Instances for Amazon Redshift. (Spot instances impacts performance)
  11. A research scientist is planning for the one-time launch of an Elastic MapReduce cluster and is encouraged by her manager to minimize the costs. The cluster is designed to ingest 200TB of genomics data with a total of 100 Amazon EC2 instances and is expected to run for around four hours. The resulting data set must be stored temporarily until archived into an Amazon RDS Oracle instance. Which option will help save the most money while meeting requirements? [PROFESSIONAL]
    1. Store ingest and output files in Amazon S3. Deploy on-demand for the master and core nodes and spot for the task nodes.
    2. Optimize by deploying a combination of on-demand, RI and spot-pricing models for the master, core and task nodes. Store ingest and output files in Amazon S3 with a lifecycle policy that archives them to Amazon Glacier. (Reserved Instance not cost effective for 4 hour job and data not needed in S3 once moved to RDS)
    3. Store the ingest files in Amazon S3 RRS and store the output files in S3. Deploy Reserved Instances for the master and core nodes and on-demand for the task nodes. (Reserved Instance not cost effective)
    4. Deploy on-demand master, core and task nodes and store ingest and output files in Amazon S3 RRS (RRS provides not much cost benefits for a 4 hour job while the amount of input data would take time to upload and Output data to reproduce)
  12. A company currently has a highly available web application running in production. The application’s web front-end utilizes an Elastic Load Balancer and Auto scaling across 3 availability zones. During peak load, your web servers operate at 90% utilization and leverage a combination of heavy utilization reserved instances for steady state load and on-demand and spot instances for peak load. You are asked with designing a cost effective architecture to allow the application to recover quickly in the event that an availability zone is unavailable during peak load. Which option provides the most cost effective high availability architectural design for this application? [PROFESSIONAL]
    1. Increase auto scaling capacity and scaling thresholds to allow the web-front to cost-effectively scale across all availability zones to lower aggregate utilization levels that will allow an availability zone to fail during peak load without affecting the applications availability. (Ideal for HA to reduce and distribute load)
    2. Continue to run your web front-end at 90% utilization, but purchase an appropriate number of utilization RIs in each availability zone to cover the loss of any of the other availability zones during peak load. (90% is not recommended as well RIs would increase the cost)
    3. Continue to run your web front-end at 90% utilization, but leverage a high bid price strategy to cover the loss of any of the other availability zones during peak load. (90% is not recommended as high bid price would not guarantee instances and would increase cost)
    4. Increase use of spot instances to cost effectively to scale the web front-end across all availability zones to lower aggregate utilization levels that will allow an availability zone to fail during peak load without affecting the applications availability. (Availability cannot be guaranteed)
  13. You run accounting software in the AWS cloud. This software needs to be online continuously during the day every day of the week, and has a very static requirement for compute resources. You also have other, unrelated batch jobs that need to run once per day at any time of your choosing. How should you minimize cost? [PROFESSIONAL]
    1. Purchase a Heavy Utilization Reserved Instance to run the accounting software. Turn it off after hours. Run the batch jobs with the same instance class, so the Reserved Instance credits are also applied to the batch jobs. (Because the instance will always be online during the day, in a predictable manner, and there are sequences of batch jobs to perform at any time, we should run the batch jobs when the account software is off. We can achieve Heavy Utilization by alternating these times, so we should purchase the reservation as such, as this represents the lowest cost. There is no such thing a “Full” level utilization purchases on EC2.)
    2. Purchase a Medium Utilization Reserved Instance to run the accounting software. Turn it off after hours. Run the batch jobs with the same instance class, so the Reserved Instance credits are also applied to the batch jobs.
    3. Purchase a Light Utilization Reserved Instance to run the accounting software. Turn it off after hours. Run the batch jobs with the same instance class, so the Reserved Instance credits are also applied to the batch jobs.
    4. Purchase a Full Utilization Reserved Instance to run the accounting software. Turn it off after hours. Run the batch jobs with the same instance class, so the Reserved Instance credits are also applied to the batch jobs.

References

AWS Elastic Block Store Storage – EBS

EC2 Elastic Block Storage – EBS

  • Elastic Block Storage – EBS provides highly available, reliable, durable, block-level storage volumes that can be attached to an EC2 instance.
  • EBS as a primary storage device is recommended for data that requires frequent and granular updates e.g. running a database or filesystem.
  • An EBS volume
    • behaves like a raw, unformatted, external block device that can be attached to a single EC2 instance at a time.
    • persists independently from the running life of an instance.
    • is Zonal and can be attached to any instance within the same Availability Zone and can be used like any other physical hard drive.
    • is particularly well-suited for use as the primary storage for file systems, databases, or any applications that require fine granular updates and access to raw, unformatted, block-level storage.

Elastic Block Storage Features

  • EBS Volumes are created in a specific Availability Zone and can be attached to any instance in that same AZ.
  • Volumes can be backed up by creating a snapshot of the volume, which is stored in S3.
  • Volumes can be created from a snapshot that can be attached to another instance within the same region.
  • Volumes can be made available outside of the AZ by creating and restoring the snapshot to a new volume anywhere in that region.
  • Snapshots can also be copied to other regions and then restored to new volumes, making it easier to leverage multiple AWS regions for geographical expansion, data center migration, and disaster recovery.
  • Volumes allow encryption using the EBS encryption feature. All data stored at rest, disk I/O, and snapshots created from the volume are encrypted.
  • Encryption occurs on the EC2 instance, providing encryption of data-in-transit from EC2 to the EBS volume.
  • Elastic Volumes help easily adapt the volumes as the needs of the applications change. Elastic Volumes allow you to dynamically increase capacity, tune performance, and change the type of any new or existing current generation volume with no downtime or performance impact.
  • You can dynamically increase size, modify the provisioned IOPS capacity, and change volume type on live production volumes.
  • General Purpose (SSD) volumes support up to 10,000 16000 IOPS and 160 250 MB/s of throughput and Provisioned IOPS (SSD) volumes support up to 20,000 64000 IOPS and 320 1000 MB/s of throughput.
  • EBS Magnetic volumes can be created from 1 GiB to 1 TiB in size; EBS General Purpose (SSD) and Provisioned IOPS (SSD) volumes can be created up to 16 TiB in size.

EBS Benefits

  • Data Availability
    • Data is automatically replicated in an Availability Zone to prevent data loss due to the failure of any single hardware component.
  • Data Persistence
    • persists independently of the running life of an EC2 instance
    • persists when an instance is stopped, started, or rebooted
    • Root volume is deleted, by default, on Instance termination but the behaviour can be changed using the DeleteOnTermination flag
    • All attached volumes persist, by default, on instance termination
  • Data Encryption
    • can be encrypted by the EBS encryption feature
    • uses 256-bit AES-256 and an Amazon-managed key infrastructure.
    • Encryption occurs on the server that hosts the EC2 instance, providing encryption of data-in-transit from the EC2 instance to EBS storage
    • Snapshots of encrypted EBS volumes are automatically encrypted.
  • Snapshots
    • provides the ability to create snapshots (backups) of any EBS volume and write a copy of the data in the volume to S3, where it is stored redundantly in multiple Availability Zones.
    • can be used to create new volumes, increase the size of the volumes or replicate data across Availability Zones or Regions.
    • are incremental backups and store only the data that was changed from the time the last snapshot was taken.
    • Snapshot size can probably be smaller than the volume size as the data is compressed before being saved to S3.
    • Even though snapshots are saved incrementally, the snapshot deletion process is designed so that you need to retain only the most recent snapshot in order to restore the volume.

EBS Volume Types

Refer blog post @ EBS Volume Types

EBS Volume

EBS Volume Creation

  • Creating New volumes
    • Completely new from console or command line tools and can then be attached to an EC2 instance in the same Availability Zone.
  • Restore volume from Snapshots
    • Volumes can also be restored from previously created snapshots
    • New volumes created from existing snapshots are loaded lazily in the background.
    • There is no need to wait for all of the data to transfer from S3 to the volume before the attached instance can start accessing the volume and all its data.
    • If the instance accesses the data that hasn’t yet been loaded, the volume immediately downloads the requested data from S3, and continues loading the rest of the data in the background.
    • Volumes restored from encrypted snapshots are always encrypted, by default.
  • Volumes can be created and attached to a running EC2 instance by specifying a block device mapping

EBS Volume Detachment

  • EBS volumes can be detached from an instance explicitly or by terminating the instance.
  • EBS root volumes can be detached by stopping the instance.
  • EBS data volumes, attached to a running instance, can be detached by unmounting the volume from the instance first.
  • If the volume is detached without being unmounted, it might result in the volume being stuck in a busy state and could possibly damage the file system or the data it contains.
  • EBS volume can be force detached from an instance, using the Force Detach option, but it might lead to data loss or a corrupted file system as the instance does not get an opportunity to flush file system caches or file system metadata.
  • Charges are still incurred for the volume after its detachment

EBS Volume Deletion

  • EBS volume deletion would wipe out its data and the volume can’t be attached to any instance. However, it can be backed up before deletion using EBS snapshots

EBS Volume Resize

  • EBS Elastic Volumes can be modified to increase the volume size, change the volume type, or adjust the performance of your EBS volumes.
  • If the instance supports Elastic Volumes, changes can be performed without detaching the volume or restarting the instance.

EBS Volume Snapshots

Refer blog post @ EBS Snapshot

EBS Encryption

  • EBS volumes can be created and attached to a supported instance type and support the following types of data
    • Data at rest
    • All disk I/O i.e All data moving between the volume and the instance
    • All snapshots created from the volume
    • All volumes created from those snapshots
  • Encryption occurs on the servers that host EC2 instances, providing encryption of data-in-transit from EC2 instances to EBS storage.
  • EBS encryption is supported with all EBS volume types (gp2, io1, st1, and sc1), and has the same IOPS performance on encrypted volumes as with unencrypted volumes, with a minimal effect on latency
  • EBS encryption is only available on select instance types.
  • Volumes created from encrypted snapshots and snapshots of encrypted volumes are automatically encrypted using the same encryption key.
  • EBS encryption uses AWS KMS customer master keys (CMK) when creating encrypted volumes and any snapshots created from the encrypted volumes.
  • EBS volumes can be encrypted using either
    • a default CMK created for you automatically.
    • a CMK that you created separately using AWS KMS, giving you more flexibility, including the ability to create, rotate, disable, define access controls, and audit the encryption keys used to protect your data.
  • Public or shared snapshots of encrypted volumes are not supported, because other accounts would be able to decrypt your data and needs to be migrated to an unencrypted status before sharing.
  • Existing unencrypted volumes cannot be encrypted directly, but can be migrated by
    • Option 1
      • create an unencrypted snapshot from the volume
      • create an encrypted copy of an unencrypted snapshot
      • create an encrypted volume from the encrypted snapshot
    • Option 2
      • create an unencrypted snapshot from the volume
      • create an encrypted volume from an unencrypted snapshot
  • An encrypted snapshot can be created from an unencrypted snapshot by creating an encrypted copy of the unencrypted snapshot.
  • Unencrypted volume cannot be created from an encrypted volume directly but needs to be migrated

EBS Multi-Attach

Refer blog Post @ EBS Multi-Attach

EBS Performance

Refer blog Post @ EBS Performance

EBS vs Instance Store

Refer blog post @ EBS vs Instance Store

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. _____ is a durable, block-level storage volume that you can attach to a single, running Amazon EC2 instance.
    1. Amazon S3
    2. Amazon EBS
    3. None of these
    4. All of these
  2. Which Amazon storage do you think is the best for my database-style applications that frequently encounter many random reads and writes across the dataset?
    1. None of these.
    2. Amazon Instance Storage
    3. Any of these
    4. Amazon EBS
  3. What does Amazon EBS stand for?
    1. Elastic Block Storage
    2. Elastic Business Server
    3. Elastic Blade Server
    4. Elastic Block Store
  4. Which Amazon Storage behaves like raw, unformatted, external block devices that you can attach to your instances?
    1. None of these.
    2. Amazon Instance Storage
    3. Amazon EBS
    4. All of these
  5. A user has created numerous EBS volumes. What is the general limit for each AWS account for the maximum number of EBS volumes that can be created?
    1. 10000
    2. 5000
    3. 100
    4. 1000
  6. Select the correct set of steps for exposing the snapshot only to specific AWS accounts
    1. Select Public for all the accounts and check mark those accounts with whom you want to expose the snapshots and click save.
    2. Select Private and enter the IDs of those AWS accounts, and click Save.
    3. Select Public, enter the IDs of those AWS accounts, and click Save.
    4. Select Public, mark the IDs of those AWS accounts as private, and click Save.
  7. If an Amazon EBS volume is the root device of an instance, can I detach it without stopping the instance?
    1. Yes but only if Windows instance
    2. No
    3. Yes
    4. Yes but only if a Linux instance
  8. Can we attach an EBS volume to more than one EC2 instance at the same time?
    1. Yes
    2. No
    3. Only EC2-optimized EBS volumes.
    4. Only in read mode.
  9. Do the Amazon EBS volumes persist independently from the running life of an Amazon EC2 instance?
    1. Only if instructed to when created
    2. Yes
    3. No
  10. Can I delete a snapshot of the root device of an EBS volume used by a registered AMI?
    1. Only via API
    2. Only via Console
    3. Yes
    4. No
  11. By default, EBS volumes that are created and attached to an instance at launch are deleted when that instance is terminated. You can modify this behavior by changing the value of the flag_____ to false when you launch the instance
    1. DeleteOnTermination
    2. RemoveOnDeletion
    3. RemoveOnTermination
    4. TerminateOnDeletion
  12. Your company policies require encryption of sensitive data at rest. You are considering the possible options for protecting data while storing it at rest on an EBS data volume, attached to an EC2 instance. Which of these options would allow you to encrypt your data at rest? (Choose 3 answers)
    1. Implement third party volume encryption tools
    2. Do nothing as EBS volumes are encrypted by default
    3. Encrypt data inside your applications before storing it on EBS
    4. Encrypt data using native data encryption drivers at the file system level
    5. Implement SSL/TLS for all services running on the server
  13. Which of the following are true regarding encrypted Amazon Elastic Block Store (EBS) volumes? Choose 2 answers
    1. Supported on all Amazon EBS volume types
    2. Snapshots are automatically encrypted
    3. Available to all instance types
    4. Existing volumes can be encrypted
    5. Shared volumes can be encrypted
  14. How can you secure data at rest on an EBS volume?
    1. Encrypt the volume using the S3 server-side encryption service
    2. Attach the volume to an instance using EC2’s SSL interface.
    3. Create an IAM policy that restricts read and write access to the volume.
    4. Write the data randomly instead of sequentially.
    5. Use an encrypted file system on top of the EBS volume
  15. A user has deployed an application on an EBS backed EC2 instance. For a better performance of application, it requires dedicated EC2 to EBS traffic. How can the user achieve this?
    1. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    2. Launch the EC2 instance as EBS enhanced with PIOPS EBS
    3. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    4. Launch the EC2 instance as EBS optimized with PIOPS EBS
  16. A user is trying to launch an EBS backed EC2 instance under free usage. The user wants to achieve encryption of the EBS volume. How can the user encrypt the data at rest?
    1. Use AWS EBS encryption to encrypt the data at rest (Encryption is allowed on micro instances)
    2. User cannot use EBS encryption and has to encrypt the data manually or using a third party tool (Encryption was not allowed on micro instances before)
    3. The user has to select the encryption enabled flag while launching the EC2 instance
    4. Encryption of volume is not available as a part of the free usage tier
  17. A user is planning to schedule a backup for an EBS volume. The user wants security of the snapshot data. How can the user achieve data encryption with a snapshot?
    1. Use encrypted EBS volumes so that the snapshot will be encrypted by AWS
    2. While creating a snapshot select the snapshot with encryption
    3. By default the snapshot is encrypted by AWS
    4. Enable server side encryption for the snapshot using S3
  18. A user has launched an EBS backed EC2 instance. The user has rebooted the instance. Which of the below mentioned statements is not true with respect to the reboot action?
    1. The private and public address remains the same
    2. The Elastic IP remains associated with the instance
    3. The volume is preserved
    4. The instance runs on a new host computer
  19. A user has launched an EBS backed EC2 instance. What will be the difference while performing the restart or stop/start options on that instance?
    1. For restart it does not charge for an extra hour, while every stop/start it will be charged as a separate hour
    2. Every restart is charged by AWS as a separate hour, while multiple start/stop actions during a single hour will be counted as a single hour
    3. For every restart or start/stop it will be charged as a separate hour
    4. For restart it charges extra only once, while for every stop/start it will be charged as a separate hour
  20. A user has launched an EBS backed instance. The user started the instance at 9 AM in the morning. Between 9 AM to 10 AM, the user is testing some script. Thus, he stopped the instance twice and restarted it. In the same hour the user rebooted the instance once. For how many instance hours will AWS charge the user?
    1. 3 hours
    2. 4 hours
    3. 2 hours
    4. 1 hour
  21. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  22. An organization wants to move to Cloud. They are looking for a secure encrypted database storage option. Which of the below mentioned AWS functionalities helps them to achieve this?
    1. AWS MFA with EBS
    2. AWS EBS encryption
    3. Multi-tier encryption with Redshift
    4. AWS S3 server-side storage
  23. A user has stored data on an encrypted EBS volume. The user wants to share the data with his friend’s AWS account. How can user achieve this?
    1. Create an AMI from the volume and share the AMI
    2. Copy the data to an unencrypted volume and then share
    3. Take a snapshot and share the snapshot with a friend
    4. If both the accounts are using the same encryption key then the user can share the volume directly
  24. A user is using an EBS backed instance. Which of the below mentioned statements is true?
    1. The user will be charged for volume and instance only when the instance is running
    2. The user will be charged for the volume even if the instance is stopped
    3. The user will be charged only for the instance running cost
    4. The user will not be charged for the volume if the instance is stopped
  25. A user is planning to use EBS for his DB requirement. The user already has an EC2 instance running in the VPC private subnet. How can the user attach the EBS volume to a running instance?
    1. The user must create EBS within the same VPC and then attach it to a running instance.
    2. The user can create EBS in the same zone as the subnet of instance and attach that EBS to instance. (Should be in the same AZ)
    3. It is not possible to attach an EBS to an instance running in VPC until the instance is stopped.
    4. The user can specify the same subnet while creating EBS and then attach it to a running instance.
  26. A user is creating an EBS volume. He asks for your advice. Which advice mentioned below should you not give to the user for creating an EBS volume?
    1. Take the snapshot of the volume when the instance is stopped
    2. Stripe multiple volumes attached to the same instance
    3. Create an AMI from the attached volume (AMI is created from the snapshot)
    4. Attach multiple volumes to the same instance
  27. An EC2 instance has one additional EBS volume attached to it. How can a user attach the same volume to another running instance in the same AZ?
    1. Terminate the first instance and only then attach to the new instance
    2. Attach the volume as read only to the second instance
    3. Detach the volume first and attach to new instance
    4. No need to detach. Just select the volume and attach it to the new instance, it will take care of mapping internally
  28. What is the scope of an EBS volume?
    1. VPC
    2. Region
    3. Placement Group
    4. Availability Zone

Reference

Amazon_EBS

AWS EC2 Instance Types

EC2 Instance Types

  • EC2 Instance types determine the hardware of the host computer used for the instance.
  • EC2 Instance types offer different compute, memory & storage capabilities and are grouped in instance families based on these capabilities.
  • EC2 provides each instance with a consistent and predictable amount of CPU capacity, regardless of its underlying hardware.
  • EC2 dedicates some resources of the host computer, such as CPU, memory, and instance storage, to a particular instance.
  • EC2 shares other resources of the host computer, such as the network and the disk subsystem, among instances. If each instance on a host computer tries to use as much of one of these shared resources as possible, each receives an equal share of that resource. However, when a resource is under-utilized, an instance can consume a higher share of that resource while it’s available.

EC2 Instance Types Selection criteria

  • Some Instance types support only the HVM virtualization type while others support both the PV and HVM virtualization types. AWS, however, recommends using HVM for taking advantage of the underlying hardware
  • All EC2 instance types are available in a VPC, however, a few are not available in an EC2-classic. AWS recommends using VPC to take advantage of enhanced networking, multiple IP addresses, finer security control etc.
  • Some instances support only EBS volumes, while others support both EBS and Instance store volumes. Some instances that support instance store volumes use solid-state drives (SSD) to deliver very high random I/O performance.
  • Some EC2 instance types can be launched as EBS optimized instances with a dedicated capacity for EBS I/O.
  • Some EC2 Instance types can be launched in placement group to optimize instances for High-Performance Computing (HPC)
  • Some instances support Enhanced Networking,  to get significantly higher packet per second (PPS) performance, lower network jitter, and lower latencies
  • Some Instances allow EBS volumes to be encrypted

EBS-Optimized

  • EBS-optimized instance uses an optimized configuration stack and provides additional, dedicated capacity for EBS I/O.
  • EBS-optimized instances enable you to get consistently high performance for the EBS volumes by eliminating contention between EBS I/O and other network traffic from the instance.
  • EBS-optimized instances deliver dedicated throughput between Amazon EC2 and EBS, with options between 500 and 60,000 Megabits per second (Mbps) depending on the instance type used.
  • When attached to an EBS–optimized instance, General Purpose (SSD) volumes are designed to deliver within 10 percent of their baseline and burst performance 99.9 percent of the time in a given year, and Provisioned IOPS (SSD) volumes are designed to deliver within 10 percent of their provisioned performance 99.9 percent of the time in a given year.
  • EBS optimization can be enabled for an instance that is not EBS–optimized, by default

Placement Groups

  • EC2 Placement groups determine how the instances are placed on the underlying hardware.
  • AWS now provides three types of placement groups
    • Cluster – clusters instances into a low-latency group in a single AZ
    • Partition – spreads instances across logical partitions, ensuring that instances in one partition do not share underlying hardware with instances in other partitions
    • Spread – strictly places a small group of instances across distinct underlying hardware to reduce correlated failures

NOTE – AWS keeps on releasing new instance types, please refer AWS documentation for the same.

EC2 Instance Types – Current Generation

EC2 Instance Types

EC2 Instance Types Comparision

Screen Shot 2016-04-15 at 7.06.50 AM.png

T2 Instances (General Purpose)

  • T2 instances are designed to provide moderate baseline performance and the capability to burst to significantly higher performance as required
  • Mainly intended for workloads that don’t use the full CPU often or consistently, but occasionally need to burst.
  • T2 instances are well suited for
    • general-purpose workloads, such as web servers, developer environments, remote desktops, and small databases
  • Requirements
    • can be launched only with HVM AMI
    • can be launched into a  VPC only, and not supported on the EC2-Classic platform
    • are available as EBS-backed instances only
    • are available as On-Demand, Reserved instances, Dedicated Instances (T3 only), and Spot instances but do not allow spot instances
    • By default, 20 (soft limit) T2 instances can run simultaneously
    • cannot be launched as a Dedicated host
  • T2 Unlimited Instances
    • can sustain high CPU performance for as long as a workload needs it.
    • for most general-purpose workloads, it provides ample performance without any additional charges.
    • If the instance needs to run at higher CPU utilization for a prolonged period, it can also do so at a flat additional rate

CPU Credits

  • CPU Credits (Similar to I/O Credits in the case of the EBS general-purpose storage) provides the performance of a full CPU core for one minute
  • T2 instances provide a baseline level of CPU performance, while CPU governs the ability to burst above the baseline level
  • One CPU credit is equal to one vCPU running at 100% utilization for one minute. for e.g. can have One vCPU running at 100% for One min OR One vCPU running @ 50% for 2 mins OR Two vCPU running @ 25% for 2 mins
  • Each T2 instance receives a healthy initial credit balance for startup performance
  • Initial CPU credits do not expire, but they are used first when an instance uses CPU credits.
  • Each T2 instance then continuously (at a millisecond-level resolution) receives a set rate of CPU credits per hour, depending on instance size for e.g. t2.nano earns 3/hour while a t2.large earns 36/hour
  • Each T2 instance accumulates the CPU credit when it uses fewer CPU resources than its allowed baseline performance levels
  • Maximum earned credit balance for an instance is equal to the number of CPU credits received per hour times 24 hours for e.g. t2.nano can earn max 72 (24 * 3) credits
  • CPU credit balance is available for a period of 24 hours and it expires 24 hours after they were earned. Expired credits are removed from the balance before new ones are added
  • CPU credit ceases to persist between an instance stop-start. However, after the start, the instance receives the initial CPU credits again
  • When the credit balance is completely exhausted, the instance will perform at its baseline performance

C4 Instances (Compute Intensive)

  • C4 instances are ideal for compute-bound applications that benefit from high-performance processors
  • Well suited for
    • Batch processing workloads,
    • Media transcoding,
    • High-traffic web servers, massively multiplayer online (MMO) gaming servers, and ad serving engines,
    • High-traffic web servers, massively multiplayer online (MMO) gaming servers, and ad serving engines
  • Features
    • are EBS-optimized, by default
    • can be enabled for Enhanced Networking capabilities
    • can be clustered in a placement group
  • requirements
    • requires 64-bit HVM AMI
    • can be launched into a  VPC only, and not supported on the EC2-Classic platform

G2 Instances (Graphic Intensive)

  • GPU instances provide  high parallel processing capability
  • Well suited for
    • to accelerate many scientific, engineering, and rendering applications by leveraging the Compute Unified Device Architecture (CUDA) or OpenCL parallel computing frameworks
    • graphics applications, including game streaming, 3-D application streaming, and other graphics workloads
  • Requirements
    • requires HVM AMI
    • can’t access GPU unless NVIDIA drivers installed
  • Features
    • can be clustered in a placement group

I2 Instances (I/O Intensive)

  • I2 instances are optimized to deliver tens of thousands of low-latency, random I/O operations per second (IOPS) to applications.
  • Well suited for applications
    • NoSQL databases (for example, Cassandra and MongoDB)
    • Clustered databases
    • Online transaction processing (OLTP) systems
  • Features
    • Primary data storage is SSD-based instance storage.
    • can be enabled for Enhanced Networking capabilities
    • can be clustered in a placement group
    • can enable EBS–optimization to obtain additional, dedicated capacity for Amazon EBS I/O
  • Requirements
    • requires HVM AMI
  • HI1 is the equivalent previous generation instance
    • supports both PV and HVM AMIs

D2 Instances (Density Intensive)

  • D2 instances are designed for workloads with very high storage density and that require high sequential read and write access to very large data sets on local storage.
  • Well suited for applications
    • Massive parallel processing (MPP) data warehouse
    • MapReduce and Hadoop distributed computing
    • Log or data processing applications
  • Features
    • Primary data storage for D2 instances is HDD-based instance storage
    • are EBS-optimized, by default
    • can be enabled for Enhanced Networking capabilities
    • can be clustered in a placement group
  • requirements
    • requires 64-bit HVM AMI
  • HS1 is the equivalent previous generation instance
    • supports both EBS and Instance store backed AMIs
    • supports both PV and HVM AMIs

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following instance types are available as Amazon EBS-backed only? Choose 2 answers
    1. General purpose T2
    2. General purpose M3
    3. Compute-optimized C4
    4. Compute-optimized C3
    5. Storage-optimized 12
  2. A t2.medium EC2 instance type must be launched with what type of Amazon Machine Image (AMI)?
    1. An Instance store Hardware Virtual Machine AMI
    2. An Instance store Paravirtual AMI
    3. An Amazon EBS-backed Hardware Virtual Machine AMI
    4. An Amazon EBS-backed Paravirtual AMI
  3. You have identified network throughput as a bottleneck on your m1.small EC2 instance when uploading data Into Amazon S3 In the same region. How do you remedy this situation? Add an additional ENI
    1. Change to a larger Instance
    2. Use DirectConnect between EC2 and S3
    3. Use EBS PIOPS on the local volume
  4. You are using an m1.small EC2 Instance with one 300 GB EBS volume to host a relational database. You determined that write throughput to the database needs to be increased. Which of the following approaches can help achieve this? Choose 2 answers
    1. Use an array of EBS volumes (Striping to increase throughput)
    2. Enable Multi-AZ mode.
    3. Place the instance in an Auto Scaling Groups
    4. Add an EBS volume and place into RAID 5 (RAID 5 is not recommended as it provides parity and EBS volumes are already replicated across multiple servers in an Availability Zone for availability and durability, so AWS recommends striping for performance rather than durability)
    5. Increase the size of the EC2 Instance.
    6. Put the database behind an Elastic Load Balancer.
  5. You are tasked with setting up a cluster of EC2 Instances for a NoSQL database. The database requires random read IO disk performance up to a 100,000 IOPS at 4KB block side per node. Which of the following EC2 instances will perform the best for this workload?
    1. A High-Memory Quadruple Extra Large (m2.4xlarge) with EBS-Optimized set to true and a PIOPs EBS volume
    2. A Cluster Compute Eight Extra Large (cc2.8xlarge) using instance storage
    3. High I/O Quadruple Extra Large (hi1.4xlarge) using instance storage
    4. A Cluster GPU Quadruple Extra Large (cg1.4xlarge) using four separate 4000 PIOPS EBS volumes in a RAID 0 configuration
  6. You are implementing a URL whitelisting system for a company that wants to restrict outbound HTTP’S connections to specific domains from their EC2-hosted applications you deploy a single EC2 instance running proxy software and configure It to accept traffic from all subnets and EC2 instances in the VPC. You configure the proxy to only pass through traffic to domains that you define in its whitelist configuration You have a nightly maintenance window or 10 minutes where ail instances fetch new software updates. Each update Is about 200MB In size and there are 500 instances In the VPC that routinely fetch updates After a few days you notice that some machines are failing to successfully download some, but not all of their updates within the maintenance window The download URLs used for these updates are correctly listed in the proxy’s whitelist configuration and you are able to access them manually using a web browser on the instances What might be happening? (Choose 2 answers) [PROFESSIONAL]
    1. You are running the proxy on an undersized EC2 instance type so network throughput is not sufficient for all instances to download their updates in time.
    2. You have not allocated enough storage to the EC2 instance running me proxy so the network buffer is filling up causing some requests to fall
    3. You are running the proxy in a public subnet but have not allocated enough EIPs to support the needed network throughput through the Internet Gateway (IGW)
    4. You are running the proxy on a affluently-sized EC2 instance in a private subnet and its network throughput is being throttled by a NAT running on an undersized EC2 instance
    5. The route table for the subnets containing the affected EC2 instances is not configured to direct network traffic for the software update locations to the proxy.
  7. You have been asked to design the storage layer for an application. The application requires disk performance of at least 100,000 IOPS in addition; the storage layer must be able to survive the loss of an individual disk, EC2 instance, or Availability Zone without any data loss. The volume you provide must have a capacity of at least 3TB. Which of the following designs will meet these objectives? [PROFESSIONAL]
    1. Instantiate an i2.8xlarge instance in us-east-1a. Create a RAID 0 volume using the four 800GB SSD ephemeral disks provided with the instance. Provision 3×1 TB EBS volumes attach them to the instance and configure them as a second RAID 0 volume. Configure synchronous, block-level replication from the ephemeral backed volume to the EBS-backed volume. (Same AZ will not survive the AZ loss)
    2. Instantiate an i2.8xlarge instance in us-east-1a. Create a RAID 0 volume using the four 800GB SSD ephemeral disks provided with the Instance Configure synchronous block-level replication to an identically configured Instance in us-east-1b.
    3. Instantiate a c3.8xlarge Instance in us-east-1. Provision an AWS Storage Gateway and configure it for 3 TB of storage and 100,000 IOPS. Attach the volume to the instance. (Need synchronous replication to prevent any data loss)
    4. Instantiate a c3.8xlarge instance in us-east-1 provision 4x1TB EBS volumes, attach them to the instance, and configure them as a single RAID 5 volume Ensure that EBS snapshots are performed every 15 minutes. (RAID 5 not recommended by AWS and Need synchronous replication to prevent any data loss)
    5. Instantiate a c3 8xlarge Instance in us-east-1 Provision 3x1TB EBS volumes attach them to the instance, and configure them as a single RAID 0 volume Ensure that EBS snapshots are performed every 15 minutes. (Need synchronous replication to prevent any data loss)

References

AWS EC2 Best Practices

AWS EC2 Best Practices

AWS recommends the following best practices to get maximum benefit and satisfaction from EC2

Security & Network

  • Implement the least permissive rules for the security group.
  • Regularly patch, update, and secure the operating system and applications on the instance
  • Manage access to AWS resources and APIs using identity federation, IAM users, and IAM roles
  • Establish credential management policies and procedures for creating, distributing, rotating, and revoking AWS access credentials
  • Launch the instances into a VPC instead of EC2-Classic (If AWS account is newly created VPC is used by default)
  • Encrypt EBS volumes and snapshots.

Storage

  • EC2 supports Instance store and EBS volumes, so its best to understand the implications of the root device type for data persistence, backup, and recovery
  • Use separate Amazon EBS volumes for the operating system (root device) versus your data.
  • Ensure that the data volume (with the data) persists after instance termination.
  • Use the instance store available for the instance to only store temporary data. Remember that the data stored in the instance store is deleted when an instance is stopped or terminated.
  • If you use instance store for database storage, ensure that you have a cluster with a replication factor that ensures fault tolerance.

Resource Management

  • Use instance metadata and custom resource tags to track and identify your AWS resources
  • View your current limits for Amazon EC2. Plan to request any limit increases in advance of the time that you’ll need them.

Backup & Recovery

  • Regularly back up the instance using Amazon EBS snapshots (not done automatically) or a backup tool.
  • Data Lifecycle Manager (DLM) to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes
  • Create an Amazon Machine Image (AMI) from the instance to save the configuration as a template for launching future instances.
  • Implement High Availability by deploying critical components of the application across multiple Availability Zones, and replicate the data appropriately
  • Monitor and respond to events.
  • Design the applications to handle dynamic IP addressing when the instance restarts.
  • Implement failover. For a basic solution, you can manually attach a network interface or Elastic IP address to a replacement instance
  • Regularly test the process of recovering your instances and Amazon EBS volumes if they fail.

References

AWS Encrypting Data at Rest – Whitepaper – Certification

Encrypting Data at Rest

  • AWS delivers a secure, scalable cloud computing platform with high availability, offering the flexibility for you to build a wide range of applications
  • AWS allows several options for encrypting data at rest, for additional layer of security, ranging from completely automated AWS encryption solution to manual client-side options
  • Encryption requires 3 things
    • Data to encrypt
    • Encryption keys
    • Cryptographic algorithm method to encrypt the data
  • AWS provides different models for Securing data at rest on the following parameters
    • Encryption method
      • Encryption algorithm selection involves evaluating security, performance, and compliance requirements specific to your application
    • Key Management Infrastructure (KMI)
      • KMI enables managing & protecting the encryption keys from unauthorized access
      • KMI provides
        • Storage layer that protects plain text keys
        • Management layer that authorize key usage
  • Hardware Security Module (HSM)
    • Common way to protect keys in a KMI is using HSM
    • An HSM is a dedicated storage and data processing device that performs cryptographic operations using keys on the device.
    • An HSM typically provides tamper evidence, or resistance, to protect keys from unauthorized use.
    • A software-based authorization layer controls who can administer the HSM and which users or applications can use which keys in the HSM
  • AWS CloudHSM
    • AWS CloudHSM appliance has both physical and logical tamper detection and response mechanisms that trigger zeroization of the appliance.
    • Zeroization erases the HSM’s volatile memory where any keys in the process of being decrypted were stored and destroys the key that encrypts stored objects, effectively causing all keys on the HSM to be inaccessible and unrecoverable.
    • AWS CloudHSM can be used to generate and store key material and can perform encryption and decryption operations,
    • AWS CloudHSM, however, does not perform any key lifecycle management functions (e.g., access control policy, key rotation) and needs a compatible KMI.
    • KMI can be deployed either on-premises or within Amazon EC2 and can communicate to the AWS CloudHSM instance securely over SSL to help protect data and encryption keys.
    • AWS CloudHSM service uses SafeNet Luna appliances, any key management server that supports the SafeNet Luna platform can also be used with AWS CloudHSM
  • AWS Key Management Service (KMS)
    • AWS KMS is a managed encryption service that allows you to provision and use keys to encrypt data in AWS services and your applications.
    • Masters key, after creation, are designed to never be exported from the service.
    • AWS KMS gives you centralized control over who can access your master keys to encrypt and decrypt data, and it gives you the ability to audit this access.
    • Data can be sent into the KMS to be encrypted or decrypted under a specific master key under you account.
    • AWS KMS is natively integrated with other AWS services (for e.g. Amazon EBS, Amazon S3, and Amazon Redshift) and AWS SDKs to simplify encryption of your data within those services or custom applications
    • AWS KMS provides global availability, low latency, and a high level of durability for your keys.

Encryption Models in AWS

Encryption models in AWS depends on the on how you/AWS provides the encryption method and the KMI

  • You control the encryption method and the entire KMI
  • You control the encryption method, AWS provides the storage component of the KMI, and you provide the management layer of the KMI.
  • AWS controls the encryption method and the entire KMI.

Screen Shot 2016-04-08 at 7.39.04 AM

Model A: You control the encryption method and the entire KMI

  • You use your own KMI to generate, store, and manage access to keys as well as control all encryption methods in your applications
  • Proper storage, management, and use of keys to ensure the confidentiality, integrity, and availability of your data is your responsibility
  • AWS has no access to your keys and cannot perform encryption or decryption on your behalf.
  • Amazon S3
    • Encryption of the data is done before the object is sent to AWS S3
    • Encryption of the data can be done using any encryption method and the encrypted data can be uploaded using the PUT request in the Amazon S3 API
    • Key used to encrypt the data needs to be stored securely in your KMI
    • To decrypt this data, the encrypted object can be downloaded from Amazon S3 using the GET request in the Amazon S3 API and then decrypted using the key in your KMI
    • AWS provide Client-side encryption handling, where you can provide your key to the AWS S3 encryption client which will encrypt and decrypt the data on your behalf. However, AWS never has access to the keys or the unencrypted data
    • Screen Shot 2016-04-08 at 6.51.32 PM.png
  • Amazon EBS
    • Amazon Elastic Block Store (Amazon EBS) provides block-level storage volumes for use with Amazon EC2 instances. Amazon EBS volumes are network-attached, and persist independently from the life of an instance.
    • Because Amazon EBS volumes are presented to an instance as a block device, you can leverage most standard encryption tools for file system-level or block-level encryption
    • Block level encryption
      • Block level encryption tools usually operate below the file system layer using kernel space device drivers to perform encryption and decryption of data.
      • These tools are useful when you want all data written to a volume to be encrypted regardless of what directory the data is stored in
    • File System level encryption
      • File system level encryption usually works by stacking an encrypted file system on top of an existing file system.
      • This method is typically used to encrypt a specific directory
    • These solutions require you to provide keys, either manually or from your KMI.
    • Both block-level and file system-level encryption tools can only be used to encrypt data volumes that are not Amazon EBS boot volumes, as they don’t allow you to automatically make a trusted key available to the boot volume at startup
    • There are third party solutions available, which can help encrypt both the boot and data volumes as well as supplying and protecting keys
  • AWS Storage Gateway
    • AWS Storage Gateway is a service connecting an on-premises software appliance with Amazon S3. Data on disk volumes attached to the AWS Storage Gateway will be automatically uploaded to Amazon S3 based on policy
    • Encryption of the source data on the disk volumes can be either done before writing to the disk or using block level encryption on the iSCSI endpoint that AWS Storage Gateway exposes to encrypt all data on the disk volume.
  • Amazon RDS
    • Amazon RDS doesn’t expose the attached disk it uses for data storage, transparent disk encryption using techniques for EBS section cannot be applied.
    • However, individual fields data can be encrypted before the data is written to RDS and decrypted after reading it.

Model B: You control the encryption method, AWS provides the KMI storage component, and you provide the KMI management layer

  • Model B is similar to Model A where the encryption method is managed by you
  • Model B differs in the approach to Model A where the keys are maintained in AWS CloudHSM rather than than the on-premise key storage system
  • Only you have access to the cryptographic partitions within the dedicated HSM to use the keys

Screen Shot 2016-04-10 at 1.01.54 PM.png

Model C: AWS controls the encryption method and the entire KMI

  • AWS provides and manages the server-side encryption of your data, transparently managing the encryption method and the keys.
  • AWS KMS and other services that encrypt your data directly use a method called envelope encryption to provide a balance between performance and security.
  • Envelope Encryption method
    • A master key is defined either by you or AWS
    • A data key (data encryption key) is generated by the AWS service at the time when data encryption is requested
    • Data key is used to encrypt your data.
    • Data key is then encrypted with a key-encrypting key (master key) unique to the service storing your data.
    • Encrypted data key and the encrypted data are then stored by the AWS storage service on your behalf.
  • Master key (key-encrypting keys) used to encrypt data keys are stored and managed separately from the data and the data keys
  • For decryption of the data, the process is reversed. Encrypted data key is decrypted using the key-encrypting key; the data key is then used to decrypt your data
  • Authorized use of encryption keys is done automatically and is securely managed by AWS.
  • Because unauthorized access to those keys could lead to the disclosure of your data, AWS has built systems and processes with strong access controls that minimize the chance of unauthorized access and had these systems verified by third-party audits to achieve security certifications including SOC 1, 2, and 3, PCI-DSS, and FedRAMP.
  • Amazon S3
    • SSE-S3
      • AWS encrypts each object using a unique data key
      • Data key is encrypted with a periodically rotated master key managed by S3
      • Amazon S3 server-side encryption uses 256-bit Advanced Encryption Standard (AES) keys for both object and master keys
    • SSE-KMS
      • Master keys are defined and managed in KMS for your account
      • Object Encryption
        • When an object is uploaded, a request is sent to KMS to create an object key.
        • KMS generates a unique object key and encrypts it using the master key; KMS then returns this encrypted object key along with the plaintext object key to Amazon S3.
        • Amazon S3 web server encrypts your object using the plaintext object key and stores the now encrypted object (with the encrypted object key) and deletes the plaintext object key from memory.
      • Object Decryption
        • To retrieve the encrypted object, Amazon S3 sends the encrypted object key to AWS KMS.
        • AWS KMS decrypts the object key using the correct master key and returns the decrypted (plaintext) object key to S3.
        • Amazon S3 decrypts the encrypted object, with the plaintext object key, and returns it to you.
    • SSE-C
      • Amazon S3 is provided an encryption key, while uploading the object
      • Encryption key is used by Amazon S3 to encrypt your data using AES-256
      • After object encryption, Amazon S3 deletes the encryption key
      • For downloading, you need to provide the same encryption key, which AWS matches, decrypts and returns the object
  • Amazon EBS
    • When Amazon EBS volume is created, you can choose the master key in KMS to be used for encrypting the volume
    • Volume encryption
      • Amazon EC2 server sends an authenticated request to AWS KMS to create a volume key.
      • AWS KMS generates this volume key, encrypts it using the master key, and returns the plaintext volume key and the encrypted volume key to the Amazon EC2 server.
      • Plaintext volume key is stored in memory to encrypt and decrypt all data going to and from your attached EBS volume.
    • Volume decryption
      • When the encrypted volume (or any encrypted snapshots derived
        from that volume) needs to be re-attached to an instance, a call is made to AWS KMS to decrypt the encrypted volume key.
      • AWS KMS decrypts this encrypted volume key with the correct master key and returns the decrypted volume key to Amazon EC2.
  • Amazon Glacier
    • Glacier provide encryption of the data, by default
    • Before it’s written to disk, data is always automatically encrypted using 256-bit AES keys unique to the Amazon Glacier service that are stored in separate systems under AWS control
  • AWS Storage Gateway
    • AWS Storage Gateway transfers your data to AWS over SSL
    • AWS Storage Gateway stores data encrypted at rest in Amazon S3 or Amazon Glacier using their respective server side encryption schemes.
  • Amazon RDS – Oracle
    • Oracle Advanced Security option for Oracle on Amazon RDS can be used to leverage the native Transparent Data Encryption (TDE) and Native Network Encryption (NNE) features
    • Oracle encryption module creates data and key-encrypting keys to encrypt the database
    • Key-encrypting keys specific to your Oracle instance on Amazon RDS are themselves encrypted by a periodically rotated 256-bit AES master key.
    • Master key is unique to the Amazon RDS service and is stored in separate systems under AWS control
  • Amazon RDS -SQL server
    • Transparent Data Encryption (TDE) can be provisioned for Microsoft SQL Server on Amazon RDS.
    • SQL Server encryption module creates data and keyencrypting keys to encrypt the database.
    • Key-encrypting keys specific to your SQL Server instance on Amazon RDS are themselves encrypted by a periodically rotated, regional 256-bit AES master key
    • Master key is unique to the Amazon RDS service and is stored in separate systems under AWS control

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. How can you secure data at rest on an EBS volume?
    1. Encrypt the volume using the S3 server-side encryption service
    2. Attach the volume to an instance using EC2’s SSL interface.
    3. Create an IAM policy that restricts read and write access to the volume.
    4. Write the data randomly instead of sequentially.
    5. Use an encrypted file system on top of the EBS volume
  2. Your company policies require encryption of sensitive data at rest. You are considering the possible options for protecting data while storing it at rest on an EBS data volume, attached to an EC2 instance. Which of these options would allow you to encrypt your data at rest? (Choose 3 answers)
    1. Implement third party volume encryption tools
    2. Do nothing as EBS volumes are encrypted by default
    3. Encrypt data inside your applications before storing it on EBS
    4. Encrypt data using native data encryption drivers at the file system level
    5. Implement SSL/TLS for all services running on the server
  3. A company is storing data on Amazon Simple Storage Service (S3). The company’s security policy mandates that data is encrypted at rest. Which of the following methods can achieve this? Choose 3 answers
    1. Use Amazon S3 server-side encryption with AWS Key Management Service managed keys
    2. Use Amazon S3 server-side encryption with customer-provided keys
    3. Use Amazon S3 server-side encryption with EC2 key pair.
    4. Use Amazon S3 bucket policies to restrict access to the data at rest.
    5. Encrypt the data on the client-side before ingesting to Amazon S3 using their own master key
    6. Use SSL to encrypt the data while in transit to Amazon S3.
  4. Which 2 services provide native encryption
    1. Amazon EBS
    2. Amazon Glacier
    3. Amazon Redshift (is optional)
    4. Amazon RDS (is optional)
    5. Amazon Storage Gateway
  5. With which AWS services CloudHSM can be used (select 2)
    1. S3
    2. DynamoDb
    3. RDS
    4. ElastiCache
    5. Amazon Redshift

References