AWS Storage Options – CloudFront & ElastiCache

Amazon CloudFront

  • is a webservice for content delivery
  • provides low latency by caching and delivering content from a global network of edge locations located nearest to the user
  • supports both HTTP to allows static, dynamic content and Real Time Messaging Protocol (RTMP) for streaming of videos
  • optimized to work as with Amazon services like S3, ELB etc. as well as works seamlessly with any non-AWS origin server

Ideal Usage Patterns

  • is ideal for distribution of frequently accessed static content, or dynamic content or for streaming audio or video that benefits from edge delivery

Anti-Pattern

  • Infrequently accessed data
    • If the data is infrequently accessed, it would be better to serve the data from the Origin server
  • Programmatic cache invalidation
    • CloudFront supports cache invalidation, however AWS recommends using object versioning rather than programmatic cache invalidation.

Performance

  • is designed for low latency and high bandwidth delivery of content by redirecting the user to the nearest edge location in terms of latency and caching the content preventing the round trip to the origin server

Durability & Availability

  • provides high Availability by delivering content from a distributed global network of edge locations. Amazon also constantly monitors the network paths connecting Origin servers to CloudFront
  • does not provide durable storage, which is more of the responsibility of the underlying Origin server providing the content for e.g. S3

Cost Model

  • has two pricing components:
    • regional data transfer out (per GB) and
    • requests (per 10,000)

Scalability & Elasticity

  • provides seamless scalability & elasticity by automatically responding to the increase or the decrease in the demand

ElastiCache

  • is a webservice that makes it easy to deploy, operate, and scale a distributed, in-memory cache in the cloud
  • helps improves performance of the applications by allowing retrieval of data from fast, managed, in-memory caching system
  • supports Memcached (object caching) & Redis (key value store that supports data structure) open source caching engines

Ideal Usage Patterns

  • improving application performance by storing critical data in-memory for low latency access
  • use cases involve usage as a database front end for read heavy applications, improving performance and reducing load on databases, or managing user session data, cache dynamically generated pages, or compute intensive calculations etc.

Anti-Patterns

  • Persistent Data
    • If the application needs fast access to data coupled with strong data durability, Amazon DynamoDB would be a better option

Performance

  • Although ElastiCache provides low latency access to the data, the performance depends on the caching strategy and the hit ratio at the application level

Durability & Availability

  • stores transient data or transient copies of durable data, so the data durability is managed by the source
  • With the Memcached engine
    • all ElastiCache nodes in a single cache cluster are provisioned in a single Availability Zone.
    • ElastiCache automatically monitors the health of your cache nodes and replaces them in the event of network partitioning, host hardware, or software failure.
    • In the event of cache node failure, the cluster remains available, but performance may be reduced due to time needed to repopulate the cache in the new “cold” cache nodes.
    • To provide enhanced fault-tolerance for Availability Zone failures or cold-cache effects, you can run redundant cache clusters in different Availability Zones.
  • With the Redis engine,
    • ElastiCache supports replication to up to five read replicas for scaling. To improve availability, you can place read replicas in other Availability Zones.
    • ElastiCache monitors the primary node, and if the node becomes unavailable, ElastiCache will repair or replace the primary node if possible, using the same DNS name.
    • If the primary cache node recovery fails or its Availability Zone is unavailable, primary node can be failed over to one of the read replicas with an API call.

Cost Model

  • has a single pricing component:
    • pricing is per cache node-hour consumed

Scalability & Elasticity

  • ElastiCache is highly scalable and elastic.
  • Cache node can be added or deleted to the cache cluster
  • Auto Discovery enables automatic discovery of Memcached cache nodes by ElastiCache Clients when the nodes are added to or removed from an ElastiCache cluster.

 

Storage Options Whitepaper – Storage Gateway – Import/Export – AWS Certification

AWS Storage Options Whitepaper cont.

Provides a brief summary for the Ideal Use cases and Anti-Patterns for Storage Gateway and Import/Export AWS storage options

AWS Storage Gateway

  • Storage Gateway is a service that connects an on-premises software appliance with cloud-based storage to provide seamless and secure integration between the organization’s on-premises IT environment and AWS’s storage infrastructure.
  • Storage Gateway enables store data securely to the AWS cloud for scalable and cost-effective storage.
  • It provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of your data encrypted in S3.
  • For disaster recovery scenarios, it can serve as a cloud-hosted solution, together with EC2, that mirrors your entire production environment.
  • Storage Gateway can be configured as
    • Gateway-cached volumes
      • Gateway-cached volumes utilizes S3 for primary data backup, while retaining frequently accessed data locally in a cache.
      • These volumes minimize the need to scale the on-premises storage infrastructure, while still providing applications with low-latency access to their frequently accessed data.
      • Data written to the volumes is stored in S3, with only a cache of recently written and recently read data is stored locally on the on-premises storage hardware.
    • Gateway-stored volumes
      • Gateway-stored volumes stores the complete primary data locally, while asynchronously backing up that data to AWS.
      • These volumes provide the on-premises applications with low-latency access to their entire datasets, while providing durable, off-site backups.
      • Data written to the gateway-stored volumes is stored on the on-premises storage hardware, and asynchronously backed up to S3 in the form of EBS snapshots.

Ideal Usage Patterns

  • AWS Storage Gateway use cases include
    • corporate file sharing,
    • enabling existing on-premises backup applications to store primary backups on S3,
    • disaster recovery, and
    • data mirroring to cloud-based compute resources.

Anti-Patterns

  • Database storage
    • For Database backup or storage, EC2 instances using EBS volumes are a natural choice for database storage and workloads.

Performance

  • As the Storage Gateway VM sits between the application, underlying on-premises storage and S3, the performance experienced will be dependent upon a number of factors, including the speed and configuration of the underlying local disks, the network bandwidth between the iSCSI initiator and gateway VM, the amount of local storage allocated to the gateway VM, and the bandwidth between the gateway VM and S3.
  • For gateway-cached volumes, to provide low-latency read access to the on-premises applications, it’s important to provide enough local cache storage to store the recently accessed data.
  • Storage Gateway efficiently uses the Internet bandwidth to speed up the upload of on-premises application data to AWS.
  • Storage Gateway only uploads incremental changes (data that has changed), which minimizes the amount of data sent over the Internet.
  • AWS Direct Connect can be used to further increase throughput and reduce the network costs by establishing a dedicated network connection between the on-premises gateway and AWS.

Durability and Availability

  • AWS Storage Gateway durably stores on-premises application data by uploading it to S3.
  • S3 stores data in multiple facilities and on multiple devices within each facility.
  • S3 also performs regular, systematic data integrity checks and is built to be automatically self-healing.

Cost Model

  • AWS Storage Gateway has four pricing components:
    • gateway usage (per gateway per month),
    • snapshot storage usage (per GB per month),
    • volume storage usage (per GB per month), and
    • data transfer out (per GB per month).

Scalability and Elasticity

  • AWS Storage Gateway stores data in Amazon S3, which has been designed to offer a very high level of scalability and elasticity automatically.

Interfaces

  • AWS Management Console can be used to download the AWS Storage Gateway VM image, select between a gateway-cached or gateway-stored configuration, activate the on-premises by associating the gateway’s IP Address with your AWS account, select an AWS region, and create AWS Storage Gateway volumes and attach these volumes as iSCSI devices to your on-premises application servers.

AWS Import/Export (Upgraded to Snowball)

  • AWS Import/Export accelerates moving large amounts of data into and out of AWS using portable storage devices for transport.
  • AWS transfers the data directly onto and off of storage devices using Amazon’s high-speed internal network and bypassing the Internet and can be much faster and more cost effective than upgrading connectivity.
  • AWS Import/Export supports importing into several types of AWS storage, including EBS snapshots, S3 buckets, and Glacier vaults and exporting data from S3.

Ideal Usage Patterns

  • AWS Import/Export is ideal for transferring large amounts of data in and out of the AWS cloud, especially in cases where transferring the data over the Internet would be too slow (a week or more) or too costly.
  • Common use cases include
    • initial data upload to AWS,
    • content distribution or regular data interchange to/from your customers or business associates,
    • transfer to Amazon S3 or Amazon Glacier for off-site backup and archival storage, and quick retrieval of large backups from Amazon S3 or Amazon Glacier for disaster recovery.

Anti-Patterns

  • AWS Import/Export may not be the ideal solution for data that is more easily transferred over the Internet in less than one week.

Performance

  • Each AWS Import/Export station is capable of loading data at over 100 MB per second
  • Rate of the data load will be bounded by a combination of the read or write speed of the portable storage device and, for Amazon S3 data loads, the average object (file) size.

Durability and Availability

  • Durability and availability characteristics of the target storage i.e. EBS, S3 or Glacier applies, after the data has been imported

Cost Model

  • AWS Import/Export has three pricing components: a per-device fee, a data load time charge (per data-loading-hour), and possible return shipping charges (for expedited shipping, or shipping to destinations not local to that AWS Import/Export region).
  • Storage pricing applies for the destination storage, the standard Amazon EBS snapshot, Amazon S3, and Amazon Glacier request and storage pricing applies.

Scalability and Elasticity

  • Total amount of data you can load using AWS Import/Export is limited only by the capacity of the devices sent to AWS.
  • For Amazon S3, individual files will be loaded as objects in Amazon S3, and may range up to 5 terabytes in size.
  • For Amazon Glacier, individual devices will be loaded as a single archive, and may range up to 4 terabytes in size.
  • Aggregate total amount of data that can be imported is virtually unlimited.

Interfaces

  • To upload or download data, AWS Import/Export job for each storage device shipped need to be created and submitted
  • Jobs can be created using AWS CLI, AWS SDK or native REST API
  • Each job request requires a manifest file, a YAML-formatted text file that contains a set of key-value pairs that supply the required information—such as your device ID, secret access key, and return address—necessary to complete the job.
  • Job request is tied to the storage device through a signature file in the root directory (for Amazon S3 import jobs), or by a barcode taped to the device (for Amazon EBS and Amazon Glacier jobs).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are working with a customer who has 10 TB of archival data that they want to migrate to Amazon Glacier. The customer has a 1-Mbps connection to the Internet. Which service or feature provides the fastest method of getting the data into Amazon Glacier?
    1. Amazon Glacier multipart upload
    2. AWS Storage Gateway
    3. VM Import/Export
    4. AWS Import/Export

AWS Storage Options – RDS, DynamoDB & Database on EC2

AWS Storage Options Whitepaper with RDS, DynamoDB & Database on EC2 Cont.

Provides a brief summary for the Ideal Use cases, Anti-Patterns and other factors for Amazon RDS, DynamoDB & Databases on EC2 storage options

Amazon RDS

  • RDS is a web service that provides the capabilities of MySQL, Oracle, MariaDB, Postgres or Microsoft SQL Server relational database as a managed, cloud-based service
  • RDS eliminates much of the administrative overhead associated with launching, managing, and scaling your own relational database on Amazon EC2 or in another computing environment.

Ideal Usage Patterns

  • RDS is a great solution for cloud-based fully-managed relational database
  • RDS is also optimal for new applications with structured data that requires more sophisticated querying and joining capabilities than that provided by Amazon’s NoSQL database offering, DynamoDB.
  • RDS provides full compatibility with the databases supported and direct access to native database engines, code and libraries and is ideal for existing applications that rely on these databases

Anti-Patterns

  • Index and query-focused data
    • If the applications don’t require advanced features such as joins and complex transactions and is more oriented toward indexing and querying data, DynamoDB would be more appropriate for this needs
  • Numerous BLOBs
    • If the application makes heavy use of files (audio files, videos, images, etc), it is a better choice to use S3 to store the objects instead of database engines Blob feature and use RDS or DynamoDB only to save the metadata
  • Automated scalability
    • RDS provides pushbutton scaling and it only scales up and has limited scale out ability. If fully-automated scaling is needed, DynamoDB may be a better choice.
  • Complete control
    • RDS does not provide admin access and does not enable the full feature set of the database engines.
    • So if the application requires complete, OS-level control of the database server with full root or admin login privileges, a self-managed database on EC2 may be a better match.
  • Other database platforms
    • RDS, at this time, provides a MySQL, Oracle, MariaDB, PostgreSQL and SQL Server databases.
    • If any other database platform (such as IBM DB2, Informix, or Sybase) is needed, it should be deployed on a self-managed database on an EC2 instance by using a relational database AMI, or by installing database software on an EC2 instance.

Performance

  • RDS Provisioned IOPS, where the IOPS can be specified when the instance is launched and is guaranteed over the life of the instance, provides a high-performance storage option designed to deliver fast, predictable, and consistent performance for I/O intensive transactional database workload

Durability and Availability

  • RDS leverages Amazon EBS volumes as its data store
  • RDS provides database backups, for enhanced durability, which are replicated across multiple AZ’s
    • Automated backups
      • If enabled, RDS will automatically perform a full daily backup of your data during the specified backup window, and will also capture DB transaction logs
    • User initiated backups
      • User can initiate backups at time and they are not deleted unless deleted explicitly by the user
  • RDS Multi AZ’s feature enhances both the durability and the availability of the database by synchronously replicating the data between a primary RDS DB instance and a standby instance in another Availability Zone, which prevents data loss,
  • RDS provides a DNS endpoint and in case of an failure on the primary, it automatically fails over to the standby instance
  • RDS also allows Read replicas for the supported databases, which are replicated asynchronously

Cost Model

  • RDS offers a tiered pricing structure, based on the size of the database instance, the deployment type (Single-AZ/Multi-AZ), and the AWS region.
  • Pricing for RDS is based on several factors: the DB instance hours (per hour), the amount of provisioned database storage (per GB-month and per million I/O requests), additional backup storage (per GB-month), and data transfer in/out (per GB per month)

Scalability and Elasticity

  • RDS resources can be scaled elastically in several dimensions: database storage size, database storage IOPS rate, database instance compute capacity, and the number of read replicas
  • RDS supports “pushbutton scaling” of both database storage and compute resources. Additional storage can either be added immediately or during the next maintenance cycle
  • RDS for MySQL also enables you to scale out beyond the capacity of a single database deployment for read-heavy database workloads by creating one or more read replicas.
  • Multiple RDS instances can also be configured to leverage database partitioning or sharding to spread the workload over multiple DB instances, achieving even greater database scalability and elasticity.

Interfaces

  • RDS APIs and the AWS Management Console provide a management interface that allows you to create, delete, modify, and terminate RDS DB instances; to create DB snapshots; and to perform point-in-time restores
  • There is no AWS data API for Amazon RDS.
  • Once a database is created, RDS provides a DNS endpoint for the database which can be used to connect to the database.
  • Endpoint does not change over the lifetime of the instance even during the failover in case of Multi-AZ configuration

Amazon DynamoDB

  • Amazon DynamoDB is a fast, fully-managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic.
  • DynamoDB being a managed service helps offload the administrative burden of operating and scaling a highly-available distributed database cluster.
  • DynamoDB helps meet the latency and throughput requirements of highly demanding applications by providing extremely fast and predictable performance with seamless throughput and storage scalability.
  • DynamoDB provides both eventually-consistent reads (by default), and strongly-consistent reads (optional), as well as implicit item-level transactions for item put, update, delete, conditional operations, and increment/decrement.
  • Amazon DynamoDB handles the data as below :-
    • DynamoDB stores structured data in tables, indexed by primary key, and allows low-latency read and write access to items.
    • DynamoDB supports three data types: number, string, and binary, in both scalar and multi-valued sets.
    • Tables do not have a fixed schema, so each data item can have a different number of attributes.
    • Primary key can either be a single-attribute hash key or a composite hash-range key.
    • Local secondary indexes provide additional flexibility for querying against attributes other than the primary key.

Ideal Usage Patterns

  • DynamoDB is ideal for existing or new applications that need a flexible NoSQL database with low read and write latencies, and the ability to scale storage and throughput up or down as needed without code changes or downtime.
  • Use cases require a highly available and scalable database because downtime or performance degradation has an immediate negative impact on an organization’s business. for e.g. mobile apps, gaming, digital ad serving, live voting and audience interaction for live events, sensor networks, log ingestion, access control for web-based content, metadata storage for S3 objects, e-commerce shopping carts, and web session management

Anti-Patterns

  • Structured data with Join and/or Complex Transactions
    • If the application uses structured data and required joins, complex transactions or other relationship infrastructure provided by traditional database platforms, it is better to use RDS or Database installed on an EC2 instance
  • Large Blob data
    • If the application uses large blob data for e.g. media, files, videos etc., it is better to use S3 to store the objects and use DynamoDB to store metadata for e.g. name, size, content-type etc
  • Large Objects with Low I/O rate
    • DynamoDB uses SSD drives and is optimized for workloads with a high I/O rate per GB stored. If the applications stores very large amounts of data that are infrequently accessed, S3 might be a better choice
  • Prewritten application with databases
    • For Porting an existing application using databases, RDS or database installed on the EC2 instance would be a better and seamless solution

Performance

  • SSDs and limited indexing on attributes provides high throughput and low latency and drastically reduces the cost of read and write operations.
  • Predictable performance can be achieved by defining the provisioned throughput capacity required for a given table.
  • DynamoDB handles the provisioning of resources to achieve the requested throughput rate, taking away the burden to think about instances, hardware, memory, and other factors that can affect an application’s throughput rate.
  • Provisioned throughput capacity reservations are elastic and can be increased or decreased on demand.

Durability and Availability

  • DynamoDB has built-in fault tolerance that automatically and synchronously replicates data across three AZ’s in a region for high availability and to help protect data against individual machine, or even facility failures.

Cost Model

  • DynamoDB has three pricing components: provisioned throughput capacity (per hour), indexed data storage (per GB per month), data transfer in or out (per GB per month)

Scalability and Elasticity

  • DynamoDB is both highly-scalable and elastic.
  • DynamoDB provides unlimited storage capacity, and the service automatically allocates more storage as the demand increases
  • Data is automatically partitioned and re-partitioned as needed, while the use of SSDs provides predictable low-latency response times at any scale.
  • DynamoDB is also elastic, in that you can simply “dial-up” or “dial-down” the read and write capacity of a table as your needs change.

Interfaces

  • DynamoDB provides a low-level REST API, as well as higher-level SDKs in different languages
  • APIs provide both a management and data interface for Amazon DynamoDB, that enable table management (creating, listing, deleting, and obtaining metadata) and working with attributes (getting, writing, and deleting attributes; query using an index, and full scan).

Databases on EC2

  • EC2 with EBS volumes allows hosting a self managed relational database
  • Ready to use, prebuilt AMIs are also available from leading database solutions

Ideal Usage Patterns

  • Self managed database on EC2 is an ideal scenario for users whose application requires a specific traditional relational database not supported by Amazon RDS for e.g. IBM DB2, Informix, or Sybase
  • Users or applications that require a maximum level of administrative control and configurability which is not provided by RDS

Anti-Patterns

  • Index and query-focused data
    • If the applications don’t require advanced features such as joins and complex transactions and is more oriented toward indexing and querying data, DynamoDB would be more appropriate for this needs
  • Numerous BLOBs
    • If the application makes heavy use of files (audio files, videos, images, and so on), it is a better choice to use S3 to store the objects instead of database engines Blob feature and use RDS or DynamoDB only to save the metadata
  • Automated scalability
    • Relational databases on EC2 leverages the scalability and elasticity of the underlying AWS platform, but this requires system administrators or DBAs to perform a manual or scripted task. If you need pushbutton scaling or fully-automated scaling, DynamoDB or RDS may be a better choice.
  • RDS supported database platforms
    • If the application using RDS supported database engine and all the features are available, RDS would be a better choice instead of self managed relational database on EC2

Performance

  • Performance depends on the size of the underlying EC2 instance, the number and configuration of the EBS volumes and the database itself
  • Performance can be increased by scaling up memory and compute resources by choosing a larger Amazon EC2 instance size.
  • For database storage, it is usually best to use EBS Provisioned IOPS volumes. To scale up I/O performance, the Provisioned IOPS can be increased, the number of EBS volumes changed, or use software RAID 0 (disk striping) across multiple EBS volumes, which will aggregate total IOPS and bandwidth.

Durability & Availability

  • As the database on EC2 uses EBS as storage, it has the same durability and availability provided by EBS and can be further enhanced by using EBS snapshots or by using third-party database backup utilities (such as Oracle’s RMAN) to store database backups in Amazon S3

Cost Model

  • Cost for running a database on EC2 instance is mainly determined by the size and the number of EC2 instance running, the size of the EBS volume used for database storage and any third party licensing cost for the database

Scalability & Elasticity

  • Users of traditional relational database solutions on Amazon EC2 can take advantage of the scalability and elasticity of the underlying AWS platform by creating AMI and spawning multiple instances

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following are use cases for Amazon DynamoDB? Choose 3 answers
    1. Storing BLOB data.
    2. Managing web sessions
    3. Storing JSON documents
    4. Storing metadata for Amazon S3 objects
    5. Running relational joins and complex updates.
    6. Storing large amounts of infrequently accessed data.
  2. A client application requires operating system privileges on a relational database server. What is an appropriate configuration for highly available database architecture?
    1. A standalone Amazon EC2 instance
    2. Amazon RDS in a Multi-AZ configuration
    3. Amazon EC2 instances in a replication configuration utilizing a single Availability Zone
    4. Amazon EC2 instances in a replication configuration utilizing two different Availability Zones
  3. You are developing a new mobile application and are considering storing user preferences in AWS, which would provide a more uniform cross-device experience to users using multiple mobile devices to access the application. The preference data for each user is estimated to be 50KB in size. Additionally 5 million customers are expected to use the application on a regular basis. The solution needs to be cost-effective, highly available, scalable and secure, how would you design a solution to meet the above requirements?
    1. Setup an RDS MySQL instance in 2 availability zones to store the user preference data. Deploy a public facing application on a server in front of the database to manage security and access credentials
    2. Setup a DynamoDB table with an item for each user having the necessary attributes to hold the user preferences. The mobile application will query the user preferences directly from the DynamoDB table. Utilize STS. Web Identity Federation, and DynamoDB Fine Grained Access Control to authenticate and authorize access (DynamoDB provides high availability as it synchronously replicates data across three facilities within an AWS Region and scalability as it is designed to scale its provisioned throughput up or down while still remaining available. Also suitable for storing user preference data)
    3. Setup an RDS MySQL instance with multiple read replicas in 2 availability zones to store the user preference data .The mobile application will query the user preferences from the read replicas. Leverage the MySQL user management and access privilege system to manage security and access credentials.
    4. Store the user preference data in S3 Setup a DynamoDB table with an item for each user and an item attribute pointing to the user’ S3 object. The mobile application will retrieve the S3 URL from DynamoDB and then access the S3 object directly utilize STS, Web identity Federation, and S3 ACLs to authenticate and authorize access.
  4. A customer is running an application in US-West (Northern California) region and wants to setup disaster recovery failover to the Asian Pacific (Singapore) region. The customer is interested in achieving a low Recovery Point Objective (RPO) for an Amazon RDS multi-AZ MySQL database instance. Which approach is best suited to this need?
    1. Synchronous replication
    2. Asynchronous replication
    3. Route53 health checks
    4. Copying of RDS incremental snapshots
  5. You are designing a file -sharing service. This service will have millions of files in it. Revenue for the service will come from fees based on how much storage a user is using. You also want to store metadata on each file, such as title, description and whether the object is public or private. How do you achieve all of these goals in a way that is economical and can scale to millions of users?
    1. Store all files in Amazon Simple Storage Service (53). Create a bucket for each user. Store metadata in the filename of each object, and access it with LIST commands against the S3 API.
    2. Store all files in Amazon 53. Create Amazon DynamoDB tables for the corresponding key -value pairs on the associated metadata, when objects are uploaded.
    3. Create a striped set of 4000 IOPS Elastic Load Balancing volumes to store the data. Use a database running in Amazon Relational Database Service (RDS) to store the metadata.
    4. Create a striped set of 4000 IOPS Elastic Load Balancing volumes to store the data. Create Amazon DynamoDB tables for the corresponding key-value pairs on the associated metadata, when objects are uploaded.
  6. Company ABCD has recently launched an online commerce site for bicycles on AWS. They have a “Product” DynamoDB table that stores details for each bicycle, such as, manufacturer, color, price, quantity and size to display in the online store. Due to customer demand, they want to include an image for each bicycle along with the existing details. Which approach below provides the least impact to provisioned throughput on the “Product” table?
    1. Serialize the image and store it in multiple DynamoDB tables
    2. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
    3. Add an image data type to the “Product” table to store the images in binary format
    4. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

AWS Storage Options – S3 & Glacier

Amazon S3

  • highly-scalable, reliable, and low-latency data storage infrastructure at very low costs.
  • provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from within Amazon EC2 or from anywhere on the web.
  • allows you to write, read, and delete objects containing from 1 byte to 5 terabytes of data each.
  • number of objects you can store in an Amazon S3 bucket is virtually unlimited.
  • highly secure, supporting encryption at rest, and providing multiple mechanisms to provide fine-grained control of access to Amazon S3 resources.
  • highly scalable, allowing concurrent read or write access to Amazon S3 data by many separate clients or application threads.
  • provides data lifecycle management capabilities, allowing users to define rules to automatically archive Amazon S3 data to Amazon Glacier, or to delete data at end of life.

Ideal Use Cases

  • Storage & Distribution of static web content and media
    • frequently used to host static websites and provides a highly-available and highly-scalable solution for websites with only static content, including HTML files, images, videos, and client-side scripts such as JavaScript
    • works well for fast growing websites hosting data intensive, user-generated content, such as video and photo sharing sites as no storage provisioning is required
    • content can either be directly served from Amazon S3 since each object in Amazon S3 has a unique HTTP URL address
    • can also act as an Origin store for the Content Delivery Network (CDN) such as Amazon CloudFront
    • it works particularly well for hosting web content with extremely spiky bandwidth demands because of S3’s elasticity
  • Data Store for Large Objects
    • can be paired with RDS or NoSQL database and used to store large objects for e.g. file or objects, while the associated metadata for e.g. name, tags, comments etc. can be stored in RDS or NoSQL database where it can be indexed and queried providing faster access to relevant data
  • Data store for computation and large-scale analytics
    • commonly used as a data store for computation and large-scale analytics, such as analyzing financial transactions, clickstream analytics, and media transcoding.
    • data can be accessed from multiple computing nodes concurrently without being constrained by a single connection because of its horizontal scalability
  • Backup and Archival of critical data
    • used as a highly durable, scalable, and secure solution for backup and archival of critical data, and to provide disaster recovery solutions for business continuity.
    • stores objects redundantly on multiple devices across multiple facilities, it provides the highly-durable storage infrastructure needed for these scenarios.
    • it’s versioning capability is available to protect critical data from inadvertent deletion

Anti-Patterns

Amazon S3 has following Anti-Patterns where it is not an optimal solution

  • Dynamic website hosting
    • While Amazon S3 is ideal for hosting static websites, dynamic websites requiring server side interaction, scripting or database interaction cannot be hosted and should rather be hosted on Amazon EC2
  • Backup and archival storage
    • Data requiring long term archival storage with infrequent read access can be stored more cost effectively in Amazon Glacier
  • Structured Data Query
    • Amazon S3 doesn’t offer query capabilities, so to read an object the object name and key must be known. Instead pair up S3 with RDS or Dynamo DB to store, index and query metadata about Amazon S3 objects
    • NOTE – S3 now provides query capabilities and also Athena can be used
  • Rapidly Changing Data
    • Data that needs to updated frequently might be better served by a storage solution with lower read/write latencies, such as Amazon EBS volumes, RDS or Dynamo DB.
  • File System
    • Amazon S3 uses a flat namespace and isn’t meant to serve as a standalone, POSIX-compliant file system. However, by using delimiters (commonly either the ‘/’ or ‘’ character) you are able construct your keys to emulate the hierarchical folder structure of file system within a given bucket.

Performance

  • Access to Amazon S3 from within Amazon EC2 in the same region is fast.
  • Amazon S3 is designed so that server-side latencies are insignificant relative to Internet latencies.
  • Amazon S3 is also built to scale storage, requests, and users to support a virtually unlimited number of web-scale applications.
  • If Amazon S3 is accessed using multiple threads, multiple applications, or multiple clients concurrently, total Amazon S3 aggregate throughput will typically scale to rates that far exceed what any single server can generate or consume.

Durability & Availability

  • Amazon S3 storage provides provides the highest level of data durability and availability, by automatically and synchronously storing your data across both multiple devices and multiple facilities within the selected geographical region
  • Error correction is built-in, and there are no single points of failure. Amazon S3 is designed to sustain the concurrent loss of data in two facilities, making it very well-suited to serve as the primary data storage for mission-critical data.
  • Amazon S3 is designed for 99.999999999% (11 nines) durability per object and 99.99% availability over a one-year period.
  • Amazon S3 data can be protected from unintended deletions or overwrites using Versioning.
  • Versioning can be enabled with MFA (Multi Factor Authentication) Delete on the bucket, which would require two forms of authentication to delete an object
  • For Non Critical and Reproducible data for e.g. thumbnails, transcoded media etc., S3 Reduced Redundancy Storage (RRS) can be used, which provides a lower level of durability at a lower storage cost
  • RRS is designed to provide 99.99% durability per object over a given year. While RRS is less durable than standard Amazon S3, it is still designed to provide 400 times more durability than a typical disk drive

Cost Model

  • With Amazon S3, you pay only for what you use and there is no minimum fee.
  • Amazon S3 has three pricing components: storage (per GB per month), data transfer in or out (per GB per month), and requests (per n thousand requests per month).

Scalability & Elasticity

  • Amazon S3 has been designed to offer a very high level of scalability and elasticity automatically
  • Amazon S3 supports a virtually unlimited number of files in any bucket
  • Amazon S3 bucket can store a virtually unlimited number of bytes
  • Amazon S3 allows you to store any number of objects (files) in a single bucket, and Amazon S3 will automatically manage scaling and distributing redundant copies of your information to other servers in other locations in the same region, all using Amazon’s high-performance infrastructure.

Interfaces

  • Amazon S3 provides standards-based REST and SOAP web services APIs for both management and data operations.
  • NOTE – SOAP support over HTTP is deprecated, but it is still available over HTTPS. New Amazon S3 features will not be supported for SOAP. We recommend that you use either the REST API or the AWS SDKs.
  • Amazon S3 provides easier to use higher level toolkit or SDK in different languages (Java, .NET, PHP, and Ruby) that wraps the underlying APIs
  • Amazon S3 Command Line Interface (CLI) provides a set of high-level, Linux-like Amazon S3 file commands for common operations, such as ls, cp, mv, sync, etc. They also provide the ability to perform recursive uploads and downloads using a single folder-level Amazon S3 command, and supports parallel transfers.
  • AWS Management Console provides the ability to easily create and manage Amazon S3 buckets, upload and download objects, and browse the contents of your Amazon S3 buckets using a simple web-based user interface
  • All interfaces provide the ability to store Amazon S3 objects (files) in uniquely-named buckets (top-level folders), with each object identified by an unique Object key within that bucket.

Glacier

  • extremely low-cost storage service that provides highly secure, durable, and flexible storage for data backup and archival
  • can reliably store their data for as little as $0.01 per gigabyte per month.
  • to offload the administrative burdens of operating and scaling storage to AWS such as capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time consuming hardware migrations
  • Data is stored in Amazon Glacier as Archives where an archive can represent a single file or multiple files combined into a single archive
  • Archives are stored in Vaults for which the access can be controlled through IAM
  • Retrieving archives from Vaults require initiation of a job and can take anywhere around 3-5 hours
  • Amazon Glacier integrates seamlessly with Amazon S3 by using S3 data lifecycle management policies to move data from S3 to Glacier
  • AWS Import/Export can also be used to accelerate moving large amounts of data into Amazon Glacier using portable storage devices for transport

Ideal Usage Patterns

  • Amazon Glacier is ideally suited for long term archival solution for infrequently accessed data with archiving offsite enterprise information, media assets, research and scientific data, digital preservation and magnetic tape replacement

Anti-Patterns

Amazon Glacier has following Anti-Patterns where it is not an optimal solution

  • Rapidly changing data
    • Data that must be updated very frequently might be better served by a storage solution with lower read/write latencies such as Amazon EBS or a Database
  • Real time access
    • Data stored in Glacier can not be accessed at real time and requires an initiation of a job for object retrieval with retrieval times ranging from 3-5 hours. If immediate access is needed, Amazon S3 is a better choice.

Performance

  • Amazon Glacier is a low-cost storage service designed to store data that is infrequently accessed and long lived.
  • Amazon Glacier jobs typically complete in 3 to 5 hours

Durability and Availability

  • Amazon Glacier redundantly stores data in multiple facilities and on multiple devices within each facility
  • Amazon Glacier is designed to provide average annual durability of 99.999999999% (11 nines) for an archive
  • Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives.
  • Amazon Glacier also performs regular, systematic data integrity checks and is built to be automatically self-healing.

Cost Model

  • Amazon Glacier has three pricing components: storage (per GB per month), data transfer out (per GB per month), and requests (per thousand UPLOAD and RETRIEVAL requests per month).
  • Amazon Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time and allows you to retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. Any additional amount of data retrieved is charged per GB
  • Amazon Glacier also charges a pro-rated charge (per GB) for items deleted prior to 90 days

Scalability & Elasticity

  • A single archive is limited to 40 TBs, but there is no limit to the total amount of data you can store in the service.
  • Amazon Glacier scales to meet your growing and often unpredictable storage requirements whether you’re storing petabytes or gigabytes, Amazon Glacier automatically scales your storage up or down as needed.

Interfaces

  • Amazon Glacier provides a native, standards-based REST web services interface, as well as Java and .NET SDKs.
  • AWS Management Console or the Amazon Glacier APIs can be used to create vaults to organize the archives in Amazon Glacier.
  • Amazon Glacier APIs can be used to upload and retrieve archives, monitor the status of your jobs and also configure your vault to send you a notification via Amazon Simple Notification Service (Amazon SNS) when your jobs complete.
  • Amazon Glacier can be used as a storage class in Amazon S3 by using object lifecycle management to provide automatic, policy-driven archiving from Amazon S3 to Amazon Glacier.
  • Amazon S3 api provides a RESTORE operation and the retrieval process takes the same 3-5 hours
  • On retrieval, a copy of the retrieved object is placed in Amazon S3 RRS storage for a specified retention period; the original archived object remains stored in Amazon Glacier and you are charged for both the storage.
  • When using Amazon Glacier as a storage class in Amazon S3, use the Amazon S3 APIs, and when using “native” Amazon Glacier, you use the Amazon Glacier APIs
  • Objects archived to Amazon Glacier via Amazon S3 can only be listed and retrieved via the Amazon S3 APIs or the AWS Management Console—they are not visible as archives in an Amazon Glacier vault.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You want to pass queue messages that are 1GB each. How should you achieve this?
    1. Use Kinesis as a buffer stream for message bodies. Store the checkpoint id for the placement in the Kinesis Stream in SQS.
    2. Use the Amazon SQS Extended Client Library for Java and Amazon S3 as a storage mechanism for message bodies. (Amazon SQS messages with Amazon S3 can be useful for storing and retrieving messages with a message size of up to 2 GB. To manage Amazon SQS messages with Amazon S3, use the Amazon SQS Extended Client Library for Java. Refer link)
    3. Use SQS’s support for message partitioning and multi-part uploads on Amazon S3.
    4. Use AWS EFS as a shared pool storage medium. Store filesystem pointers to the files on disk in the SQS message bodies.
  2. Company ABCD has recently launched an online commerce site for bicycles on AWS. They have a “Product” DynamoDB table that stores details for each bicycle, such as, manufacturer, color, price, quantity and size to display in the online store. Due to customer demand, they want to include an image for each bicycle along with the existing details. Which approach below provides the least impact to provisioned throughput on the “Product” table?
    1. Serialize the image and store it in multiple DynamoDB tables
    2. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
    3. Add an image data type to the “Product” table to store the images in binary format
    4. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

References

AWS S3 Best Practices

S3 Best Practices

Performance

Multiple Concurrent PUTs/GETs

  • S3 scales to support very high request rates. If the request rate grows steadily, S3 automatically partitions the buckets as needed to support higher request rates.
  • S3 can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.
  • If the typical workload involves only occasional bursts of 100 requests per second and less than 800 requests per second, AWS scales and handle it.
  • If the typical workload involves a request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, it’s recommended to open a support case to prepare for the workload and avoid any temporary limits on your request rate.
  • S3 best practice guidelines can be applied only if you are routinely processing 100 or more requests per second
  • Workloads that include a mix of request types
    • If the request workload is typically a mix of GET, PUT, DELETE, or GET Bucket (list objects), choosing appropriate key names for the objects ensures better performance by providing low-latency access to the S3 index
    • This behavior is driven by how S3 stores key names.
      • S3 maintains an index of object key names in each AWS region.
      • Object keys are stored lexicographically (UTF-8 binary ordering) across multiple partitions in the index i.e. S3 stores key names in alphabetical order.
      • Object keys are stored in across multiple partitions in the index and the key name dictates which partition the key is stored in
      • Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that S3 will target a specific partition for a large number of keys, overwhelming the I/O capacity of the partition.
    • Introduce some randomness in the key name prefixes, the key names, and the I/O load, will be distributed across multiple index partitions.
    • It also ensures scalability regardless of the number of requests sent per second.

Transfer Acceleration

  • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and an S3 bucket.
  • Transfer Acceleration takes advantage of CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to S3 over an optimized network path.

GET-intensive Workloads

  • CloudFront can be used for performance optimization and can help by
    • distributing content with low latency and high data transfer rate.
    • caching the content and thereby reducing the number of direct requests to S3
    • providing multiple endpoints (Edge locations) for data availability
    • available in two flavors as Web distribution or RTMP distribution
  • To fast data transport over long distances between a client and an S3 bucket, use S3 Transfer Acceleration. Transfer Acceleration uses the globally distributed edge locations in CloudFront to accelerate data transport over geographical distances

PUTs/GETs for Large Objects

  • AWS allows Parallelizing the PUTs/GETs request to improve the upload and download performance as well as the ability to recover in case it fails
  • For PUTs, Multipart upload can help improve the uploads by
    • performing multiple uploads at the same time and maximizing network bandwidth utilization
    • quick recovery from failures, as only the part that failed to upload, needs to be re-uploaded
    • ability to pause and resume uploads
    • begin an upload before the Object size is known
  • For GETs, the range HTTP header can help to improve the downloads by
    • allowing the object to be retrieved in parts instead of the whole object
    • quick recovery from failures, as only the part that failed to download needs to be retried.

List Operations

  • Object key names are stored lexicographically in S3 indexes, making it hard to sort and manipulate the contents of LIST
  • S3 maintains a single lexicographically sorted list of indexes
  • Build and maintain Secondary Index outside of S3 for e.g. DynamoDB or RDS to store, index and query objects metadata rather than performing operations on S3

Security

  • Use Versioning
    • can be used to protect from unintended overwrites and deletions
    • allows the ability to retrieve and restore deleted objects or rollback to previous versions
  • Enable additional security by configuring a bucket to enable MFA (Multi-Factor Authentication) to delete
  • Versioning does not prevent Bucket deletion and must be backed up as if accidentally or maliciously deleted the data is lost
  • Use Same Region Replication or Cross Region replication feature to backup data to a different region
  • When using VPC with S3, use VPC S3 endpoints as
    • are horizontally scaled, redundant, and highly available VPC components
    • help establish a private connection between VPC and S3 and the traffic never leaves the Amazon network

Refer blog post @ S3 Security Best Practices

Cost

  • Optimize S3 storage cost by selecting an appropriate storage class for objects
  • Configure appropriate lifecycle management rules to move objects to different storage classes and expire them

Tracking

  • Use Event Notifications to be notified for any put or delete request on the S3 objects
  • Use CloudTrail, which helps capture specific API calls made to S3 from the AWS account and delivers the log files to an S3 bucket
  • Use CloudWatch to monitor the Amazon S3 buckets, tracking metrics such as object counts and bytes stored, and configure appropriate actions

S3 Monitoring and Auditing Best Practices

Refer blog post @ S3 Monitoring and Auditing Best Practices

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    1. Increase your network bandwidth to provide faster throughput to S3
    2. Upload the files in parallel to S3 using multipart upload
    3. Pack all files into a single archive, upload it to S3, then extract the files in AWS
    4. Use AWS Import/Export to transfer the video files
  2. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?
    1. Use multi-part upload.
    2. Add a random prefix to the key names.
    3. Amazon S3 will automatically manage performance at this scale.
    4. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names
  3. You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?
    1. Enable enhanced networking
    2. Use Amazon S3 multipart upload
    3. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
    4. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance
  4. Which of the following methods gives you protection against accidental loss of data stored in Amazon S3? (Choose 2)
    1. Set bucket policies to restrict deletes, and also enable versioning
    2. By default, versioning is enabled on a new bucket so you don’t have to worry about it (Not enabled by default)
    3. Build a secondary index of your keys to protect the data (improves performance only)
    4. Back up your bucket to a bucket owned by another AWS account for redundancy
  5. A startup company hired you to help them build a mobile application that will ultimately store billions of image and videos in Amazon S3. The company is lean on funding, and wants to minimize operational costs, however, they have an aggressive marketing plan, and expect to double their current installation base every six months. Due to the nature of their business, they are expecting sudden and large increases to traffic to and from S3, and need to ensure that it can handle the performance needs of their application. What other information must you gather from this customer in order to determine whether S3 is the right option?
    1. You must know how many customers that company has today, because this is critical in understanding what their customer base will be in two years. (No. of customers do not matter)
    2. You must find out total number of requests per second at peak usage.
    3. You must know the size of the individual objects being written to S3 in order to properly design the key namespace. (Size does not relate to the key namespace design but the count does)
    4. In order to build the key namespace correctly, you must understand the total amount of storage needs for each S3 bucket. (S3 provided unlimited storage the key namespace design would depend on the number)
  6. A document storage company is deploying their application to AWS and changing their business model to support both free tier and premium tier users. The premium tier users will be allowed to store up to 200GB of data and free tier customers will be allowed to store only 5GB. The customer expects that billions of files will be stored. All users need to be alerted when approaching 75 percent quota utilization and again at 90 percent quota use. To support the free tier and premium tier users, how should they architect their application?
    1. The company should utilize an amazon simple workflow service activity worker that updates the users data counter in amazon dynamo DB. The activity worker will use simple email service to send an email if the counter increases above the appropriate thresholds.
    2. The company should deploy an amazon relational data base service relational database with a store objects table that has a row for each stored object along with size of each object. The upload server will query the aggregate consumption of the user in questions (by first determining the files store by the user, and then querying the stored objects table for respective file sizes) and send an email via Amazon Simple Email Service if the thresholds are breached. (Good Approach to use RDS but with so many objects might not be a good option)
    3. The company should write both the content length and the username of the files owner as S3 metadata for the object. They should then create a file watcher to iterate over each object and aggregate the size for each user and send a notification via Amazon Simple Queue Service to an emailing service if the storage threshold is exceeded. (List operations on S3 not feasible)
    4. The company should create two separated amazon simple storage service buckets one for data storage for free tier users and another for data storage for premium tier users. An amazon simple workflow service activity worker will query all objects for a given user based on the bucket the data is stored in and aggregate storage. The activity worker will notify the user via Amazon Simple Notification Service when necessary (List operations on S3 not feasible as well as SNS does not address email requirement)
  7. Your company host a social media website for storing and sharing documents. the web application allow users to upload large files while resuming and pausing the upload as needed. Currently, files are uploaded to your php front end backed by Elastic Load Balancing and an autoscaling fleet of amazon elastic compute cloud (EC2) instances that scale upon average of bytes received (NetworkIn) After a file has been uploaded. it is copied to amazon simple storage service(S3). Amazon Ec2 instances use an AWS Identity and Access Management (AMI) role that allows Amazon s3 uploads. Over the last six months, your user base and scale have increased significantly, forcing you to increase the auto scaling groups Max parameter a few times. Your CFO is concerned about the rising costs and has asked you to adjust the architecture where needed to better optimize costs. Which architecture change could you introduce to reduce cost and still keep your web application secure and scalable?
    1. Replace the Autoscaling launch Configuration to include c3.8xlarge instances; those instances can potentially yield a network throughput of 10gbps. (no info of current size and might increase cost)
    2. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic to directly upload the file to amazon s3 using the given credentials and S3 Prefix. (will not provide the ability to handle pause and restarts)
    3. Re-architect your ingest pattern, and move your web application instances into a VPC public subnet. Attach a public IP address for each EC2 instance (using the auto scaling launch configuration settings). Use Amazon Route 53 round robin records set and http health check to DNS load balance the app request this approach will significantly reduce the cost by bypassing elastic load balancing. (ELB is not the bottleneck)
    4. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic that used the S3 multipart upload API to directly upload the file to Amazon s3 using the given credentials and s3 Prefix. (multipart allows one to start uploading directly to S3 before the actual size is known or complete data is downloaded)
  8. If an application is storing hourly log files from thousands of instances from a high traffic web site, which naming scheme would give optimal performance on S3?
    1. Sequential
    2. instanceID_log-HH-DD-MM-YYYY
    3. instanceID_log-YYYY-MM-DD-HH
    4. HH-DD-MM-YYYY-log_instanceID (HH will give some randomness to start with instead of instaneId where the first characters would be i-)
    5. YYYY-MM-DD-HH-log_instanceID

Reference

S3_Optimizing_Performance

AWS EC2 – Elastic Cloud Compute

Elastic Cloud Compute – EC2

  • Elastic Compute Cloud – EC2 provides scalable computing capacity in AWS
  • Elastic Compute Cloud – EC2
    • eliminates the need to invest in hardware upfront, so applications can be developed and deployed faster.
    • can be used to launch as many or as few virtual servers as you need, configure security and networking, and manage storage.
    • enables you to scale up or down to handle changes in requirements or spikes in popularity, reducing the need to forecast traffic.

EC2 features

  • EC2 instances – Virtual computing environments
  • Amazon Machine Images (AMIs) – Preconfigured templates for the instances that package the bits needed for a server (including the operating system and additional software)
  • Instance types – Various configurations of CPU, memory, storage, and networking capacity for the instances
  • Key Pairs – Secure login information for the instances (AWS stores the public key, and you store the private key in a secure place)
  • Instance Store VolumesStorage volumes for temporary data that are deleted when you stop or terminate your instance, known as
  • EBS Volumes – Persistent storage volumes for the data using Elastic Block Store (EBS)
  • Regions and Availability ZonesMultiple physical locations for the resources, such as instances and EBS volumes
  • Security GroupsA firewall that enables you to specify the protocols, ports, and source IP ranges that can reach the instances
  • Elastic IP addresses – Static IP addresses for dynamic cloud computing
  • Tags – Metadata can be created and assigned to EC2 resources

Accessing EC2

  • Amazon EC2 console
    • Amazon EC2 console is the web-based user interface that can be accessed from the AWS management console
  • AWS Command line Interface (CLI)
    • Provides commands for a broad set of AWS products, and is supported on Windows, Mac, and Linux.
  • Amazon EC2 Command Line Interface (CLI) tools
    • Provides commands for Amazon EC2, Amazon EBS, and Amazon VPC, and is supported on Windows, Mac, and Linux
  • AWS Tools for Windows Powershell
    • Provides commands for a broad set of AWS products for those who script in the PowerShell environment
  • AWS Query API
    • Query API allows for requests are HTTP or HTTPS requests that use the HTTP verbs GET or POST and a Query parameter named Action
  • AWS SDK libraries
    • AWS provides libraries in various languages which provide basic functions that automate tasks such as cryptographically signing your requests, retrying requests, and handling error responses

Additional Reading

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What are the Amazon EC2 API tools?
    1. They don’t exist. The Amazon EC2 AMI tools, instead, are used to manage permissions.
    2. Command-line tools to the Amazon EC2 web service
    3. They are a set of graphical tools to manage EC2 instances.
    4. They don’t exist. The Amazon API tools are a client interface to Amazon Web Services.
  2. When a user is launching an instance with EC2, which of the below mentioned options is not available during the instance launch console for a key pair?
    1. Proceed without the key pair
    2. Upload a new key pair
    3. Select an existing key pair
    4. Create a new key pair

References

AWS_EC2

AWS EC2 Security

AWS EC2 Security

  • IAM helps control whether users in the organization can perform a task using specific EC2 API actions and whether they can use specific AWS resources.
  • Use IAM roles to prevent the need to share as well as manage, and rotate the security credentials that the applications use.
  • Security groups act as a virtual firewall that controls the traffic to the EC2 instances. They can help specify rules that control the inbound traffic that’s allowed to reach the instances and the outbound traffic that’s allowed to leave the instance.
  • Use AWS Systems Manager Session Manager to connect to the instance as it provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys.
  • Use EC2 Instance Connect to connect to your instances using Secure Shell (SSH) without the need to share and manage SSH keys.
  • Use AWS Systems Manager Run Command to automate common administrative tasks instead of opening inbound SSH ports and managing SSH keys.
  • Use Systems Manager Patch Manager can be used to automate the process of patching, installing security-related updates for both the operating system and applications.

EC2 Key Pairs

  • EC2 uses public-key cryptography to encrypt & decrypt login information
  • Public-key cryptography uses a public key to encrypt a piece of data, such as a password, then the recipient uses the private key to decrypt the data.
  • Public and private keys are known as a key pair.
  • To log in to an EC2 instance, a key pair needs to be created and specified when the instance is launched, and the private key can be used to connect to the instance.
  • Linux instances have no password, and the key pair is used for ssh log in
  • For Windows instances, the key pair can be used to obtain the administrator password and then log in using RDP
  • EC2 stores the public key only, and the private key resides with the user. EC2 doesn’t keep a copy of your private key
  • Public key content (on Linux instances) is placed in an entry within  ~/.ssh/authorized_keys at boot time and enables the user to securely access the instance without passwords
  • Public key specified for an instance when launched is also available through its instance metadata http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key
  • EC2 Security Best Practice: Store the private keys in a secure place as anyone who possesses the private key can decrypt the login information
  • Also, if the private key is lost, there is no way to recover the same.
    • For instance store, you cannot access the instance
    • For EBS-backed Linux instances, access can be regained.
      • EBS-backed instance can be stopped, its root volume detached and attached to another instance as a data volume
      • Modify the authorized_keys file, move the volume back to the original instance, and restart the instance
  • Key pair associated with the instances can either be
    • Generated by EC2
      • Keys that EC2 uses are 2048-bit SSH-2 RSA keys.
    • Created separately (using third-party tools) and Imported into EC2
      • EC2 only accepts RSA keys and does not accept DSA keys
      • Supported lengths: 1024, 2048, and 4096
  • supports five thousand key pairs per region
  • Deleting a key pair only deletes the public key and does not impact the servers already launched with the key.
  • Use AWS Systems Manager Session Manager to connect to the instance as it provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys.

EC2 Security Groups

  • An EC2 instance, when launched, can be associated with one or more security groups, which acts as a virtual firewall that controls the traffic to that instance
  • Security groups help specify rules that control the inbound traffic that’s allowed to reach the instances and the outbound traffic that’s allowed to leave the instance
  • Security groups are associated with network interfaces. Changing an instance’s security groups changes the security groups associated with the primary network interface (eth0)
  • An ENI can be associated with 5 security groups and with 50 60 rules per security group
  • Rules for a security group can be modified at any time; the new rules are automatically applied to all instances associated with the security group.
  • All the rules from all associated security groups are evaluated to decide where to allow traffic to an instance
  • Security Group features
    • For the VPC default security group, it allows all inbound traffic from other instances associated with the default security group
    • By default, VPC default security groups or newly created security groups allow all outbound traffic
    • Security group rules are always permissive; deny rules can’t be created
    • Rules can be added and removed any time.
    • Any modification to the rules are automatically applied to the instances associated with the security group after a short period, depending on the connection tracking for the traffic
    • Security groups are stateful — if you send a request from your instance, the response traffic for that request is allowed to flow in regardless of inbound security group rules. For VPC security groups, this also means that responses to allowed inbound traffic are allowed to flow out, regardless of outbound rules
    • If multiple rules are defined for the same protocol and port, the Most permissive rule is applied for e.g. for multiple rules for tcp and port 22 for specific IP and Everyone, everyone is granted access being the most permissive rule

Connection Tracking

  • Security groups are Stateful and they use Connection tracking to track information about traffic to and from the instance.
  • This allows responses to inbound traffic to flow out of the instance regardless of outbound security group rules, and vice versa.
  • Connection Tracking is maintained only if there is no explicit Outbound rule for an Inbound request (and vice versa)
  • However, if there is an explicit Outbound rule for an Inbound request, the response traffic is allowed on the basis of the Outbound rule and not on the Tracking information
  • Any existing flow of traffic, that is tracked, is not interrupted even if the rules for the security groups are changed. To ensure traffic is immediately interrupted, use NACL as they are stateless and therefore do not allow automatic response traffic.
  • Also, If the instance (host A) initiates traffic to host B and uses a protocol other than TCP, UDP, or ICMP,  the instance’s firewall only tracks the IP address and protocol number for the purpose of allowing response traffic from host B. If host B initiates traffic to your instance in a separate request within 600 seconds of the original request or response, your instance accepts it regardless of inbound security group rules, because it’s regarded as response traffic.
  • can be controlled by modifying the security group’s outbound rules to permit only certain types of outbound traffic or using NACL

IAM with EC2

  • IAM policy can be defined to allow or deny a user access to the EC2 resources and actions
  • EC2 partially supports resource-level permissions. For some EC2 API actions, you cannot specify which resource a user is allowed to work with for that action; instead, you have to allow users to work with all resources for that action
  • IAM allows to control only what actions a user can perform on the EC2 resources but cannot be used to grant access for users to be able to access or login to the instances

EC2 with IAM Role

  • EC2 instances can be launched with IAM roles so that the applications can securely make API requests from your instances,
  • IAM roles prevent the need to share as well as manage, rotate the security credentials that the applications use.
  • IAM role can be added to an existing running EC2 instance.
  • EC2 uses an instance profile as a container for an IAM role.
    • Creation of an IAM role using the console, creates an instance profile automatically and gives it the same name as the role it corresponds to.
    • When using the AWS CLI, API, or an AWS SDK to create a role, the role and instance profile needs to be created as separate actions, and they can be given different names.
  • To launch an instance with an IAM role, the name of its instance profile needs to be specified.
  • An application on the instance can retrieve the security credentials provided by the role from the instance metadata item http://169.254.169.254/latest/meta-data/iam/security-credentials/role-name.
  • Security credentials are temporary and are rotated automatically and new credentials are made available at least five minutes prior to the expiration of the old credentials.
  • Best Practice: Always launch EC2 instance with IAM role instead of hardcoded credentials

EC2 IAM Role S3 Access

EC2 Resiliency

  • EC2 offers the following features to support your data resiliency:
    • Copying AMIs across Regions
    • Copying EBS snapshots across Regions
    • Automating EBS-backed AMIs using Data Lifecycle Manager
    • Automating EBS snapshots using Data Lifecycle Manager
    • Maintaining the health and availability of the fleet using EC2 Auto Scaling
    • Distributing incoming traffic across multiple instances in a single AZ or multiple AZs using Elastic Load Balancing

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You launch an Amazon EC2 instance without an assigned AWS identity and Access Management (IAM) role. Later, you decide that the instance should be running with an IAM role. Which action must you take in order to have a running Amazon EC2 instance with an IAM role assigned to it?
    1. Create an image of the instance, and register the image with an IAM role assigned and an Amazon EBS volume mapping.
    2. Create a new IAM role with the same permissions as an existing IAM role, and assign it to the running instance. (As per AWS latest enhancement, this is possible now)
    3. Create an image of the instance, add a new IAM role with the same permissions as the desired IAM role, and deregister the image with the new role assigned.
    4. Create an image of the instance, and use this image to launch a new instance with the desired IAM role assigned (This was correct before, as it was not possible to add an IAM role to an existing instance)
  2. What does the following command do with respect to the Amazon EC2 security groups? ec2-revoke RevokeSecurityGroupIngress
    1. Removes one or more security groups from a rule.
    2. Removes one or more security groups from an Amazon EC2 instance.
    3. Removes one or more rules from a security group
    4. Removes a security group from our account.
  3. Which of the following cannot be used in Amazon EC2 to control who has access to specific Amazon EC2 instances?
    1. Security Groups
    2. IAM System
    3. SSH keys
    4. Windows passwords
  4. You must assign each server to at least _____ security group
    1. 3
    2. 2
    3. 4
    4. 1
  5. A company is building software on AWS that requires access to various AWS services. Which configuration should be used to ensure that AWS credentials (i.e., Access Key ID/Secret Access Key combination) are not compromised?
    1. Enable Multi-Factor Authentication for your AWS root account.
    2. Assign an IAM role to the Amazon EC2 instance
    3. Store the AWS Access Key ID/Secret Access Key combination in software comments.
    4. Assign an IAM user to the Amazon EC2 Instance.
  6. Which of the following items are required to allow an application deployed on an EC2 instance to write data to a DynamoDB table? Assume that no security keys are allowed to be stored on the EC2 instance. (Choose 2 answers)
    1. Create an IAM Role that allows write access to the DynamoDB table
    2. Add an IAM Role to a running EC2 instance. (As per AWS latest enhancement, this is possible now)
    3. Create an IAM User that allows write access to the DynamoDB table.
    4. Add an IAM User to a running EC2 instance.
    5. Launch an EC2 Instance with the IAM Role included in the launch configuration (This was correct before, as it was not possible to add an IAM role to an existing instance)
  7. You have an application running on an EC2 Instance, which will allow users to download files from a private S3 bucket using a pre-assigned URL. Before generating the URL the application should verify the existence of the file in S3. How should the application use AWS credentials to access the S3 bucket securely?
    1. Use the AWS account access Keys the application retrieves the credentials from the source code of the application.
    2. Create a IAM user for the application with permissions that allow list access to the S3 bucket launch the instance as the IAM user and retrieve the IAM user’s credentials from the EC2 instance user data.
    3. Create an IAM role for EC2 that allows list access to objects in the S3 bucket. Launch the instance with the role, and retrieve the role’s credentials from the EC2 Instance metadata
    4. Create an IAM user for the application with permissions that allow list access to the S3 bucket. The application retrieves the IAM user credentials from a temporary directory with permissions that allow read access only to the application user.
  8. A user has created an application, which will be hosted on EC2. The application makes calls to DynamoDB to fetch certain data. The application is using the DynamoDB SDK to connect with from the EC2 instance. Which of the below mentioned statements is true with respect to the best practice for security in this scenario?
    1. The user should attach an IAM role with DynamoDB access to the EC2 instance
    2. The user should create an IAM user with DynamoDB access and use its credentials within the application to connect with DynamoDB
    3. The user should create an IAM role, which has EC2 access so that it will allow deploying the application
    4. The user should create an IAM user with DynamoDB and EC2 access. Attach the user with the application so that it does not use the root account credentials
  9. Your application is leveraging IAM Roles for EC2 for accessing object stored in S3. Which two of the following IAM policies control access to you S3 objects.
    1. An IAM trust policy allows the EC2 instance to assume an EC2 instance role.
    2. An IAM access policy allows the EC2 role to access S3 objects
    3. An IAM bucket policy allows the EC2 role to access S3 objects. (Bucket policy is defined with S3 and not with IAM)
    4. An IAM trust policy allows applications running on the EC2 instance to assume as EC2 role (Trust policy allows EC2 instance to assume the role)
    5. An IAM trust policy allows applications running on the EC2 instance to access S3 objects. (Applications can access S3 through EC2 assuming the role)
  10. You have an application running on an EC2 Instance, which will allow users to download files from a private S3 bucket using a pre-assigned URL. Before generating the URL the application should verify the existence of the file in S3. How should the application use AWS credentials to access the S3 bucket securely?
    1. Use the AWS account access Keys the application retrieves the credentials from the source code of the application.
    2. Create a IAM user for the application with permissions that allow list access to the S3 bucket launch the instance as the IAM user and retrieve the IAM user’s credentials from the EC2 instance user data.
    3. Create an IAM role for EC2 that allows list access to objects in the S3 bucket. Launch the instance with the role, and retrieve the role’s credentials from the EC2 Instance metadata
    4. Create an IAM user for the application with permissions that allow list access to the S3 bucket. The application retrieves the IAM user credentials from a temporary directory with permissions that allow read access only to the application user.

AWS EC2 Instance Lifecycle

EC2 Instance Lifecycle Overview

  • EC2 instance lifecycle determines how an EC2 instance transitions through different states from the moment it is launched to its termination

EC2 Instance Lifecycle

Instance Launch

  •  Pending
    • When the instance is first launched is enters into the pending state
  • Running
    • After the instance is launched, it enters into the running state
    • Charges are incurred for each second, with a one-minute minimum, that the instance running is running, even if the instance remains idle

Instance Start & Stop (EBS-backed instances only)

  • Only an EBS-backed instance can be stopped and started.
  • Instance store-backed instance cannot be stopped and started.
  • An instance can be stopped & started in case the instance fails a status check or is not running as expected
  • Stop
    • After the instance is stopped, it enters in stopping state and then to stopped state.
    • Charges are only incurred for the EBS storage and not for the instance hourly charge or data transfer.
    • While the instance is stopped, its root volume can be treated like any other volume, and modified for e.g. repair file system problems or update software or change the instance type, user data, EBS optimization attributes, etc
    • Volume can be detached from the stopped instance, and attached to a running instance, modified, detached from the running instance, and then reattached to the stopped instance. It should be reattached using the storage device name that’s specified as the root device in the block device mapping for the instance.
  • Start
    • When the instance is started, it enters into pending state and then into running
    • An instance, when stopped and started, is launched on a new host
    • Any data on an instance store volume (not root volume) would be lost while data on the EBS volume persists
  • EC2 instance retains its private IP address as well as the Elastic IP address.
  • If the instance has an IPv6 address, it retains its IPv6 address.
  • However, the public IP address, if assigned instead of the Elastic IP address, would be released
  • For each transition of an instance from stopped to running, charges per second are incurred when the instance is running, with a minimum of one minute every time the instance is started

Instance Hibernate

  • Instance hibernation signals the operating system to perform hibernation (suspend-to-disk), which saves the contents from the instance memory (RAM) to the EBS root volume
  • Instance’s EBS root volume and any attached EBS data volumes are persisted, including the saved contents of the RAM.
  • Any EC2 instance store volumes remain attached to the instance, but the data on the instance store volumes is lost.
  • When the instance is restarted, the EBS root volume is restored to its previous state and the RAM contents are reloaded. Previously attached data volumes are reattached and the instance retains its instance ID.
  • After the instance is hibernated, it enters in stopping state and then to stopped state.
  • When the instance is restarted
    • It enters the pending state and the instance is moved to a new host computer (though in some cases, it remains on the current host).
    • EBS root volume is restored to its previous state
    • RAM contents are reloaded
    • Processes that were previously running on the instance are resumed
    • Previously attached data volumes are reattached and the instance retains its instance ID
    • Instance retains private IPv4 addresses and any IPv6 addresses
    • Instance retains its Elastic IP address
    • Instance releases its Public IPv4 address and would get a new one
  • Hibernation prerequisites
    • Supported instance families – C3, C4, C5, M3, M4, M5, R3, R4, R5, & T2
    • Instance RAM size – must be less than 150 GB.
    • Instance size – not supported for bare metal instances.
    • Supported AMIs must be an HVM AMI that supports hibernation
    • Root volume type – must be EBS volume and not instance store
    • EBS root volume size – must be large enough to store the RAM contents
    • EBS root volume MUST be encrypted to ensure the protection of sensitive content that is in memory at the time of hibernation
    • Enable hibernation at launch, as changing it is not supported on an existing instance
    • Purchasing options – Only On-Demand Instances and Reserved Instances supported
  • Limitations or Unsupported Actions
    • Changing the instance type or size of a hibernated instance
    • Creating snapshots or AMIs from hibernated instances or instances for which hibernation is enabled
    • the data on any instance store volumes is lost
    • can’t hibernate an instance that has more than 150 GB of RAM.
    • can’t hibernate an instance that is in an Auto Scaling group or used by ECS. If the instance is in an Auto Scaling group is hibernated, the EC2 Auto Scaling service marks the stopped instance as unhealthy, and may terminate it and launch a replacement instance.
    • An instance cannot be hibernated for more than 60 days.

Instance Reboot

  • Both EBS-backed and Instance store-backed instances can be rebooted
  • An instance remains on the same host computer and maintains its public DNS name, private IP address
  • Data on the EBS and Instance store volume is also retained
  • AWS recommends using EC2 to reboot the instance instead of running the operating system reboot command from the instance as it performs a hard reboot if the instance does not cleanly shut down within four minutes also creates an API record in CloudTrail if enabled.

Instance Retirement

  • An instance is scheduled to be retired when AWS detects an irreparable failure of the underlying hardware hosting the instance.
  • When an instance reaches its scheduled retirement date, it is stopped or terminated by AWS.
  • If the instance root device is an EBS volume, the instance is stopped and can be started again at any time.
  • If the instance root device is an instance store volume, the instance is terminated, and cannot be used again.

Instance Termination

  • An instance can be terminated, and it enters into the shutting-down, and then the terminated state
  • After an instance is terminated, it can’t be connected and no charges are incurred
  • Instance Shutdown behavior
    • Each EBS-backed instance supports the InstanceInitiatedShutdownBehavior attribute which determines whether the instance would be stopped or terminated when a shutdown command is initiated from the instance itself for e.g. shutdown, halt or poweroff command in linux
    • Default behavior for the instance to be stopped.
    • A shutdown command for an Instance store-backed instance will always terminate the instance
  • Termination protection
    • Termination protection ( DisableApiTermination attribute) can be enabled on the instance to prevent it from being accidentally terminated
    • DisableApiTermination from the Console, CLI or API.
    • Instance can be terminated through EC2 CLI.
    • Termination protection does not work for instances when
      • part of an Autoscaling group
      • launched as Spot instances
      • terminating an instance by initiating shutdown from the instance
  • Data persistence
    • EBS volume has a DeleteOnTermination  attribute which determines whether the volumes would be persisted or deleted when an instance they are associated with are terminated
    • Data on Instance store volume data does not persist
    • Default is to delete the root device volume and preserve any other EBS volumes. i.e.
      • Data on EBS root volumes have the DeleteOnTermination flag set to true and would be deleted by default
      • Additional EBS volumes attached have the DeleteOnTermination flag set to false are not deleted but just detached from the instance

EC2 Instance Lifecycle States and Billing

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon EC2 provide?
    1. Virtual servers in the Cloud
    2. A platform to run code (Java, PHP, Python), paying on an hourly basis.
    3. Computer Clusters in the Cloud.
    4. Physical servers, remotely managed by the customer.
  2. A user has enabled termination protection on an EC2 instance. The user has also set Instance initiated shutdown behavior to terminate. When the user shuts down the instance from the OS, what will happen?
    1. The OS will shutdown but the instance will not be terminated due to protection
    2. It will terminate the instance
    3. It will not allow the user to shutdown the instance from the OS
    4. It is not possible to set the termination protection when an Instance initiated shutdown is set to Terminate
  3. A user has launched an EC2 instance and deployed a production application in it. The user wants to prohibit any mistakes from the production team to avoid accidental termination. How can the user achieve this?
    1. The user can the set DisableApiTermination attribute to avoid accidental termination
    2. It is not possible to avoid accidental termination
    3. The user can set the Deletion termination flag to avoid accidental termination
    4. The user can set the InstanceInitiatedShutdownBehavior flag to avoid accidental termination
  4. You have been doing a lot of testing of your VPC Network by deliberately failing EC2 instances to test whether instances are failing over properly. Your customer who will be paying the AWS bill for all this asks you if he being charged for all these instances. You try to explain to him how the billing works on EC2 instances to the best of your knowledge. What would be an appropriate response to give to the customer in regards to this?
    1. Billing commences when Amazon EC2 AMI instance is completely up and billing ends as soon as the instance starts to shutdown.
    2. Billing commences when Amazon EC2 initiates the boot sequence of an AMI instance and billing ends when the instance shuts down.
    3. Billing only commences only after 1 hour of uptime and billing ends when the instance terminates.
    4. Billing commences when Amazon EC2 initiates the boot sequence of an AMI instance and billing ends as soon as the instance starts to shutdown.

References

AWS EC2 Instance Purchasing Option

AWS EC2 Instance Purchasing Option

  • Amazon provides different ways to pay for the EC2 instances
    • On-Demand Instances
    • Reserved Instances
    • Spot Instances
    • Dedicated Hosts
    • Dedicated Instances
    • Capacity reservations
  • EC2 instances can be launched on shared or dedicated tenancy

On-Demand Instances

  • Pay for the instances and the compute capacity used by the hour or the second, depending on which instances you run
  • No long-term commitments or up-front payments
  • Instances can be scaled accordingly as per the demand
  • Although AWS makes effort to have the capacity to launch On-Demand instances, there might be instances during peak demand where the instance cannot be launched
  • Well suited for
    • Users that want the low cost and flexibility of EC2 without any up-front payment or long-term commitment
    • Applications with short term, spiky, or unpredictable workloads that cannot be interrupted
    • Applications being developed or tested on EC2 for the first time

Reserved Instances

  • Reserved Instances provides lower hourly running costs by providing a billing discount (up to 75%) as well as capacity reservation that is applied to instances and there would never be a case of insufficient capacity
  • Discounted usage price is fixed as long as you own the Reserved Instance, allowing compute costs prediction over the term of the reservation
  • Reserved instances are best suited if consistent, heavy, use is expected and they can provide savings over owning the hardware or running only On-Demand instances.
  • Well Suited for
    • Applications with steady state or predictable usage
    • Applications that require reserved capacity
    • Users are able to make upfront payments to reduce their total computing costs even further
  • Reserved instance is not a physical instance that is launched, but rather a billing discount applied to the use of On-Demand Instances
  • On-Demand Instances must match certain attributes, such as instance type and Region, in order to benefit from the billing discount.
  • Reserved Instances do not renew automatically, and the EC2 instances can be continued to be used but charged On-Demand rates
  • Auto Scaling or other AWS services can be used to launch the On-Demand instances that use the Reserved Instance benefits
  • With Reserved Instances
    • You pay for the entire term, regardless of the usage
    • Once purchased, the reservation cannot be canceled but can be sold in the Reserved Instance Marketplace
    • Reserved Instance pricing tier discounts only apply to purchases made from AWS, and not to the third party Reserved instances

Reserved Instance Pricing Key Variables

Instance attributes

A Reserved Instance has four instance attributes that determine its price.

  • Instance type: Instance family + Instance size e.g.m4.large composed of the instance family (m4) and the instance size (large).
  • Region: Region in which the Reserved Instance is purchased.
  • Tenancy: Whether your instance runs on shared (default) or single-tenant (dedicated) hardware.
  • Platform: Operating system; for example, Windows or Linux/Unix.

Term commitment

Reserved Instance can be purchased for a one-year or three-year commitment, with the three-year commitment offering a bigger discount.

  • One-year: A year is defined as 31536000 seconds (365 days).
  • Three-year: Three years is defined as 94608000 seconds (1095 days).

Payment options

  • No Upfront
    • No upfront payment is required and the account is charged at a discounted hourly rate for every hour, regardless of the usage
    • Only available as a 1-year reservation
  • Partial Upfront
    • A portion of the cost is paid upfront and the remaining hours in the term are charged at an hourly discounted rate, regardless of the usage
  • Full Upfront
    • Full payment is made at the start of the term, with no costs for the remainder of the term, regardless of the usage

Offering class

  • Standard: Provide the most significant discount, but can only be modified.
  • Convertible: These provide a lower discount than Standard Reserved Instances, but can be exchanged for another Convertible Reserved Instance with different instance attributes. Convertible Reserved Instances can also be modified.

How Reserved Instances work

Billing Benefits & Payment Options

  • Reserved Instance purchase reservation is automatically applied to running instances that match the specified parameters
  • Reserved Instance can also be utilized by launching On-Demand instances with the same configuration as to the purchased reserved capacity

Understanding Hourly Billing

  • Reserved Instances are billed for every clock-hour during the term that you select, regardless of whether the instance is running or not.
  • A Reserved Instance billing benefit can be applied to a running instance on a per-second basis. Per-second billing is available for instances using an open-source Linux distribution, such as Amazon Linux and Ubuntu.
  • Per-hour billing is used for commercial Linux distributions, such as Red Hat Enterprise Linux and SUSE Linux Enterprise Server.
  • A Reserved Instance billing benefit can apply to a maximum of 3600 seconds (one hour) of instance usage per clock-hour. You can run multiple instances concurrently, but can only receive the benefit of the Reserved Instance discount for a total of 3600 seconds per clock-hour; instance usage that exceeds 3600 seconds in a clock-hour is billed at the On-Demand rate.
  • Reservations and discounted rates only apply to one instance-hour per hour. If an instance restarts during the first hour of a reservation and runs for two hours before stopping, the first instance-hour is charged at the discounted rate but three instance-hours are charged at the On-Demand rate. If the instance restarts during one hour and again the next hour before running the remainder of the reservation, one instance-hour is charged at the On-Demand rate but the discounted rate is applied to previous and subsequent instance-hours.

Consolidated Billing

  • Pricing benefits of Reserved Instances are shared when the purchasing account is part of a set of accounts billed under one consolidated billing payer account
  • Consolidated billing account aggregates the list value of member accounts within a region.
  • When the list value of all active Reserved Instances for the consolidated billing account reaches a discount pricing tier, any Reserved Instances purchased after this point by any member of the consolidated billing account are charged at the discounted rate (as long as the list value for that consolidated account stays above the discount pricing tier threshold)

Buying Reserved Instances

Buying Reserved Instances need a selection of the following

  • Platform (for example, Linux)
  • Instance type (for example, m1.small)
  • Availability Zone in which to run the instance for Zonal reserved instance
  • Term (time period) over which you want to reserve capacity
  • Tenancy You can reserve capacity for your instance to run in single-tenant hardware (dedicated tenancy, as opposed to shared).
  • Offering (No Upfront, Partial Upfront, All Upfront).

Modifying Reserved Instances

  • Standard or Convertible Reserved Instances can be modified and continue to benefit from the capacity reservation as the computing needs change.
  • Availability Zone, instance size (within the same instance family), and scope of the Reserved Instance can be modified
  • All or a subset of the Reserved Instances can be modified
  • Two or more Reserved Instances can be merged into a single Reserved Instance
  • Modification does not change the remaining term of the Reserved Instances; their end dates remain the same.
  • There is no fee, and you do not receive any new bills or invoices.
  • Modification is separate from purchasing and does not affect how you use, purchase, or sell Reserved Instances.
  • Complete reservation or a subset of it can be modified in one or more of the following ways:
    • Switch Availability Zones within the same region
    • Change between EC2-VPC and EC2-Classic
    • Change the instance size within the same instance type, given the instance size footprint remains the same for e.g. four m1.medium instances (4 x 2), you can turn it into a reservation for eight m1.small instances (8 x 1) and vice versa. However, you cannot convert a reservation for a single m1.small instance (1 x 1) into a reservation for an m1.large instance (1 x 4).

Screen Shot 2016-04-26 at 7.07.24 AM.png

Scheduled Reserved Instances

  • AWS does not have any capacity available for Scheduled Reserved Instances or any plans to make it available in the future. To reserve capacity, use On-Demand Capacity Reservations instead
  • Scheduled Reserved Instances (Scheduled Instances) enable capacity reservations purchase that recurs on a daily, weekly, or monthly basis, with a specified start time and duration, for a one-year term.
  • Capacity is reserved in advance and is always available when needed
  • Charges are incurred for the time that the instances are scheduled, even if they are not used
  • Scheduled Instances are a good choice for workloads that do not run continuously, but do run on a regular schedule for e.g. weekly or monthly batch jobs
  • EC2 launches the instances, based on the launch specification during their scheduled time periods
  • EC2 terminates the EC2 instances three minutes before the end of the current scheduled time period to ensure the capacity is available for any other Scheduled Instances it is reserved for.
  • Scheduled Reserved instances cannot be stopped or rebooted, however, they can be terminated and relaunched within minutes of termination
  • Scheduled Reserved instances limits or restrictions
    • after purchase cannot be modified, canceled, or resold
    • only supported instance types: C3, C4, M4, and R3
    • the required term is 365 days (one year).
    • minimum required utilization is 1,200 hours per year
    • purchase up to three months in advance

On-Demand Capacity Reservations

  • On-Demand Capacity Reservations enable you to reserve compute capacity for the EC2 instances in a specific AZ for any duration.
  • This gives you the ability to create and manage Capacity Reservations independently from the billing discounts offered by Savings Plans or regional Reserved Instances.
  • By creating Capacity Reservations, you ensure that you always have access to EC2 capacity when you need it, for as long as you need it.
  • Capacity Reservations can be created at any time, without entering into a one-year or three-year term commitment, and the capacity is available immediately.
  • Billing starts as soon as the capacity is provisioned and the Capacity Reservation enters the active state.
  • When no longer needed, the Capacity Reservation can be canceled to stop incurring charges.
  • Capacity Reservation creation requires
    • AZ in which to reserve the capacity
    • Number of instances for which to reserve capacity
    • Instance attributes, including the instance type, tenancy, and platform/OS
  • Capacity Reservations can only be used by instances that match their attributes. By default, they are automatically used by running instances that match the attributes. If you don’t have any running instances that match the attributes of the Capacity Reservation, it remains unused until you launch an instance with matching attributes.

Spot Instances

Refer blog post @ EC2 Spot Instances

Dedicated Instances

  • Dedicated Instances are EC2 instances that run in a VPC on hardware that’s dedicated to a single customer
  • Dedicated Instances are physically isolated at the host hardware level from the instances that aren’t Dedicated Instances and from instances that belong to other AWS accounts.
  • Each VPC has a related instance tenancy attribute.
    • default
      • default is shared.
      • the tenancy can be changed to dedicated after creation
      • all instances launched would be shared, unless you explicitly specify a different tenancy during instance launch.
    • dedicated
      • all instances launched would be dedicated
      • the tenancy can’t be changed to default after creation
  • Each instance launched into a VPC has a tenancy attribute. Default tenancy depends on the VPC tenancy, which by default is shared.
    • default – instance runs on shared hardware.
    • dedicated – instance runs on single-tenant hardware.
    • host – instance runs on a Dedicated Host, which is an isolated server with configurations that you can control.
    • default tenancy cannot be changed to dedicatedor hostand vice versa.
    • dedicatedtenancy can be changed to hostand vice version
  • Dedicated Instances can be launched using
    • Create the VPC with the instance tenancy set to dedicated, all instances launched into this VPC are Dedicated Instances even though if you mark the tenancy as shared.
    • Create the VPC with the instance tenancy set to default, and specify dedicated tenancy for any instances that should be Dedicated Instances when launched.

Dedicated Hosts

  • EC2 Dedicated Host is a physical server with EC2 instance capacity fully dedicated to your use
  • Dedicated Hosts allow using existing per-socket, per-core, or per-VM software licenses, including Windows Server, Microsoft SQL Server, SUSE, and Linux Enterprise Server.

Dedicated Hosts vs Dedicated Instances

EC2 Dedicated Host vs Dedicated Instances

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. If I want my instance to run on a single-tenant hardware, which value do I have to set the instance’s tenancy attribute to?
    1. dedicated
    2. isolated
    3. one
    4. reserved
  2. You have a video transcoding application running on Amazon EC2. Each instance polls a queue to find out which video should be transcoded, and then runs a transcoding process. If this process is interrupted, the video will be transcoded by another instance based on the queuing system. You have a large backlog of videos, which need to be transcoded, and would like to reduce this backlog by adding more instances. You will need these instances only until the backlog is reduced. Which type of Amazon EC2 instances should you use to reduce the backlog in the most cost efficient way?
    1. Reserved instances
    2. Spot instances
    3. Dedicated instances
    4. On-demand instances
  3. The one-time payment for Reserved Instances is __________ refundable if the reservation is cancelled.
    1. always
    2. in some circumstances
    3. never
  4. You run a web application where web servers on EC2 Instances are In an Auto Scaling group Monitoring over the last 6 months shows that 6 web servers are necessary to handle the minimum load. During the day up to 12 servers are needed Five to six days per year, the number of web servers required might go up to 15. What would you recommend to minimize costs while being able to provide hill availability?
    1. 6 Reserved instances (heavy utilization). 6 Reserved instances (medium utilization), rest covered by On-Demand instances
    2. 6 Reserved instances (heavy utilization). 6 On-Demand instances, rest covered by Spot Instances (don’t go for spot as availability not guaranteed)
    3. 6 Reserved instances (heavy utilization) 6 Spot instances, rest covered by On-Demand instances (don’t go for spot as availability not guaranteed)
    4. 6 Reserved instances (heavy utilization) 6 Reserved instances (medium utilization) rest covered by Spot instances (don’t go for spot as availability not guaranteed)
  5. A user is running one instance for only 3 hours every day. The user wants to save some cost with the instance. Which of the below mentioned Reserved Instance categories is advised in this case?
    1. The user should not use RI; instead only go with the on-demand pricing (seems question before the introduction of the Scheduled Reserved instances in Jan 2016, which can be used in this case)
    2. The user should use the AWS high utilized RI
    3. The user should use the AWS medium utilized RI
    4. The user should use the AWS low utilized RI
  6. Which of the following are characteristics of a reserved instance? Choose 3 answers (but 4 answers seem correct)
    1. It can be migrated across Availability Zones (can be modified)
    2. It is specific to an Amazon Machine Image (AMI) (specific to platform)
    3. It can be applied to instances launched by Auto Scaling (are allowed)
    4. It is specific to an instance Type (specific to instance family but instance type can be changed)
    5. It can be used to lower Total Cost of Ownership (TCO) of a system (helps to reduce cost)
  7. You have a distributed application that periodically processes large volumes of data across multiple Amazon EC2 Instances. The application is designed to recover gracefully from Amazon EC2 instance failures. You are required to accomplish this task in the most cost-effective way. Which of the following will meet your requirements?
    1. Spot Instances
    2. Reserved instances
    3. Dedicated instances
    4. On-Demand instances
  8. Can I move a Reserved Instance from one Region to another?
    1. No
    2. Only if they are moving into GovCloud
    3. Yes
    4. Only if they are moving to US East from another region
  9. An application you maintain consists of multiple EC2 instances in a default tenancy VPC. This application has undergone an internal audit and has been determined to require dedicated hardware for one instance. Your compliance team has given you a week to move this instance to single-tenant hardware. Which process will have minimal impact on your application while complying with this requirement?
    1. Create a new VPC with tenancy=dedicated and migrate to the new VPC (possible but impact not minimal)
    2. Use ec2-reboot-instances command line and set the parameter dedicated=true
    3. Right click on the instance, select properties and check the box for dedicated tenancy
    4. Stop the instance, create an AMI, launch a new instance with tenancy=dedicated, and terminate the old instance
  10. Your department creates regular analytics reports from your company’s log files. All log data is collected in Amazon S3 and processed by daily Amazon Elastic Map Reduce (EMR) jobs that generate daily PDF reports and aggregated tables in CSV format for an Amazon Redshift data warehouse. Your CFO requests that you optimize the cost structure for this system. Which of the following alternatives will lower costs without compromising average performance of the system or data integrity for the raw data? [PROFESSIONAL]
    1. Use reduced redundancy storage (RRS) for PDF and CSV data in Amazon S3. Add Spot instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift. (Spot instances impacts performance)
    2. Use reduced redundancy storage (RRS) for all data in S3. Use a combination of Spot instances and Reserved Instances for Amazon EMR jobs. Use Reserved instances for Amazon Redshift (Combination of the Spot and reserved with guarantee performance and help reduce cost. Also, RRS would reduce cost and guarantee data integrity, which is different from data durability )
    3. Use reduced redundancy storage (RRS) for all data in Amazon S3. Add Spot Instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift (Spot instances impacts performance)
    4. Use reduced redundancy storage (RRS) for PDF and CSV data in S3. Add Spot Instances to EMR jobs. Use Spot Instances for Amazon Redshift. (Spot instances impacts performance)
  11. A research scientist is planning for the one-time launch of an Elastic MapReduce cluster and is encouraged by her manager to minimize the costs. The cluster is designed to ingest 200TB of genomics data with a total of 100 Amazon EC2 instances and is expected to run for around four hours. The resulting data set must be stored temporarily until archived into an Amazon RDS Oracle instance. Which option will help save the most money while meeting requirements? [PROFESSIONAL]
    1. Store ingest and output files in Amazon S3. Deploy on-demand for the master and core nodes and spot for the task nodes.
    2. Optimize by deploying a combination of on-demand, RI and spot-pricing models for the master, core and task nodes. Store ingest and output files in Amazon S3 with a lifecycle policy that archives them to Amazon Glacier. (Reserved Instance not cost effective for 4 hour job and data not needed in S3 once moved to RDS)
    3. Store the ingest files in Amazon S3 RRS and store the output files in S3. Deploy Reserved Instances for the master and core nodes and on-demand for the task nodes. (Reserved Instance not cost effective)
    4. Deploy on-demand master, core and task nodes and store ingest and output files in Amazon S3 RRS (RRS provides not much cost benefits for a 4 hour job while the amount of input data would take time to upload and Output data to reproduce)
  12. A company currently has a highly available web application running in production. The application’s web front-end utilizes an Elastic Load Balancer and Auto scaling across 3 availability zones. During peak load, your web servers operate at 90% utilization and leverage a combination of heavy utilization reserved instances for steady state load and on-demand and spot instances for peak load. You are asked with designing a cost effective architecture to allow the application to recover quickly in the event that an availability zone is unavailable during peak load. Which option provides the most cost effective high availability architectural design for this application? [PROFESSIONAL]
    1. Increase auto scaling capacity and scaling thresholds to allow the web-front to cost-effectively scale across all availability zones to lower aggregate utilization levels that will allow an availability zone to fail during peak load without affecting the applications availability. (Ideal for HA to reduce and distribute load)
    2. Continue to run your web front-end at 90% utilization, but purchase an appropriate number of utilization RIs in each availability zone to cover the loss of any of the other availability zones during peak load. (90% is not recommended as well RIs would increase the cost)
    3. Continue to run your web front-end at 90% utilization, but leverage a high bid price strategy to cover the loss of any of the other availability zones during peak load. (90% is not recommended as high bid price would not guarantee instances and would increase cost)
    4. Increase use of spot instances to cost effectively to scale the web front-end across all availability zones to lower aggregate utilization levels that will allow an availability zone to fail during peak load without affecting the applications availability. (Availability cannot be guaranteed)
  13. You run accounting software in the AWS cloud. This software needs to be online continuously during the day every day of the week, and has a very static requirement for compute resources. You also have other, unrelated batch jobs that need to run once per day at any time of your choosing. How should you minimize cost? [PROFESSIONAL]
    1. Purchase a Heavy Utilization Reserved Instance to run the accounting software. Turn it off after hours. Run the batch jobs with the same instance class, so the Reserved Instance credits are also applied to the batch jobs. (Because the instance will always be online during the day, in a predictable manner, and there are sequences of batch jobs to perform at any time, we should run the batch jobs when the account software is off. We can achieve Heavy Utilization by alternating these times, so we should purchase the reservation as such, as this represents the lowest cost. There is no such thing a “Full” level utilization purchases on EC2.)
    2. Purchase a Medium Utilization Reserved Instance to run the accounting software. Turn it off after hours. Run the batch jobs with the same instance class, so the Reserved Instance credits are also applied to the batch jobs.
    3. Purchase a Light Utilization Reserved Instance to run the accounting software. Turn it off after hours. Run the batch jobs with the same instance class, so the Reserved Instance credits are also applied to the batch jobs.
    4. Purchase a Full Utilization Reserved Instance to run the accounting software. Turn it off after hours. Run the batch jobs with the same instance class, so the Reserved Instance credits are also applied to the batch jobs.

References

AWS Elastic Block Store Storage – EBS

EC2 Elastic Block Storage – EBS

  • Elastic Block Storage – EBS provides highly available, reliable, durable, block-level storage volumes that can be attached to an EC2 instance.
  • EBS as a primary storage device is recommended for data that requires frequent and granular updates e.g. running a database or filesystem.
  • An EBS volume
    • behaves like a raw, unformatted, external block device that can be attached to a single EC2 instance at a time.
    • persists independently from the running life of an instance.
    • is Zonal and can be attached to any instance within the same Availability Zone and can be used like any other physical hard drive.
    • is particularly well-suited for use as the primary storage for file systems, databases, or any applications that require fine granular updates and access to raw, unformatted, block-level storage.

Elastic Block Storage Features

  • EBS Volumes are created in a specific Availability Zone and can be attached to any instance in that same AZ.
  • Volumes can be backed up by creating a snapshot of the volume, which is stored in S3.
  • Volumes can be created from a snapshot that can be attached to another instance within the same region.
  • Volumes can be made available outside of the AZ by creating and restoring the snapshot to a new volume anywhere in that region.
  • Snapshots can also be copied to other regions and then restored to new volumes, making it easier to leverage multiple AWS regions for geographical expansion, data center migration, and disaster recovery.
  • Volumes allow encryption using the EBS encryption feature. All data stored at rest, disk I/O, and snapshots created from the volume are encrypted.
  • Encryption occurs on the EC2 instance, providing encryption of data-in-transit from EC2 to the EBS volume.
  • Elastic Volumes help easily adapt the volumes as the needs of the applications change. Elastic Volumes allow you to dynamically increase capacity, tune performance, and change the type of any new or existing current generation volume with no downtime or performance impact.
  • You can dynamically increase size, modify the provisioned IOPS capacity, and change volume type on live production volumes.
  • General Purpose (SSD) volumes support up to 10,000 16000 IOPS and 160 250 MB/s of throughput and Provisioned IOPS (SSD) volumes support up to 20,000 64000 IOPS and 320 1000 MB/s of throughput.
  • EBS Magnetic volumes can be created from 1 GiB to 1 TiB in size; EBS General Purpose (SSD) and Provisioned IOPS (SSD) volumes can be created up to 16 TiB in size.

EBS Benefits

  • Data Availability
    • Data is automatically replicated in an Availability Zone to prevent data loss due to the failure of any single hardware component.
  • Data Persistence
    • persists independently of the running life of an EC2 instance
    • persists when an instance is stopped, started, or rebooted
    • Root volume is deleted, by default, on Instance termination but the behaviour can be changed using the DeleteOnTermination flag
    • All attached volumes persist, by default, on instance termination
  • Data Encryption
    • can be encrypted by the EBS encryption feature
    • uses 256-bit AES-256 and an Amazon-managed key infrastructure.
    • Encryption occurs on the server that hosts the EC2 instance, providing encryption of data-in-transit from the EC2 instance to EBS storage
    • Snapshots of encrypted EBS volumes are automatically encrypted.
  • Snapshots
    • provides the ability to create snapshots (backups) of any EBS volume and write a copy of the data in the volume to S3, where it is stored redundantly in multiple Availability Zones.
    • can be used to create new volumes, increase the size of the volumes or replicate data across Availability Zones or Regions.
    • are incremental backups and store only the data that was changed from the time the last snapshot was taken.
    • Snapshot size can probably be smaller than the volume size as the data is compressed before being saved to S3.
    • Even though snapshots are saved incrementally, the snapshot deletion process is designed so that you need to retain only the most recent snapshot in order to restore the volume.

EBS Volume Types

Refer blog post @ EBS Volume Types

EBS Volume

EBS Volume Creation

  • Creating New volumes
    • Completely new from console or command line tools and can then be attached to an EC2 instance in the same Availability Zone.
  • Restore volume from Snapshots
    • Volumes can also be restored from previously created snapshots
    • New volumes created from existing snapshots are loaded lazily in the background.
    • There is no need to wait for all of the data to transfer from S3 to the volume before the attached instance can start accessing the volume and all its data.
    • If the instance accesses the data that hasn’t yet been loaded, the volume immediately downloads the requested data from S3, and continues loading the rest of the data in the background.
    • Volumes restored from encrypted snapshots are always encrypted, by default.
  • Volumes can be created and attached to a running EC2 instance by specifying a block device mapping

EBS Volume Detachment

  • EBS volumes can be detached from an instance explicitly or by terminating the instance.
  • EBS root volumes can be detached by stopping the instance.
  • EBS data volumes, attached to a running instance, can be detached by unmounting the volume from the instance first.
  • If the volume is detached without being unmounted, it might result in the volume being stuck in a busy state and could possibly damage the file system or the data it contains.
  • EBS volume can be force detached from an instance, using the Force Detach option, but it might lead to data loss or a corrupted file system as the instance does not get an opportunity to flush file system caches or file system metadata.
  • Charges are still incurred for the volume after its detachment

EBS Volume Deletion

  • EBS volume deletion would wipe out its data and the volume can’t be attached to any instance. However, it can be backed up before deletion using EBS snapshots

EBS Volume Resize

  • EBS Elastic Volumes can be modified to increase the volume size, change the volume type, or adjust the performance of your EBS volumes.
  • If the instance supports Elastic Volumes, changes can be performed without detaching the volume or restarting the instance.

EBS Volume Snapshots

Refer blog post @ EBS Snapshot

EBS Encryption

  • EBS volumes can be created and attached to a supported instance type and support the following types of data
    • Data at rest
    • All disk I/O i.e All data moving between the volume and the instance
    • All snapshots created from the volume
    • All volumes created from those snapshots
  • Encryption occurs on the servers that host EC2 instances, providing encryption of data-in-transit from EC2 instances to EBS storage.
  • EBS encryption is supported with all EBS volume types (gp2, io1, st1, and sc1), and has the same IOPS performance on encrypted volumes as with unencrypted volumes, with a minimal effect on latency
  • EBS encryption is only available on select instance types.
  • Volumes created from encrypted snapshots and snapshots of encrypted volumes are automatically encrypted using the same encryption key.
  • EBS encryption uses AWS KMS customer master keys (CMK) when creating encrypted volumes and any snapshots created from the encrypted volumes.
  • EBS volumes can be encrypted using either
    • a default CMK created for you automatically.
    • a CMK that you created separately using AWS KMS, giving you more flexibility, including the ability to create, rotate, disable, define access controls, and audit the encryption keys used to protect your data.
  • Public or shared snapshots of encrypted volumes are not supported, because other accounts would be able to decrypt your data and needs to be migrated to an unencrypted status before sharing.
  • Existing unencrypted volumes cannot be encrypted directly, but can be migrated by
    • Option 1
      • create an unencrypted snapshot from the volume
      • create an encrypted copy of an unencrypted snapshot
      • create an encrypted volume from the encrypted snapshot
    • Option 2
      • create an unencrypted snapshot from the volume
      • create an encrypted volume from an unencrypted snapshot
  • An encrypted snapshot can be created from an unencrypted snapshot by creating an encrypted copy of the unencrypted snapshot.
  • Unencrypted volume cannot be created from an encrypted volume directly but needs to be migrated

EBS Multi-Attach

Refer blog Post @ EBS Multi-Attach

EBS Performance

Refer blog Post @ EBS Performance

EBS vs Instance Store

Refer blog post @ EBS vs Instance Store

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. _____ is a durable, block-level storage volume that you can attach to a single, running Amazon EC2 instance.
    1. Amazon S3
    2. Amazon EBS
    3. None of these
    4. All of these
  2. Which Amazon storage do you think is the best for my database-style applications that frequently encounter many random reads and writes across the dataset?
    1. None of these.
    2. Amazon Instance Storage
    3. Any of these
    4. Amazon EBS
  3. What does Amazon EBS stand for?
    1. Elastic Block Storage
    2. Elastic Business Server
    3. Elastic Blade Server
    4. Elastic Block Store
  4. Which Amazon Storage behaves like raw, unformatted, external block devices that you can attach to your instances?
    1. None of these.
    2. Amazon Instance Storage
    3. Amazon EBS
    4. All of these
  5. A user has created numerous EBS volumes. What is the general limit for each AWS account for the maximum number of EBS volumes that can be created?
    1. 10000
    2. 5000
    3. 100
    4. 1000
  6. Select the correct set of steps for exposing the snapshot only to specific AWS accounts
    1. Select Public for all the accounts and check mark those accounts with whom you want to expose the snapshots and click save.
    2. Select Private and enter the IDs of those AWS accounts, and click Save.
    3. Select Public, enter the IDs of those AWS accounts, and click Save.
    4. Select Public, mark the IDs of those AWS accounts as private, and click Save.
  7. If an Amazon EBS volume is the root device of an instance, can I detach it without stopping the instance?
    1. Yes but only if Windows instance
    2. No
    3. Yes
    4. Yes but only if a Linux instance
  8. Can we attach an EBS volume to more than one EC2 instance at the same time?
    1. Yes
    2. No
    3. Only EC2-optimized EBS volumes.
    4. Only in read mode.
  9. Do the Amazon EBS volumes persist independently from the running life of an Amazon EC2 instance?
    1. Only if instructed to when created
    2. Yes
    3. No
  10. Can I delete a snapshot of the root device of an EBS volume used by a registered AMI?
    1. Only via API
    2. Only via Console
    3. Yes
    4. No
  11. By default, EBS volumes that are created and attached to an instance at launch are deleted when that instance is terminated. You can modify this behavior by changing the value of the flag_____ to false when you launch the instance
    1. DeleteOnTermination
    2. RemoveOnDeletion
    3. RemoveOnTermination
    4. TerminateOnDeletion
  12. Your company policies require encryption of sensitive data at rest. You are considering the possible options for protecting data while storing it at rest on an EBS data volume, attached to an EC2 instance. Which of these options would allow you to encrypt your data at rest? (Choose 3 answers)
    1. Implement third party volume encryption tools
    2. Do nothing as EBS volumes are encrypted by default
    3. Encrypt data inside your applications before storing it on EBS
    4. Encrypt data using native data encryption drivers at the file system level
    5. Implement SSL/TLS for all services running on the server
  13. Which of the following are true regarding encrypted Amazon Elastic Block Store (EBS) volumes? Choose 2 answers
    1. Supported on all Amazon EBS volume types
    2. Snapshots are automatically encrypted
    3. Available to all instance types
    4. Existing volumes can be encrypted
    5. Shared volumes can be encrypted
  14. How can you secure data at rest on an EBS volume?
    1. Encrypt the volume using the S3 server-side encryption service
    2. Attach the volume to an instance using EC2’s SSL interface.
    3. Create an IAM policy that restricts read and write access to the volume.
    4. Write the data randomly instead of sequentially.
    5. Use an encrypted file system on top of the EBS volume
  15. A user has deployed an application on an EBS backed EC2 instance. For a better performance of application, it requires dedicated EC2 to EBS traffic. How can the user achieve this?
    1. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    2. Launch the EC2 instance as EBS enhanced with PIOPS EBS
    3. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    4. Launch the EC2 instance as EBS optimized with PIOPS EBS
  16. A user is trying to launch an EBS backed EC2 instance under free usage. The user wants to achieve encryption of the EBS volume. How can the user encrypt the data at rest?
    1. Use AWS EBS encryption to encrypt the data at rest (Encryption is allowed on micro instances)
    2. User cannot use EBS encryption and has to encrypt the data manually or using a third party tool (Encryption was not allowed on micro instances before)
    3. The user has to select the encryption enabled flag while launching the EC2 instance
    4. Encryption of volume is not available as a part of the free usage tier
  17. A user is planning to schedule a backup for an EBS volume. The user wants security of the snapshot data. How can the user achieve data encryption with a snapshot?
    1. Use encrypted EBS volumes so that the snapshot will be encrypted by AWS
    2. While creating a snapshot select the snapshot with encryption
    3. By default the snapshot is encrypted by AWS
    4. Enable server side encryption for the snapshot using S3
  18. A user has launched an EBS backed EC2 instance. The user has rebooted the instance. Which of the below mentioned statements is not true with respect to the reboot action?
    1. The private and public address remains the same
    2. The Elastic IP remains associated with the instance
    3. The volume is preserved
    4. The instance runs on a new host computer
  19. A user has launched an EBS backed EC2 instance. What will be the difference while performing the restart or stop/start options on that instance?
    1. For restart it does not charge for an extra hour, while every stop/start it will be charged as a separate hour
    2. Every restart is charged by AWS as a separate hour, while multiple start/stop actions during a single hour will be counted as a single hour
    3. For every restart or start/stop it will be charged as a separate hour
    4. For restart it charges extra only once, while for every stop/start it will be charged as a separate hour
  20. A user has launched an EBS backed instance. The user started the instance at 9 AM in the morning. Between 9 AM to 10 AM, the user is testing some script. Thus, he stopped the instance twice and restarted it. In the same hour the user rebooted the instance once. For how many instance hours will AWS charge the user?
    1. 3 hours
    2. 4 hours
    3. 2 hours
    4. 1 hour
  21. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  22. An organization wants to move to Cloud. They are looking for a secure encrypted database storage option. Which of the below mentioned AWS functionalities helps them to achieve this?
    1. AWS MFA with EBS
    2. AWS EBS encryption
    3. Multi-tier encryption with Redshift
    4. AWS S3 server-side storage
  23. A user has stored data on an encrypted EBS volume. The user wants to share the data with his friend’s AWS account. How can user achieve this?
    1. Create an AMI from the volume and share the AMI
    2. Copy the data to an unencrypted volume and then share
    3. Take a snapshot and share the snapshot with a friend
    4. If both the accounts are using the same encryption key then the user can share the volume directly
  24. A user is using an EBS backed instance. Which of the below mentioned statements is true?
    1. The user will be charged for volume and instance only when the instance is running
    2. The user will be charged for the volume even if the instance is stopped
    3. The user will be charged only for the instance running cost
    4. The user will not be charged for the volume if the instance is stopped
  25. A user is planning to use EBS for his DB requirement. The user already has an EC2 instance running in the VPC private subnet. How can the user attach the EBS volume to a running instance?
    1. The user must create EBS within the same VPC and then attach it to a running instance.
    2. The user can create EBS in the same zone as the subnet of instance and attach that EBS to instance. (Should be in the same AZ)
    3. It is not possible to attach an EBS to an instance running in VPC until the instance is stopped.
    4. The user can specify the same subnet while creating EBS and then attach it to a running instance.
  26. A user is creating an EBS volume. He asks for your advice. Which advice mentioned below should you not give to the user for creating an EBS volume?
    1. Take the snapshot of the volume when the instance is stopped
    2. Stripe multiple volumes attached to the same instance
    3. Create an AMI from the attached volume (AMI is created from the snapshot)
    4. Attach multiple volumes to the same instance
  27. An EC2 instance has one additional EBS volume attached to it. How can a user attach the same volume to another running instance in the same AZ?
    1. Terminate the first instance and only then attach to the new instance
    2. Attach the volume as read only to the second instance
    3. Detach the volume first and attach to new instance
    4. No need to detach. Just select the volume and attach it to the new instance, it will take care of mapping internally
  28. What is the scope of an EBS volume?
    1. VPC
    2. Region
    3. Placement Group
    4. Availability Zone

Reference

Amazon_EBS