AWS S3 vs EBS vs EFS

S3 vs EBS vs EFS

Amazon EFS, Amazon EBS, and Amazon S3 are AWS’ three different storage types that are applicable for different types of workload needs

S3 vs EBS vs EFS

Amazon S3

    • is an object store with a with simple key, value store design and good at storing vast numbers of backups or user files.
    • offers pay for the storage you actually use. Offers cost-saving storage classes ideal for infrequently access data or for data archival
    • provides unlimited storage
    • provides durability as the data is replicated and stored across at least three geographically-dispersed AZs with a maximum of 99.999999999% (11! 9’s)
    • provide high availability with a maximum of 99.99%
    • provides Security with range of access control mechanism and ability. to encrypt data at rest and in transit
    • data can be accessed programmatically or directly from services such as AWS CloudFront.
    • provides backup capability using versioning and cross-region replication

Amazon EBS

    • delivers high-availability block-level storage volumes for Amazon Elastic Compute Cloud (EC2) instances.
    • offers pay for the provisioned storage, even if you do not use it
    • provides limited storage capability and cannot scale infinitely
    • stores data on a file system which can be retained after the EC2 instance is shut down.
    • provides durability by replicating data across multiple servers in an AZ to prevent the loss of data from the failure of any single component
    • designed for 99.999% availability
    • provides low-latency performance – using SSD EBS volumes, it offers reliable I/O performance scaled to meet your workload needs.
    • provides secure storage with access control and providing data at rest and in transit encryption
    • is only accessible from a single EC2 instance in the particular AWS region and AZ
    • provides backup capability using backups and snapshots

Amazon EFS

    • scalable file storage, also optimized for EC2.
    • offers pay for the storage you actually use. There’s no advance provisioning, up-front fees, or commitments
    • multiple instances can be configured to mount the file system.
    • allows mounting the file system across multiple regions and instances.
    • is designed to be highly durable and highly available. Data is redundantly stored across multiple AZs.
    • provides elasticity – scales up and down automatically, even to meet the most abrupt workload spikes.
    • provides performance that scales to support any workload: EFS offers the throughput changing workloads need. It can provide higher throughput in spurts that match sudden file system growth, even for workloads up to 500,000 IOPS or 10 GB per second.
    • provides accessible file storage, which can be accessed by On-premises servers and EC2 instances concurrently.
    • provides security and compliance – access to the file system can be secured with the current security solution, or control access to EFS file systems using IAM, VPC, or POSIX permissions.
    • provides data encryption in transit or at rest.
    • allows EC2 instances to access EFS file systems located in other AWS regions through VPC peering.
    • a file system can be accessed concurrently from all AZs in the region where it is located, which means the application can be architected to failover from one AZ to other AZs in the region in order to ensure the highest level of application availability. Mount targets themselves are designed to be highly available.
    • used as a common data source for any application or workload that runs on numerous instances.

[do_widget id=text-15]

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company runs an application on a group of Amazon Linux EC2 instances. The application writes log files using standard API calls. For compliance reasons, all log files must be retained indefinitely and will be analyzed by a reporting tool that must access all files concurrently. Which storage service should a solutions architect use to provide the MOST cost-effective solution?
    1. Amazon EBS
    2. Amazon EFS
    3. Amazon EC2 instance store
    4. Amazon S3
  2. A new application is being deployed on Amazon EC2. The Application needs to read write upto 3 TB of data to an external data store and requires read-after-write consistency across all AWS regions for writing new objects into this data store.
    1. Amazon EBS
    2. Amazon Glacier
    3. Amazon EFS
    4. Amazon S3
  3. To meet the requirements of an application, an organization needs to save a constantly increasing volume of files on a cloud storage system with the following features and abilities. What below AWS service will meet these requirements?
      1. Pay only for the storage used
      2. Create different security policies for different groups of files
      3. Allow access to the public
      4. Retrieve the files at any time
      5. Store an unlimited number of files
    1. Amazon EBS
    2. Amazon S3
    3. Amazon Glacier
    4. Amazon EFS

AWS Certification – Storage & Content Delivery – Cheat Sheet

Instance Store

  • provides temporary block-level storage for an EC2 instance,  also referred to as ephemeral storage
  • is physically attached to the Instance
  • deliver very high random I/O performance, which is a good option when storage with very low latency is needed
  • cannot be dynamically resized
  • data persists when an instance is rebooted
  • data does not persists if the
    • underlying disk drive fails
    • instance stops i.e. if the EBS backed instance with instance store volumes attached is stopped
    • instance terminates
  • can be attached to an EC2 instance only when the instance is launched

Elastic Block Store – EBS

  • is virtual network attached block storage
  • volumes CANNOT be shared with multiple EC2 instances, use EFS instead
  • persists and is independent of EC2 lifecycle
  • multiple volumes can be attached to a single EC2 instance
  • can be detached & attached to another EC2 instance in that same AZ only
  • volumes are created in an specific AZ and CANNOT span across AZs
  • snapshots CANNOT span across regions
  • for making volume available to different AZ, create a snapshot of the volume and restore it to a new volume in any AZ within the region
  • for making the volume available to different Region, the snapshot of the volume can be copied to a different region and restored as a volume
  • provides high durability and are redundant in an AZ, as the data is automatically replicated within that AZ to prevent data loss due to any single hardware component failure
  • PIOPS is designed to run transactions applications that require high and consistent IO for e.g. Relation database, NoSQL etc

EBS Encryption

EBS Snapshots

  • helps create backups of EBS volumes
  • are incremental
  • occur asynchronously, consume the instance IOPS
  • are regional, and can be copied across regions to make it easier to leverage multiple regions for geographical expansion, data center migration, and disaster recovery
  • can be shared by making them public or with specific AWS accounts by modifying the access permissions of the snapshots
  • support EBS encryption
    • Snapshots of encrypted volumes are automatically encrypted
    • Volumes created from encrypted snapshots are automatically encrypted
    • All data in flight between the instance and the volume is encrypted
    • Volumes created from an unencrypted snapshot owned or have access to can be encrypted on-the-fly.
    • Encrypted snapshot owned or having access to, can be encrypted with a different key during the copy process.
  • can be automated using AWS Data Lifecycle Manager

EBS vs Instance Store

Refer blog post @ EBS vs Instance Store

Simple Storage Service – S3

  • Key-value based object storage with unlimited storage, unlimited objects up to 5 TB for the internet
  • is an Object level storage (not a Block level storage) and cannot be used to host OS or dynamic websites (but can work with Javascript SDK)
  • provides durability by redundantly storing objects on multiple facilities within a region
  • support SSL encryption of data in transit and data encryption at rest
  • regularly verifies the integrity of data using checksums and provides auto healing capability
  • integrates with CloudTrail, CloudWatch and SNS for event notifications
  • S3 resources
    • consists of bucket and objects stored in the bucket which can be retrieved via a unique, developer-assigned key
    • bucket names are globally unique
    • data model is a flat structure with no hierarchies or folders
    • Logical hierarchy can be inferred using the keyname prefix e.g. Folder1/Object1
  • Bucket & Object Operations
    • allows retrieval of 1000 objects and provides pagination support and is NOT suited for list or prefix queries with large number of objects
    • with a single put operations, 5GB size object can be uploaded
    • use Multipart upload to upload large objects up to 5 TB and is recommended for object size of over 100MB for fault tolerant uploads
    • support Range HTTP Header to retrieve partial objects for fault tolerant downloads where the network connectivity is poor
    • Pre-Signed URLs can also be used shared for uploading/downloading objects for limited time without requiring AWS security credentials
    • allows deletion of a single object or multiple objects (max 1000) in a single call
  • Multipart Uploads allows
    • parallel uploads with improved throughput and bandwidth utilization
    • fault tolerance and quick recovery from network issues
    • ability to pause and resume uploads
    • begin an upload before the final object size is known
  • Versioning
    • allows preserve, retrieve, and restore every version of every object
    • protects individual files but does NOT protect from Bucket deletion
  • Storage tiers
    • Standard
      • default storage class
      • 99.999999999% durability & 99.99% availability
      • Low latency and high throughput performance
      • designed to sustain the loss of data in a two facilities
    • Standard IA
      • optimized for long-lived and less frequently accessed data
      • designed to sustain the loss of data in a two facilities
      • 99.999999999% durability & 99.9% availability
      • suitable for objects greater than 128 KB kept for at least 30 days
    • Reduced Redundancy Storage
      • designed for noncritical, reproducible data stored at lower levels of redundancy than the STANDARD storage class
      • reduces storage costs
      • 99.99% durability & 99.99% availability
      • designed to sustain the loss of data in a single facility
    • Glacier
      • suitable for archiving data where data access is infrequent and retrieval time of several (3-5) hours  is acceptable
      • 99.999999999% durability
  • allows Lifecycle Management policies
    • transition to move objects to different storage classes and Glacier
    • expiration to remove objects
  • Data Consistency Model
    • provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES
    • for new objects, synchronously stores data across multiple facilities before returning success
    • updates to a single key are atomic
  • Security
    • IAM policies – grant users within your own AWS account permission to access S3 resources
    • Bucket and Object ACL – grant other AWS accounts (not specific users) access to  S3 resources
    • Bucket policies – allows to add or deny permissions across some or all of the objects within a single bucket
  • Data Protection – Pending
  • Best Practices
    • use random hash prefix for keys and ensure a random access pattern, as S3 stores object lexicographically randomness helps distribute the contents across multiple partitions for better performance
    • use parallel threads and Multipart upload for faster writes
    • use parallel threads and Range Header GET for faster reads
    • for list operations with large number of objects, its better to build a secondary index in DynamoDB
    • use Versioning to protect from unintended overwrites and deletions, but this does not protect against bucket deletion
    • use VPC S3 Endpoints with VPC to transfer data using Amazon internal network

Glacier

  • suitable for archiving data, where data access is infrequent and a retrieval time of several hours (3 to 5 hours) is acceptable (Not true anymore with enhancements from AWS)
  • provides a high durability by storing archive in multiple facilities and multiple devices at a very low cost storage
  • performs regular, systematic data integrity checks and is built to be automatically self healing
  • aggregate files into bigger files before sending them to Glacier and use range retrievals to retrieve partial file and reduce costs
  • improve speed and reliability with multipart upload
  • automatically encrypts the data using AES-256
  • upload or download data to Glacier via SSL encrypted endpoints

EFS

  • fully-managed, easy to set up, scale, and cost-optimize file storage
  • can automatically scale from gigabytes to petabytes of data without needing to provision storage
  • provides managed NFS (network file system) that can be mounted on and accessed by multiple EC2 in multiple AZs simultaneously
  • highly durable, highly scalable and highly available.
    • stores data redundantly across multiple Availability Zones
    • grows and shrinks automatically as files are added and removed, so you there is no need to manage storage procurement or provisioning.
  • expensive (3x gp2), but you pay per use
  • uses the Network File System version 4 (NFS v4) protocol
  • is compatible with all Linux-based AMIs for EC2,  POSIX file system (~Linux) that has a standard file API
  • does not support Windows AMI
  • offers the ability to encrypt data at rest using KMS and in transit.
  • can be accessed from on-premises using an AWS Direct Connect or AWS VPN connection between the on-premises datacenter and VPC.
  • can be accessed concurrently from servers in the on-premises datacenter as well as EC2 instances in the Amazon VPC
  • Performance mode
    • General purpose (default)
      • latency-sensitive use cases (web server, CMS, etc…)
    • Max I/O
      • higher latency, throughput, highly parallel (big data, media processing)
  • Storage Tiers
    • Standard
      • for frequently accessed files
      • ideal for active file system workloads and you pay only for the file system storage you use per month
    • Infrequent access (EFS-IA)
      • a lower cost storage class that’s cost-optimized for files infrequently accessed i.e. not accessed every day
      • cost to retrieve files, lower price to store
    • EFS Lifecycle Management with choosing an age-off policy allows moving files to EFS IA
    • Lifecycle Management automatically moves the data to the EFS IA storage class according to the lifecycle policy. for e.g., you can move files automatically into EFS IA fourteen days of not being accessed.
    • EFS is a shared POSIX system for Linux systems and does not work for Windows

Amazon FSx for Windows

  • is a fully managed,  highly reliable, and scalable Windows file system share drive
  • supports SMB protocol & Windows NTFS
  • supports Microsoft Active Directory integration, ACLs, user quotas
  • built on SSD, scale up to 10s of GB/s, millions of IOPS, 100s PB of data
  • is accessible from Windows, Linux, and MacOS compute instances
  • can be accessed from the on-premise infrastructure
  • can be configured to be Multi-AZ (high availability)
  • supports encryption of data at rest and in transit
  • provides data deduplication, which enables further cost optimization by removing redundant data.
  • data is backed-up daily to S3

Amazon FSx for Lustre

  • provides easy and cost effective way to launch and run the world’s most popular high-performance file system.
  • is a type of parallel distributed file system, for large-scale computing
  • Lustre is derived from “Linux” and “cluster”
  • Machine Learning, High Performance Computing (HPC) esp. Video Processing, Financial Modeling, Electronic Design Automation
  • scales up to 100s GB/s, millions of IOPS, sub-ms latencies
  • seamless integration with S3, it transparently presents S3 objects as files and allows you to write changed data back to S3.
  • can “read S3” as a file system (through FSx)
  • can write the output of the computations back to S3 (through FSx)
  • supports encryption of data at rest and in transit
  • can be used from on-premise servers

CloudFront

  • provides low latency and high data transfer speeds for distribution of static, dynamic web or streaming content to web users
  • delivers the content through a worldwide network of data centers called Edge Locations
  • keeps persistent connections with the origin servers so that the files can be fetched from the origin servers as quickly as possible.
  • dramatically reduces the number of network hops that users’ requests must pass through
  • supports multiple origin server options, like AWS hosted service for e.g. S3, EC2, ELB or an on premise server, which stores the original, definitive version of the objects
  • single distribution can have multiple origins and Path pattern in a cache behavior determines which requests are routed to the origin
  • supports Web Download distribution and RTMP Streaming distribution
    • Web distribution supports static, dynamic web content, on demand using progressive download & HLS and live streaming video content
    • RTMP supports streaming of media files using Adobe Media Server and the Adobe Real-Time Messaging Protocol (RTMP) ONLY
  • supports HTTPS using either
    • dedicated IP address, which is expensive as dedicated IP address is assigned to each CloudFront edge location
    • Server Name Indication (SNI), which is free but supported by modern browsers only with the domain name available in the request header
  • For E2E HTTPS connection,
    • Viewers -> CloudFront needs either self signed certificate, or certificate issued by CA or ACM
    • CloudFront -> Origin needs certificate issued by ACM for ELB and by CA for other origins
  •  Security
    • Origin Access Identity (OAI) can be used to restrict the content from S3 origin to be accessible from CloudFront only
    • supports Geo restriction (Geo-Blocking) to whitelist or blacklist countries that can access the content
    • Signed URLs 
      • for RTMP distribution as signed cookies aren’t supported
      • to restrict access to individual files, for e.g., an installation download for your application.
      • users using a client, for e.g. a custom HTTP client, that doesn’t support cookies
    • Signed Cookies
      • provide access to multiple restricted files, for e.g., video part files in HLS format or all of the files in the subscribers’ area of a website.
      • don’t want to change the current URLs
    • integrates with AWS WAF, a web application firewall that helps protect web applications from attacks by allowing rules configured based on IP addresses, HTTP headers, and custom URI strings
  • supports GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE to get object & object headers, add, update, and delete objects
    • only caches responses to GET and HEAD requests and, optionally, OPTIONS requests
    • does not cache responses to PUT, POST, PATCH, DELETE request methods and these requests are proxied back to the origin
  • object removal from cache
    • would be removed upon expiry (TTL) from the cache, by default 24 hrs
    • can be invalidated explicitly, but has a cost associated, however might continue to see the old version until it expires from those caches
    • objects can be invalidated only for Web distribution
    • change object name, versioning, to serve different version
  • supports adding or modifying custom headers before the request is sent to origin which can be used to
    • validate if user is accessing the content from CDN
    • identifying CDN from which the request was forwarded from, in case of multiple CloudFront distribution
    • for viewers not supporting CORS to return the Access-Control-Allow-Origin header for every request
  • supports Partial GET requests using range header to download object in smaller units improving the efficiency of partial downloads and recovery from partially failed transfers
  • supports compression to compress and serve compressed files when viewer requests include Accept-Encoding: gzip in the request header
  • supports different price class to include all regions, to include only least expensive regions and other regions to exclude most expensive regions
  • supports access logs which contain detailed information about every user request for both web and RTMP distribution

AWS Import/Export

  • accelerates moving large amounts of data into and out of AWS using portable storage devices for transport and transfers data directly using Amazon’s high speed internal network, bypassing the internet.
  • suitable for use cases with
    • large datasets
    • low bandwidth connections
    • first time migration of data
  • Importing data to several types of AWS storage, including EBS snapshots, S3 buckets, and Glacier vaults.
  • Exporting data out from S3 only, with versioning enabled only the latest version is exported
  • Import data can be encrypted (optional but recommended) while export is always encrypted using Truecrypt
  • Amazon will wipe the device if specified, however it will not destroy the device

AWS EBS Snapshot

EBS Snapshot

  • EBS provides the ability to create snapshots (backups) of any EBS volume and write a copy of the data in the volume to S3, where it is stored redundantly in multiple Availability Zones
  • Snapshots can be used to create new volumes, increase the size of the volumes or replicate data across Availability Zones
  • Snapshots are incremental backups and store only the data that was changed from the time the last snapshot was taken.
  • Snapshots size can probably be smaller then the volume size as the data is compressed before being saved to S3
  • Even though snapshots are saved incrementally, the snapshot deletion process is designed so that you need to retain only the most recent snapshot in order to restore the volume.
  • EBS Snapshots can be used to migrate or create EBS Volumes in different AZs or regions.

Multi-Volume Snapshots

  • Snapshots can be used to create a backup of critical workloads, such as a large database or a file system that spans across multiple EBS volumes.
  • Multi-volume snapshots helps take exact point-in-time, data coordinated, and crash-consistent snapshots across multiple EBS volumes attached to an EC2 instance.
  • It is no longer required to stop the instance or to coordinate between volumes to ensure crash consistency, because snapshots are automatically taken across multiple EBS volumes.

EBS Snapshot creation

  • Snapshots can be created from EBS volumes periodically and are point-in-time snapshots.
  • Snapshots are incremental and only store the blocks on the device that changed since the last snapshot was taken
  • Snapshots occur asynchronously; the point-in-time snapshot is created immediately while it takes time to upload the modified blocks to S3
  • Snapshots can be taken from in-use volumes. However, snapshots will only capture the data that was written to the EBS volumes at the time snapshot command is issued excluding the data which is cached by any applications of OS
  • Recommended ways to create a Snapshot from an EBS volume are
    • Pause all file writes to the volume
    • Unmount the Volume -> Take Snapshot -> Remount the Volume
    • Stop the instance – Take Snapshot (for root EBS volumes)
  • EBS volume created based on a snapshot
    • begins as an exact replica of the original volume that was used to create the snapshot.
    • replicated volume loads data in the background so that it can be used immediately.
    • If data that hasn’t been loaded yet is accessed, the volume immediately downloads the requested data from S3, and then continues loading the rest of the volume’s data in the background.

EBS Snapshot Deletion

  • When a snapshot is deleted only the data exclusive to that snapshot is removed.
  • Deleting previous snapshots of a volume do not affect your ability to restore volumes from later snapshots of that volume.
  • Active snapshots contain all of the information needed to restore your data (from the time the snapshot was taken) to a new EBS volume.
  • Even though snapshots are saved incrementally, the snapshot deletion process is designed so that you need to retain only the most recent snapshot in order to restore the volume.
  • Snapshot of the root device of an EBS volume used by a registered AMI can’t be deleted. AMI needs to be deregistered to be able to delete the snapshot

EBS Snapshot Copy

  • Snapshots are constrained to the region in which they are created and can be used to launch EBS volumes within the same region only
  • Snapshots can be copied across regions to make it easier to leverage multiple regions for geographical expansion, data center migration, and disaster recovery
  • Snapshots are copied with S3 server-side encryption (256-bit Advanced Encryption Standard) to encrypt the data and the snapshot copy receives a snapshot ID that’s different from the original snapshot’s ID.
  • User-defined tags are not copied from the source to the new snapshot.
  • First Snapshot copy to another region is always a full copy, while the rest are always incremental.
  • When a snapshot is copied,
    • it can be encrypted if currently unencrypted or
    • can be encrypted using a different encryption key. Changing the encryption status of a snapshot or using a non-default EBS CMK during a copy operation always results in a full copy (not incremental)

EBS Snapshot Sharing

  • Snapshots can be shared by making them public or with specific AWS accounts by modifying the access permissions of the snapshots
  • Encrypted snapshots cannot be made available publicly.
  • Only unencrypted snapshots can be shared. Encrypted snapshots cannot be shared between accounts or made public
  • Encrypted snapshot can be shared with specific AWS accounts by sharing the custom CMK key used must also be shared to encrypt it
  • Cross-account permissions may be applied to a custom key either when it is created or at a later time.
  • Users, with access to snapshots, can copy the snapshot and create their own EBS volumes based on the snapshot while the original snapshot remains unaffected
  • AWS prevents you from sharing snapshots that were encrypted with the default CMK

EBS Snapshot Encryption

  • EBS snapshots fully support EBS encryption.
  • Snapshots of encrypted volumes are automatically encrypted
  • Volumes created from encrypted snapshots are automatically encrypted
  • All data in flight between the instance and the volume is encrypted
  • Volumes created from an unencrypted snapshot owned or have access to can be encrypted on-the-fly.
  • Unencrypted snapshot you own, can be encrypted during the copy process
  • Encrypted snapshot that you own or have access to, can be encrypted with a different key during the copy process.
  • First snapshot of an encrypted volume that has been created from an unencrypted snapshot is always a full snapshot.
  • First snapshot of a reencrypted volume, which has a different CMK compared to the source snapshot, is always a full snapshot.

EBS Snapshot Lifecycle Automation

  • Amazon Data Lifecycle Manager can be used to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes.
  • Automating snapshot management helps you to:
    • Protect valuable data by enforcing a regular backup schedule.
    • Retain backups as required by auditors or internal compliance.
    • Reduce storage costs by deleting outdated backups.

[do_widget id=text-15]

AWS Certification Exam Practice Questions

    • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
    • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
    • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
    • Open to further feedback, discussion and correction.
  1. An existing application stores sensitive information on a non-boot Amazon EBS data volume attached to an Amazon Elastic Compute Cloud instance. Which of the following approaches would protect the sensitive data on an Amazon EBS volume?
    1. Upload your customer keys to AWS CloudHSM. Associate the Amazon EBS volume with AWS CloudHSM. Remount the Amazon EBS volume.
    2. Create and mount a new, encrypted Amazon EBS volume. Move the data to the new volume. Delete the old Amazon EBS volume.
    3. Unmount the EBS volume. Toggle the encryption attribute to True. Re-mount the Amazon EBS volume.
    4. Snapshot the current Amazon EBS volume. Restore the snapshot to a new, encrypted Amazon EBS volume. Mount the Amazon EBS volume (Need to create a snapshot, create an encrypted copy of snapshot and then create a EBS volume and mount it)
  2. Is it possible to access your EBS snapshots?
    1. Yes, through the Amazon S3 APIs.
    2. Yes, through the Amazon EC2 APIs
    3. No, EBS snapshots cannot be accessed; they can only be used to create a new EBS volume.
    4. EBS doesn’t provide snapshots.
  3. Which of the following approaches provides the lowest cost for Amazon Elastic Block Store snapshots while giving you the ability to fully restore data?
    1. Maintain two snapshots: the original snapshot and the latest incremental snapshot
    2. Maintain a volume snapshot; subsequent snapshots will overwrite one another
    3. Maintain a single snapshot the latest snapshot is both Incremental and complete
    4. Maintain the most current snapshot, archive the original and incremental to Amazon Glacier.
  4. Which procedure for backing up a relational database on EC2 that is using a set of RAIDed EBS volumes for storage minimizes the time during which the database cannot be written to and results in a consistent backup?
    1. Detach EBS volumes, 2. Start EBS snapshot of volumes, 3. Re-attach EBS volumes
    2. Stop the EC2 Instance. 2. Snapshot the EBS volumes
    3. Suspend disk I/O, 2. Create an image of the EC2 Instance, 3. Resume disk I/O
    4. Suspend disk I/O, 2. Start EBS snapshot of volumes, 3. Resume disk I/O
    5. Suspend disk I/O, 2. Start EBS snapshot of volumes, 3. Wait for snapshots to complete, 4. Resume disk I/O
  5. How can an EBS volume that is currently attached to an EC2 instance be migrated from one Availability Zone to another?
    1. Detach the volume and attach it to another EC2 instance in the other AZ.
    2. Simply create a new volume in the other AZ and specify the original volume as the source.
    3. Create a snapshot of the volume, and create a new volume from the snapshot in the other AZ
    4. Detach the volume, then use the ec2-migrate-volume command to move it to another AZ.
  6. How are the EBS snapshots saved on Amazon S3?
    1. Exponentially
    2. Incrementally
    3. EBS snapshots are not stored in the Amazon S3
    4. Decrementally
  7. EBS Snapshots occur _____
    1. Asynchronously
    2. Synchronously
    3. Weekly
  8. What will be the status of the snapshot until the snapshot is complete?
    1. Running
    2. Working
    3. Progressing
    4. Pending
  9. Before I delete an EBS volume, what can I do if I want to recreate the volume later?
    1. Create a copy of the EBS volume (not a snapshot)
    2. Create and Store a snapshot of the volume
    3. Download the content to an EC2 instance
    4. Back up the data in to a physical disk
  10. Which of the following are true regarding encrypted Amazon Elastic Block Store (EBS) volumes? Choose 2 answers
    1. Supported on all Amazon EBS volume types
    2. Snapshots are automatically encrypted
    3. Available to all instance types
    4. Existing volumes can be encrypted
    5. Shared volumes can be encrypted
  11. Amazon EBS snapshots have which of the following two characteristics? (Choose 2.) Choose 2 answers
    1. EBS snapshots only save incremental changes from snapshot to snapshot
    2. EBS snapshots can be created in real-time without stopping an EC2 instance (the snapshot can be taken real time however it will not be consistent and the recommended way is to stop or freeze the IO)
    3. EBS snapshots can only be restored to an EBS volume of the same size or smaller (EBS volume restored from snapshots need to be of the same size of larger size)
    4. EBS snapshots can only be restored and mounted to an instance in the same Availability Zone as the original EBS volume (Snapshots are specific to Region and can be used to create a volume in any AZ and does not depend on the original EBS volume AZ)
  12. A user is planning to schedule a backup for an EBS volume. The user wants security of the snapshot data. How can the user achieve data encryption with a snapshot?
    1. Use encrypted EBS volumes so that the snapshot will be encrypted by AWS (Refer link)
    2. While creating a snapshot select the snapshot with encryption
    3. By default the snapshot is encrypted by AWS
    4. Enable server side encryption for the snapshot using S3
  13. A sys admin is trying to understand EBS snapshots. Which of the below mentioned statements will not be useful to the admin to understand the concepts about a snapshot?
    1. Snapshot is synchronous
    2. It is recommended to stop the instance before taking a snapshot for consistent data
    3. Snapshot is incremental
    4. Snapshot captures the data that has been written to the hard disk when the snapshot command was executed
  14. When creation of an EBS snapshot is initiated but not completed, the EBS volume
    1. Cannot be detached or attached to an EC2 instance until me snapshot completes
    2. Can be used in read-only mode while me snapshot is in progress
    3. Can be used while the snapshot is in progress
    4. Cannot be used until the snapshot completes
  15. You have a server with a 5O0GB Amazon EBS data volume. The volume is 80% full. You need to back up the volume at regular intervals and be able to re-create the volume in a new Availability Zone in the shortest time possible. All applications using the volume can be paused for a period of a few minutes with no discernible user impact. Which of the following backup methods will best fulfill your requirements?
    1. Take periodic snapshots of the EBS volume
    2. Use a third-party Incremental backup application to back up to Amazon Glacier
    3. Periodically back up all data to a single compressed archive and archive to Amazon S3 using a parallelized multi-part upload
    4. Create another EBS volume in the second Availability Zone attach it to the Amazon EC2 instance, and use a disk manager to mirror me two disks
  16. A user is creating a snapshot of an EBS volume. Which of the below statements is incorrect in relation to the creation of an EBS snapshot?
    1. Its incremental
    2. It can be used to launch a new instance
    3. It is stored in the same AZ as the volume (stored in the same region)
    4. It is a point in time backup of the EBS volume
  17. A user has created a snapshot of an EBS volume. Which of the below mentioned usage cases is not possible with respect to a snapshot?
    1. Mirroring the volume from one AZ to another AZ
    2. Launch an instance
    3. Decrease the volume size
    4. Increase the size of the volume
  18. What is true of the way that encryption works with EBS?
    1. Snapshotting an encrypted volume makes an encrypted snapshot; restoring an encrypted snapshot creates an encrypted volume when specified / requested.
    2. Snapshotting an encrypted volume makes an encrypted snapshot when specified / requested; restoring an encrypted snapshot creates an encrypted volume when specified / requested.
    3. Snapshotting an encrypted volume makes an encrypted snapshot; restoring an encrypted snapshot always creates an encrypted volume.
    4. Snapshotting an encrypted volume makes an encrypted snapshot when specified / requested; restoring an encrypted snapshot always creates an encrypted volume.
  19. Why are more frequent snapshots of EBS Volumes faster?
    1. Blocks in EBS Volumes are allocated lazily, since while logically separated from other EBS Volumes, Volumes often share the same physical hardware. Snapshotting the first time forces full block range allocation, so the second snapshot doesn’t need to perform the allocation phase and is faster.
    2. The snapshots are incremental so that only the blocks on the device that have changed after your last snapshot are saved in the new snapshot.
    3. AWS provisions more disk throughput for burst capacity during snapshots if the drive has been pre-warmed by snapshotting and reading all blocks.
    4. The drive is pre-warmed, so block access is more rapid for volumes when every block on the device has already been read at least one time.
  20. Which is not a restriction on AWS EBS Snapshots?
    1. Snapshots which are shared cannot be used as a basis for other snapshots (Snapshots shared with other users are usable in full by the recipient, including but limited to the ability to base modified volumes and snapshots)
    2. You cannot share a snapshot containing an AWS Access Key ID or AWS Secret Access Key
    3. You cannot share encrypted snapshots (NOTE: this has be updated partially where you can share a encrypted snapshot with other accounts)
    4. Snapshot restorations are restricted to the region in which the snapshots are created
  21. There is a very serious outage at AWS. EC2 is not affected, but your EC2 instance deployment scripts stopped working in the region with the outage. What might be the issue?
    1. The AWS Console is down, so your CLI commands do not work.
    2. S3 is unavailable, so you can’t create EBS volumes from a snapshot you use to deploy new volumes. (EBS volume snapshots are stored in S3. If S3 is unavailable, snapshots are unavailable)
    3. AWS turns off the <code>DeployCode</code> API call when there are major outages, to protect from system floods.
    4. None of the other answers make sense. If EC2 is not affected, it must be some other issue.

AWS EBS Performance – Certification

AWS EBS Performance Tips

EBS Performance depends on several factores including I/O characteristics and the configuration of instances and volumes and can be improved using PIOPS, EBS-Optimized instances, Pre-Warming and RAIDed configuration

EBS-Optimized or 10 Gigabit Network Instances

  • An EBS-Optimized instance uses an optimized configuration stack and provides additional, dedicated capacity for EBS I/O.
  • Optimization provides the best performance for the EBS volumes by minimizing contention between EBS I/O and other traffic from a instance.
  • EBS-Optimized instances deliver dedicated throughput to EBS, with options between 500 Mbps and 4,000 Mbps, depending on the instance type used
  • Not all instance types support EBS-Optimization
  • Some Instance type enable EBS-Optimization by default, while it can be enabled for some.
  • EBS optimization enabled for an instance, that is not EBS-Optimized by default, an additional low, hourly fee for the dedicated capacity is charged
  • When attached to an EBS–optimized instance,
    • General Purpose (SSD) volumes are designed to deliver within 10% of their baseline and burst performance 99.9% of the time in a given year
    • Provisioned IOPS (SSD) volumes are designed to deliver within 10% of their provisioned performance 99.9 percent of the time in a given year.

EBS Volume Initialization – Pre-warming

  • New EBS volumes receive their maximum performance the moment that they are available and DO NOT require initialization (pre-warming).
  • EBS volumes needed a pre-warming, previously, before being used to get maximum performance to start with. Pre-warming of the volume was possible by writing to the entire volume with 0 for new volumes or reading entire volume for volumes from snapshots
  • Storage blocks on volumes that were restored from snapshots must be initialized (pulled down from S3 and written to the volume) before the block can be accessed
  • This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed.

RAID Configuration

  • EBS volumes can be striped, if a single EBS volume does not meet the performance and more is required.
  • Striping volumes allows pushing tens of thousands of IOPS.
  • EBS volumes are already replicated across multiple servers in an AZ for availability and durability, so AWS generally recommend striping for performance rather than durability.
  • For greater I/O performance than can be achieved with a single volume, RAID 0 can stripe multiple volumes together; for on-instance redundancy, RAID 1 can mirror two volumes together.
  • RAID 0 allows I/O distribution across all volumes in a stripe, allowing straight gains with each addition.
  • RAID 1 can be used for durability to mirror volumes, but in this case, it requires more EC2 to EBS bandwidth as the data is written to multiple volumes simultaneously and should be used with EBS–optimization.
  • EBS volume data is replicated across multiple servers in an AZ to prevent the loss of data from the failure of any single component
  • AWS doesn’t recommend RAID 5 and 6 because the parity write operations of these modes consume the IOPS available for the volumes and can result in 20-30% fewer usable IOPS than a RAID 0.
  • A 2-volume RAID 0 config can outperform a 4-volume RAID 6 that costs twice as much.

RAID Configuration
[do_widget id=text-15]

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user is trying to pre-warm a blank EBS volume attached to a Linux instance. Which of the below mentioned steps should be performed by the user?
    1. There is no need to pre-warm an EBS volume (with latest update no pre-warming is needed)
    2. Contact AWS support to pre-warm (This used to be the case before, but pre warming is not necessary now)
    3. Unmount the volume before pre-warming
    4. Format the device
  2. A user has created an EBS volume of 10 GB and attached it to a running instance. The user is trying to access EBS for first time. Which of the below mentioned options is the correct statement with respect to a first time EBS access?
    1. The volume will show a size of 8 GB
    2. The volume will show a loss of the IOPS performance the first time (the volume needed to be wiped cleaned before for new volumes, however pre warming is not needed any more)
    3. The volume will be blank
    4. If the EBS is mounted it will ask the user to create a file system
  3. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  4. You have launched an EC2 instance with four (4) 500 GB EBS Provisioned IOPS volumes attached. The EC2 Instance is EBS-Optimized and supports 500 Mbps throughput between EC2 and EBS. The two EBS volumes are configured as a single RAID 0 device, and each Provisioned IOPS volume is provisioned with 4,000 IOPS (4000 16KB reads or writes) for a total of 16,000 random IOPS on the instance. The EC2 Instance initially delivers the expected 16,000 IOPS random read and write performance. Sometime later in order to increase the total random I/O performance of the instance, you add an additional two 500 GB EBS Provisioned IOPS volumes to the RAID. Each volume is provisioned to 4,000 IOPS like the original four for a total of 24,000 IOPS on the EC2 instance Monitoring shows that the EC2 instance CPU utilization increased from 50% to 70%, but the total random IOPS measured at the instance level does not increase at all. What is the problem and a valid solution?
    1. Larger storage volumes support higher Provisioned IOPS rates: increase the provisioned volume storage of each of the 6 EBS volumes to 1TB.
    2. EBS-Optimized throughput limits the total IOPS that can be utilized use an EBS-Optimized instance that provides larger throughput. (EC2 Instance types have limit on max throughput and would require larger instance types to provide 24000 IOPS)
    3. Small block sizes cause performance degradation, limiting the I’O throughput, configure the instance device driver and file system to use 64KB blocks to increase throughput.
    4. RAID 0 only scales linearly to about 4 devices, use RAID 0 with 4 EBS Provisioned IOPS volumes but increase each Provisioned IOPS EBS volume to 6.000 IOPS.
    5. The standard EBS instance root volume limits the total IOPS rate, change the instant root volume to also be a 500GB 4,000 Provisioned IOPS volume
  5. A user has deployed an application on an EBS backed EC2 instance. For a better performance of application, it requires dedicated EC2 to EBS traffic. How can the user achieve this?
    1. Launch the EC2 instance as EBS provisioned with PIOPS EBS
    2. Launch the EC2 instance as EBS enhanced with PIOPS EBS
    3. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    4. Launch the EC2 instance as EBS optimized with PIOPS EBS

AWS Glacier – Certification

AWS Glacier

  • Amazon Glacier is a storage service optimized for archival, infrequently used data, or “cold data.”
  • Glacier is an extremely low-cost storage service that provides durable storage with security features for data archiving and backup.
  • Glacier is designed to provide average annual durability of 99.999999999% for an archive.
  • Glacier redundantly stores data in multiple facilities and on multiple devices within each facility.
  • To increase durability, Glacier synchronously stores the data across multiple facilities before returning SUCCESS on uploading archives.
  • Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.
  • Glacier enables customers to offload the administrative burdens of operating and scaling storage to AWS, without having to worry about capacity planning, hardware provisioning, data replication, hardware failure detection and recovery, or time-consuming hardware migrations.
  • Glacier is a great storage choice when low storage cost is paramount, with data rarely retrieved, and retrieval latency of several hours is acceptable.
  • Glacier now offers a range of data retrievals options where the retrieval time varies from hours to 1-5 minutes
  • S3 should be used if applications requires fast, frequent real time access to the data
  • Glacier can store virtually any kind of data in any format.
  • All data is encrypted on the server side with Glacier handling key management and key protection. It uses AES-256, one of the strongest block ciphers available
  • Glacier allows interaction through AWS Management Console, Command Line Interface CLI and SDKs or REST based APIs.
    • management console can only be used to create and delete vaults.
    • rest of the operations to upload, download data, create jobs for retrieval need CLI, SDK or REST based APIs
  • Use cases include
    • Digital media archives
    • Data that must be retained for regulatory compliance
    • Financial and healthcare records
    • Raw genomic sequence data
    • Long-term database backups

Amazon Glacier Data Model

  • Amazon Glacier data model core concepts include vaults and archives and also includes job and notification-configuration resources
    • Vault
      • A vault is a container for storing archives
      • Each vault resource has a unique address, which comprises of the region the vault was created and the unique vault name within the region and account for e.g. https://glacier.us-west-2.amazonaws.com/111122223333/vaults/examplevault
      • Vault allows storage of unlimited number of archives
      • Glacier supports various vault operations which are region specific
      • An AWS account can create up to 1,000 vaults per region.
    • Archive
      • An archive can be any data such as a photo, video, or document and is a base unit of storage in Glacier.
      • Each archive has a unique ID and an optional description, which can only be specified during the upload of an archive.
      • Glacier assigns the archive an ID, which is unique in the AWS region in which it is stored.
      • Archive can be uploaded in a single request. While for large archives, Glacier provides a multipart upload API that enables uploading an archive in parts.
    • Jobs
      • A Job is required to retrieve an Archive and vault inventory list
      • Data retrieval requests are asynchronous operations, are queued and most jobs take about four hours to complete.
      • A job is first initiated and then the output of the job is downloaded after the job is completes
      • Vault inventory jobs needs the vault name
      • Data retrieval jobs needs both the vault name and the archive id, with an optional description
      • A vault can have multiple jobs in progress at any point in time and can be identified by Job ID, assigned when is it created for tracking
      • Glacier maintains job information such as job type, description, creation date, completion date, and job status and can be queried
      • After the job completes, the job output can be downloaded in full or partially by specifying a byte range.
    • Notification Configuration
      • As the jobs are asynchronous, Glacier supports notification mechanism to a SNS topic when job completes
      • SNS topic for notification can either be specified with each individual job request or with the vault
      • Glacier stores the notification configuration as a JSON document

Glacier Data Retrievals Options

Glacier provides three options for retrieving data with varying access times and cost: Expedited, Standard, and Bulk retrievals.

Standard retrievals

  • Standard retrievals allow access to any of the archives within several hours.
  • Standard retrievals typically complete within 3-5 hours.

Bulk retrievals

  • Bulk retrievals are Glacier’s lowest-cost retrieval option, enabling retrieval of large amounts, even petabytes, of data inexpensively in a day.
  • Bulk retrievals typically complete within 5 – 12 hours.

Expedited retrievals

  • Expedited retrievals allows quick access to the data when occasional urgent requests for a subset of archives are required.
  • For all but the largest archives (250MB+), data accessed using Expedited retrievals are typically made available within 1 – 5 minutes.
  • There are two types of Expedited retrievals: On-Demand and Provisioned.
    • On-Demand requests are like EC2 On-Demand instances and are available the vast majority of the time.
    • Provisioned requests are guaranteed to be available when needed

Glacier Supported Operations

Vault Operations

  • Glacier provides operations to create and delete vaults.
  • A vault can be deleted only if there are no archives in the vault as of the last computed inventory and there have been no writes to the vault since the last inventory (as the inventory is prepared periodically)
  • Vault Inventory
    • Vault inventory helps retrieve list of archives in a vault with information such as archive ID, creation date, and size for each archive
    • Inventory for each vault is prepared periodically, every 24 hours
    • Vault inventory is updated approximately once a day, starting on the day the first archive is uploaded to the vault.
    • When a vault inventory job is, Glacier returns the last inventory it generated, which is a point-in-time snapshot and not real-time data.
  • Vault Metadata or Description can also be obtained for a specific vault or for all vaults in a region, which provides information such as
    • creation date,
    • number of archives in the vault,
    • total size in bytes used by all the archives in the vault,
    • and the date the vault inventory was generated
  • Glacier also provides operations to set, retrieve, and delete a notification configuration on the vault. Notifications can be used to identify vault events.

Archive Operations

  • Glacier provides operations to upload, download and delete archives.

Uploading an Archive

  • An archive can be uploaded in a single operation (1 byte to up to 4 GB in size ) or in parts referred as Multipart upload (40 TB)
  • Multipart Upload helps to
    • improve the upload experience for larger archives.
    • upload archives in parts, independently, parallely and in any order
    • faster recovery by needing to upload only the part that failed upload and not the entire archive.
    • upload archives without even knowing the size
    • upload archives from 1 byte to about 40,000 GB (10,000 parts * 4 GB) in size
  • To upload existing data to Glacier, consider using the AWS Import/Export service, which accelerates moving large amounts of data into and out of AWS using portable storage devices for transport. AWS transfers the data directly onto and off of storage devices using Amazon’s high-speed internal network, bypassing the Internet.
  • Glacier returns a response that includes an archive ID which is unique in the region in which the archive is stored
  • Glacier does not support any additional metadata information apart from an optional description. Any additional metadata information required should be maintained at client side

Downloading an Archive

  • Downloading an archive is an asynchronous operation and is the 2 step process
    • Initiate an archive retrieval job
      • When a Job is initiated, a job ID is returned as a part of the response
      • Job is executed asynchronously and the output can be downloaded after the job completes
      • Job can be initiated to download the entire archive or a portion of the archive
    • After the job completes, download the bytes
      • Archive can downloaded as all the bytes or specific byte range to download only a portion of the output
      • Downloading the archive in chunks helps in the event of the download failure, as only that part needs to be downloaded
      • Job completion status can be checked by
        • Check status explicitly (Not Recommended)
          • periodically poll the describe job operation request to obtain job information
        • Completion notification
          • An SNS topic can be specified, when the job is initiated or with the vault, to be used to notify job completion
About Range Retrievals
  • Amazon Glacier allows retrieving an archive either in whole (default) or a range, or portion
  • Range retrievals need a range to be provided that is megabyte aligned
  • Glacier returns checksum in the response which can be used to verify if any errors in download by comparing with checksum computed on the client side
  • Specifying a range of bytes can be helpful when:
    • Control bandwidth costs
      • Glacier allows retrieval of up to 5 percent of the average monthly storage (pro-rated daily) for free each month
      • Scheduling range retrievals can help in two ways.
        • meet the monthly free allowance of 5 percent by spreading out the data requested
        • if the amount of data retrieved doesn’t meet the free allowance percentage, scheduling range retrievals enables reduction of peak retrieval rate, which determines the retrieval fees.
    • Manage your data downloads
      • Glacier allows retrieved data to be downloaded for 24 hours after the retrieval request completes
      • Only portions of the archive can be retrieved so that the schedule of downloads can be managed within the given download window.
    • Retrieve a targeted part of a large archive
      • Retrieving an archive in range can be useful if an archive is uploaded as an aggregate of multiple individual files, and only few files need to be retrieved

Deleting an Archive

  • An archive can be deleted from the vault only one at a time
  • This operation is idempotent. Deleting an already-deleted archive does not result in an error
  • AWS applies pro-rated charge for items that are deleted prior to 90 days, as it is meant for long term storage

Updating an Archive

  • An existing archive cannot be updated and must be deleted and re-uploaded, which would be assigned a new archive id

[do_widget id=text-15]

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What is Amazon Glacier?
    1. You mean Amazon “Iceberg”: it’s a low-cost storage service.
    2. A security tool that allows to “freeze” an EBS volume and perform computer forensics on it.
    3. A low-cost storage service that provides secure and durable storage for data archiving and backup
    4. It’s a security tool that allows to “freeze” an EC2 instance and perform computer forensics on it.
  2. Amazon Glacier is designed for: (Choose 2 answers)
    1. Active database storage
    2. Infrequently accessed data
    3. Data archives
    4. Frequently accessed data
    5. Cached session data
  3. An organization is generating digital policy files which are required by the admins for verification. Once the files are verified they may not be required in the future unless there is some compliance issue. If the organization wants to save them in a cost effective way, which is the best possible solution?
    1. AWS RRS
    2. AWS S3
    3. AWS RDS
    4. AWS Glacier
  4. A user has moved an object to Glacier using the life cycle rules. The user requests to restore the archive after 6 months. When the restore request is completed the user accesses that archive. Which of the below mentioned statements is not true in this condition?
    1. The archive will be available as an object for the duration specified by the user during the restoration request
    2. The restored object’s storage class will be RRS (After the object is restored the storage class still remains GLACIER. Read more)
    3. The user can modify the restoration period only by issuing a new restore request with the updated period
    4. The user needs to pay storage for both RRS (restored) and Glacier (Archive) Rates
  5. To meet regulatory requirements, a pharmaceuticals company needs to archive data after a drug trial test is concluded. Each drug trial test may generate up to several thousands of files, with compressed file sizes ranging from 1 byte to 100MB. Once archived, data rarely needs to be restored, and on the rare occasion when restoration is needed, the company has 24 hours to restore specific files that match certain metadata. Searches must be possible by numeric file ID, drug name, participant names, date ranges, and other metadata. Which is the most cost-effective architectural approach that can meet the requirements?
    1. Store individual files in Amazon Glacier, using the file ID as the archive name. When restoring data, query the Amazon Glacier vault for files matching the search criteria. (Individual files are expensive and does not allow searching by participant names etc)
    2. Store individual files in Amazon S3, and store search metadata in an Amazon Relational Database Service (RDS) multi-AZ database. Create a lifecycle rule to move the data to Amazon Glacier after a certain number of days. When restoring data, query the Amazon RDS database for files matching the search criteria, and move the files matching the search criteria back to S3 Standard class. (As the data is not needed can be stored to Glacier directly and the data need not be moved back to S3 standard)
    3. Store individual files in Amazon Glacier, and store the search metadata in an Amazon RDS multi-AZ database. When restoring data, query the Amazon RDS database for files matching the search criteria, and retrieve the archive name that matches the file ID returned from the database query. (Individual files and Multi-AZ is expensive)
    4. First, compress and then concatenate all files for a completed drug trial test into a single Amazon Glacier archive. Store the associated byte ranges for the compressed files along with other search metadata in an Amazon RDS database with regular snapshotting. When restoring data, query the database for files that match the search criteria, and create restored files from the retrieved byte ranges.
    5. Store individual compressed files and search metadata in Amazon Simple Storage Service (S3). Create a lifecycle rule to move the data to Amazon Glacier, after a certain number of days. When restoring data, query the Amazon S3 bucket for files matching the search criteria, and retrieve the file to S3 reduced redundancy in order to move it back to S3 Standard class. (Once the data is moved from S3 to Glacier the metadata is lost, as Glacier does not have metadata and must be maintained externally)
  6. A user is uploading archives to Glacier. The user is trying to understand key Glacier resources. Which of the below mentioned options is not a Glacier resource?
    1. Notification configuration
    2. Archive ID
    3. Job
    4. Archive

References

AWS Storage Options – EBS & Instance Store

Amazon Elastic Block Store (EBS) volumes

  • EBS provides a durable block-level storage for use with Amazon EC2 instances (virtual machines)
  • EBS volumes are off-instance, network-attached storage (NAS) that persists independently from the running life of a single Amazon EC2 instance
  • Amazon EBS volume is attached to an instance, can be used like a physical hard drive, typically by formatting it with the file system of your choice and using the file I/O interface provided by the instance operating system.
  • Amazon EBS volume can be used to boot an Amazon EC2 instance (Amazon EBS-root AMIs only), and multiple Amazon EBS volumes can be attached to a single Amazon EC2 instance.
  • Amazon EBS volume can be attached to a Single EC2 instance only at any point of time
  • Amazon EBS provides ability to take point-in-time snapshots, which are persisted in S3. These snapshots can be used to instantiate new EC2 volumes and to protect data for long-term durability
  • EBS snapshots can be copied across AWS region as well, making it easier to leverage multiple AWS regions for geographical expansion, data center migration and disaster recovery

Ideal Usage Patterns

  • Amazon EBS is meant for data that changes relatively frequently and requires long-term persistence.
  • Amazon EBS volume provide access to raw block-level storage and is particularly well-suited for use as the primary storage for a database or file system
  • Amazon EBS Provisioned IOPS volumes are particularly well-suited for use with databases applications that require a high and consistent rate of random disk reads and writes

Anti-Patterns

  • Temporary Storage
    • Amazon EBS volume persists independent of the attached EC2 life cycle.
    • For temporary storage such as caches, buffers, queues etc it is better to use local instance store volumes, SQS or Elastic Cache
  • Highly-durable storage
    • Amazon EBS volumes with less than 20 GB of modified data since the last snapshot are designed for between 99.5% and 99.9% annual durability; volumes with more modified data can be expected to have proportionally lower durability
    • For highly-durable storage, use Amazon S3 or Amazon Glacier which provides 99.999999999% annual durability per object
  • Static data or web content
    • For static web content, where data infrequently changes, EBS with EC2 would require a web server to serve the pages.
    • Amazon S3 may represent a more cost-effective and scalable solution for storing this fixed information and is served directly out of S3.

Performance

  • Amazon EBS provides two volume types: standard volumes and Provisioned IOPS volumes which differ in performance characteristics and pricing model, allowing you to tailor your storage performance and cost to the needs of your applications.
  • EBS Volumes can be attached and striped across multiple similarly-provisioned Amazon EBS volumes using RAID 0 or logical volume manager software, thus aggregating available IOPs, total volume throughput, and total volume size.
  • Standard volumes offer cost effective storage for applications with moderate or bursty I/O requirements. Standard volumes are also well suited for use as boot volumes, where the burst capability provides fast instance start-up times.
  • Provisioned IOPS volumes are designed to deliver predictable, high performance for I/O intensive workloads such as databases. With Provisioned IOPS, you specify an IOPS rate when creating a volume, and then Amazon EBS provisions that rate for the lifetime of the volume.
  • As Amazon EBS volumes are network-attached devices, other network I/O performed by the instance, as well as the total load on the shared network, can affect individual Amazon EBS volume performance. EBS optimized instances can be launched which deliver dedicated throughput between Amazon EC2 and Amazon EBS and enables instances to fully utilize the Provisioned IOPS on an Amazon EBS volume,
  • Each separate Amazon EBS volume can be configured as Amazon EBS standard or Amazon EBS Provisioned IOPS as needed. Alternatively, you could stripe your data

Durability & Availability

  • Amazon EBS volumes are designed to be highly available and reliable
  • EBS volume data is replicated across multiple servers in a single Availability Zone to prevent the loss of data from the failure of any single component
  • Durability of your Amazon EBS volume depends on both the size of your volume and the amount of data that has changed since your last snapshot
  • Amazon EBS snapshots are incremental, point-in-time backups, containing only the data blocks changed since the last snapshot. Amazon EBS volumes that operate with 20 GB or less of modified data since their most recent snapshot can expect an annual failure rate (AFR) between 0.1% and 0.5%. Amazon EBS volumes with more than 20 GB of modified data since the last snapshot should expect higher failure rates that are roughly proportional to the increase in modified data.
  • It is therefore recommended to create frequent snapshots to maximize both durability and availability of their Amazon EBS data
  • Amazon EBS snapshots provides an easy-to-use disk clone or disk image mechanism for backup, sharing, and disaster-recovery.

Cost Model

  • Amazon EBS pricing has 3 components: provisioned storage, I/O requests and snapshot storage
  • Standard volumes are charged per GB-month of provisioned storage and per million I/O requests
  • EBS Provisioned IOPS volumes are charged per GB-month of provisioned storage and per Provisioned IOPS-month
  • For both volumes, Amazon EBS snapshots are charged per GB-month of data stored. Amazon EBS snapshot copy is charged for the data transferred between regions, and for the standard Amazon EBS snapshot charges in the destination region.
  • For an EBS volume all storage is allocated at the time of volume creation, and that you are charged for this allocated storage even if you don’t write data to it.
  • For Amazon EBS snapshots, you are charged only for storage actually used (consumed). Note that Amazon EBS snapshots are incremental and compressed, so the storage used in any snapshot is generally much less than the storage consumed on an Amazon EBS volume

Scalability and Elasticity

  • Amazon EBS volumes can easily and rapidly be provisioned and released to scale in and out with the changing total storage demands
  • Amazon EBS volumes cannot be resized, and if additional storage is needed either
    • An additional volume can be attached
    • Create a snapshot and create a new volume from the snapshot with a higher volume size
  • NOTE : With latest AWS enhancements EBS volumes can be resize and enlarged.

Interfaces

  • Amazon offers management APIs for Amazon EBS in both SOAP and REST formats which can be used to create, delete, describe, attach, and detach Amazon EBS volumes for your Amazon EC2 instances as well as to create, delete, and describe snapshots from Amazon EBS to Amazon S3; and to copy snapshots from one region to another
  • Amazon also offers the same capabilities through AWS Management Console

Instance Store Volumes

  • Instance Store volumes, also referred to as Ephemeral Storage, provide temporary block-level storage and consists of preconfigured and pre-attached block of disk storage on the same physical server as the EC2 instance
  • Instance storage’s amount of disk storage depends on the Instance type and larger instances provide both more and larger instance store volumes. Smaller instance types such as micro instances can only be launched with EBS volumes.
  • Storage optimized instances provides special purpose instance storage targeted to specific uses case for e.g. HI1 provides very fast solid-state drive (SSD) backed instance storage are capable of supporting over 120,000 random read IOPS, and are optimized for very high random I/O performance and low cost per IOPS. While, HS1 instances are optimized for very high storage density, low storage cost, and high sequential I/O performance.
  • Instance store volumes, unlike EBS volumes, cannot be detached or attached to another instance

Ideal Usage Patterns

  • Amazon EC2 local instance store volumes are fast, free (that is, included in the price of the Amazon EC2 instance) “scratch volumes” best suited for storing temporary data that is continually changing, such as buffers, caches, scratch data or can easily be regenerated, or data that is replicated for durability
  • High I/O instances provide instance store volumes backed by SSD, and are ideally suited for many high performance database workloads. for e.g. applications include NoSQL databases like Cassandra and MongoDB.
  • High storage instances support much higher storage density per Amazon EC2 instance, and are ideally suited for applications that benefit from high sequential I/O performance across very large datasets. for e.g. applications include data warehouses, Hadoop storage nodes, seismic analysis, cluster file systems, etc.

Anti-Patterns

  • Persistent storage
    • For persistent virtual disk storage similar to a physical disk drive for files or other data that must persist longer than the lifetime of a single Amazon EC2 instance, Amazon EBS volumes or Amazon S3 are more appropriate.
  • Relational database storage
    • In most cases, relational databases require storage that persists beyond the lifetime of a single Amazon EC2 instance, making Amazon EBS volumes the natural choice.
  • Shared storage
    • Instance store volumes are dedicated to a single Amazon EC2 instance, and cannot be shared with other systems or users. If you need storage that can be detached from one instance and attached to a different instance, or if you need the ability to share data easily, Amazon S3 or Amazon EBS volumes are the better choice.
  • Snapshots
    • If you need the convenience, long-term durability, availability, and shareability of point-in-time disk snapshots, Amazon EBS volumes are a better choice.

Performance

  • Non-SSD-based instance store volumes in most Amazon EC2 instance families have performance characteristics similar to standard Amazon EBS volumes.
  • Amazon EC2 instance virtual machine and the local instance store volumes are located in the same physical server, interaction with the storage is very fast, particularly for sequential access.
  • To further increase aggregate IOPS, or to improve sequential disk throughput, multiple instance store volumes can be grouped together using RAID 0 (disk striping) software.
  • Because the bandwidth to the disks is not limited by the network, aggregate sequential throughput for multiple instance volumes can be higher than for the same number of Amazon EBS volumes.
  • SSD instance store volumes in the Amazon EC2 high I/O instances provide from tens of thousands to hundreds of thousands of low-latency, random 4 KB random IOPS.
    Because of the I/O characteristics of SSD devices, write performance can be variable.
  • Instance store volumes on Amazon EC2 high storage instances provide very high storage density and high sequential read and write performance. High storage instances are capable of delivering 2.6 GB/sec of sequential read and write performance when using a block size of 2 MB.

Durability and Availability

  • Amazon EC2 local instance store volumes are not intended to be used as durable disk storage and they persists only during the life of the associate Amazon EC2 instance

Cost Model

  • Cost of the Amazon EC2 instance includes any local instance store volumes, if the instance type provides them.
  • While there is no additional charge for data storage on local instance store volumes, note that data transferred to and from Amazon EC2 instance store volumes from other Availability Zones or outside of an Amazon EC2 region may incur data transfer charges, and additional charges will apply for use of any persistent storage, such as Amazon S3, Amazon Glacier, Amazon EBS volumes, and Amazon EBS snapshots

Scalability and Elasticity

  • Local instance store volumes are tied to a particular Amazon EC2 instance, and are fixed in number and size for a given Amazon EC2 instance type, so the scalability and elasticity of this storage is tied to the number of Amazon EC2 instances.

Interfaces

  • Instance store volumes are specified using the block device mapping feature of the Amazon EC2 API and the AWS Management Console
  • To the Amazon EC2 instance, an instance store volume appears just like a local disk drive. To write to and read data from instance store volumes, use the native file system I/O interfaces of the chosen operating system.

[do_widget id=text-15]

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following provides the fastest storage medium?
    1. Amazon S3
    2. Amazon EBS using Provisioned IOPS (PIOPS)
    3. SSD Instance (ephemeral) store (SSD Instance Storage provides 100,000 IOPS on some instance types, much faster than any network-attached storage)
    4. AWS Storage Gateway

References

AWS Storage Options – S3 & Glacier

Amazon S3

  • highly-scalable, reliable, and low-latency data storage infrastructure at very low costs.
  • provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from within Amazon EC2 or from anywhere on the web.
  • allows you to write, read, and delete objects containing from 1 byte to 5 terabytes of data each.
  • number of objects you can store in an Amazon S3 bucket is virtually unlimited.
  • highly secure, supporting encryption at rest, and providing multiple mechanisms to provide fine-grained control of access to Amazon S3 resources.
  • highly scalable, allowing concurrent read or write access to Amazon S3 data by many separate clients or application threads.
  • provides data lifecycle management capabilities, allowing users to define rules to automatically archive Amazon S3 data to Amazon Glacier, or to delete data at end of life.

Ideal Use Cases

  • Storage & Distribution of static web content and media
    • frequently used to host static websites and provides a highly-available and highly-scalable solution for websites with only static content, including HTML files, images, videos, and client-side scripts such as JavaScript
    • works well for fast growing websites hosting data intensive, user-generated content, such as video and photo sharing sites as no storage provisioning is required
    • content can either be directly served from Amazon S3 since each object in Amazon S3 has a unique HTTP URL address
    • can also act as an Origin store for the Content Delivery Network (CDN) such as Amazon CloudFront
    • it works particularly well for hosting web content with extremely spiky bandwidth demands because of S3’s elasticity
  • Data Store for Large Objects
    • can be paired with RDS or NoSQL database and used to store large objects for e.g. file or objects, while the associated metadata for e.g. name, tags, comments etc. can be stored in RDS or NoSQL database where it can be indexed and queried providing faster access to relevant data
  • Data store for computation and large-scale analytics
    • commonly used as a data store for computation and large-scale analytics, such as analyzing financial transactions, clickstream analytics, and media transcoding.
    • data can be accessed from multiple computing nodes concurrently without being constrained by a single connection because of its horizontal scalability
  • Backup and Archival of critical data
    • used as a highly durable, scalable, and secure solution for backup and archival of critical data, and to provide disaster recovery solutions for business continuity.
    • stores objects redundantly on multiple devices across multiple facilities, it provides the highly-durable storage infrastructure needed for these scenarios.
    • it’s versioning capability is available to protect critical data from inadvertent deletion

Anti-Patterns

Amazon S3 has following Anti-Patterns where it is not an optimal solution

  • Dynamic website hosting
    • While Amazon S3 is ideal for hosting static websites, dynamic websites requiring server side interaction, scripting or database interaction cannot be hosted and should rather be hosted on Amazon EC2
  • Backup and archival storage
    • Data requiring long term archival storage with infrequent read access can be stored more cost effectively in Amazon Glacier
  • Structured Data Query
    • Amazon S3 doesn’t offer query capabilities, so to read an object the object name and key must be known. Instead pair up S3 with RDS or Dynamo DB to store, index and query metadata about Amazon S3 objects
    • NOTE – S3 now provides query capabilities and also Athena can be used
  • Rapidly Changing Data
    • Data that needs to updated frequently might be better served by a storage solution with lower read/write latencies, such as Amazon EBS volumes, RDS or Dynamo DB.
  • File System
    • Amazon S3 uses a flat namespace and isn’t meant to serve as a standalone, POSIX-compliant file system. However, by using delimiters (commonly either the ‘/’ or ‘’ character) you are able construct your keys to emulate the hierarchical folder structure of file system within a given bucket.

Performance

  • Access to Amazon S3 from within Amazon EC2 in the same region is fast.
  • Amazon S3 is designed so that server-side latencies are insignificant relative to Internet latencies.
  • Amazon S3 is also built to scale storage, requests, and users to support a virtually unlimited number of web-scale applications.
  • If Amazon S3 is accessed using multiple threads, multiple applications, or multiple clients concurrently, total Amazon S3 aggregate throughput will typically scale to rates that far exceed what any single server can generate or consume.

Durability & Availability

  • Amazon S3 storage provides provides the highest level of data durability and availability, by automatically and synchronously storing your data across both multiple devices and multiple facilities within the selected geographical region
  • Error correction is built-in, and there are no single points of failure. Amazon S3 is designed to sustain the concurrent loss of data in two facilities, making it very well-suited to serve as the primary data storage for mission-critical data.
  • Amazon S3 is designed for 99.999999999% (11 nines) durability per object and 99.99% availability over a one-year period.
  • Amazon S3 data can be protected from unintended deletions or overwrites using Versioning.
  • Versioning can be enabled with MFA (Multi Factor Authentication) Delete on the bucket, which would require two forms of authentication to delete an object
  • For Non Critical and Reproducible data for e.g. thumbnails, transcoded media etc., S3 Reduced Redundancy Storage (RRS) can be used, which provides a lower level of durability at a lower storage cost
  • RRS is designed to provide 99.99% durability per object over a given year. While RRS is less durable than standard Amazon S3, it is still designed to provide 400 times more durability than a typical disk drive

Cost Model

  • With Amazon S3, you pay only for what you use and there is no minimum fee.
  • Amazon S3 has three pricing components: storage (per GB per month), data transfer in or out (per GB per month), and requests (per n thousand requests per month).

Scalability & Elasticity

  • Amazon S3 has been designed to offer a very high level of scalability and elasticity automatically
  • Amazon S3 supports a virtually unlimited number of files in any bucket
  • Amazon S3 bucket can store a virtually unlimited number of bytes
  • Amazon S3 allows you to store any number of objects (files) in a single bucket, and Amazon S3 will automatically manage scaling and distributing redundant copies of your information to other servers in other locations in the same region, all using Amazon’s high-performance infrastructure.

Interfaces

  • Amazon S3 provides standards-based REST and SOAP web services APIs for both management and data operations.
  • NOTE – SOAP support over HTTP is deprecated, but it is still available over HTTPS. New Amazon S3 features will not be supported for SOAP. We recommend that you use either the REST API or the AWS SDKs.
  • Amazon S3 provides easier to use higher level toolkit or SDK in different languages (Java, .NET, PHP, and Ruby) that wraps the underlying APIs
  • Amazon S3 Command Line Interface (CLI) provides a set of high-level, Linux-like Amazon S3 file commands for common operations, such as ls, cp, mv, sync, etc. They also provide the ability to perform recursive uploads and downloads using a single folder-level Amazon S3 command, and supports parallel transfers.
  • AWS Management Console provides the ability to easily create and manage Amazon S3 buckets, upload and download objects, and browse the contents of your Amazon S3 buckets using a simple web-based user interface
  • All interfaces provide the ability to store Amazon S3 objects (files) in uniquely-named buckets (top-level folders), with each object identified by an unique Object key within that bucket.

Glacier

  • extremely low-cost storage service that provides highly secure, durable, and flexible storage for data backup and archival
  • can reliably store their data for as little as $0.01 per gigabyte per month.
  • to offload the administrative burdens of operating and scaling storage to AWS such as capacity planning, hardware provisioning, data replication, hardware failure detection and repair, or time consuming hardware migrations
  • Data is stored in Amazon Glacier as Archives where an archive can represent a single file or multiple files combined into a single archive
  • Archives are stored in Vaults for which the access can be controlled through IAM
  • Retrieving archives from Vaults require initiation of a job and can take anywhere around 3-5 hours
  • Amazon Glacier integrates seamlessly with Amazon S3 by using S3 data lifecycle management policies to move data from S3 to Glacier
  • AWS Import/Export can also be used to accelerate moving large amounts of data into Amazon Glacier using portable storage devices for transport

Ideal Usage Patterns

  • Amazon Glacier is ideally suited for long term archival solution for infrequently accessed data with archiving offsite enterprise information, media assets, research and scientific data, digital preservation and magnetic tape replacement

Anti-Patterns

Amazon Glacier has following Anti-Patterns where it is not an optimal solution

  • Rapidly changing data
    • Data that must be updated very frequently might be better served by a storage solution with lower read/write latencies such as Amazon EBS or a Database
  • Real time access
    • Data stored in Glacier can not be accessed at real time and requires an initiation of a job for object retrieval with retrieval times ranging from 3-5 hours. If immediate access is needed, Amazon S3 is a better choice.

Performance

  • Amazon Glacier is a low-cost storage service designed to store data that is infrequently accessed and long lived.
  • Amazon Glacier jobs typically complete in 3 to 5 hours

Durability and Availability

  • Amazon Glacier redundantly stores data in multiple facilities and on multiple devices within each facility
  • Amazon Glacier is designed to provide average annual durability of 99.999999999% (11 nines) for an archive
  • Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives.
  • Amazon Glacier also performs regular, systematic data integrity checks and is built to be automatically self-healing.

Cost Model

  • Amazon Glacier has three pricing components: storage (per GB per month), data transfer out (per GB per month), and requests (per thousand UPLOAD and RETRIEVAL requests per month).
  • Amazon Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time and allows you to retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. Any additional amount of data retrieved is charged per GB
  • Amazon Glacier also charges a pro-rated charge (per GB) for items deleted prior to 90 days

Scalability & Elasticity

  • A single archive is limited to 40 TBs, but there is no limit to the total amount of data you can store in the service.
  • Amazon Glacier scales to meet your growing and often unpredictable storage requirements whether you’re storing petabytes or gigabytes, Amazon Glacier automatically scales your storage up or down as needed.

Interfaces

  • Amazon Glacier provides a native, standards-based REST web services interface, as well as Java and .NET SDKs.
  • AWS Management Console or the Amazon Glacier APIs can be used to create vaults to organize the archives in Amazon Glacier.
  • Amazon Glacier APIs can be used to upload and retrieve archives, monitor the status of your jobs and also configure your vault to send you a notification via Amazon Simple Notification Service (Amazon SNS) when your jobs complete.
  • Amazon Glacier can be used as a storage class in Amazon S3 by using object lifecycle management to provide automatic, policy-driven archiving from Amazon S3 to Amazon Glacier.
  • Amazon S3 api provides a RESTORE operation and the retrieval process takes the same 3-5 hours
  • On retrieval, a copy of the retrieved object is placed in Amazon S3 RRS storage for a specified retention period; the original archived object remains stored in Amazon Glacier and you are charged for both the storage.
  • When using Amazon Glacier as a storage class in Amazon S3, use the Amazon S3 APIs, and when using “native” Amazon Glacier, you use the Amazon Glacier APIs
  • Objects archived to Amazon Glacier via Amazon S3 can only be listed and retrieved via the Amazon S3 APIs or the AWS Management Console—they are not visible as archives in an Amazon Glacier vault.

[do_widget id=text-15]

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You want to pass queue messages that are 1GB each. How should you achieve this?
    1. Use Kinesis as a buffer stream for message bodies. Store the checkpoint id for the placement in the Kinesis Stream in SQS.
    2. Use the Amazon SQS Extended Client Library for Java and Amazon S3 as a storage mechanism for message bodies. (Amazon SQS messages with Amazon S3 can be useful for storing and retrieving messages with a message size of up to 2 GB. To manage Amazon SQS messages with Amazon S3, use the Amazon SQS Extended Client Library for Java. Refer link)
    3. Use SQS’s support for message partitioning and multi-part uploads on Amazon S3.
    4. Use AWS EFS as a shared pool storage medium. Store filesystem pointers to the files on disk in the SQS message bodies.
  2. Company ABCD has recently launched an online commerce site for bicycles on AWS. They have a “Product” DynamoDB table that stores details for each bicycle, such as, manufacturer, color, price, quantity and size to display in the online store. Due to customer demand, they want to include an image for each bicycle along with the existing details. Which approach below provides the least impact to provisioned throughput on the “Product” table?
    1. Serialize the image and store it in multiple DynamoDB tables
    2. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
    3. Add an image data type to the “Product” table to store the images in binary format
    4. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

References

AWS S3 Best Practices – Certification

S3 Best Practices

Performance

Multiple Concurrent PUTs/GETs

  • S3 scales to support very high request rates. If the request rate grows steadily, S3 automatically partitions the buckets as needed to support higher request rates.
  • S3 can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.
  • If the typical workload involves only occasional bursts of 100 requests per second and less than 800 requests per second, AWS scales and handle it.
  • If the typical workload involves request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, its recommended to open a support case to prepare for the workload and avoid any temporary limits on your request rate.
  • S3 best practice guidelines can be applied only if you are routinely processing 100 or more requests per second
  • Workloads that include a mix of request types
    • If the request workload are typically a mix of GET, PUT, DELETE, or GET Bucket (list objects), choosing appropriate key names for the objects ensures better performance by providing low-latency access to the S3 index
    • This behavior is driven by how S3 stores key names.
      • S3 maintains an index of object key names in each AWS region.
      • Object keys are stored lexicographically (UTF-8 binary ordering) across multiple partitions in the index i.e. S3 stores key names in alphabetical order.
      • Object keys are stored in across multiple partitions in the index and the key name dictates which partition the key is stored in
      • Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that S3 will target a specific partition for a large number of keys, overwhelming the I/O capacity of the partition.
    • Introduce some randomness in the key name prefixes, the key names, and the I/O load, will be distributed across multiple index partitions.
    • It also ensures scalability regardless of the number of requests sent per second.
  • Workloads that are GET-intensive
    • Cloudfront can be used for performance optimization and can help by
      • distributing content with low latency and high data transfer rate.
      • caching the content and thereby reducing the number of direct requests to S3
      • providing multiple endpoints (Edge locations) for data availability
      • available in two flavors as Web distribution or RTMP distribution
    • To fast data transport over long distances between a client and an S3 bucket, use Amazon S3 Transfer Acceleration. Transfer Acceleration uses the globally distributed edge locations in CloudFront to accelerate data transport over geographical distances

PUTs/GETs for Large Objects

  • AWS allows Parallelizing the PUTs/GETs request to improve the upload and download performance as well as the ability to recover in case it fails
  • For PUTs, Multipart upload can help improve the uploads by
    • performing multiple uploads at the same time and maximizing network bandwidth utilization
    • quick recovery from failures, as only the part that failed to upload needs to be re-uploaded
    • ability to pause and resume uploads
    • begin an upload before the Object size is known
  • For GETs, range http header can help to improve the downloads by
    • allowing the object to be retrieved in parts instead of the whole object
    • quick recovery from failures, as only the part that failed to download needs to be retried.

List Operations

  • Object key names are stored lexicographically in Amazon S3 indexes, making it hard to sort and manipulate the contents of LIST
  • S3 maintains a single lexicographically sorted list of indexes
  • Build and maintain Secondary Index outside of S3 for e.g. DynamoDB or RDS to store, index and query objects metadata rather then performing operations on S3

Security

  • Use Versioning
    • can be used to protect from unintended overwrites and deletions
    • allows the ability to retrieve and restore deleted objects or rollback to previous versions
  • Enable additional security by configuring a bucket to enable MFA (Multi-Factor Authentication) delete
  • Versioning does not prevent Bucket deletion and must be backed up, as if accidentally or maliciously deleted the data is lost
  • Use Cross Region replication feature to backup data to a different region
  • When using VPC with S3, use VPC S3 endpoints as
    • are horizontally scaled, redundant, and highly available VPC components
    • help establish a private connection between VPC and S3 and the traffic never leaves the Amazon network

Cost

  • Optimize S3 storage cost by selecting an appropriate storage class for objects
  • Configure appropriate lifecycle management rules to move objects to different storage classes and expire them

Tracking

  • Use Event Notifications to be notified for any put or delete request on the S3 objects
  • Use CloudTrail, which helps capture specific API calls made to S3 from the AWS account and delivers the log files to an S3 bucket
  • Use CloudWatch to monitor the Amazon S3 buckets, tracking metrics such as object counts and bytes stored and configure appropriate actions

[do_widget id=text-15]

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?
    1. Increase your network bandwidth to provide faster throughput to S3
    2. Upload the files in parallel to S3 using multipart upload
    3. Pack all files into a single archive, upload it to S3, then extract the files in AWS
    4. Use AWS Import/Export to transfer the video files
  2. You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?
    1. Use multi-part upload.
    2. Add a random prefix to the key names.
    3. Amazon S3 will automatically manage performance at this scale.
    4. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names
  3. You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?
    1. Enable enhanced networking
    2. Use Amazon S3 multipart upload
    3. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
    4. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance
  4. Which of the following methods gives you protection against accidental loss of data stored in Amazon S3? (Choose 2)
    1. Set bucket policies to restrict deletes, and also enable versioning
    2. By default, versioning is enabled on a new bucket so you don’t have to worry about it (Not enabled by default)
    3. Build a secondary index of your keys to protect the data (improves performance only)
    4. Back up your bucket to a bucket owned by another AWS account for redundancy
  5. A startup company hired you to help them build a mobile application that will ultimately store billions of image and videos in Amazon S3. The company is lean on funding, and wants to minimize operational costs, however, they have an aggressive marketing plan, and expect to double their current installation base every six months. Due to the nature of their business, they are expecting sudden and large increases to traffic to and from S3, and need to ensure that it can handle the performance needs of their application. What other information must you gather from this customer in order to determine whether S3 is the right option?
    1. You must know how many customers that company has today, because this is critical in understanding what their customer base will be in two years. (No. of customers do not matter)
    2. You must find out total number of requests per second at peak usage.
    3. You must know the size of the individual objects being written to S3 in order to properly design the key namespace. (Size does not relate to the key namespace design but the count does)
    4. In order to build the key namespace correctly, you must understand the total amount of storage needs for each S3 bucket. (S3 provided unlimited storage the key namespace design would depend on the number)
  6. A document storage company is deploying their application to AWS and changing their business model to support both free tier and premium tier users. The premium tier users will be allowed to store up to 200GB of data and free tier customers will be allowed to store only 5GB. The customer expects that billions of files will be stored. All users need to be alerted when approaching 75 percent quota utilization and again at 90 percent quota use. To support the free tier and premium tier users, how should they architect their application?
    1. The company should utilize an amazon simple workflow service activity worker that updates the users data counter in amazon dynamo DB. The activity worker will use simple email service to send an email if the counter increases above the appropriate thresholds.
    2. The company should deploy an amazon relational data base service relational database with a store objects table that has a row for each stored object along with size of each object. The upload server will query the aggregate consumption of the user in questions (by first determining the files store by the user, and then querying the stored objects table for respective file sizes) and send an email via Amazon Simple Email Service if the thresholds are breached. (Good Approach to use RDS but with so many objects might not be a good option)
    3. The company should write both the content length and the username of the files owner as S3 metadata for the object. They should then create a file watcher to iterate over each object and aggregate the size for each user and send a notification via Amazon Simple Queue Service to an emailing service if the storage threshold is exceeded. (List operations on S3 not feasible)
    4. The company should create two separated amazon simple storage service buckets one for data storage for free tier users and another for data storage for premium tier users. An amazon simple workflow service activity worker will query all objects for a given user based on the bucket the data is stored in and aggregate storage. The activity worker will notify the user via Amazon Simple Notification Service when necessary (List operations on S3 not feasible as well as SNS does not address email requirement)
  7. Your company host a social media website for storing and sharing documents. the web application allow users to upload large files while resuming and pausing the upload as needed. Currently, files are uploaded to your php front end backed by Elastic Load Balancing and an autoscaling fleet of amazon elastic compute cloud (EC2) instances that scale upon average of bytes received (NetworkIn) After a file has been uploaded. it is copied to amazon simple storage service(S3). Amazon Ec2 instances use an AWS Identity and Access Management (AMI) role that allows Amazon s3 uploads. Over the last six months, your user base and scale have increased significantly, forcing you to increase the auto scaling groups Max parameter a few times. Your CFO is concerned about the rising costs and has asked you to adjust the architecture where needed to better optimize costs. Which architecture change could you introduce to reduce cost and still keep your web application secure and scalable?
    1. Replace the Autoscaling launch Configuration to include c3.8xlarge instances; those instances can potentially yield a network throughput of 10gbps. (no info of current size and might increase cost)
    2. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic to directly upload the file to amazon s3 using the given credentials and S3 Prefix. (will not provide the ability to handle pause and restarts)
    3. Re-architect your ingest pattern, and move your web application instances into a VPC public subnet. Attach a public IP address for each EC2 instance (using the auto scaling launch configuration settings). Use Amazon Route 53 round robin records set and http health check to DNS load balance the app request this approach will significantly reduce the cost by bypassing elastic load balancing. (ELB is not the bottleneck)
    4. Re-architect your ingest pattern, have the app authenticate against your identity provider as a broker fetching temporary AWS credentials from AWS Secure token service (GetFederation Token). Securely pass the credentials and s3 endpoint/prefix to your app. Implement client-side logic that used the S3 multipart upload API to directly upload the file to Amazon s3 using the given credentials and s3 Prefix. (multipart allows one to start uploading directly to S3 before the actual size is known or complete data is downloaded)
  8. If an application is storing hourly log files from thousands of instances from a high traffic web site, which naming scheme would give optimal performance on S3?
    1. Sequential
    2. instanceID_log-HH-DD-MM-YYYY
    3. instanceID_log-YYYY-MM-DD-HH
    4. HH-DD-MM-YYYY-log_instanceID (HH will give some randomness to start with instead of instaneId where the first characters would be i-)
    5. YYYY-MM-DD-HH-log_instanceID

Reference

AWS EC2 Storage – Certification

EC2 Storage Overview

  • Amazon EC2 provides flexible, cost effective and easy-to-use EC2 storage options with a unique combination of performance and durability
    • Amazon Elastic Block Store (EBS)
    • Amazon EC2 Instance Store
    • Amazon Simple Storage Service (S3)
  • While EBS and Instance store are Block level, Amazon S3 is an Object level storage

EC2 Storage Options - EBS, S3 & Instance Store

Storage Types

Amazon EBS

More details @ AWS EC2 EBS Storage

Amazon Instance Store

More details @ AWS EC2 Instance Store Storage

Amazon EBS vs Instance Store

More detailed @ Comparision of EBS vs Instance Store

Amazon S3

More details @ AWS S3

Block Device Mapping

  • A block device is a storage device that moves data in sequences of bytes or bits (blocks) and supports random access and generally use buffered I/O for e.g. hard disks, CD-ROM etc
  • Block devices can be physically attached to a computer (like an instance store volume) or can be accessed remotely as if it was attached (like an EBS volume)
  • Block device mapping defines the block devices to be attached to an instance, which can either be done while creation of an AMI or when an instance is launched
  • Block device must be mounted on the instance, after being attached to the instance, to be able to be accessed
  • When a block device is detached from an instance, it is unmounted by the operating system and you can no longer access the storage device.
  • Additional Instance store volumes can be attached only when the instance is launched while EBS volumes can be attached to an running instance.
  • View the block device mapping for an instance, only the EBS volumes can be seen, not the instance store volumes.Instance metadata can be used to query the complete block device mapping.

Public Data Sets

  • Amazon Web Services provides a repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
  • Amazon stores the data sets at no charge to the community and, as with all AWS services, you pay only for the compute and storage you use for your own applications.

[do_widget id=text-15]

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. When you view the block device mapping for your instance, you can see only the EBS volumes, not the instance store volumes.
    1. Depends on the instance type
    2. FALSE
    3. Depends on whether you use API call
    4. TRUE
  1. Amazon EC2 provides a repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. What is the monthly charge for using the public data sets?
    1. A 1 time charge of 10$ for all the datasets.
    2. 1$ per dataset per month
    3. 10$ per month for all the datasets
    4. There is no charge for using the public data sets
  1. How many types of block devices does Amazon EC2 support?
    1. 2
    2. 4
    3. 3
    4. 1

References

AWS Elastic Block Store Storage – EBS

EC2 Elastic Block Storage – EBS Overview

  • Amazon EBS provides highly available, reliable, durable, block-level storage volumes that can be attached to a running instance
  • EBS as a primary storage device is recommended for data that requires frequent and granular updates for e.g. running a database or filesystems
  • An EBS volume behaves like a raw, unformatted, external block device that can be attached to a single EC2 instance at a time
  • EBS volume persists independently from the running life of an instance.
  • An EBS volume can be attached to any instance within the same Availability Zone, and can be used like any other physical hard drive.
  • EBS volumes allows encryption using the EBS encryption feature. All data stored at rest, disk I/O, and snapshots created from the volume are encrypted. Encryption occurs on the EC2 instance, providing encryption of data-in-transit from EC2 to the EBS volume
  • EBS volumes can be backed up by creating a snapshot of the volume, which is stored in S3.  EBS volumes can be created from a snapshot can be attached to an another instance within the same region
  • EBS volumes are created in a specific Availability Zone, and can then be attached to any instances in that same Availability Zone. To make a volume available outside of the Availability Zone, create a snapshot and restore that snapshot to a new volume anywhere in that region
  • Snapshots can also be copied to other regions and then restored to new volumes, making it easier to leverage multiple AWS regions for geographical expansion, data center migration, and disaster recovery.
  • General Purpose (SSD) volumes support up to 10,000 16000 IOPS and 160 250 MB/s of throughput and Provisioned IOPS (SSD) volumes support up to 20,000 64000 IOPS and 320 1000 MB/s of throughput.
  • EBS Magnetic volumes can be created from 1 GiB to 1 TiB in size; EBS General Purpose (SSD) and Provisioned IOPS (SSD) volumes can be created up to 16 TiB in size

EBS Benefits

  • Data Availability
    • EBS volume is automatically replicated in an Availability Zone to prevent data loss due to failure of any single hardware component.
  • Data Persistence
    • persists independently of the running life of an EC2 instance
    • persists when an instance is stopped and started or rebooted
    • Root EBS volume is deleted, by default, on Instance termination but can be modified by changing the DeleteOnTermination flag
    • All attached volumes persist, by default, on instance termination
  • Data Encryption
    • can be encrypted by EBS encryption feature
    • EBS encryption uses 256-bit Advanced Encryption Standard algorithms (AES-256) and an Amazon-managed key infrastructure.
    • Encryption occurs on the server that hosts the EC2 instance, providing encryption of data-in-transit from the EC2 instance to EBS storage
    • Snapshots of encrypted EBS volumes are automatically encrypted.
  • Snapshots
    • EBS provides the ability to create snapshots (backups) of any EBS volume and write a copy of the data in the volume to Amazon S3, where it is stored redundantly in multiple Availability Zones
    • Snapshots can be used to create new volumes, increase the size of the volumes or replicate data across Availability Zones or regions
    • Snapshots are incremental backups and store only the data that was changed from the time the last snapshot was taken.
    • Snapshots size can probably be smaller then the volume size as the data is compressed before being saved to S3
    • Even though snapshots are saved incrementally, the snapshot deletion process is designed so that you need to retain only the most recent snapshot in order to restore the volume.

EBS Volume Types

Refer blog post @ EBS Volume Types

EBS Volume

EBS Volume Creation

  • EBS volume can be created either
    • Creating New volumes
      • Completely new from console or command line tools and can then be attached to an EC2 instance in the same Availability Zone
    • Restore volume from Snapshots
      • EBS volumes can also be restored from a previously created snapshots
      • New volumes created from existing EBS snapshots load lazily in the background.
      • There is no need to wait for all of the data to transfer from S3 to the EBS volume before the attached instance can start accessing the volume and all its data.
      • If the instance accesses the data that hasn’t yet been loaded, the volume immediately downloads the requested data from S3, and continues loading the rest of the data in the background.
      • EBS volumes restored from encrypted snapshots are encrypted, by default
    • EBS volumes can be created and attached to a running EC2 instance by specifying a block device mapping

EBS Volume Detachment

  • EBS volumes can be detached from an instance explicitly or by terminating the instance
  • EBS root volumes can be detached by stopping the instance
  • EBS data volumes, attached to a running instance, can be detached by unmounting the volume from the instance first.
  • If the volume is detached without being unmounted, it might result the volume being stuck in the busy state and could possibly damaged the file system or the data it contains
  • EBS volume can be force detached from an instance, using the Force Detach option, but it might lead to data loss or a corrupted file system as the instance does not get an opportunity to flush file system caches or file system metadata
  • Charges are still incurred for the volume after its detachment

EBS Volume Deletion

  • EBS volume deletion would wipe out its data and the volume can’t be attached to any instance. However, it can be backed up before deletion using EBS snapshots

EBS Volume Snapshots

Refer blog post @ EBS Snapshot

EBS Encryption

  • EBS volumes can be created and attached to a supported instance type, and supports following types of data
    • Data at rest
    • All disk I/O i.e All data moving between the volume and the instance
    • All snapshots created from the volume
    • All volumes created from those snapshots
  • Encryption occurs on the servers that host EC2 instances, providing encryption of data-in-transit from EC2 instances to EBS storage.
  • EBS encryption is supported with all EBS volume types (gp2, io1, st1 and sc1), and has the same IOPS performance on encrypted volumes as with unencrypted volumes, with a minimal effect on latency
  • EBS encryption is only available on select instance types
  • Snapshots of encrypted volumes and volumes created from encrypted snapshots are automatically encrypted using the same volume encryption key
  • EBS encryption uses AWS Key Management Service (AWS KMS) customer master keys (CMK) when creating encrypted volumes and any snapshots created from the encrypted volumes.
  • EBS volumes can be encrypted using either
    • a default CMK is created for you automatically.
    • a CMK that you created separately using AWS KMS, giving you more flexibility, including the ability to create, rotate, disable, define access controls, and audit the encryption keys used to protect your data.
  • Public or shared snapshots of encrypted volumes are not supported, because other accounts would be able to decrypt your data and needs to be migrated to an unencrypted status before sharing.
  • Existing unencrypted volumes cannot be encrypted directly, but can be migrated by
    • create a unencrypted snapshot from the volume
    • create an encrypted copy of unencrypted snapshot
    • create an encrypted volume from the encrypted snapshot
  • Encrypted snapshot can be created from a unencrypted snapshot by create an encrypted copy of the unencrypted snapshot
  • Unencrypted volume cannot be created from an encrypted volume directly but needs to be migrated

EBS Performance

Refer blog Post @ EBS Performance

EBS vs Instance Store

Refer blog post @ EBS vs Instance Store

[do_widget id=text-15]

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. _____ is a durable, block-level storage volume that you can attach to a single, running Amazon EC2 instance.
    1. Amazon S3
    2. Amazon EBS
    3. None of these
    4. All of these
  2. Which Amazon storage do you think is the best for my database-style applications that frequently encounter many random reads and writes across the dataset?
    1. None of these.
    2. Amazon Instance Storage
    3. Any of these
    4. Amazon EBS
  3. What does Amazon EBS stand for?
    1. Elastic Block Storage
    2. Elastic Business Server
    3. Elastic Blade Server
    4. Elastic Block Store
  4. Which Amazon Storage behaves like raw, unformatted, external block devices that you can attach to your instances?
    1. None of these.
    2. Amazon Instance Storage
    3. Amazon EBS
    4. All of these
  5. A user has created numerous EBS volumes. What is the general limit for each AWS account for the maximum number of EBS volumes that can be created?
    1. 10000
    2. 5000
    3. 100
    4. 1000
  6. Select the correct set of steps for exposing the snapshot only to specific AWS accounts
    1. Select Public for all the accounts and check mark those accounts with whom you want to expose the snapshots and click save.
    2. Select Private and enter the IDs of those AWS accounts, and click Save.
    3. Select Public, enter the IDs of those AWS accounts, and click Save.
    4. Select Public, mark the IDs of those AWS accounts as private, and click Save.
  7. If an Amazon EBS volume is the root device of an instance, can I detach it without stopping the instance?
    1. Yes but only if Windows instance
    2. No
    3. Yes
    4. Yes but only if a Linux instance
  8. Can we attach an EBS volume to more than one EC2 instance at the same time?
    1. Yes
    2. No
    3. Only EC2-optimized EBS volumes.
    4. Only in read mode.
  9. Do the Amazon EBS volumes persist independently from the running life of an Amazon EC2 instance?
    1. Only if instructed to when created
    2. Yes
    3. No
  10. Can I delete a snapshot of the root device of an EBS volume used by a registered AMI?
    1. Only via API
    2. Only via Console
    3. Yes
    4. No
  11. By default, EBS volumes that are created and attached to an instance at launch are deleted when that instance is terminated. You can modify this behavior by changing the value of the flag_____ to false when you launch the instance
    1. DeleteOnTermination
    2. RemoveOnDeletion
    3. RemoveOnTermination
    4. TerminateOnDeletion
  12. Your company policies require encryption of sensitive data at rest. You are considering the possible options for protecting data while storing it at rest on an EBS data volume, attached to an EC2 instance. Which of these options would allow you to encrypt your data at rest? (Choose 3 answers)
    1. Implement third party volume encryption tools
    2. Do nothing as EBS volumes are encrypted by default
    3. Encrypt data inside your applications before storing it on EBS
    4. Encrypt data using native data encryption drivers at the file system level
    5. Implement SSL/TLS for all services running on the server
  13. Which of the following are true regarding encrypted Amazon Elastic Block Store (EBS) volumes? Choose 2 answers
    1. Supported on all Amazon EBS volume types
    2. Snapshots are automatically encrypted
    3. Available to all instance types
    4. Existing volumes can be encrypted
    5. Shared volumes can be encrypted
  14. How can you secure data at rest on an EBS volume?
    1. Encrypt the volume using the S3 server-side encryption service
    2. Attach the volume to an instance using EC2’s SSL interface.
    3. Create an IAM policy that restricts read and write access to the volume.
    4. Write the data randomly instead of sequentially.
    5. Use an encrypted file system on top of the EBS volume
  15. A user has deployed an application on an EBS backed EC2 instance. For a better performance of application, it requires dedicated EC2 to EBS traffic. How can the user achieve this?
    1. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    2. Launch the EC2 instance as EBS enhanced with PIOPS EBS
    3. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    4. Launch the EC2 instance as EBS optimized with PIOPS EBS
  16. A user is trying to launch an EBS backed EC2 instance under free usage. The user wants to achieve encryption of the EBS volume. How can the user encrypt the data at rest?
    1. Use AWS EBS encryption to encrypt the data at rest (Encryption is allowed on micro instances)
    2. User cannot use EBS encryption and has to encrypt the data manually or using a third party tool (Encryption was not allowed on micro instances before)
    3. The user has to select the encryption enabled flag while launching the EC2 instance
    4. Encryption of volume is not available as a part of the free usage tier
  17. A user is planning to schedule a backup for an EBS volume. The user wants security of the snapshot data. How can the user achieve data encryption with a snapshot?
    1. Use encrypted EBS volumes so that the snapshot will be encrypted by AWS
    2. While creating a snapshot select the snapshot with encryption
    3. By default the snapshot is encrypted by AWS
    4. Enable server side encryption for the snapshot using S3
  18. A user has launched an EBS backed EC2 instance. The user has rebooted the instance. Which of the below mentioned statements is not true with respect to the reboot action?
    1. The private and public address remains the same
    2. The Elastic IP remains associated with the instance
    3. The volume is preserved
    4. The instance runs on a new host computer
  19. A user has launched an EBS backed EC2 instance. What will be the difference while performing the restart or stop/start options on that instance?
    1. For restart it does not charge for an extra hour, while every stop/start it will be charged as a separate hour
    2. Every restart is charged by AWS as a separate hour, while multiple start/stop actions during a single hour will be counted as a single hour
    3. For every restart or start/stop it will be charged as a separate hour
    4. For restart it charges extra only once, while for every stop/start it will be charged as a separate hour
  20. A user has launched an EBS backed instance. The user started the instance at 9 AM in the morning. Between 9 AM to 10 AM, the user is testing some script. Thus, he stopped the instance twice and restarted it. In the same hour the user rebooted the instance once. For how many instance hours will AWS charge the user?
    1. 3 hours
    2. 4 hours
    3. 2 hours
    4. 1 hour
  21. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  22. An organization wants to move to Cloud. They are looking for a secure encrypted database storage option. Which of the below mentioned AWS functionalities helps them to achieve this?
    1. AWS MFA with EBS
    2. AWS EBS encryption
    3. Multi-tier encryption with Redshift
    4. AWS S3 server-side storage
  23. A user has stored data on an encrypted EBS volume. The user wants to share the data with his friend’s AWS account. How can user achieve this?
    1. Create an AMI from the volume and share the AMI
    2. Copy the data to an unencrypted volume and then share
    3. Take a snapshot and share the snapshot with a friend
    4. If both the accounts are using the same encryption key then the user can share the volume directly
  24. A user is using an EBS backed instance. Which of the below mentioned statements is true?
    1. The user will be charged for volume and instance only when the instance is running
    2. The user will be charged for the volume even if the instance is stopped
    3. The user will be charged only for the instance running cost
    4. The user will not be charged for the volume if the instance is stopped
  25. A user is planning to use EBS for his DB requirement. The user already has an EC2 instance running in the VPC private subnet. How can the user attach the EBS volume to a running instance?
    1. The user must create EBS within the same VPC and then attach it to a running instance.
    2. The user can create EBS in the same zone as the subnet of instance and attach that EBS to instance. (Should be in the same AZ)
    3. It is not possible to attach an EBS to an instance running in VPC until the instance is stopped.
    4. The user can specify the same subnet while creating EBS and then attach it to a running instance.
  26. A user is creating an EBS volume. He asks for your advice. Which advice mentioned below should you not give to the user for creating an EBS volume?
    1. Take the snapshot of the volume when the instance is stopped
    2. Stripe multiple volumes attached to the same instance
    3. Create an AMI from the attached volume (AMI is created from the snapshot)
    4. Attach multiple volumes to the same instance
  27. An EC2 instance has one additional EBS volume attached to it. How can a user attach the same volume to another running instance in the same AZ?
    1. Terminate the first instance and only then attach to the new instance
    2. Attach the volume as read only to the second instance
    3. Detach the volume first and attach to new instance
    4. No need to detach. Just select the volume and attach it to the new instance, it will take care of mapping internally
  28. What is the scope of an EBS volume?
    1. VPC
    2. Region
    3. Placement Group
    4. Availability Zone