AWS S3 Encryption

AWS S3 Encryption

  • AWS S3 Encryption supports both data at rest and data in transit encryption.
  • Data in-transit
    • S3 allows protection of data in transit by enabling communication via SSL or using client-side encryption
  • Data at Rest
    • Server-Side Encryption
      • S3 encrypts the object before saving it on disks in its data centers and decrypt it when the objects are downloaded
    • Client-Side Encryption
      • data is encrypted at the client-side and uploaded to S3.
      • the encryption process, the encryption keys, and related tools are managed by the user.

S3 Server-Side Encryption

  • Server-side encryption is about data encryption at rest
  • Server-side encryption encrypts only the object data.
  • Any object metadata is not encrypted.
  • S3 handles the encryption (as it writes to disks) and decryption (when objects are accessed) of the data objects
  • There is no difference in the access mechanism for both encrypted and unencrypted objects and is handled transparently by S3

Server-Side Encryption with S3-Managed Keys – SSE-S3

  • Encryption keys are handled and managed by AWS
  • Each object is encrypted with a unique data key employing strong multi-factor encryption.
  • SSE-S3 encrypts the data key with a master key that is regularly rotated.
  • S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt the data.
  • Whether or not objects are encrypted with SSE-S3 can’t be enforced when they are uploaded using pre-signed URLs, because the only way server-side encryption can be specified is through the AWS Management Console or through an HTTP request header.
  • Request must set header x-amz-server-side-encryption to AES-256
  • For enforcing server-side encryption for all of the objects that are stored in a bucket, use a bucket policy that denies permissions to upload an object unless the request includes x-amz-server-side-encryption header to request server-side encryption.
SSE-S3 : Server Side Encryption using S3 Managed Keys
Source: Oreilly

Server-Side Encryption with AWS KMS-Managed Keys – SSE-KMS

Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)

  • SSE-KMS is similar to SSE-S3, but it uses AWS Key Management Services (KMS) which provides additional benefits along with additional charges
    • KMS is a service that combines secure, highly available hardware and software to provide a key management system scaled for the cloud.
    • KMS uses customer master keys (CMKs) to encrypt the S3 objects.
    • The master key is never made available.
    • KMS enables you to centrally create encryption keys, and define the policies that control how keys can be used.
    • Allows audit of keys used to prove they are being used correctly, by inspecting logs in AWS CloudTrail.
    • Allows keys to be temporarily disabled and re-enabled.
    • Allows keys to be rotated regularly.
    • Security controls in AWS KMS can help meet encryption-related compliance requirements..
  • SSE-KMS enables separate permissions for the use of an envelope key (that is, a key that protects the data’s encryption key) that provides added protection against unauthorized access to the objects in S3.
  • SSE-KMS provides the option to create and manage encryption keys yourself, or use a default customer master key (CMK) that is unique to you, the service you’re using, and the region you’re working in.
  • Creating and Managing CMK gives more flexibility, including the ability to create, rotate, disable, and define access controls, and audit the encryption keys used to protect the data.
  • Data keys used to encrypt the data are also encrypted and stored alongside the data they protect and are unique to each object.
  • Process flow
    • An application or AWS service client requests an encryption key to encrypt data and passes a reference to a master key under the account.
    • Client requests are authenticated based on whether they have access to use the master key.
    • A new data encryption key is created, and a copy of it is encrypted under the master key.
    • Both the data key and encrypted data key are returned to the client.
    • Data key is used to encrypt customer data and then deleted as soon as is practical.
    • Encrypted data key is stored for later use and sent back to AWS KMS when the source data needs to be decrypted.
  • S3 only supports symmetric keys and not asymmetric keys.
  • Must set header x-amz-server-side-encryption to aws:kms
SSE-KMS : Server Side Encryption using AWS KMS managed keys
Source: Oreilly

Server-Side Encryption with Customer-Provided Keys – SSE-C

AWS S3 Server Side Encryption using Customer Provided Keys SSE-C

  • Encryption keys can be managed and provided by the Customer and S3 manages the encryption, as it writes to disks, and decryption, when you access the objects
  • When you upload an object, the encryption key is provided as a part of the request and S3 uses that encryption key to apply AES-256 encryption to the data and removes the encryption key from memory.
  • When you download an object, the same encryption key should be provided as a part of the request. S3 first verifies the encryption key and if it matches the object is decrypted before returning back to you.
  • As each object and each object’s version can be encrypted with a different key, you are responsible for maintaining the mapping between the object and the encryption key used.
  • SSE-C requests must be done through HTTPS and S3 will reject any requests made over HTTP when using SSE-C.
  • For security considerations, AWS recommends considering any key sent erroneously using HTTP to be compromised and it should be discarded or rotated.
  • S3 does not store the encryption key provided. Instead, a randomly salted HMAC value of the encryption key is stored which can be used to validate future requests. The salted HMAC value cannot be used to decrypt the contents of the encrypted object or to derive the value of the encryption key. That means, if you lose the encryption key, you lose the object.
SSE-C : Server-Side Encryption with Customer-Provided Keys
Source: Oreilly

Client-Side Encryption

Client-side encryption refers to encrypting data before sending it to S3 and decrypting the data after downloading it

AWS KMS-managed Customer Master Key – CMK

  • Customer can maintain the encryption CMK with AWS KMS and can provide the CMK id to the client to encrypt the data
  • Uploading Object
    • AWS S3 encryption client first sends a request to AWS KMS for the key to encrypt the object data.
    • AWS KMS returns a randomly generated data encryption key with 2 versions a plain text version for encrypting the data and cypher blob to be uploaded with the object as object metadata
    • Client obtains a unique data encryption key for each object it uploads.
    • AWS S3 encryption client uploads the encrypted data and the cipher blob with object metadata
  • Download Object
    • AWS Client first downloads the encrypted object along with the cipher blob version of the data encryption key stored as object metadata
    • AWS Client then sends the cipher blob to AWS KMS to get the plain text version of the same, so that it can decrypt the object data.
Client Side Encryption - Customer Master Keys CSE-CMK
Source: Oreilly

Client-Side master key

  • Encryption master keys are completely maintained at the Client-side
  • Uploading Object
    • S3 encryption client ( for e.g. AmazonS3EncryptionClient in the AWS SDK for Java) locally generates randomly a one-time-use symmetric key (also known as a data encryption key or data key).
    • Client encrypts the data encryption key using the customer provided master key
    • Client uses this data encryption key to encrypt the data of a single S3 object (for each object, the client generates a separate data key).
    • Client then uploads the encrypted data to S3 and also saves the encrypted data key and its material description as object metadata ( x-amz-meta-x-amz-key)  in S3 by default
  • Downloading Object
    • Client first downloads the encrypted object from S3 along with the object metadata.
    • Using the material description in the metadata, the client first determines which master key to use to decrypt the encrypted data key.
    • Using that master key, the client decrypts the data key and uses it to decrypt the object
  • Client-side master keys and your unencrypted data are never sent to AWS
  • If the master key is lost the data cannot be decrypted

Enforcing S3 Encryption

  • S3 Encryption in Transit
    • S3 Bucket Policy can be used to enforce SSL communication with S3 using the effect deny with condition  aws:SecureTransport set to false.
  • S3 Default Encryption
    • helps set the default encryption behaviour for an S3 bucket so that all new objects are encrypted when they are stored in the bucket.
    • Objects are encrypted using SSE with either S3-managed keys (SSE-S3) or AWS KMS keys stored in AWS KMS (SSE-KMS).
  • S3 Bucket Policy
    • can be applied that denies permissions to upload an object unless the request includes x-amz-server-side-encryption header to request server-side encryption.
    • is not required, if S3 default encryption is enabled
    • are evaluated before the default encryption.

S3 Bucket Policy Enforce Encryption

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is storing data on Amazon Simple Storage Service (S3). The company’s security policy mandates that data is encrypted at rest. Which of the following methods can achieve this? Choose 3 answers
    1. Use Amazon S3 server-side encryption with AWS Key Management Service managed keys
    2. Use Amazon S3 server-side encryption with customer-provided keys
    3. Use Amazon S3 server-side encryption with EC2 key pair.
    4. Use Amazon S3 bucket policies to restrict access to the data at rest.
    5. Encrypt the data on the client-side before ingesting to Amazon S3 using their own master key
    6. Use SSL to encrypt the data while in transit to Amazon S3.
  2. A user has enabled versioning on an S3 bucket. The user is using server side encryption for data at Rest. If the user is supplying his own keys for encryption (SSE-C) which of the below mentioned statements is true?
    1. The user should use the same encryption key for all versions of the same object
    2. It is possible to have different encryption keys for different versions of the same object
    3. AWS S3 does not allow the user to upload his own keys for server side encryption
    4. The SSE-C does not work when versioning is enabled
  3. A storage admin wants to encrypt all the objects stored in S3 using server side encryption. The user does not want to use the AES 256 encryption key provided by S3. How can the user achieve this?
    1. The admin should upload his secret key to the AWS console and let S3 decrypt the objects
    2. The admin should use CLI or API to upload the encryption key to the S3 bucket. When making a call to the S3 API mention the encryption key URL in each request
    3. S3 does not support client supplied encryption keys for server side encryption
    4. The admin should send the keys and encryption algorithm with each API call
  4. A user has enabled versioning on an S3 bucket. The user is using server side encryption for data at rest. If the user is supplying his own keys for encryption (SSE-C), what is recommended to the user for the purpose of security?
    1. User should not use his own security key as it is not secure
    2. Configure S3 to rotate the user’s encryption key at regular intervals
    3. Configure S3 to store the user’s keys securely with SSL
    4. Keep rotating the encryption key manually at the client side
  5. A system admin is planning to encrypt all objects being uploaded to S3 from an application. The system admin does not want to implement his own encryption algorithm; instead he is planning to use server side encryption by supplying his own key (SSE-C.. Which parameter is not required while making a call for SSE-C?
    1. x-amz-server-side-encryption-customer-key-AES-256
    2. x-amz-server-side-encryption-customer-key
    3. x-amz-server-side-encryption-customer-algorithm
    4. x-amz-server-side-encryption-customer-key-MD5
  6. You are designing a personal document-archiving solution for your global enterprise with thousands of employee. Each employee has potentially gigabytes of data to be backed up in this archiving solution. The solution will be exposed to he employees as an application, where they can just drag and drop their files to the archiving system. Employees can retrieve their archives through a web interface. The corporate network has high bandwidth AWS DirectConnect connectivity to AWS. You have regulatory requirements that all data needs to be encrypted before being uploaded to the cloud. How do you implement this in a highly available and cost efficient way?
    1. Manage encryption keys on-premise in an encrypted relational database. Set up an on-premises server with sufficient storage to temporarily store files and then upload them to Amazon S3, providing a client-side master key. (Storing temporary increases cost and not a high availability option)
    2. Manage encryption keys in a Hardware Security Module(HSM) appliance on-premise server with sufficient storage to temporarily store, encrypt, and upload files directly into amazon Glacier. (Not cost effective)
    3. Manage encryption keys in amazon Key Management Service (KMS), upload to amazon simple storage service (s3) with client-side encryption using a KMS customer master key ID and configure Amazon S3 lifecycle policies to store each object using the amazon glacier storage tier. (with CSE-KMS the encryption happens at client side before the object is upload to S3 and KMS is cost effective as well)
    4. Manage encryption keys in an AWS CloudHSM appliance. Encrypt files prior to uploading on the employee desktop and then upload directly into amazon glacier (Not cost effective)
  7. A user has enabled server side encryption with S3. The user downloads the encrypted object from S3. How can the user decrypt it?
    1. S3 does not support server side encryption
    2. S3 provides a server side key to decrypt the object
    3. The user needs to decrypt the object using their own private key
    4. S3 manages encryption and decryption automatically
  8. When uploading an object, what request header can be explicitly specified in a request to Amazon S3 to encrypt object data when saved on the server side?
    1. x-amz-storage-class
    2. Content-MD5
    3. x-amz-security-token
    4. x-amz-server-side-encryption
  9. A company must ensure that any objects uploaded to an S3 bucket are encrypted. Which of the following actions should the SysOps Administrator take to meet this requirement? (Select TWO.)
    1. Implement AWS Shield to protect against unencrypted objects stored in S3 buckets.
    2. Implement Object access control list (ACL) to deny unencrypted objects from being uploaded to the S3 bucket.
    3. Implement Amazon S3 default encryption to make sure that any object being uploaded is encrypted before it is stored.
    4. Implement Amazon Inspector to inspect objects uploaded to the S3 bucket to make sure that they are encrypted.
    5. Implement S3 bucket policies to deny unencrypted objects from being uploaded to the buckets.

References

AWS_S3_Encryption

AWS S3 Versioning

S3 Versioning

  • S3 Versioning helps to keep multiple variants of an object in the same bucket and can be used to preserve, retrieve, and restore every version of every object stored in the S3 bucket.
  • S3 Object Versioning can be used to protect from unintended overwrites and accidental deletions
  • As Versioning maintains multiple copies of the same objects as a whole and charges accrue for multiple versions for e.g. for a 1GB file with 5 copies with minor differences would consume 5GB of S3 storage space and you would be charged for the same.
  • Buckets can be in one of the three states
    • Unversioned (the default)
    • Versioning-enabled
    • Versioning-suspended
  • S3 Object Versioning is not enabled by default and has to be explicitly enabled for each bucket.
  • Versioning once enabled, cannot be disabled and can only be suspended
  • Versioning enabled on a bucket applies to all the objects within the bucket
  • Permissions are set at the version level. Each version has its own object owner; an AWS account that creates the object version is the owner. So, you can set different permissions for different versions of the same object.
  • Irrespective of the Versioning, each object in the bucket has a version.
    • For Non Versioned bucket, the version ID for each object is null
    • For Versioned buckets, a unique version ID is assigned to each object
  • With Versioning, version ID forms a key element to define the uniqueness of an object within a bucket along with the bucket name and object key

Object Retrieval

  • For Non Versioned bucket
    • An Object retrieval always returns the only object available.
  • For Versioned bucket
    • An object retrieval returns the Current latest object.
    • Non-Current objects can be retrieved by specifying the version ID.

Object Addition

  • For Non Versioned bucket
    • If an object with the same key is uploaded again it overwrites the object
  • For Versioned bucket
    • If an object with the same key is uploaded, the newly uploaded object becomes the current version and the previous object becomes the non-current version.
    • A non-current versioned object can be retrieved and restored hence protecting against accidental overwrites

Object Deletion

  • For Non Versioned bucket
    • An object is permanently deleted and cannot be recovered
  • For the Versioned bucket,
    • All versions remain in the bucket and Amazon inserts a delete marker which becomes the Current version
    • A non-current versioned object can be retrieved and restored hence protecting against accidental overwrites
    • If an Object with a specific version ID is deleted, a permanent deletion happens and the object cannot be recovered

Delete marker

  • Delete Marker object does not have any data or ACL associated with it, just the key and the version ID
  • An object retrieval on a bucket with a delete marker as the Current version would return a 404
  • Only a DELETE operation is allowed on the Delete Marker object
  • If the Delete marker object is deleted by specifying its version ID, the previous non-current version object becomes the current version object
  • If a DELETE request is fired on an object with Delete Marker as the current version, the Delete marker object is not deleted but a Delete Marker is added again

S3 Versioning - Delete Operation

Restoring Previous Versions

  • Copy a previous version of the object into the same bucket. The copied object becomes the current version of that object and all object versions are preserved – Recommended as it keeps all the versions.
  • Permanently delete the current version of the object. When you delete the current object version, you, in effect, turn the previous version into the current version of that object.

Versioning Suspended Bucket

  • Versioning can be suspended to stop accruing new versions of the same object in a bucket.
  • Existing objects in the bucket do not change and only future requests behavior changes.
  • An object with version ID null is added for each new object addition.
  • For each object addition with the same key name, the object with the version ID null is overwritten.
  • An object retrieval request will always return the current version of the object.
  • A DELETE request on the bucket would permanently delete the version ID null object and inserts a Delete Marker
  • A DELETE request does not delete anything if the bucket does not have an object with version ID null
  • A DELETE request can still be fired with a specific version ID for any previous object with version IDs stored

MFA Delete

  • Additional security can be enabled by configuring a bucket to enable MFA (Multi-Factor Authentication) for the deletion of objects.
  • MFA Delete enabled, requires additional authentication for operations
    • Changing the versioning state of the bucket
    • Permanently deleting an object version
  • MFA Delete can be enabled on a bucket to ensure that data in the bucket cannot be accidentally deleted
  • While the bucket owner, the AWS account that created the bucket (root account), and all authorized IAM users can enable versioning, but only the bucket owner (root account) can enable MFA Delete.
  • MFA Delete however does not prevent deletion or allow restoration.
  • MFA Delete cannot be enabled using the AWS Management Console. You must use the AWS Command Line Interface (AWS CLI) or the API.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which set of Amazon S3 features helps to prevent and recover from accidental data loss?
    1. Object lifecycle and service access logging
    2. Object versioning and Multi-factor authentication
    3. Access controls and server-side encryption
    4. Website hosting and Amazon S3 policies
  2. You use S3 to store critical data for your company Several users within your group currently have full permissions to your S3 buckets. You need to come up with a solution that does not impact your users and also protect against the accidental deletion of objects. Which two options will address this issue? Choose 2 answers
    1. Enable versioning on your S3 Buckets
    2. Configure your S3 Buckets with MFA delete
    3. Create a Bucket policy and only allow read only permissions to all users at the bucket level
    4. Enable object life cycle policies and configure the data older than 3 months to be archived in Glacier
  3. To protect S3 data from both accidental deletion and accidental overwriting, you should
    1. enable S3 versioning on the bucket
    2. access S3 data using only signed URLs
    3. disable S3 delete using an IAM bucket policy
    4. enable S3 Reduced Redundancy Storage
    5. enable Multi-Factor Authentication (MFA) protected access
  4. A user has not enabled versioning on an S3 bucket. What will be the version ID of the object inside that bucket?
    1. 0
    2. There will be no version attached
    3. Null
    4. Blank
  5. A user is trying to find the state of an S3 bucket with respect to versioning. Which of the below mentioned states AWS will not return when queried?
    1. versioning-enabled
    2. versioning-suspended
    3. unversioned
    4. versioned

References

AWS S3 Versioning

AWS Storage Gateway

AWS Storage Gateway

  • AWS Storage Gateway connects on-premises software appliances with cloud-based storage to provide seamless integration with data security features between on-premises and the AWS storage infrastructure.
  • AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
  • Storage Gateway allows storage of data in the AWS cloud for scalable and cost-effective storage while maintaining data security.
  • Storage Gateway can run either on-premises, as a VM appliance, or in AWS, as an EC2 instance. So if the on-premises data center goes offline and there is no available host, the gateway can be deployed on an EC2 instance.
  • Gateways hosted on EC2 instances can be used for disaster recovery, data mirroring, and providing storage for applications hosted on EC2
  • Storage Gateway, by default, uploads data using SSL and provides data encryption at rest when stored in S3 or Glacier using AES-256
  • Storage Gateway performs encryption of data-in-transit and at-rest.
  • Storage Gateway offers multiple types
    • File Gateway
    • Volume-based Gateway
    • Tape-based

S3 File Gateway

  • supports a file interface into S3 and combines service and a virtual software appliance.
  • allows storing and retrieving of objects in S3 using industry-standard file protocols such as NFS and SMB.
  • Software appliance, or gateway, is deployed into the on-premises environment as a VM running on VMware ESXi or Microsoft Hyper-V hypervisor.
  • provides access to objects in S3 as files or file share mount points. It can be considered as a file system mount on S3.
  • durably stores POSIX-style metadata, including ownership, permissions, and timestamps in S3 as object user metadata associated with the file.
  • provides a cost-effective alternative to on-premises storage.
  • provides low-latency access to data through transparent local caching.
  • manages data transfer to and from AWS, buffers applications from network congestion, optimizes and streams data in parallel, and manages bandwidth consumption.
  • easily integrates with services like IAM, KMS, CloudWatch, CloudTrail, etc.
  • File Gateway allows you to
    • store and retrieve files directly using the NFS version 3 or 4.1 protocol.
    • store and retrieve files directly using the SMB file system version, 2 and 3 protocol.
    • access the data directly in S3 from any AWS Cloud application or service.
    • manage S3 data using lifecycle policies, cross-region replication, and versioning.

Volume Gateways

  • Volume gateways provide cloud-backed storage volumes that can be mounted as Internet Small Computer System Interface (iSCSI) devices from the on-premises application servers.
  • all data is securely stored in AWS, the approach differs from how much data is stored on-premises.
  • exposes compatible iSCSI interface on the front end to easily integrate with existing backup applications and represents another disk drive
  • backs up the data incrementally by taking snapshots which are stored as EBS snapshots in S3. These snapshots can be restored as gateway storage volume or used to create EBS volumes to be attached to an EC2 instance

Gateway Cached Volumes

Storage Gateway Cached Volume
  • Gateway Cached Volumes store data in S3, which acts as a primary data storage, and retains a copy of recently read data locally for low latency access to the frequently accessed data
  • Gateway-cached volumes offer substantial cost savings on primary storage and minimize the need to scale the storage on-premises.
  • All gateway-cached volume data and snapshot data are stored in S3 encrypted at rest using server-side encryption (SSE) and it cannot be accessed with S3 API or any other tools.
  • Each gateway configured for gateway-cached volumes can support up to 32 volumes, with each volume ranging from 1GiB to 32TiB, for a total maximum storage volume of 1,024 TiB (1 PiB).
  • Gateway VM can be allocated disks
    • Cache storage
      • Cache storage acts as the on-premises durable storage, stores the data before uploading it to S3
      • Cache storage also stores recently read data for low-latency access
    • Upload buffer
      • Upload buffer acts as a staging area before the data is uploaded to S3
      • Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in S3

Gateway Stored Volumes

Storage Gateway Stored Volume
  • Gateway stored volumes maintain the entire data set locally to provide low-latency access.
  • Gateway asynchronously backs up point-in-time snapshots (in the form of EBS snapshots) of the data to S3 which provides durable off-site backups
  • Gateway stored volume configuration provides durable and inexpensive off-site backups that you can recover to your local data center or EC2 for e.g., if you need replacement capacity for disaster recovery, you can recover the backups to EC2.
  • Each gateway configured for gateway-stored volumes can support up to 12 32 volumes, ranging from 1GiB to 16TiB, and total volume storage of 192 TiB 512 TiB
  • Gateway VM can be allocated disks
    • Volume Storage
      • For storing the actual data
      • Can be mapped to on-premises direct-attached storage (DAS) or storage area network (SAN) disks
    • Upload buffer
      • Upload buffer acts as a staging area before the data is uploaded to S3
      • Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in Amazon S3

Tape Gateway – Gateway-Virtual Tape Library (VTL)

Storage Gateway VTL
  • Tape Gateway offers a durable, cost-effective data archival solution.
  • VTL interface can help leverage existing tape-based backup application infrastructure to store data on virtual tape cartridges created on the tape gateway.
  • Each Tape Gateway is preconfigured with a media changer and tape drives, which are available to the existing client backup applications as iSCSI devices. Tape cartridges can be added as needed to archive the data.
  • Gateway-VTL provides a virtual tape infrastructure that scales seamlessly with the business needs and eliminates the operational burden of provisioning, scaling, and maintaining a physical tape infrastructure.
  • Gateway VTL has the following components:-
    • Virtual Tape
      • Virtual tape is similar to the physical tape cartridge, except that the data is stored in the AWS storage solution
      • Each gateway can contain 1500 tapes or up to 150 TiB 1 PiB of total tape data, with each tape ranging from 100 GiB to 2.5 TiB
    • Virtual Tape Library
      • Virtual tape library is similar to the physical tape library with tape drives (replaced with VTL tape drive) and robotic arms (replaced with Media changer)
      • Tapes in the Virtual tape library are backup in S3
      • Backup software writes data to the gateway, the gateway stores data locally, and then asynchronously uploads it to virtual tapes in S3.
    • Archive OR Virtual Tape Shelf
      • Virtual tape shelf is similar to the offsite tape holding facility
      • Tapes in the Virtual tape library are backup in Glacier providing an extremely low-cost storage service for data archiving and backup
      • VTS is located in the same region where the gateway was created and every region would have a single VTS irrespective of the number of gateways
      • Archiving tapes
        • When the backup software ejects a tape, the gateway moves the tape to the VTS for long term storage
      • Retrieving tapes
        • Tape can be retrieved from VTS only by first retrieving the tapes first to VTL and would be available in the VTL in about 24 hours
  • Gateway VM can be allocated disks for
    • Cache storage
      • Cache storage acts as the on-premises durable storage, stores the data before uploading it to S3.
      • Cache storage also stores recently read data for low-latency access
    • Upload buffer
      • Upload buffer acts as a staging area before the data is uploaded to the Virtual tape.
      • Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in S3.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following services natively encrypts data at rest within an AWS region? Choose 2 answers
    1. AWS Storage Gateway
    2. Amazon DynamoDB
    3. Amazon CloudFront
    4. Amazon Glacier
    5. Amazon Simple Queue Service
  2. What does the AWS Storage Gateway provide?
    1. It allows to integrate on-premises IT environments with Cloud Storage
    2. A direct encrypted connection to Amazon S3.
    3. It’s a backup solution that provides an on-premises Cloud storage.
    4. It provides an encrypted SSL endpoint for backups in the Cloud.
  3. You’re running an application on-premises due to its dependency on non-x86 hardware and want to use AWS for data backup. Your backup application is only able to write to POSIX-compatible block-based storage. You have 140TB of data and would like to mount it as a single folder on your file server. Users must be able to access portions of this data while the backups are taking place. What backup solution would be most appropriate for this use case?
    1. Use Storage Gateway and configure it to use Gateway Cached volumes.
    2. Configure your backup software to use S3 as the target for your data backups.
    3. Configure your backup software to use Glacier as the target for your data backups
    4. Use Storage Gateway and configure it to use Gateway Stored volumes (Data is hosted on the On-premise server as well. The requirement for 140TB is for file server On-Premise more to confuse and not in AWS. Just need a backup solution hence stored instead of cached volumes)
  4. A customer has a single 3-TB volume on-premises that is used to hold a large repository of images and print layout files. This repository is growing at 500 GB a year and must be presented as a single logical volume. The customer is becoming increasingly constrained with their local storage capacity and wants an off-site backup of this data, while maintaining low-latency access to their frequently accessed data. Which AWS Storage Gateway configuration meets the customer requirements?
    1. Gateway-Cached volumes with snapshots scheduled to Amazon S3
    2. Gateway-Stored volumes with snapshots scheduled to Amazon S3
    3. Gateway-Virtual Tape Library with snapshots to Amazon S3
    4. Gateway-Virtual Tape Library with snapshots to Amazon Glacier
  5. You have a proprietary data store on-premises that must be backed up daily by dumping the data store contents to a single compressed 50GB file and sending the file to AWS. Your SLAs state that any dump file backed up within the past 7 days can be retrieved within 2 hours. Your compliance department has stated that all data must be held indefinitely. The time required to restore the data store from a backup is approximately 1 hour. Your on-premise network connection is capable of sustaining 1gbps to AWS. Which backup methods to AWS would be most cost-effective while still meeting all of your requirements?
    1. Send the daily backup files to Glacier immediately after being generated (will not meet the RTO)
    2. Transfer the daily backup files to an EBS volume in AWS and take daily snapshots of the volume (Not cost effective)
    3. Transfer the daily backup files to S3 and use appropriate bucket lifecycle policies to send to Glacier (Store in S3 for seven days and then archive to Glacier)
    4. Host the backup files on a Storage Gateway with Gateway-Cached Volumes and take daily snapshots (Not Cost effective as local storage as well as S3 storage)
  6. A customer implemented AWS Storage Gateway with a gateway-cached volume at their main office. An event takes the link between the main and branch office offline. Which methods will enable the branch office to access their data? Choose 3 answers
    1. Use a HTTPS GET to the Amazon S3 bucket where the files are located (gateway volumes are only accessible from the AWS Storage Gateway and cannot be directly accessed using Amazon S3 APIs)
    2. Restore by implementing a lifecycle policy on the Amazon S3 bucket.
    3. Make an Amazon Glacier Restore API call to load the files into another Amazon S3 bucket within four to six hours.
    4. Launch a new AWS Storage Gateway instance AMI in Amazon EC2, and restore from a gateway snapshot
    5. Create an Amazon EBS volume from a gateway snapshot, and mount it to an Amazon EC2 instance.
    6. Launch an AWS Storage Gateway virtual iSCSI device at the branch office, and restore from a gateway snapshot
  7. A company uses on-premises servers to host its applications. The company is running out of storage capacity. The applications use
    both block storage and NFS storage. The company needs a high-performing solution that supports local caching without rearchitecting
    its existing applications.Which combination of actions should a solutions architect take to meet these requirements? (Choose two.)
    1. Mount Amazon S3 as a file system to the on-premises servers.
    2. Deploy an AWS Storage Gateway file gateway to replace NFS storage.
    3. Deploy AWS Snowball Edge to provision NFS mounts to on-premises servers.
    4. Deploy an AWS Storage Gateway volume gateway to replace the block storage.
    5. Deploy Amazon Elastic File System (Amazon EFS) volumes and mount them to on-premises servers.

References

  1. AWS_Storage_Gateway_User_Guide
https://www.youtube.com/watch?v=AkehuRl5YPg

Amazon EBS Multi-Attach

EBS Multi-Attach

  • EBS Multi-Attach enables attaching a single Provisioned IOPS SSD (io1 or io2) volume to multiple instances that are in the same AZ.
  • Multiple Multi-Attach enabled volumes can be attached to an instance or set of instances.
  • Each instance to which the volume is attached has full read and write permission to the shared volume.
  • Multi-Attach helps achieve higher application availability in clustered Linux applications that manage concurrent write operations.

EBS Multi-Attach Considerations & Limitations

  • Multi-Attach is supported exclusively on Provisioned IOPS SSD volumes.
  • Multi-Attach enabled volumes can be attached
    • to up to 16 Linux instances built on the Nitro System that are in the same AZ.
    • to Windows instances, but the operating system does not recognize the data on the volume that is shared between the instances, which can result in data inconsistency.
  • Multi-Attach enabled volumes can be attached to one block device mapping per instance.
  • Multi-Attach enabled volumes are deleted on instance termination if the last attached instance is terminated and if that instance is configured to delete the volume on termination.
  • Multi-Attach enabled volumes can’t be created as boot volumes.
  • Multi-Attach can’t be enabled during instance launch using either the EC2 console or RunInstances API.
  • Multi-Attach enabled volumes do not support I/O fencing. I/O fencing protocols control write access in a shared storage environment to maintain data consistency
  • Multi-Attach can’t be enabled or disabled while the volume is attached to an instance.
  • Multi-Attach option is disabled by default.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

Amazon_EBS_Multi-Attach

AWS EC2 Instance Store Storage

EC2 Instance Store

EC2 Instance Store

  • An instance store provides temporary or Ephemeral block-level storage for an Elastic Cloud Compute – EC2 instance.
  • is located on the disks that are physically attached to the host computer.
  • consists of one or more instance store volumes exposed as block devices.
  • The size of an instance store varies by instance type.
  • Virtual devices for instance store volumes that are ephemeral[0-23], starting the first one as ephemeral0 and so on.
  • While an instance store is dedicated to a particular instance, the disk subsystem is shared among instances on a host computer.
  • is ideal for temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content, or for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers.
  • delivers very high random I/O performance and is a good option for storage with very low latency requirements, but you don’t need the data to persist when the instance terminates or you can take advantage of fault-tolerant architectures.

EC2 Instance Store

Instance Store Lifecycle

  • Instance store data lifetime is dependent on the lifecycle of the Instance to which it is attached.
  • Data on the Instance store persists when an instance is rebooted.
  • However, the data on the instance store does not persist if the
    • underlying disk drive fails
    • instance terminates
    • instance hibernates
    • instance stops i.e. if the EBS-backed instance with instance store volumes attached is stopped
  • Stopping, hibernating, or terminating an instance would cause every block of storage in the instance store to be reseted.
  • If an AMI is created from an Instance with an Instance store volume, the data on its instance store volume isn’t preserved.

Instance Store Volumes

  • Instance type of an instance determines the size of the instance store available for the instance and the type of hardware used for the instance store volumes.
  • Instance store volumes are included as part of the instance’s hourly cost.
  • Some instance types use solid-state drives (SSD) to deliver very high random I/O performance, which is a good option when storage with very low latency is needed, but the data does not need to be persisted when the instance terminates or architecture is fault tolerant.

Instance Store Volumes with EC2 instances

  • EBS volumes and instance store volumes for an instance are specified using a block device mapping.
  • Instance store volume
    • can only be attached to an EC2 instance only when an instance is launched.
    • cannot be detached and reattached to a different instance.
  • After an instance is launched, the instance store volumes for the instance should be formatted and mounted before it can be used.
  • Root volume of an instance store-backed instance is mounted automatically

Instance Store Optimizing Writes

  • Because of the way that EC2 virtualizes disks, the first write to any location on an instance store volume performs more slowly than subsequent writes.
  • Amortizing (gradually writing off) this cost over the lifetime of the instance might be acceptable.
  • However, if high disk performance is required, AWS recommends initializing the drives by writing once to every drive location before production use

EBS vs Instance Store

Refer blog post @ EBS vs Instance Store

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Please select the most correct answer regarding the persistence of the Amazon Instance Store
    1. The data on an instance store volume persists only during the life of the associated Amazon EC2 instance
    2. The data on an instance store volume is lost when the security group rule of the associated instance is changed.
    3. The data on an instance store volume persists even after associated Amazon EC2 instance is deleted
  2. A user has launched an EC2 instance from an instance store backed AMI. The user has attached an additional instance store volume to the instance. The user wants to create an AMI from the running instance. Will the AMI have the additional instance store volume data?
    1. Yes, the block device mapping will have information about the additional instance store volume
    2. No, since the instance store backed AMI can have only the root volume bundled
    3. It is not possible to attach an additional instance store volume to the existing instance store backed AMI instance
    4. No, since this is ephemeral storage it will not be a part of the AMI
  3. When an EC2 instance that is backed by an S3-based AMI Is terminated, what happens to the data on the root volume?
    1. Data is automatically saved as an EBS volume.
    2. Data is automatically saved as an EBS snapshot.
    3. Data is automatically deleted
    4. Data is unavailable until the instance is restarted.
  4. A user has launched an EC2 instance from an instance store backed AMI. If the user restarts the instance, what will happen to the ephemeral storage data?
    1. All the data will be erased but the ephemeral storage will stay connected
    2. All data will be erased and the ephemeral storage is released
    3. It is not possible to restart an instance launched from an instance store backed AMI
    4. The data is preserved
  5. When an EC2 EBS-backed instance is stopped, what happens to the data on any ephemeral store volumes?
    1. Data will be deleted and will no longer be accessible
    2. Data is automatically saved in an EBS volume.
    3. Data is automatically saved as an EBS snapshot
    4. Data is unavailable until the instance is restarted
  6. A user has launched an EC2 Windows instance from an instance store backed AMI. The user has also set the Instance initiated shutdown behavior to stop. What will happen when the user shuts down the OS?
    1. It will not allow the user to shutdown the OS when the shutdown behavior is set to Stop
    2. It is not possible to set the termination behavior to Stop for an Instance store backed AMI instance
    3. The instance will stay running but the OS will be shutdown
    4. The instance will be terminated
  7. Which of the following will occur when an EC2 instance in a VPC (Virtual Private Cloud) with an associated Elastic IP is stopped and started? (Choose 2 answers)
    1. The Elastic IP will be dissociated from the instance
    2. All data on instance-store devices will be lost
    3. All data on EBS (Elastic Block Store) devices will be lost
    4. The ENI (Elastic Network Interface) is detached
    5. The underlying host for the instance is changed

References

AWS EBS Snapshot

EBS Snapshot

  • EBS provides the ability to create snapshots (backups) of any EBS volume and write a copy of the data in the volume to S3, where it is stored redundantly in multiple Availability Zones
  • Snapshots are incremental backups and store only the data that was changed from the time the last snapshot was taken.
  • Snapshots can be used to create new volumes, increase the size of the volumes or replicate data across Availability Zones.
  • Snapshot size can probably be smaller than the volume size as the data is compressed before being saved to S3.
  • Even though snapshots are saved incrementally, the snapshot deletion process is designed so that you need to retain only the most recent snapshot in order to restore the volume.
  • EBS Snapshots can be used to migrate or create EBS volumes in different AZs or regions.

Multi-Volume Snapshots

  • Snapshots can be used to create a backup of critical workloads, such as a large database or a file system that spans across multiple EBS volumes.
  • Multi-volume snapshots help take exact point-in-time, data-coordinated, and crash-consistent snapshots across multiple EBS volumes attached to an EC2 instance.
  • It is no longer required to stop the instance or to coordinate between volumes to ensure crash consistency because snapshots are automatically taken across multiple EBS volumes.

EBS Snapshot creation

  • Snapshots can be created from EBS volumes periodically and are point-in-time snapshots.
  • Snapshots are incremental and only store the blocks on the device that changed since the last snapshot was taken
  • Snapshots occur asynchronously; the point-in-time snapshot is created immediately while it takes time to upload the modified blocks to S3. While it is completing, an in-progress snapshot is not affected by ongoing reads and writes to the volume.
  • Snapshots can be taken from in-use volumes. However, snapshots will only capture the data that was written to the EBS volumes at the time the snapshot command is issued excluding the data which is cached by any applications of OS.
  • Recommended ways to create a Snapshot from an EBS volume are
    • Pause all file writes to the volume
    • Unmount the Volume -> Take Snapshot -> Remount the Volume
    • Stop the instance – Take Snapshot (for root EBS volumes)
  • EBS volume created based on a snapshot
    • begins as an exact replica of the original volume that was used to create the snapshot.
    • replicated volume loads data in the background so that it can be used immediately.
    • If data that hasn’t been loaded yet is accessed, the volume immediately downloads the requested data from S3 and then continues loading the rest of the volume’s data in the background.

EBS Snapshot Deletion

  • When a snapshot is deleted only the data exclusive to that snapshot is removed.
  • Deleting previous snapshots of a volume does not affect the ability to restore volumes from later snapshots of that volume.
  • Active snapshots contain all of the information needed to restore your data (from the time the snapshot was taken) to a new EBS volume.
  • Even though snapshots are saved incrementally, the snapshot deletion process is designed so that you need to retain only the most recent snapshot in order to restore the volume.
  • Snapshot of the root device of an EBS volume used by a registered AMI can’t be deleted. AMI needs to be deregistered to be able to delete the snapshot.

EBS Snapshot Copy

  • Snapshots are constrained to the region in which they are created and can be used to launch EBS volumes within the same region only
  • Snapshots can be copied across regions to make it easier to leverage multiple regions for geographical expansion, data center migration, and disaster recovery
  • Snapshots are copied with S3 server-side encryption (256-bit Advanced Encryption Standard) to encrypt the data and the snapshot copy receives a snapshot ID that’s different from the original snapshot’s ID.
  • User-defined tags are not copied from the source to the new snapshot.
  • First Snapshot copy to another region is always a full copy, while the rest are always incremental.
  • When a snapshot is copied,
    • it can be encrypted if currently unencrypted or
    • can be encrypted using a different encryption key. Changing the encryption status of a snapshot or using a non-default EBS CMK during a copy operation always results in a full copy (not incremental)

EBS Snapshot Sharing

  • Snapshots can be shared by making them public or with specific AWS accounts by modifying the access permissions of the snapshots
  • Encrypted snapshots cannot be made available publicly.
  • Only unencrypted snapshots can be shared. Encrypted snapshots cannot be shared between accounts or made public
  • Encrypted snapshot can be shared with specific AWS accounts by sharing the custom CMK key used must also be shared to encrypt it
  • Cross-account permissions may be applied to a custom key either when it is created or at a later time.
  • Users, with access to snapshots, can copy the snapshot and create their own EBS volumes based on the snapshot while the original snapshot remains unaffected
  • AWS prevents you from sharing snapshots that were encrypted with the default CMK

EBS Snapshot Encryption

  • EBS snapshots fully support EBS encryption.
  • Snapshots of encrypted volumes are automatically encrypted
  • Volumes created from encrypted snapshots are automatically encrypted
  • All data in flight between the instance and the volume is encrypted
  • Volumes created from an unencrypted snapshot owned or have access to can be encrypted on the fly.
  • Unencrypted snapshots can be encrypted during the copy process.
  • Encrypted snapshots that you own or have access to, can be encrypted with a different key during the copy process.
  • First snapshot of an encrypted volume that has been created from an unencrypted snapshot is always a full snapshot.
  • First snapshot of a re-encrypted volume, which has a different CMK compared to the source snapshot, is always a full snapshot.

EBS Snapshot Lifecycle Automation

  • Amazon Data Lifecycle Manager can be used to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes.
  • Automating snapshot management helps you to:
    • Protect valuable data by enforcing a regular backup schedule.
    • Retain backups as required by auditors or internal compliance.
    • Reduce storage costs by deleting outdated backups.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An existing application stores sensitive information on a non-boot Amazon EBS data volume attached to an Amazon Elastic Compute Cloud instance. Which of the following approaches would protect the sensitive data on an Amazon EBS volume?
    1. Upload your customer keys to AWS CloudHSM. Associate the Amazon EBS volume with AWS CloudHSM. Remount the Amazon EBS volume.
    2. Create and mount a new, encrypted Amazon EBS volume. Move the data to the new volume. Delete the old Amazon EBS volume.
    3. Unmount the EBS volume. Toggle the encryption attribute to True. Re-mount the Amazon EBS volume.
    4. Snapshot the current Amazon EBS volume. Restore the snapshot to a new, encrypted Amazon EBS volume. Mount the Amazon EBS volume (Need to create a snapshot, create an encrypted copy of snapshot and then create an EBS volume and mount it)
  2. Is it possible to access your EBS snapshots?
    1. Yes, through the Amazon S3 APIs.
    2. Yes, through the Amazon EC2 APIs
    3. No, EBS snapshots cannot be accessed; they can only be used to create a new EBS volume.
    4. EBS doesn’t provide snapshots.
  3. Which of the following approaches provides the lowest cost for Amazon Elastic Block Store snapshots while giving you the ability to fully restore data?
    1. Maintain two snapshots: the original snapshot and the latest incremental snapshot
    2. Maintain a volume snapshot; subsequent snapshots will overwrite one another
    3. Maintain a single snapshot the latest snapshot is both Incremental and complete
    4. Maintain the most current snapshot, archive the original and incremental to Amazon Glacier.
  4. Which procedure for backing up a relational database on EC2 that is using a set of RAIDed EBS volumes for storage minimizes the time during which the database cannot be written to and results in a consistent backup?
    1. Detach EBS volumes, 2. Start EBS snapshot of volumes, 3. Re-attach EBS volumes
    2. Stop the EC2 Instance. 2. Snapshot the EBS volumes
    3. Suspend disk I/O, 2. Create an image of the EC2 Instance, 3. Resume disk I/O
    4. Suspend disk I/O, 2. Start EBS snapshot of volumes, 3. Resume disk I/O
    5. Suspend disk I/O, 2. Start EBS snapshot of volumes, 3. Wait for snapshots to complete, 4. Resume disk I/O
  5. How can an EBS volume that is currently attached to an EC2 instance be migrated from one Availability Zone to another?
    1. Detach the volume and attach it to another EC2 instance in the other AZ.
    2. Simply create a new volume in the other AZ and specify the original volume as the source.
    3. Create a snapshot of the volume, and create a new volume from the snapshot in the other AZ
    4. Detach the volume, then use the ec2-migrate-volume command to move it to another AZ.
  6. How are the EBS snapshots saved on Amazon S3?
    1. Exponentially
    2. Incrementally
    3. EBS snapshots are not stored in the Amazon S3
    4. Decrementally
  7. EBS Snapshots occur _____
    1. Asynchronously
    2. Synchronously
    3. Weekly
  8. What will be the status of the snapshot until the snapshot is complete?
    1. Running
    2. Working
    3. Progressing
    4. Pending
  9. Before I delete an EBS volume, what can I do if I want to recreate the volume later?
    1. Create a copy of the EBS volume (not a snapshot)
    2. Create and Store a snapshot of the volume
    3. Download the content to an EC2 instance
    4. Back up the data in to a physical disk
  10. Which of the following are true regarding encrypted Amazon Elastic Block Store (EBS) volumes? Choose 2 answers
    1. Supported on all Amazon EBS volume types
    2. Snapshots are automatically encrypted
    3. Available to all instance types
    4. Existing volumes can be encrypted
    5. Shared volumes can be encrypted
  11. Amazon EBS snapshots have which of the following two characteristics? (Choose 2.) Choose 2 answers
    1. EBS snapshots only save incremental changes from snapshot to snapshot
    2. EBS snapshots can be created in real-time without stopping an EC2 instance (the snapshot can be taken real time however it will not be consistent and the recommended way is to stop or freeze the IO)
    3. EBS snapshots can only be restored to an EBS volume of the same size or smaller (EBS volume restored from snapshots need to be of the same size of larger size)
    4. EBS snapshots can only be restored and mounted to an instance in the same Availability Zone as the original EBS volume (Snapshots are specific to Region and can be used to create a volume in any AZ and does not depend on the original EBS volume AZ)
  12. A user is planning to schedule a backup for an EBS volume. The user wants security of the snapshot data. How can the user achieve data encryption with a snapshot?
    1. Use encrypted EBS volumes so that the snapshot will be encrypted by AWS (Refer link)
    2. While creating a snapshot select the snapshot with encryption
    3. By default the snapshot is encrypted by AWS
    4. Enable server side encryption for the snapshot using S3
  13. A sys admin is trying to understand EBS snapshots. Which of the below mentioned statements will not be useful to the admin to understand the concepts about a snapshot?
    1. Snapshot is synchronous
    2. It is recommended to stop the instance before taking a snapshot for consistent data
    3. Snapshot is incremental
    4. Snapshot captures the data that has been written to the hard disk when the snapshot command was executed
  14. When creation of an EBS snapshot is initiated but not completed, the EBS volume
    1. Cannot be detached or attached to an EC2 instance until me snapshot completes
    2. Can be used in read-only mode while me snapshot is in progress
    3. Can be used while the snapshot is in progress
    4. Cannot be used until the snapshot completes
  15. You have a server with a 5O0GB Amazon EBS data volume. The volume is 80% full. You need to back up the volume at regular intervals and be able to re-create the volume in a new Availability Zone in the shortest time possible. All applications using the volume can be paused for a period of a few minutes with no discernible user impact. Which of the following backup methods will best fulfill your requirements?
    1. Take periodic snapshots of the EBS volume
    2. Use a third-party Incremental backup application to back up to Amazon Glacier
    3. Periodically back up all data to a single compressed archive and archive to Amazon S3 using a parallelized multi-part upload
    4. Create another EBS volume in the second Availability Zone attach it to the Amazon EC2 instance, and use a disk manager to mirror me two disks
  16. A user is creating a snapshot of an EBS volume. Which of the below statements is incorrect in relation to the creation of an EBS snapshot?
    1. Its incremental
    2. It can be used to launch a new instance
    3. It is stored in the same AZ as the volume (stored in the same region)
    4. It is a point in time backup of the EBS volume
  17. A user has created a snapshot of an EBS volume. Which of the below mentioned usage cases is not possible with respect to a snapshot?
    1. Mirroring the volume from one AZ to another AZ
    2. Launch an instance
    3. Decrease the volume size
    4. Increase the size of the volume
  18. What is true of the way that encryption works with EBS?
    1. Snapshotting an encrypted volume makes an encrypted snapshot; restoring an encrypted snapshot creates an encrypted volume when specified / requested.
    2. Snapshotting an encrypted volume makes an encrypted snapshot when specified / requested; restoring an encrypted snapshot creates an encrypted volume when specified / requested.
    3. Snapshotting an encrypted volume makes an encrypted snapshot; restoring an encrypted snapshot always creates an encrypted volume.
    4. Snapshotting an encrypted volume makes an encrypted snapshot when specified / requested; restoring an encrypted snapshot always creates an encrypted volume.
  19. Why are more frequent snapshots of EBS Volumes faster?
    1. Blocks in EBS Volumes are allocated lazily, since while logically separated from other EBS Volumes, Volumes often share the same physical hardware. Snapshotting the first time forces full block range allocation, so the second snapshot doesn’t need to perform the allocation phase and is faster.
    2. The snapshots are incremental so that only the blocks on the device that have changed after your last snapshot are saved in the new snapshot.
    3. AWS provisions more disk throughput for burst capacity during snapshots if the drive has been pre-warmed by snapshotting and reading all blocks.
    4. The drive is pre-warmed, so block access is more rapid for volumes when every block on the device has already been read at least one time.
  20. Which is not a restriction on AWS EBS Snapshots?
    1. Snapshots which are shared cannot be used as a basis for other snapshots (Snapshots shared with other users are usable in full by the recipient, including but limited to the ability to base modified volumes and snapshots)
    2. You cannot share a snapshot containing an AWS Access Key ID or AWS Secret Access Key
    3. You cannot share encrypted snapshots (NOTE: this has be updated partially where you can share a encrypted snapshot with other accounts)
    4. Snapshot restorations are restricted to the region in which the snapshots are created
  21. There is a very serious outage at AWS. EC2 is not affected, but your EC2 instance deployment scripts stopped working in the region with the outage. What might be the issue?
    1. The AWS Console is down, so your CLI commands do not work.
    2. S3 is unavailable, so you can’t create EBS volumes from a snapshot you use to deploy new volumes. (EBS volume snapshots are stored in S3. If S3 is unavailable, snapshots are unavailable)
    3. AWS turns off the <code>DeployCode</code> API call when there are major outages, to protect from system floods.
    4. None of the other answers make sense. If EC2 is not affected, it must be some other issue.

AWS EBS Performance

AWS EBS Performance Tips

  • EBS Performance depends on several factors including I/O characteristics,  instances and volumes configuration and can be improved using PIOPS, EBS-Optimized instances, Pre-Warming, and RAIDed configuration.

EBS-Optimized or 10 Gigabit Network Instances

  • An EBS-Optimized instance uses an optimized configuration stack and provides additional, dedicated capacity for EBS I/O.
  • Optimization provides the best performance for the EBS volumes by minimizing contention between EBS I/O and other traffic from an instance
  • EBS-Optimized instances deliver dedicated throughput to EBS depending on the instance type used.
  • Not all instance types support EBS-Optimization
  • Some Instance types enable EBS-Optimization by default, while it can be enabled for some.
  • EBS optimization enabled for an instance, that is not EBS-Optimized by default, an additional low, hourly fee for the dedicated capacity is charged
  • When attached to an EBS–optimized instance,
    • General Purpose (SSD) volumes are designed to deliver within 10% of their baseline and burst performance 99.9% of the time in a given year
    • Provisioned IOPS (SSD) volumes are designed to deliver within 10% of their provisioned performance 99.9 percent of the time in a given year.

EBS Volume Initialization – Pre-warming

  • Empty EBS volumes receive their maximum performance the moment that they are available and DO NOT require initialization (pre-warming).
  • EBS volumes needed a pre-warming, previously, before being used to get maximum performance to start with. Pre-warming of the volume was possible by writing to the entire volume with 0 for new volumes or reading the entire volume for volumes from snapshots.
  • Storage blocks on volumes that were restored from snapshots must be initialized (pulled down from S3 and written to the volume) before the block can be accessed.
  • This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed.
  • To avoid this initial performance hit in a production environment, the following options can be used
    • Force the immediate initialization of the entire volume by using the dd or fio utilities to read from all of the blocks on a volume.
    • Enable fast snapshot restore – FSR on a snapshot to ensure that the EBS volumes created from it are fully-initialized at creation and instantly deliver all of their provisioned performance.

RAID Configuration

  • EBS volumes can be striped, if a single EBS volume does not meet the performance and more is required.
  • Striping volumes allows pushing tens of thousands of IOPS.
  • EBS volumes are already replicated across multiple servers in an AZ for availability and durability, so AWS generally recommend striping for performance rather than durability.
  • For greater I/O performance than can be achieved with a single volume, RAID 0 can stripe multiple volumes together; for on-instance redundancy, RAID 1 can mirror two volumes together.
  • RAID 0 allows I/O distribution across all volumes in a stripe, allowing straight gains with each addition.
  • RAID 1 can be used for durability to mirror volumes, but in this case, it requires more EC2 to EBS bandwidth as the data is written to multiple volumes simultaneously and should be used with EBS–optimization.
  • EBS volume data is replicated across multiple servers in an AZ to prevent the loss of data from the failure of any single component
  • AWS doesn’t recommend RAID 5 and 6 because the parity write operations of these modes consume the IOPS available for the volumes and can result in 20-30% fewer usable IOPS than RAID 0.
  • A 2-volume RAID 0 config can outperform a 4-volume RAID 6 that costs twice as much.

RAID Configuration

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user is trying to pre-warm a blank EBS volume attached to a Linux instance. Which of the below mentioned steps should be performed by the user?
    1. There is no need to pre-warm an EBS volume (with latest update no pre-warming is needed)
    2. Contact AWS support to pre-warm (This used to be the case before, but pre warming is not necessary now)
    3. Unmount the volume before pre-warming
    4. Format the device
  2. A user has created an EBS volume of 10 GB and attached it to a running instance. The user is trying to access EBS for first time. Which of the below mentioned options is the correct statement with respect to a first time EBS access?
    1. The volume will show a size of 8 GB
    2. The volume will show a loss of the IOPS performance the first time (the volume needed to be wiped cleaned before for new volumes, however pre warming is not needed any more)
    3. The volume will be blank
    4. If the EBS is mounted it will ask the user to create a file system
  3. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  4. You have launched an EC2 instance with four (4) 500 GB EBS Provisioned IOPS volumes attached. The EC2 Instance is EBS-Optimized and supports 500 Mbps throughput between EC2 and EBS. The two EBS volumes are configured as a single RAID 0 device, and each Provisioned IOPS volume is provisioned with 4,000 IOPS (4000 16KB reads or writes) for a total of 16,000 random IOPS on the instance. The EC2 Instance initially delivers the expected 16,000 IOPS random read and write performance. Sometime later in order to increase the total random I/O performance of the instance, you add an additional two 500 GB EBS Provisioned IOPS volumes to the RAID. Each volume is provisioned to 4,000 IOPS like the original four for a total of 24,000 IOPS on the EC2 instance Monitoring shows that the EC2 instance CPU utilization increased from 50% to 70%, but the total random IOPS measured at the instance level does not increase at all. What is the problem and a valid solution?
    1. Larger storage volumes support higher Provisioned IOPS rates: increase the provisioned volume storage of each of the 6 EBS volumes to 1TB.
    2. EBS-Optimized throughput limits the total IOPS that can be utilized use an EBS-Optimized instance that provides larger throughput. (EC2 Instance types have limit on max throughput and would require larger instance types to provide 24000 IOPS)
    3. Small block sizes cause performance degradation, limiting the I’O throughput, configure the instance device driver and file system to use 64KB blocks to increase throughput.
    4. RAID 0 only scales linearly to about 4 devices, use RAID 0 with 4 EBS Provisioned IOPS volumes but increase each Provisioned IOPS EBS volume to 6.000 IOPS.
    5. The standard EBS instance root volume limits the total IOPS rate, change the instant root volume to also be a 500GB 4,000 Provisioned IOPS volume
  5. A user has deployed an application on an EBS backed EC2 instance. For a better performance of application, it requires dedicated EC2 to EBS traffic. How can the user achieve this?
    1. Launch the EC2 instance as EBS provisioned with PIOPS EBS
    2. Launch the EC2 instance as EBS enhanced with PIOPS EBS
    3. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    4. Launch the EC2 instance as EBS optimized with PIOPS EBS

AWS EBS Volume Types

EBS Volume Types

AWS EBS Volume Types

  • AWS provides the following EBS volume types, which differ in performance characteristics and price and can be tailored for storage performance and cost to the needs of the applications.
  • Solid state drives (SSD-backed) volumes optimized for transactional workloads involving frequent read/write operations with small I/O size, where the dominant performance attribute is IOPS
    • General Purpose SSD (gp2/gp3)
    • Provisioned IOPS SSD (io1/io2/io2 block express)
  • Hard disk drives (HDD-backed) volumes optimized for large streaming workloads where throughput (measured in MiB/s) is a better performance measure than IOPS
    • Throughput Optimized HDD (st1)
    • Cold HDD (sc1)
    • Magnetic Volumes (standard) (Previous Generation)

EBS Volume Types (New Generation)

EBS Volume Types

Solid state drives (SSD-backed) volumes

Solid state drives (SSD-backed) volumes

General Purpose SSD Volumes (gp2/gp3)

  • General Purpose SSD volumes offer cost-effective storage that is ideal for a broad range of workloads.
  • General Purpose SSD volumes deliver single-digit millisecond latencies
  • General Purpose SSD volumes can range in size from 1 GiB to 16 TiB.
  • General Purpose SSD (gp2) volumes
    • has a maximum throughput of 160 MiB/s (at 214 GiB and larger).
    • provides a baseline performance of 3 IOPS/GiB
    • provides the ability to burst to 3,000 IOPS for extended periods of time for volume size less than 1 TiB and up to a maximum of 16,000 IOPS (at 5,334 GiB).
    • If the volume performance is frequently limited to the baseline level (due to an empty I/O credit balance),
      • consider using a larger General Purpose SSD volume (with a higher baseline performance level) or
      • switching to a Provisioned IOPS SSD volume for workloads that require sustained IOPS performance greater than 16,000 IOPS.
  • General Purpose SSD (gp3) volumes
    • deliver a consistent baseline rate of 3,000 IOPS and 125 MiB/s, included with the price of storage.
    • additional IOPS (up to 16,000) and throughput (up to 1,000 MiB/s) can be provisioned for an additional cost.
    • the maximum ratio of provisioned IOPS to provisioned volume size is 500 IOPS per GiB
    • the maximum ratio of provisioned throughput to provisioned IOPS is .25 MiB/s per IOPS.

I/O Credits and Burst Performance

  • I/O credits represent the available bandwidth that the General Purpose SSD volume can use to burst large amounts of I/O when more than the baseline performance is needed.
  • General Purpose SSD (gp2) volume performance is governed by volume size, which dictates the baseline performance level of the volume for e.g. 100 GiB volume has a 300 IOPS @ 3 IOPS/GiB
  • General Purpose SSD volume size also determines how quickly it accumulates I/O credits for e.g. 100 GiB with a performance of 300 IOPS can accumulate 180K IOPS/10 mins (300 * 60 * 10).
  • Larger volumes have higher baseline performance levels and accumulate I/O credits faster for e.g. 1 TiB has a baseline performance of 3000 IOPS
  • More credits the volume has for I/O, the more time it can burst beyond its baseline performance level and the better it performs when more performance is needed for e.g. 300 GiB volume with 180K I/O credit can burst @ 3000 IOPS for 1 minute (180K/3000)
  • Each volume receives an initial I/O credit balance of 5,400,000 I/O credits, which is enough to sustain the maximum burst performance of 3,000 IOPS for 30 minutes.
  • Initial credit balance is designed to provide a fast initial boot cycle for boot volumes and a good bootstrapping experience for other applications.
  • Each volume can accumulate I/O credits over a period of time which can be to burst to the required performance level, up to a max of 3,000 IOPS
  • Unused I/O credit cannot go beyond 54,00,000 I/O credits.

IOPS vs Volume size

  • Volumes till 1 TiB can burst up to 3000 IOPS over and above its baseline performance
  • Volumes larger than 1 TiB have a baseline performance that is already equal to or greater than the maximum burst performance, and their I/O credit balance never depletes.
  • Baseline performance cannot be beyond 10000 IOPS for General Purpose SSD volumes and this limit is reached @ 3333 GiB

IOPS vs Volume Size

Baseline Performance

  • Formula – 3 IOPS i.e. GiB * 3
  • Calculation example
    • 1 GiB volume size =  3 IOPS (1 * 3 IOPS)
    • 250 GiB volume size = 750 IOPS (250* 3 IOPS)

Maximum burst duration @ 3000 IOPS

  • How much time can 5400000 IO credit be sustained @ the burst performance of 3000 IOPS. Subtract the baseline performance from 3000 IOPS which would be contributed by the volume size
  • Formula – 5400000/(3000 – Baseline performance)
  • Calculation example
    • 1 GiB volume size @ 3000 IOPS with 5400000 the burst performance can be maintained for 5400000/(3000-3) = 1802 secs
    • 250 GiB volume size @ 3000 IOPS with 5400000 the burst performance can be maintained for 5400000/(3000-3*250) = 2400 secs

Time to fill the 5400000 I/O credit balance

  • Formula – 5400000/Baseline performance
  • Calculation
    • 1 GiB volume size @ 3 IOPS would require 5400000/3 = 1800000 secs
    • 250 GiB volume size @ 750 IOPS would require 5400000/750 = 7200 secs

Provisioned IOPS SSD (io1/io2) Volumes

  • are designed to meet the needs of I/O intensive workloads, particularly database workloads, that are sensitive to storage performance and consistency in random access I/O throughput.
  • IOPS rate can be specified when the volume is created, and EBS delivers within 10% of the provisioned IOPS performance 99.9% of the time over a given year.
  • can range in size from 4 GiB to 16 TiB
  • have a throughput limit range of 256 KiB for each IOPS provisioned, up to a maximum of 320 500 MiB/s (at 32000 IOPS)
  • can be provision up to 20,000 32,000 64,000 IOPS per volume.
  • Ratio of IOPS provisioned to the volume size requested can be a maximum of 30 50; e.g., a volume with 5,000 IOPS must be at least 100 GiB.
  • can be striped together in a RAID configuration for larger size and greater performance over 20000 IOPS

Hard disk drives (HDD-backed) volumes

Hard disk drives (HDD-backed) volumes

Throughput Optimized HDD (st1) Volumes

  • provide low-cost magnetic storage that defines performance in terms of throughput rather than IOPS.
  • is a good fit for large, sequential workloads such as EMR, ETL, data warehouses, and log processing
  • do not support Bootable sc1 volumes
  • are designed to support frequently accessed data
  • uses a burst-bucket model for performance similar to gp2. Volume size determines the baseline throughput of the volume, which is the rate at which the volume accumulates throughput credits. Volume size also determines the burst throughput of your volume, which is the rate at which you can spend credits when they are available.

Cold HDD (sc1) Volumes

  • provide low-cost magnetic storage that defines performance in terms of throughput rather than IOPS.
  • With a lower throughput limit than st1, sc1 is a good fit ideal for large, sequential cold-data workloads.
  • ideal for infrequent access to data and are looking to save costs, sc1 provides inexpensive block storage
  • do not support Bootable sc1 volumes
  • though are similar to Throughput Optimized HDD (st1) volumes, are designed to support infrequently accessed data.
  • uses a burst-bucket model for performance similar to gp2. Volume size determines the baseline throughput of the volume, which is the rate at which the volume accumulates throughput credits. Volume size also determines the burst throughput of your volume, which is the rate at which you can spend credits when they are available.

Magnetic Volumes (standard)

Magnetic volumes provide the lowest cost per gigabyte of all EBS volume types. Magnetic volumes are backed by magnetic drives and are ideal for workloads performing sequential reads, workloads where data is accessed infrequently, and scenarios where the lowest storage cost is important.

  • Magnetic volumes can range in size from1 GiB to 1 TiB
  • These volumes deliver approximately 100 IOPS on average, with burst capability of up to hundreds of IOPS
  • Magnetic volumes can be striped together in a RAID configuration for larger size and greater performance.

EBS Volume Types (Previous Generation – Reference Only)

EBS Volume Types Comparision

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are designing an enterprise data storage system. Your data management software system requires mountable disks and a real filesystem, so you cannot use S3 for storage. You need persistence, so you will be using AWS EBS Volumes for your system. The system needs as low-cost storage as possible, and access is not frequent or high throughput, and is mostly sequential reads. Which is the most appropriate EBS Volume Type for this scenario?
    1. gp1
    2. io1
    3. standard (Standard or Magnetic volumes are suited for cold workloads where data is infrequently accessed, or scenarios where the lowest storage cost is important)
    4. gp2
  2. Which EBS volume type is best for high performance NoSQL cluster deployments?
    1. io1 (io1 volumes, or Provisioned IOPS (PIOPS) SSDs, are best for: Critical business applications that require sustained IOPS performance, or more than 10,000 IOPS or 160 MiB/s of throughput per volume, like large database workloads, such as MongoDB.)
    2. gp1
    3. standard
    4. gp2
  3. Provisioned IOPS Costs: you are charged for the IOPS and storage whether or not you use them in a given month.
    1. FALSE
    2. TRUE
  4. A user is trying to create a PIOPS EBS volume with 8 GB size and 450 IOPS. Will AWS create the volume?
    1. Yes, since the ratio between EBS and IOPS is less than 50
    2. No, since the PIOPS and EBS size ratio is less than 50
    3. No, the EBS size is less than 10 GB
    4. Yes, since PIOPS is higher than 100
  5. A user has provisioned 2000 IOPS to the EBS volume. The application hosted on that EBS is experiencing fewer IOPS than provisioned. Which of the below mentioned options does not affect the IOPS of the volume?
    1. The application does not have enough IO for the volume
    2. Instance is EBS optimized
    3. The EC2 instance has 10 Gigabit Network connectivity
    4. Volume size is too large
  6. A user is trying to create a PIOPS EBS volume with 6000 IOPS and 100 GB size. AWS does not allow the user to create this volume. What is the possible root cause for this?
    1. The ratio between IOPS and the EBS volume is higher than 50
    2. The maximum IOPS supported by EBS is 3000
    3. The ratio between IOPS and the EBS volume is lower than 100
    4. PIOPS is supported for EBS higher than 500 GB size

References

AWS EC2 Network – Enhanced Networking

EC2 Enhanced Networking

  • Enhanced networking results in higher bandwidth, higher packet per second (PPS) performance, lower latency, consistency, scalability and lower jitter
  • EC2 provides enhanced networking capabilities using single root I/O virtualization (SR-IOV) only on supported instance types
    • SR-IOV is a method of device virtualization that provides higher I/O performance and lower CPU utilization
  • Amazon Linux AMIs and Windows Server 2012 R2 AMI already have the module installed with the attributes set and do not require any additional configurations.
  • It can be enabled for other OS distributions by installing the module with the correct attributes configured
  • Enhanced Networking is supported using
    • Elastic Network Adapter (ENA)
      • The Elastic Network Adapter (ENA) supports network speeds of up to 100 Gbps for supported instance types.
      • The current generation instances use ENA for enhanced networking, except for C4, D2, and M4 instances smaller than m4.16xlarge.
    • Intel 82599 Virtual Function (VF) interface
      • The Intel 82599 Virtual Function interface supports network speeds of up to 10 Gbps for supported instance types.
      • supported instance types: C3, C4, D2, I2, M4 (excl. m4.16xlarge), and R3.

VF Enhanced Networking Key Requirements

  • VPC, as enhanced networking can’t be enabled for instance in EC2-Classic
  • an HVM virtualization type AMI
  • Instance kernel version
    • Linux kernel version of 2.6.32+
    • Windows: Server 2008 R2+
  • Appropriate Virtual Function (VF) driver
    • Linux – should have the ixgbevf module installed and that sriovNetSupport attribute set for the instance
    • Windows- Intel 82599 Virtual Function driver
  • supported instance types: C3, C4, D2, I2, M4 (excl. m4.16xlarge), and R3.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You have multiple Amazon EC2 instances running in a cluster across multiple Availability Zones within the same region. What combination of the following should be used to ensure the highest network performance (packets per second), lowest latency, and lowest jitter? Choose 3 answers
    1. Amazon EC2 placement groups (would not work for multiple AZs)
    2. Enhanced networking (provides network performance, lowest latency)
    3. Amazon PV AMI (Requires HVM)
    4. Amazon HVM AMI (Requires HVM)
    5. Amazon Linux (Can be on others as well)
    6. Amazon VPC (works only in VPC, can’t enable enhanced networking if the instance is in EC2-Classic)
  2. A group of researchers is studying the migration pattern of a beetle that eats and destroys gram. The researchers must process massive amounts of data and run statistics. Which one of the following options provides the high performance computing for this purpose.
    1. Configure an Autoscaling Scaling group to launch dozens of spot instances to run the statistical analysis simultaneously
    2. Launch AMI instances that support SR-IOV in a single Availability Zone
    3. Launch compute optimized (C4) instances in at least two Availability Zones
    4. Launch enhanced network type instances in a placement group

References

AWS Web Application Firewall – WAF

AWS Web Application Firewall – WAF

  • AWS WAF – Web Application Firewall protects web applications from attacks by allowing rules configuration that allow, block, or monitor (count) web requests based on defined conditions.
  • helps protects from common attack techniques like SQL injection and Cross-Site Scripting (XSS), Conditions based include IP addresses, HTTP headers, HTTP body, and URI strings.
  • tightly integrates with CloudFront, API Gateway, AppSync, and the Application Load Balancer (ALB) services used to deliver content for their websites and applications.
    • AWS WAF with Amazon CloudFront
      • AWS WAF rules run in all AWS Edge Locations, located around the world close to the end users.
      • Blocked requests are stopped before they reach the web servers.
      • Helps support custom origins outside of AWS.
    • AWS WAF with Application Load Balancer
      • WAF rules run in the region and can be used to protect internet-facing as well as internal load balancers.
    • AWS WAF with API Gateway
      • Can help secure and protect the REST APIs.
  • helps protect applications and can inspect web requests transmitted over HTTP or HTTPS.
  • provides Managed Rules which are pre-configured rules to protect applications from common threats like application vulnerabilities like OWASP, bots, or Common Vulnerabilities and Exposures (CVE).
  • logs can be sent to the CloudWatch Logs log group, an S3 bucket, or Kinesis Data Firehose.

WAF Benefits

  • Additional protection against web attacks using specified conditions
  • Conditions can be defined by using characteristics of web requests such as the following:
    • IP addresses that the requests originate from
    • Values in request headers
    • Strings that appear in the requests
    • Length of requests
    • Presence of SQL code that is likely to be malicious (this is known as SQL injection)
    • Presence of a script that is likely to be malicious (this is known as cross-site scripting)
  • Managed Rules to get you started quickly
  • Rules that you can reuse for multiple web applications
  • Real-time metrics and sampled web requests
  • Automated administration using the WAF API

How WAF Works

WAF allows controlling the behaviour of web requests by creating conditions, rules, and web access control lists (web ACLs).

WAF Works

Conditions

  • Conditions define basic characteristics to watch for in a web request
    • Malicious script – XSS  (Cross Site Scripting) – Attackers embed scripts that can exploit vulnerabilities in web applications
    • IP addresses or address ranges that requests originate from.
    • Size – Length of specified parts of the request, such as the query string.
    • Malicious SQL – SQL injection – Attackers try to extract data from the database by embedding malicious SQL code in a web request
    • Geographic match – Allow or block requests based on the country from which the requests originate.
    • Strings that appear in the request, for e.g., values that appear in the User-Agent header or text strings that appear in the query string.
      Some conditions take multiple values.

Actions

  • Allow all requests except the ones specified – blacklisting for e.g all IP addresses except the ones specified
  • Block all requests except the ones specified – whitelisting for e.g IP addresses the request originates from
  • Monitor (Count) the requests that match the specified properties – allows counting of the requests that match the defined properties, which can be useful when configuring and testing allow or block requests using new properties. After confirming that the config did not accidentally block all of the traffic to the website, the configuration can be applied to change the behaviour to allow or block requests.
  • CAPTCHA – runs a CAPTCHA check against the request.

Rules

  • AWS WAF rule defines how to inspect HTTP(S) web requests and the action to take on a request when it matches the inspection criteria.
  • Each rule requires one top-level rule statement, which might contain nested statements at any depth, depending on the rule and statement type.
  • AWS WAF also supports logical statements for ANDOR, and NOT that you use to combine statements in a rule. for e.g.,
    • based on recent requests that you’ve seen from an attacker, you might create a rule that includes the following conditions with logical AND condition:
      • The requests come from 192.0.2.44.
      • They contain the value BadBot in the User-Agent header.
      • They appear to include malicious SQL code in the query string.
    • All 3 conditions should be satisfied for the Rule to be passed and the associated action to be taken.

Rule Groups

  • A Rule Group is a reusable set of rules that can be added to a Web ACL.
  • Rule groups fall into the following main categories
    • Managed rule groups, which AWS Managed Rules and AWS Marketplace sellers create and maintain for you
    • Your own rule groups, which you create and maintain
    • Rule groups that are owned and managed by other services, like AWS Firewall Manager and Shield Advanced.

Web ACLs – Access Control Lists

  • A Web Access Control List – Web ACL provides fine-grained control over all of the HTTP(S) web requests that the protected resource responds to.
  • Web ACLs provides
    • Rule Groups OR Combination of Rules
    • Action – allow, block or count to perform for each rule
      • WAF compares a request with the rules in a web ACL in the order in which it is listed and takes the action that is associated with the first rule that the request matches.
      • For multiple rules in a web ACL, WAF evaluates each request against the rules in the order they are listed in the web ACL.
      • When a web request matches all of the conditions in a rule, WAF immediately takes the action – allow or block – and doesn’t evaluate the request against the remaining rules in the web ACL, if any.
    • Default action
      • determines whether WAF allows or blocks a request that does not match all of the conditions in any of the rules
  • Supports criteria like the following to allow or block requests
    • IP address origin of the request
    • Country of origin of the request
    • String match or regular expression (regex) match in a part of the request
    • Size of a particular part of the request
    • Detection of malicious SQL code or scripting
    • Rate based rules

AWS WAF based Architecture

AWS WAF Blacklist Example
  1. AWS WAF integration with CloudFront and Lambda to dynamically update WAF rules
  2. CloudFront receives requests on behalf of the web application, it sends access logs to an S3 bucket that contains detailed information about the requests.
  3. For every new access log stored in the S3 bucket, a Lambda function is triggered. The Lambda function parses the log files and looks for requests that resulted in error codes 400, 403, 404, and 405.
  4. Lambda function then counts the number of bad requests and temporarily stores results in the S3 bucket
  5. Lambda function updates AWS WAF rules to block the IP addresses for a period of time that you specify.
  6. After this blocking period has expired, AWS WAF allows those IP addresses to access your application again, but continues to monitor the requests from those IP addresses.
  7. Lambda function publishes execution metrics in CloudWatch, such as the number of requests analyzed and IP addresses blocked.
  8. CloudWatch metrics can be integrated with SNS for notification

Web Application Firewall Sandwich Architecture

NOTE :- From DDOS Resiliency Whitepaper – doesn’t use the AWS WAF and is not valid anymore

WAF Sandwich Architecture

  • DDoS attacks at the application layer commonly target web applications with lower volumes of traffic compared to infrastructure attacks.
  • WAF can be included as part of the infrastructure to mitigate these types of attacks
  • WAFs act as filters that apply a set of rules to web traffic, which cover exploits like XSS and SQL injection but can also help build resiliency against DDoS by mitigating HTTP GET or POST floods.
  • HTTP works as a request-response protocol between end users and applications where end users request data (GET) and submit data to be processed (POST). GET floods work by requesting the same URL at a high rate or requesting all objects from your application. POST floods work by finding expensive application processes, i.e., logins or database searches, and triggering those process to overwhelm your application.
  • WAFs have several features that may prevent these types of attacks from affecting the application availability for e.g. HTTP rate limiting which limits the number of requests per end user within a certain time period. Once the threshold is exceeded, WAFs can block or buffer new requests to ensure other end users have access to the application.
  • WAFs can also inspect HTTP requests and identify those that don’t confirm to normal patterns
  • In the “WAF sandwich,” the EC2 instance running the WAF software (not the AWS WAF) is included in an Auto Scaling group and placed in between two ELB load balancers. Basic load balancer in the default VPC will be the frontend, public facing load balancer that will distribute all incoming traffic to the WAF EC2 instance.
  • With WAF sandwich pattern, the instance can scale and add additional WAF EC2 instances should the traffic spike to elevated levels.
  • Once the traffic has been inspected and filtered, the WAF EC2 instance forwards traffic to the internal, backend load balancer which then distributes traffic across the application EC2 instance.
  • This configuration allows the WAF EC2 instances to scale and meet capacity demands without affecting the availability of your application EC2 instance.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. The Web Application Development team is worried about malicious activity from 200 random IP addresses. Which action will ensure security and scalability from this type of threat?
    1. Use inbound security group rules to block the IP addresses.
    2. Use inbound network ACL rules to block the IP addresses.
    3. Use AWS WAF to block the IP addresses.
    4. Write iptables rules on the instance to block the IP addresses.
  2. You’ve been hired to enhance the overall security posture for a very large e-commerce site. They have a well architected multi-tier application running in a VPC that uses ELBs in front of both the web and the app tier with static assets served directly from S3. They are using a combination of RDS and DynamoDB for their dynamic data and then archiving nightly into S3 for further processing with EMR. They are concerned because they found questionable log entries and suspect someone is attempting to gain unauthorized access. Which approach provides a cost effective scalable mitigation to this kind of attack? [Old Exam Question]
    1. Recommend mat they lease space at a DirectConnect partner location and establish a 1G DirectConnect connection to their VPC they would then establish Internet connectivity into their space, filter the traffic in hardware Web Application Firewall (WAF). And then pass the traffic through the DirectConnect connection into their application running in their VPC. (Not cost effective)
    2. Add previously identified hostile source IPs as an explicit INBOUND DENY NACL to the web tier subnet. (does not protect against new source)
    3. Add a WAF tier by creating a new ELB and an AutoScaling group of EC2 Instances running a host-based WAF. They would redirect Route 53 to resolve to the new WAF tier ELB. The WAF tier would then pass the traffic to the current web tier. Web tier Security Groups would be updated to only allow traffic from the WAF tier Security Group
    4. Remove all but TLS 1.2 from the web tier ELB and enable Advanced Protocol Filtering This will enable the ELB itself to perform WAF functionality. (No advanced protocol filtering in ELB)

References

AWS_WAF