AWS S3 vs EBS vs EFS

June 18, 2020 ~ Last updated on : September 26, 2021 ~ jayendrapatil ~ 2 Comments

S3 vs EBS vs EFS

EFS, EBS, and S3 are AWS’ three different storage types that are applicable for different types of workload needs

S3 vs EBS vs EFS Comparision

S3 vs EBS vs EFS

Simple Storage Service – S3

is an object store with a simple key, value store design, and good at storing vast numbers of backups or user files.

offers pay for the storage you actually use. Offers cost-saving storage classes ideal for infrequently access data or for data archival
provides unlimited storage
provides durability as the data is replicated and stored across at least three geographically dispersed AZs with a maximum of 99.999999999% (11! 9’s)

provide high availability with a maximum of 99.99%
provides security with a range of access control mechanisms and abilities to encrypt data at rest and in transit
data can be accessed programmatically or directly from services such as AWS CloudFront.

provides backup capability using versioning and cross-region replication

Elastic Block Storage – EBS

delivers high-availability block-level storage volumes for EC2 instances.
offers pay for the provisioned storage, even if you do not use it

provides limited storage capability and cannot scale infinitely
stores data on a file system which can be retained after the EC2 instance is shut down.
provides durability by replicating data across multiple servers in an AZ to prevent the loss of data from the failure of any single component

designed for 99.999% availability
provides low-latency performance – using SSD EBS volumes, it offers reliable I/O performance scaled to meet your workload needs.
provides secure storage with access control and providing data at rest and in transit encryption

is only accessible from a single EC2 instance in the particular AWS region and AZ
provides Multi-Attach option to share storage across multiple EC2 instances, but within a particular AWS region and AZ
provides backup capability using backups and snapshots

Elastic File Storage – EFS

scalable file storage, also optimized for EC2.
offers pay for the storage you actually use. There’s no advance provisioning, up-front fees, or commitments
multiple instances can be configured to mount the file system.

allows mounting the file system across multiple regions and instances.
is designed to be highly durable and highly available. Data is redundantly stored across multiple AZs.
provides elasticity – scales up and down automatically, even to meet the most abrupt workload spikes.

provides performance that scales to support any workload: EFS offers the throughput changing workloads need. It can provide higher throughput in spurts that match sudden file system growth, even for workloads up to 500,000 IOPS or 10 GB per second.
provides accessible file storage, which can be accessed by On-premises servers and EC2 instances concurrently.
provides security and compliance – access to the file system can be secured with the current security solution, or control access to EFS file systems using IAM, VPC, or POSIX permissions.

provides data encryption in transit or at rest.
allows EC2 instances to access EFS file systems located in other AWS regions through VPC peering.
a file system can be accessed concurrently from all AZs in the region where it is located, which means the application can be architected to failover from one AZ to other AZs in the region in order to ensure the highest level of application availability. Mount targets themselves are designed to be highly available.

used as a common data source for any application or workload that runs on numerous instances.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A company runs an application on a group of Amazon Linux EC2 instances. The application writes log files using standard API calls. For compliance reasons, all log files must be retained indefinitely and will be analyzed by a reporting tool that must access all files concurrently. Which storage service should a solutions architect use to provide the MOST cost-effective solution?
1. Amazon EBS
2. Amazon EFS
3. Amazon EC2 instance store
4. Amazon S3

A new application is being deployed on Amazon EC2. The Application needs to read write upto 3 TB of data to an external data store and requires read-after-write consistency across all AWS regions for writing new objects into this data store.
1. Amazon EBS
2. Amazon Glacier
3. Amazon EFS
4. Amazon S3
To meet the requirements of an application, an organization needs to save a constantly increasing volume of files on a cloud storage system with the following features and abilities. What below AWS service will meet these requirements?
1. 1. Pay only for the storage used
  2. Create different security policies for different groups of files
  3. Allow access to the public
  4. Retrieve the files at any time
  5. Store an unlimited number of files
2. Amazon EBS
3. Amazon S3
4. Amazon Glacier
5. Amazon EFS
An administrator runs a highly available application in AWS. A file storage layer is needed that can share between instances and scale the platform more easily. The storage should also be POSIX compliant. Which AWS service can perform this action?
1. Amazon EBS
2. Amazon S3
3. Amazon EFS
4. Amazon EC2 Instance store

Reference

AWS_When_to_choose_EFS

AWS Elastic Load Balancer – ELB

March 18, 2020 ~ Last updated on : August 11, 2022 ~ jayendrapatil ~ 62 Comments

AWS Elastic Load Balancer – ELB

Elastic Load Balancer allows the incoming traffic to be distributed automatically across multiple healthy EC2 instances.

ELB serves as a single point of contact for the client.
ELB helps to be transparent and increases the application availability by allowing the addition or removal of multiple EC2 instances across one or more AZs, without disrupting the overall flow of information.

ELB benefits
- is a distributed system that is fault-tolerant and actively monitored
- abstracts out the complexity of managing, maintaining, and scaling load balancers
- serves as the first line of defence against attacks on the network
- can offload the work of encryption and decryption (SSL termination) so that the EC2 instances can focus on their main work
- offers integration with Auto Scaling, which ensures enough back-end capacity available to meet varying traffic levels
- are engineered to not be a single point of failure
Elastic Load Balancer, by default, routes each request independently to the registered instance with the smallest load.
ELB automatically reroutes the traffic to the remaining running healthy EC2 instances, if an EC2 instance fails. If a failed EC2 instance is restored, ELB restores the traffic to that instance.

Load Balancers are regional only work across AZs within a region

Application Load Balancer – ALB

Refer to Blog Post @ Application Load Balancer

Network Load Balancer – NLB

Refer to Blog Post @ Network Load Balancer

Gateway Load Balancer – GWLB

Refer to Blog Post @ Gateway Load Balancer

Classic Load Balancer vs Application Load Balancer vs Network Load Balancer

Refer Blog Post @ Classic Load Balancer vs Application Load Balancer vs Network Load Balancer

Elastic Load Balancer Features

Following ELB key concepts apply to all the Elastic Load Balancer types

Scaling ELB

Each ELB is allocated and configured with a default capacity.
ELB Controller is the service that stores all the configurations and also monitors the load balancer and manages the capacity that is used to handle the client requests.
As the traffic profile changes, the controller service scales the load balancers to handle more requests, scaling equally in all AZs.

ELB increases its capacity by utilizing either larger resources (scale up – resources with higher performance characteristics) or more individual resources (scale-out).
AWS handles the scaling of the ELB capacity and this scaling is different to the scaling of the EC2 instances to which the ELB routes its request, which is dealt with by Auto Scaling.
Time required for Elastic Load Balancing to scale can range from 1 to 7 minutes, depending on the changes in the traffic profile

When an Availability Zone is enabled for the load balancer, Elastic Load Balancing creates a load balancer node in the Availability Zone.
By default, each load balancer node distributes traffic across the registered targets in its Availability Zone only.

Pre-Warming ELB

NOTE – AWS documentation does not include Pre-warming now

~~ELB works best with a gradual increase in traffic~~
~~AWS is able to scale automatically and handle a vast majority of use cases~~
However, in certain scenarios, if there is a flash traffic spike expected or a load test cannot be configured to gradually increase traffic, recommended contacting AWS support to have the load balancer “pre-warmed”

~~AWS will help Pre-warming the ELB, by configuring the load balancer to have the appropriate level of capacity based on the expected traffic~~
~~AWS would need the information for the start, end dates, and expected request rate per second with the total size of request/response.~~

DNS Resolution

ELB is scaled automatically depending on the traffic profile.

When scaled, the Elastic Load Balancing service will update the Domain Name System (DNS) record of the load balancer so that the new resources have their respective IP addresses registered in DNS.
DNS record created includes a Time-to-Live (TTL) setting of 60 seconds
By default, ELB will return multiple IP addresses when clients perform a DNS resolution, with the records being randomly ordered on each DNS resolution request.

It is recommended that clients will re-lookup the DNS at least every 60 seconds to take advantage of the increased capacity

Load Balancer Types

Internet Load Balancer
- An Internet-facing load balancer takes requests from clients over the Internet and distributes them across the EC2 instances that are registered with the load balancer.

Internal Load Balancer –
- An Internal load balancer routes traffic to EC2 instances in private subnets.

Availability Zones/Subnets

Elastic Load Balancer should have at least one subnet attached.

Elastic Load Balancing allows subnets to be added and creates a load balancer node in each of the Availability Zone where the subnet resides.
Only one subnet per AZ can be attached to the ELB. Attaching a subnet with an AZ already attached replaces the existing subnet
Each Subnet must have a CIDR block with at least a /27 bitmask and has at least 8 free IP addresses, which ELB uses to establish connections with the back-end instances.

For High Availability, it is recommended to attach one subnet per AZ for at least two AZs, even if the instances are in a single subnet.
Subnets can be attached or detached from the ELB and it would start or stop sending requests to the instances in the subnet accordingly

Security Groups & NACL

Security groups & NACLs should allow Inbound traffic, on the load balancer listener port, from the Client for an Internet ELB or VPC CIDR for an Internal ELB

Security groups & NACLs should allow Outbound traffic to the back-end instances on both the instance listener port and the health check port
NACLs, in addition, should allow responses on the ephemeral ports
All EC2 instances should allow incoming traffic from ELB

SSL Negotiation Configuration

For HTTPS load balancers, Elastic Load Balancing uses a Secure Socket Layer (SSL) negotiation configuration, known as a security policy, to negotiate SSL connections between a client and the load balancer.
A security policy is a combination of SSL protocols, SSL ciphers, and the Server Order Preference option
- Elastic Load Balancing supports the following versions of the SSL protocol TLS 1.2, TLS 1.1, TLS 1.0, SSL 3.0, ~~SSL 2.0~~ (deprecated now)
- SSL protocols use several SSL ciphers to encrypt data over the Internet.
- An SSL cipher is an encryption algorithm that uses encryption keys to create a coded message. SSL protocols use several SSL ciphers to encrypt data over the internet.
- Elastic Load Balancing supports the Server Order Preference option for negotiating connections between a client and a load balancer.
- During the SSL connection negotiation process, this allows the load balancer to control and select the first cipher in its list that is in the client’s list of ciphers instead of the default behaviour of checking to match the first cipher in the client’s list with the server’s list.
Elastic Load Balancer allows using Predefined Security Policies or creating a Custom Security Policy for specific needs. If none is specified, ELB selects the latest Predefined Security Policy.
Elastic Load Balancer supports multiple certificates using Server Name Indication (SNI)
- If the hostname provided by a client matches a single certificate in the certificate list, the load balancer selects this certificate.
- If a hostname provided by a client matches multiple certificates in the certificate list, the load balancer selects the best certificate that the client can support.
Classic Load Balancer does not support multiple certificates

ALB and NLB support multiple certificates

Health Checks

Load balancer performs health checks on all registered instances, whether the instance is in a healthy state or an unhealthy state.
Load balancer performs health checks to discover the availability of the EC2 instances and periodically sends pings, attempts connections, or sends requests to health check the EC2 instances.

Health check is InService for the status of healthy instances and OutOfService for unhealthy ones.
Load balancer sends a request to each registered instance at the Ping Protocol, Ping Port and Ping Path every HealthCheck Interval seconds. It waits for the instance to respond within the Response Timeout period. If the health checks exceed the Unhealthy Threshold for consecutive failed responses, the load balancer takes the instance out of service. When the health checks exceed the Healthy Threshold for consecutive successful responses, the load balancer puts the instance back in service.
Load balancer only sends requests to the healthy EC2 instances and stops routing requests to the unhealthy instances

All ELB types support health checks

Listeners

Listeners are the process that checks for connection requests from client
Listeners are configured with a protocol and a port for front-end (client to load balancer) connections, and a protocol and a port for back-end (load balancer to back-end instance) connections.

Listeners support HTTP, HTTPS, SSL, and TCP protocols
An X.509 certificate is required for HTTPS or SSL connections and the load balancer uses the certificate to terminate the connection and then decrypt requests from clients before sending them to the back-end instances.
If you want to use SSL, but don’t want to terminate the connection on the load balancer, use TCP for connections from the client to the load balancer, use the SSL protocol for connections from the load balancer to the back-end application, and deploy certificates on the back-end instances handling requests.

If you use an HTTPS/SSL connection for the back end, you can enable authentication on the back-end instance. This authentication can be used to ensure that back-end instances accept only encrypted communication, and to ensure that the back-end instance has the correct certificates.
ELB HTTPS listener does not support Client-Side SSL certificates

Idle Connection Timeout

For each request that a client makes through a load balancer, it maintains two connections, for each client request, one connection is with the client, and the other connection is to the back-end instance.

For each connection, the load balancer manages an idle timeout that is triggered when no data is sent over the connection for a specified time period. If no data has been sent or received, it closes the connection after the idle timeout period (defaults to 60 seconds) has elapsed
For lengthy operations, such as file uploads, the idle timeout setting for the connections should be adjusted to ensure that lengthy operations have time to complete.

X-Forwarded Headers & Proxy Protocol Support

As the Elastic Load Balancer intercepts the traffic between the client and the back-end servers, the back-end server does not know the IP address, Protocol, and the Port used between the Client and the Load balancer.

ELB provides X-Forwarded headers support to help back-end servers track the same when using the HTTP protocol
- X-Forwarded-For request header to help back-end servers identify the IP address of a client when you use an HTTP or HTTPS load balancer.
- X-Forwarded-Proto request header to help back-end servers identify the protocol (HTTP/S) that a client used to connect to the server
- X-Forwarded-Port request header to help back-end servers identify the port that an HTTP or HTTPS load balancer uses to connect to the client.
ELB provides Proxy Protocol support to help back-end servers track the same when using non-HTTP protocol or when using HTTPS and not terminating the SSL connection on the load balancer.
- Proxy Protocol is an Internet protocol used to carry connection information from the source requesting the connection to the destination for which the connection was requested.
- Elastic Load Balancing uses Proxy Protocol version 1, which uses a human-readable header format with connection information such as the source IP address, destination IP address, and port numbers
- If the ELB is already behind a Proxy with the Proxy protocol enabled, enabling the Proxy Protocol on ELB would add the header twice

Cross-Zone Load Balancing

By default, the load balancer distributes incoming requests evenly across its enabled Availability Zones for e.g. If AZ-a has 5 instances and AZ-b has 2 instances, the load will still be distributed 50% across each of the AZs

Enabling Cross-Zone load balancing allows the ELB to distribute incoming requests evenly across all the back-end instances, regardless of the AZ
Elastic Load Balancing creates a load balancer node in the AZ. By default, each load balancer node distributes traffic across the registered targets in its AZ only. If you enable cross-zone load balancing, each load balancer node distributes traffic across the registered targets in all enabled AZs.
Cross-zone load balancer reduces the need to maintain equivalent numbers of back-end instances in each AZ and improves the application’s ability to handle the loss of one or more back-end instances.

It is still recommended to maintain approximately equivalent numbers of instances in each Availability Zone for higher fault tolerance.
With cross-zone load balancing, each load balancer node distributes traffic across the registered targets in all enabled Availability Zones.
ALB -> Cross Zone load balancing is enabled by default and free

CLB -> Cross Zone load balancing is disabled, by default, and can be enabled and free
NLB -> Cross Zone load balancing is disabled, by default, and can be enabled but charged for inter-az data transfer.

Connection Draining (Deregistration Delay)

By default, if a registered EC2 instance with the ELB is deregistered or becomes unhealthy, the load balancer immediately closes the connection

Connection draining can help the load balancer to complete the in-flight requests made while keeping the existing connections open, and preventing any new requests from being sent to the instances that are de-registering or unhealthy.
Connection draining helps perform maintenance such as deploying software upgrades or replacing back-end instances without affecting customers’ experience
Connection draining allows you to specify a maximum time (between 1 and 3,600 seconds and default 300 seconds) to keep the connections alive before reporting the instance as de-registered. The maximum timeout limit does not apply to connections to unhealthy instances.

If the instances are part of an Auto Scaling group and connection draining is enabled for your load balancer, Auto Scaling waits for the in-flight requests to complete, or for the maximum timeout to expire, before terminating instances due to a scaling event or health check replacement.

Sticky Sessions (Session Affinity)

ELB can be configured to use Sticky Session feature (also called session affinity) which enables it to bind a user’s session to an instance and ensures all requests are sent to the same instance.
Stickiness remains for a period of time which can be controlled by the application’s session cookie if one exists, or through a cookie, named AWSELB , created through Elastic Load balancer.

Sticky sessions for CLB and ALB are disabled, by default.
NLB does not support sticky sessions

Requirements

An HTTP/HTTPS load balancer.

SSL traffic should be terminated on the ELB.
ELB does session stickiness on an HTTP/HTTPS listener by utilizing an HTTP cookie. ELB has no visibility into the HTTP headers if the SSL traffic is not terminated on the ELB and is terminated on the back-end instance.
At least one healthy instance in each Availability Zone.

Duration-Based Session Stickiness

Duration-Based Session Stickiness is maintained by ELB using a special cookie created to track the instance for each request to each listener.
When the load balancer receives a request,
- it first checks to see if this cookie is present in the request. If so, the request is sent to the instance specified in the cookie.
- If there is no cookie, the ELB chooses an instance based on the existing load balancing algorithm and a cookie is inserted into the response for binding subsequent requests from the same user to that instance.
Stickiness policy configuration defines a cookie expiration, which establishes the duration of validity for each cookie.
Cookie is automatically updated after its duration expires.

Application-Controlled Session Stickiness

Load balancer uses a special cookie only to associate the session with the instance that handled the initial request, but follows the lifetime of the application cookie specified in the policy configuration.
Load balancer only inserts a new stickiness cookie if the application response includes a new application cookie. The load balancer stickiness cookie does not update with each request.
If the application cookie is explicitly removed or expires, the session stops being sticky until a new application cookie is issued.

If an instance fails or becomes unhealthy, the load balancer stops routing request to that instance, instead chooses a new healthy instance based on the existing load balancing algorithm.
The load balancer treats the session as now “stuck” to the new healthy instance, and continues routing requests to that instance even if the failed instance comes back.

Load Balancer Deletion

Deleting a load balancer does not affect the instances registered with the load balancer and they would continue to run

ELB with Autoscaling

Refer Blog Post @ ELB with Autoscaling

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A user has configured an HTTPS listener on an ELB. The user has not configured any security policy which can help to negotiate SSL between the client and ELB. What will ELB do in this scenario?
1. By default ELB will select the first version of the security policy
2. By default ELB will select the latest version of the policy
3. ELB creation will fail without a security policy
4. It is not required to have a security policy since SSL is already installed

A user has configured ELB with SSL using a security policy for secure negotiation between the client and load balancer. The ELB security policy supports various ciphers. Which of the below mentioned options helps identify the matching cipher at the client side to the ELB cipher list when client is requesting ELB DNS over SSL
1. Cipher Protocol
2. Client Configuration Preference
3. Server Order Preference
4. Load Balancer Preference
A user has configured ELB with SSL using a security policy for secure negotiation between the client and load balancer. Which of the below mentioned security policies is supported by ELB?
1. Dynamic Security Policy
2. All the other options
3. Predefined Security Policy
4. Default Security Policy
A user has configured ELB with SSL using a security policy for secure negotiation between the client and load balancer. Which of the below mentioned SSL protocols is not supported by the security policy?
1. TLS 1.3
2. TLS 1.2
3. SSL 2.0
4. SSL 3.0

A user has configured ELB with a TCP listener at ELB as well as on the back-end instances. The user wants to enable a proxy protocol to capture the source and destination IP information in the header. Which of the below mentioned statements helps the user understand a proxy protocol with TCP configuration?
1. If the end user is requesting behind a proxy server then the user should not enable a proxy protocol on ELB
2. ELB does not support a proxy protocol when it is listening on both the load balancer and the back-end instances
3. Whether the end user is requesting from a proxy server or directly, it does not make a difference for the proxy protocol
4. If the end user is requesting behind the proxy then the user should add the “isproxy” flag to the ELB Configuration
A user has enabled session stickiness with ELB. The user does not want ELB to manage the cookie; instead he wants the application to manage the cookie. What will happen when the server instance, which is bound to a cookie, crashes?
1. The response will have a cookie but stickiness will be deleted
2. The session will not be sticky until a new cookie is inserted
3. ELB will throw an error due to cookie unavailability
4. The session will be sticky and ELB will route requests to another server as ELB keeps replicating the Cookie
A user has created an ELB with Auto Scaling. Which of the below mentioned offerings from ELB helps the user to stop sending new requests traffic from the load balancer to the EC2 instance when the instance is being deregistered while continuing in-flight requests?
1. ELB sticky session
2. ELB deregistration check
3. ELB connection draining
4. ELB auto registration Off
When using an Elastic Load Balancer to serve traffic to web servers, which one of the following is true?
1. Web servers must be publicly accessible
2. The same security group must be applied to both the ELB and EC2 instances
3. ELB and EC2 instance must be in the same subnet
4. ELB and EC2 instances must be in the same VPC
A user has configured Elastic Load Balancing by enabling a Secure Socket Layer (SSL) negotiation configuration known as a Security Policy. Which of the below mentioned options is not part of this secure policy while negotiating the SSL connection between the user and the client?
1. SSL Protocols
2. Client Order Preference
3. SSL Ciphers
4. Server Order Preference
A user has created an ELB with the availability zone us-east-1. The user wants to add more zones to ELB to achieve High Availability. How can the user add more zones to the existing ELB?
1. It is not possible to add more zones to the existing ELB
2. Only option is to launch instances in different zones and add to ELB
3. The user should stop the ELB and add zones and instances as required
4. The user can add zones on the fly from the AWS console
A user has launched an ELB which has 5 instances registered with it. The user deletes the ELB by mistake. What will happen to the instances?
1. ELB will ask the user whether to delete the instances or not
2. Instances will be terminated
3. ELB cannot be deleted if it has running instances registered with it
4. Instances will keep running
A Sys-admin has created a shopping cart application and hosted it on EC2. The EC2 instances are running behind ELB. The admin wants to ensure that the end user request will always go to the EC2 instance where the user session has been created. How can the admin configure this?
1. Enable ELB cross zone load balancing
2. Enable ELB cookie setup
3. Enable ELB sticky session
4. Enable ELB connection draining
A user has setup connection draining with ELB to allow in-flight requests to continue while the instance is being deregistered through Auto Scaling. If the user has not specified the draining time, how long will ELB allow inflight requests traffic to continue?
1. 600 seconds
2. 3600 seconds
3. 300 seconds
4. 0 seconds
A customer has a web application that uses cookie Based sessions to track logged in users. It is deployed on AWS using ELB and Auto Scaling. The customer observes that when load increases Auto Scaling launches new Instances but the load on the existing Instances does not decrease, causing all existing users to have a sluggish experience. Which two answer choices independently describe a behavior that could be the cause of the sluggish user experience?
1. ELB’s normal behavior sends requests from the same user to the same backend instance (its not by default)
2. ELB’s behavior when sticky sessions are enabled causes ELB to send requests in the same session to the same backend
3. A faulty browser is not honoring the TTL of the ELB DNS name (DNS TTL would only impact the ELB instances if scaled and not the EC2 instances to which the traffic is routed)
4. The web application uses long polling such as comet or websockets. Thereby keeping a connection open to a web server tor a long time
A customer has an online store that uses the cookie-based sessions to track logged-in customers. It is deployed on AWS using ELB and autoscaling. When the load increases, Auto scaling automatically launches new web servers, but the load on the web servers do not decrease. This causes the customers a poor experience. What could be causing the issue ?
1. ELB DNS records Time to Live is set too high (DNS TTL would only impact the ELB instances if scaled and not the EC2 instances to which the traffic is routed)
2. ELB is configured to send requests with previously established sessions
3. Website uses CloudFront which is keeping sessions alive
4. New Instances are not being added to the ELB during the Auto Scaling cool down period
You are designing a multi-platform web application for AWS. The application will run on EC2 instances and will be accessed from PCs, tablets and smart phones. Supported accessing platforms are Windows, MACOS, IOS and Android. Separate sticky session and SSL certificate setups are required for different platform types. Which of the following describes the most cost effective and performance efficient architecture setup?
1. Setup a hybrid architecture to handle session state and SSL certificates on-prem and separate EC2 Instance groups running web applications for different platform types running in a VPC.
2. Set up one ELB for all platforms to distribute load among multiple instance under it. Each EC2 instance implements all functionality for a particular platform.
3. Set up two ELBs. The first ELB handles SSL certificates for all platforms and the second ELB handles session stickiness for all platforms for each ELB run separate EC2 instance groups to handle the web application for each platform.
4. Assign multiple ELBs to an EC2 instance or group of EC2 instances running the common components of the web application, one ELB for each platform type. Session stickiness and SSL termination are done at the ELBs. (Session stickiness requires HTTPS listener with SSL termination on the ELB and ELB does not support multiple SSL certs so one is required for each cert)
You are migrating a legacy client-server application to AWS. The application responds to a specific DNS domain (e.g. www.example.com) and has a 2-tier architecture, with multiple application servers and a database server. Remote clients use TCP to connect to the application servers. The application servers need to know the IP address of the clients in order to function properly and are currently taking that information from the TCP socket. A Multi-AZ RDS MySQL instance will be used for the database. During the migration you can change the application code but you have to file a change request. How would you implement the architecture on AWS in order to maximize scalability and high availability?
1. File a change request to implement Proxy Protocol support In the application. Use an ELB with a TCP Listener and Proxy Protocol enabled to distribute load on two application servers in different AZs. (ELB with TCP listener and proxy protocol will allow IP to be passed )
2. File a change request to Implement Cross-Zone support in the application. Use an ELB with a TCP Listener and Cross-Zone Load Balancing enabled, two application servers in different AZs.
3. File a change request to implement Latency Based Routing support in the application. Use Route 53 with Latency Based Routing enabled to distribute load on two application servers in different AZs.
4. File a change request to implement Alias Resource support in the application Use Route 53 Alias Resource Record to distribute load on two application servers in different AZs.
A user has created an ELB with three instances. How many security groups will ELB create by default?
1. 3
2. 5
3. 2 (One for ELB to allow inbound and Outbound to listener and health check port of instances and One for the Instances to allow inbound from ELB)
4. 1
You have a web-style application with a stateless but CPU and memory-intensive web tier running on a cc2 8xlarge EC2 instance inside of a VPC The instance when under load is having problems returning requests within the SLA as defined by your business The application maintains its state in a DynamoDB table, but the data tier is properly provisioned and responses are consistently fast. How can you best resolve the issue of the application responses not meeting your SLA?
1. Add another cc2 8xlarge application instance, and put both behind an Elastic Load Balancer
2. Move the cc2 8xlarge to the same Availability Zone as the DynamoDB table (Does not improve the response time and performance)
3. Cache the database responses in ElastiCache for more rapid access (Data tier is responding fast)
4. Move the database from DynamoDB to RDS MySQL in scale-out read-replica configuration (Data tier is responding fast)
An organization has configured a VPC with an Internet Gateway (IGW). pairs of public and private subnets (each with one subnet per Availability Zone), and an Elastic Load Balancer (ELB) configured to use the public subnets. The applications web tier leverages the ELB, Auto Scaling and a Multi-AZ RDS database instance. The organization would like to eliminate any potential single points of failure in this design. What step should you take to achieve this organization’s objective?
1. Nothing, there are no single points of failure in this architecture.
2. Create and attach a second IGW to provide redundant internet connectivity. (VPC can be attached only 1 IGW)
3. Create and configure a second Elastic Load Balancer to provide a redundant load balancer. (ELB scales by itself with multiple availability zones configured with it)
4. Create a second multi-AZ RDS instance in another Availability Zone and configure replication to provide a redundant database. (Multi AZ requires 2 different AZ for setup and already has a standby)
Your application currently leverages AWS Auto Scaling to grow and shrink as load Increases/ decreases and has been performing well. Your marketing team expects a steady ramp up in traffic to follow an upcoming campaign that will result in a 20x growth in traffic over 4 weeks. Your forecast for the approximate number of Amazon EC2 instances necessary to meet the peak demand is 175. What should you do to avoid potential service disruptions during the ramp up in traffic?
1. Ensure that you have pre-allocated 175 Elastic IP addresses so that each server will be able to obtain one as it launches (max limit 5 EIP and a service request needs to be submitted)
2. Check the service limits in Trusted Advisor and adjust as necessary so the forecasted count remains within limits.
3. Change your Auto Scaling configuration to set a desired capacity of 175 prior to the launch of the marketing campaign (Will cause 175 instances to be launched and running but not gradually scale)
4. Pre-warm your Elastic Load Balancer to match the requests per second anticipated during peak demand (Does not need pre warming as the load is increasing steadily)
Which of the following features ensures even distribution of traffic to Amazon EC2 instances in multiple Availability Zones registered with a load balancer?
1. Elastic Load Balancing request routing
2. An Amazon Route 53 weighted routing policy (does not control traffic to EC2 instance)
3. Elastic Load Balancing cross-zone load balancing
4. An Amazon Route 53 latency routing policy (does not control traffic to EC2 instance)
Your web application front end consists of multiple EC2 instances behind an Elastic Load Balancer. You configured ELB to perform health checks on these EC2 instances, if an instance fails to pass health checks, which statement will be true?
1. The instance gets terminated automatically by the ELB (it is done by Autoscaling)
2. The instance gets quarantined by the ELB for root cause analysis.
3. The instance is replaced automatically by the ELB. (it is done by Autoscaling)
4. The ELB stops sending traffic to the instance that failed its health check
You have a web application running on six Amazon EC2 instances, consuming about 45% of resources on each instance. You are using auto-scaling to make sure that six instances are running at all times. The number of requests this application processes is consistent and does not experience spikes. The application is critical to your business and you want high availability at all times. You want the load to be distributed evenly between all instances. You also want to use the same Amazon Machine Image (AMI) for all instances. Which of the following architectural choices should you make?
1. Deploy 6 EC2 instances in one availability zone and use Amazon Elastic Load Balancer. (Single AZ will not provide High Availability)
2. Deploy 3 EC2 instances in one region and 3 in another region and use Amazon Elastic Load Balancer. (Different region, AMI would not be available unless copied)
3. Deploy 3 EC2 instances in one availability zone and 3 in another availability zone and use Amazon Elastic Load Balancer.
4. Deploy 2 EC2 instances in three regions and use Amazon Elastic Load Balancer. (Different region, AMI would not be available unless copied)
You are designing an SSL/TLS solution that requires HTTPS clients to be authenticated by the Web server using client certificate authentication. The solution must be resilient. Which of the following options would you consider for configuring the web server infrastructure? (Choose 2 answers)
1. Configure ELB with TCP listeners on TCP/443. And place the Web servers behind it. (terminate SSL on the instance using client-side certificate)
2. Configure your Web servers with EIPs. Place the Web servers in a Route53 Record Set and configure health checks against all Web servers. (Remove ELB and use Web Servers directly with Route 53)
3. Configure ELB with HTTPS listeners, and place the Web servers behind it. (ELB with HTTPs does not support Client-Side certificates)
4. Configure your web servers as the origins for a CloudFront distribution. Use custom SSL certificates on your CloudFront distribution (CloudFront does not Client-Side ssl certificates)
You are designing an application that contains protected health information. Security and compliance requirements for your application mandate that all protected health information in the application use encryption at rest and in transit. The application uses a three-tier architecture where data flows through the load balancer and is stored on Amazon EBS volumes for processing, and the results are stored in Amazon S3 using the AWS SDK. Which of the following two options satisfy the security requirements? Choose 2 answers
1. Use SSL termination on the load balancer, Amazon EBS encryption on Amazon EC2 instances, and Amazon S3 with server-side encryption. (connection between ELB and EC2 not encrypted)
2. Use SSL termination with a SAN SSL certificate on the load balancer, Amazon EC2 with all Amazon EBS volumes using Amazon EBS encryption, and Amazon S3 with server-side encryption with customer-managed keys.
3. Use TCP load balancing on the load balancer, SSL termination on the Amazon EC2 instances, OS-level disk encryption on the Amazon EBS volumes, and Amazon S3 with server-side encryption.
4. Use TCP load balancing on the load balancer, SSL termination on the Amazon EC2 instances, and Amazon S3 with server-side encryption. (Does not mention EBS encryption)
5. Use SSL termination on the load balancer, an SSL listener on the Amazon EC2 instances, Amazon EBS encryption on EBS volumes containing PHI, and Amazon S3 with server-side encryption.
A startup deploys its photo-sharing site in a VPC. An elastic load balancer distributes web traffic across two subnets. The load balancer session stickiness is configured to use the AWS-generated session cookie, with a session TTL of 5 minutes. The web server Auto Scaling group is configured as min-size=4, max-size=4. The startup is preparing for a public launch, by running load-testing software installed on a single Amazon Elastic Compute Cloud (EC2) instance running in us-west-2a. After 60 minutes of load-testing, the web server logs show the following:WEBSERVER LOGS | # of HTTP requests from load-tester | # of HTTP requests from private beta users || webserver #1 (subnet in us-west-2a): | 19,210 | 434 || webserver #2 (subnet in us-west-2a): | 21,790 | 490 || webserver #3 (subnet in us-west-2b): | 0 | 410 || webserver #4 (subnet in us-west-2b): | 0 | 428 |Which recommendations can help ensure that load-testing HTTP requests are evenly distributed across the four web servers? Choose 2 answers
1. Launch and run the load-tester Amazon EC2 instance from us-east-1 instead.
2. Configure Elastic Load Balancing session stickiness to use the app-specific session cookie.
3. Re-configure the load-testing software to re-resolve DNS for each web request. (Refer link)
4. Configure Elastic Load Balancing and Auto Scaling to distribute across us-west-2a and us-west-2b.
5. Use a third-party load-testing service which offers globally distributed test clients. (Refer link)
To serve Web traffic for a popular product your chief financial officer and IT director have purchased 10 m1.large heavy utilization Reserved Instances (RIs) evenly spread across two availability zones: Route 53 is used to deliver the traffic to an Elastic Load Balancer (ELB). After several months, the product grows even more popular and you need additional capacity As a result, your company purchases two c3.2xlarge medium utilization RIs You register the two c3.2xlarge instances with your ELB and quickly find that the ml large instances are at 100% of capacity and the c3.2xlarge instances have significant capacity that’s unused Which option is the most cost effective and uses EC2 capacity most effectively?
1. Use a separate ELB for each instance type and distribute load to ELBs with Route 53 weighted round robin
2. Configure Autoscaling group and Launch Configuration with ELB to add up to 10 more on-demand mi large instances when triggered by CloudWatch shut off c3.2xlarge instances (increase cost as you still pay for the RI)
3. Route traffic to EC2 m1.large and c3.2xlarge instances directly using Route 53 latency based routing and health checks shut off ELB (will not still use the capacity effectively)
4. Configure ELB with two c3.2xlarge Instances and use on-demand Autoscailng group for up to two additional c3.2xlarge instances Shut on m1.large instances(Increases cost, as you still pay for the 10 m1.large RI)
Which header received at the EC2 instance identifies the port used by the client while requesting ELB?
1. X-Forwarded-Proto
2. X-Requested-Proto
3. X-Forwarded-Port
4. X-Requested-Port
A user has configured ELB with two instances running in separate AZs of the same region? Which of the below mentioned statements is true?
1. Multi AZ instances will provide HA with ELB (ELB provides HA to route traffic to healthy instances only it does not provide scalability)
2. Multi AZ instances are not possible with a single ELB
3. Multi AZ instances will provide scalability with ELB
4. The user can achieve both HA and scalability with ELB
A user is configuring the HTTPS protocol on a front end ELB and the SSL protocol for the back-end listener in ELB. What will ELB do?
1. It will allow you to create the configuration, but the instance will not pass the health check
2. Receives requests on HTTPS and sends it to the back end instance on SSL
3. It will not allow you to create this configuration (Will give error “Load Balancer protocol is an application layer protocol, but instance protocol is not. Both the Load Balancer protocol and the instance protocol should be at the same layer. Please fix.”)
4. It will allow you to create the configuration, but ELB will not work as expected
An ELB is diverting traffic across 5 instances. One of the instances was unhealthy only for 20 minutes. What will happen after 20 minutes when the instance becomes healthy?
1. ELB will never divert traffic back to the same instance
2. ELB will not automatically send traffic to the same instance. However, the user can configure to start sending traffic to the same instance
3. ELB starts sending traffic to the instance once it is healthy
4. ELB terminates the instance once it is unhealthy. Thus, the instance cannot be healthy after 10 minutes
A user has hosted a website on AWS and uses ELB to load balance the multiple instances. The user application does not have any cookie management. How can the user bind the session of the requestor with a particular instance?
1. Bind the IP address with a sticky cookie
2. Create a cookie at the application level to set at ELB
3. Use session synchronization with ELB
4. Let ELB generate a cookie for a specified duration
A user has configured a website and launched it using the Apache web server on port 80. The user is using ELB with the EC2 instances for Load Balancing. What should the user do to ensure that the EC2 instances accept requests only from ELB?
1. Open the port for an ELB static IP in the EC2 security group
2. Configure the security group of EC2, which allows access to the ELB source security group
3. Configure the EC2 instance so that it only listens on the ELB port
4. Configure the security group of EC2, which allows access only to the ELB listener
AWS Elastic Load Balancer supports SSL termination.
1. For specific availability zones only
2. False
3. For specific regions only
4. For all regions
User has launched five instances with ELB. How can the user add the sixth EC2 instance to ELB?
1. The user can add the sixth instance on the fly.
2. The user must stop the ELB and add the sixth instance.
3. The user can add the instance and change the ELB config file.
4. The ELB can only have a maximum of five instances.

References

AWS Certified Big Data -Speciality (BDS-C00) Exam Learning Path

August 30, 2019 ~ Last updated on : September 19, 2020 ~ jayendrapatil ~ 23 Comments

Clearing the AWS Certified Big Data – Speciality (BDS-C00) was a great feeling. This was my third Speciality certification and in terms of the difficulty level (compared to Network and Security Speciality exams), I would rate it between Network (being the toughest) Security (being the simpler one).

Big Data in itself is a very vast topic and with AWS services, there is lots to cover and know for the exam. If you have worked on Big Data technologies including a bit of Visualization and Machine learning, it would be a great asset to pass this exam.

AWS Certified Big Data – Speciality (BDS-C00) exam basically validates

Implement core AWS Big Data services according to basic architectural best practices
Design and maintain Big Data
Leverage tools to automate Data Analysis

Refer AWS Certified Big Data – Speciality Exam Guide for details

AWS Certified Big Data – Speciality Domains

AWS Certified Big Data – Speciality (BDS-C00) Exam Summary

AWS Certified Big Data – Speciality exam, as its name suggests, covers a lot of Big Data concepts right from data transfer and collection techniques, storage, pre and post processing, analytics, visualization with the added concepts for data security at each layer.
One of the key tactic I followed when solving any AWS Certification exam is to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.

Be sure to cover the following topics
- Whitepapers and articles
  - AWS Analytics Services Cheat Sheet
  - Data Transfer Options – Need to be clear for uses cases for VPN vs Direct Connect vs Snowball
- Analytics
  - Make sure you know and cover all the services in depth, as 80% of the exam is focused on these topics
  - Elastic Map Reduce
    - Understand EMR in depth
    - Understand EMRFS (hint: Use Consistent view to make sure S3 objects referred by different applications are in sync)
    - Know EMR Best Practices (hint: start with many small nodes instead on few large nodes)
    - Know Hive can be externally hosted using RDS, Aurora and AWS Glue Data Catalog
    - Know also different technologies
      - Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources
      - D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS
      - Spark is a distributed processing framework and programming model that helps do machine learning, stream processing, or graph analytics using Amazon EMR clusters
      - Zeppelin/Jupyter as a notebook for interactive data exploration and provides open-source web application that can be used to create and share documents that contain live code, equations, visualizations, and narrative text
      - Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store
  - Kinesis
    - Understand Kinesis Data Streams and Kinesis Data Firehose in depth
    - Know Kinesis Data Streams vs Kinesis Firehose
      - Know Kinesis Data Streams is open ended on both producer and consumer. It supports KCL and works with Spark.
      - Know Kineses Firehose is open ended for producer only. Data is stored in S3, Redshift and ElasticSearch.
      - Kinesis Firehose works in batches with minimum 60secs interval.
    - Understand Kinesis Encryption (hint: use server side encryption or encrypt in producer for data streams)
    - Know difference between KPL vs SDK (hint: PutRecords are synchronously, while KPL supports batching)
    - Kinesis Best Practices (hint: increase performance increasing the shards)
  - Know ElasticSearch is a search service which supports indexing, full text search, faceting etc.
  - Redshift
    - Understand Redshift in depth
    - Understand Redshift Advance topics like Workload Management, Distribution Style, Sort key
    - Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, COPY command which allows parallelism
    - Know Redshift views to control access to data.
  - Amazon Machine Learning
    - Know difference in algorithms esp. Binary classification vs Multiclass vs Regression
  - Know Data Pipeline for data transfer
  - QuickSight
    - Know Visual Types (hint: esp. plotting line, bar and story based visualizations)
    - Know Supported Data Sources (hint: supports files)
  - Know Glue as the ETL tool
- Security, Identity & Compliance
  - Data security is a key concept controlled in the Big Data – Speciality exam
  - Identity and Access Management (IAM)
    - Understand IAM in depth
    - Understand IAM Roles
    - Understand Identity Providers & Federation (hint: restrict access based on assumed role)
    - Understand IAM Policies
  - Deep dive into Key Management Service (KMS). There would be quite a few questions on this.
    - Understand how KMS works
    - Understand IAM Policies, Key Policies, Grants
    - Know KMS are regional and how to use in other regions.
  - Understand AWS Cognito esp. authentication across devices
- Management & Governance Tools
  - Understand AWS CloudWatch for Logs and Metrics. Also, CloudWatch Events more real time alerts as compared to CloudTrail
- Storage
  - Data Storage Options – Know patterns for S3 vs RDS vs DynamoDB vs Redshift
  - Simple Storage Service
    - Know S3 Data Protection
    - Know S3 Access Control (hint: ACLs for fine grained access control)
  - DynamoDB
    - Know DynamoDB
    - Know DynamoDB Secondary Indexes
    - Know DynamoDB security (hint: allows fine grained access control)
    - Know DynamoDB Accelerator (DAX) for caching
- Compute
  - Know EC2 access to services using IAM Role and Lambda using Execution role.
  - Lambda esp. how to improve performance batching, breaking functions etc.

AWS Certified Big Data – Speciality (BDS-C00) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified Big Data Specialty Exam – In Depth & Hands On [Recommended]
- Linux Academy – AWS Certified Big Data Specialty course

Practice tests
- Braincert – AWS Certified Big Data – Speciality BDS-C00 Practice Exams [Recommended]

AWS Data Transfer Services

August 30, 2019 ~ Last updated on : October 10, 2021 ~ jayendrapatil ~ 1 Comment

AWS Data Transfer Services

AWS provides a suite of data transfer services that includes many methods that to migrate your data more effectively.

Data Transfer services work both Online and Offline and the usage depends on several factors like the amount of data, the time required, frequency, available bandwidth, and cost.
Online data transfer and hybrid cloud storage
- A network link to the VPC, transfer data to AWS or use S3 for hybrid cloud storage with existing on-premises applications.
- helps both to lift and shift large datasets once, as well as help you integrate existing process flows like backup and recovery or continuous data streams directly with cloud storage.
Offline data migration to S3.
- use shippable, ruggedized devices are ideal for moving large archives, data lakes, or in situations where bandwidth and data volumes cannot pass over your networks within your desired time frame.

Online data transfer

VPN

connect securely between data centers and AWS
quick to set up and cost-efficient

ideal for small data transfers and connectivity
not reliable as still uses shared Internet connection

Direct Connect

provides a dedicated physical connection to accelerate network transfers between data centers and AWS

provides reliable data transfer
ideal for regular large data transfer
needs time to setup

is not a cost-efficient solution
can be secured using VPN over Direct Connect

AWS S3 Transfer Acceleration

makes public Internet transfers to S3 faster.

helps maximize the available bandwidth regardless of distance or varying Internet weather, and there are no special clients or proprietary network protocols. Simply change the endpoint you use with your S3 bucket and acceleration is automatically applied.
ideal for recurring jobs that travel across the globe, such as media uploads, backups, and local data processing tasks that are regularly sent to a central location

AWS DataSync

automates moving data between on-premises storage and S3 or Elastic File System (Amazon EFS).

automatically handles many of the tasks related to data transfers that can slow down migrations or burden the IT operations, including running your own instances, handling encryption, managing scripts, network optimization, and data integrity validation.
helps transfer data at speeds up to 10 times faster than open-source tools.
uses AWS Direct Connect or internet links to AWS and is ideal for one-time data migrations, recurring data processing workflows, and automated replication for data protection and recovery.

Offline data transfer

AWS Snowcone

AWS Snowcone is portable, rugged, and secure that provides edge computing and data transfer devices.
Snowcone can be used to collect, process, and move data to AWS, either offline by shipping the device or online with AWS DataSync.
AWS Snowcone stores data securely in edge locations, and can run edge computing workloads that use AWS IoT Greengrass or EC2 instances.

Snowcone devices are small and weigh 4.5 lbs. (2.1 kg), so you can carry one in a backpack or fit it in tight spaces for IoT, vehicular, or even drone use cases.

AWS Snowball

AWS Snowball is a data migration and edge computing device that comes in two device options:
- Compute Optimized
  - Snowball Edge Compute Optimized devices provide 52 vCPUs, 42 terabytes of usable block or object storage, and an optional GPU for use cases such as advanced machine learning and full-motion video analysis in disconnected environments.
- Storage Optimized.
  - Snowball Edge Storage Optimized devices provide 40 vCPUs of compute capacity coupled with 80 terabytes of usable block or S3-compatible object storage.
  - It is well-suited for local storage and large-scale data transfer.
Customers can use these two options for data collection, machine learning and processing, and storage in environments with intermittent connectivity (such as manufacturing, industrial, and transportation) or in extremely remote locations (such as military or maritime operations) before shipping it back to AWS.
Snowball devices may also be rack mounted and clustered together to build larger, temporary installations.

AWS Snowball Edge

~~is a petabyte to exabytes scale data transfer device with on-board storage and compute capabilities~~
~~move large amounts of data into and out of AWS, as a temporary storage tier for large local datasets, or to support local workloads in remote or offline locations.~~
~~ideal for one time large data transfers with limited network bandwidth, long transfer times, and security concerns~~

~~is simple, fast, and secure.~~
~~can be very cost and time efficient for large data transfer~~

AWS Snowmobile

AWS Snowmobile moves up to 100 PB of data in a 45-foot long ruggedized shipping container and is ideal for multi-petabyte or Exabyte-scale digital media migrations and data center shutdowns.

A Snowmobile arrives at the customer site and appears as a network-attached data store for more secure, high-speed data transfer.
After data is transferred to Snowmobile, it is driven back to an AWS Region where the data is loaded into S3.
Snowmobile is tamper-resistant, waterproof, and temperature controlled with multiple layers of logical and physical security – including encryption, fire suppression, dedicated security personnel, GPS tracking, alarm monitoring, 24/7 video surveillance, and an escort security vehicle during transit.

Data Transfer Chart – Bandwidth vs Time

Data Migration Speeds

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

An organization is moving non-business-critical applications to AWS while maintaining a mission critical application in an on-premises data center. An on-premises application must share limited confidential information with the applications in AWS. The Internet performance is unpredictable. Which configuration will ensure continued connectivity between sites MOST securely?
1. VPN and a cached storage gateway
2. AWS Snowball Edge
3. VPN Gateway over AWS Direct Connect
4. AWS Direct Connect
A company wants to transfer petabyte-scale of data to AWS for their analytics, however are constrained on their internet connectivity? Which AWS service can help them transfer the data quickly?
1. S3 enhanced uploader
2. Snowmobile
3. Snowball
4. Direct Connect

A company wants to transfer its video library data, which runs in exabytes, to AWS. Which AWS service can help the company transfer the data?
1. Snowmobile
2. Snowball
3. S3 upload
4. S3 enhanced uploader
You are working with a customer who has 100 TB of archival data that they want to migrate to Amazon Glacier. The customer has a 1-Gbps connection to the Internet. Which service or feature provides the fastest method of getting the data into Amazon Glacier?
1. Amazon Glacier multipart upload
2. AWS Storage Gateway
3. VM Import/Export
4. AWS Snowball

References

AWS_Cloud_Data_Migration

AWS Redshift Best Practices

August 30, 2019 ~ Last updated on : June 30, 2023 ~ jayendrapatil ~ 2 Comments

AWS Redshift Best Practices

Designing Tables

Distribution Style selection

Distribute the fact table and one dimension table on their common columns.
- A fact table can have only one distribution key. Any tables that join on another key aren’t collocated with the fact table.
- Choose one dimension to collocate based on how frequently it is joined and the size of the joining rows.
- Designate both the dimension table’s primary key and the fact table’s corresponding foreign key as the DISTKEY.
Choose the largest dimension based on the size of the filtered dataset.
- Only the rows that are used in the join need to be distributed, so consider the size of the dataset after filtering, not the size of the table.

Choose a column with high cardinality in the filtered result set.
- If you distribute a sales table on a date column, for e.g, you should probably get fairly even data distribution, unless most of the sales are seasonal
- However, if you commonly use a range-restricted predicate to filter for a narrow date period, most of the filtered rows occur on a limited set of slices and the query workload is skewed.

Change some dimension tables to use ALL distribution.
- If a dimension table cannot be collocated with the fact table or other important joining tables, query performance can be improved significantly by distributing the entire table to all of the nodes.
- Using ALL distribution multiplies storage space requirements and increases load times and maintenance operations.

Sort Key Selection

Redshift stores the data on disk in sorted order according to the sort key, which helps query optimizer to determine optimal query plans.
If recent data is queried most frequently, specify the timestamp column as the leading column for the sort key.
- Queries are more efficient because they can skip entire blocks that fall outside the time range.

If you do frequent range filtering or equality filtering on one column, specify that column as the sort key.
- Redshift can skip reading entire blocks of data for that column.
- Redshift tracks the minimum and maximum column values stored on each block and can skip blocks that don’t apply to the predicate range.

If you frequently join a table, specify the join column as both the sort key and the distribution key.
- Doing this enables the query optimizer to choose a sort merge join instead of a slower hash join.
- As the data is already sorted on the join key, the query optimizer can bypass the sort phase of the sort merge join.

Other Practices

Automatic compression produces the best results
COPY command analyzes the data and applies compression encodings to an empty table automatically as part of the load operation
Define primary key and foreign key constraints between tables wherever appropriate. Even though they are informational only, the query optimizer uses those constraints to generate more efficient query plans.

Don’t use the maximum column size for convenience.

Loading Data

You can load data into the tables using the three following methods:
- Using Multi-Row INSERT
- Using Bulk INSERT
- Using COPY command
- Staging tables

Copy Command
- COPY command loads data in parallel from S3, EMR, DynamoDB, or multiple data sources on remote hosts.
- COPY loads large amounts of data much more efficiently than using INSERT statements, and stores the data more effectively as well.
- Use a Single COPY Command to Load from Multiple Files
- DON’T use multiple concurrent COPY commands to load one table from multiple files as Redshift is forced to perform a serialized load, which is much slower.
Split the Load Data into Multiple Files
- divide the data in multiple files with equal size (between 1MB and 1GB)
- number of files to be a multiple of the number of slices in the cluster
- helps to distribute workload uniformly in the cluster.

Use a Manifest File
- S3 provides eventual consistency for some operations, so it is possible that new data will not be available immediately after the upload, which could result in an incomplete data load or loading stale data.
- Data consistency can be managed using a manifest file to load data.
- Manifest file helps specify different S3 locations in a more efficient way that with the use of S3 prefixes.
Compress Data Files
- Individually compress the load files using gzip, lzop, bzip2, or Zstandard for large datasets
- Avoid using compression, if small amount of data because the benefit of compression would be outweighed by the processing cost of decompression
- If the priority is to reduce the time spent by COPY commands use LZO compression. In the other hand if the priority is to reduce the size of the files in S3 and the network bandwidth use BZ2 compression.
Load Data in Sort Key Order
- Load the data in sort key order to avoid needing to vacuum.
- As long as each batch of new data follows the existing rows in the table, the data will be properly stored in sort order, and you will not need to run a vacuum.
- Presorting rows is not needed in each load because COPY sorts each batch of incoming data as it loads.

Load Data using IAM role

Designing Queries

Avoid using select *. Include only the columns you specifically need.
Use a CASE Expression to perform complex aggregations instead of selecting from the same table multiple times.

Don’t use cross-joins unless absolutely necessary
Use subqueries in cases where one table in the query is used only for predicate conditions and the subquery returns a small number of rows (less than about 200).
Use predicates to restrict the dataset as much as possible.

In the predicate, use the least expensive operators that you can.
Avoid using functions in query predicates.
If possible, use a WHERE clause to restrict the dataset.

Add predicates to filter tables that participate in joins, even if the predicates apply the same filters.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.)
1. When the tables are highly denormalized and do NOT participate in frequent joins.
2. When data must be grouped based on a specific key on a defined slice.
3. When data transfer between nodes must be eliminated.
4. When a new table has been loaded and it is unclear how it will be joined to dimension.
An administrator has a 500-GB file in Amazon S3. The administrator runs a nightly COPY command into a 10-node Amazon Redshift cluster. The administrator wants to prepare the data to optimize performance of the COPY command. How should the administrator prepare the data?
1. Compress the file using gz compression.
2. Split the file into 500 smaller files.
3. Convert the file format to AVRO.
4. Split the file into 10 files of equal size.

AWS Systems Manager

July 24, 2019 ~ Last updated on : February 6, 2023 ~ jayendrapatil

AWS Systems Manager

Systems Manager provides visibility and control of the infrastructure on AWS.

helps to view operational data from multiple AWS services and automates operational tasks across AWS resources.
A managed instance is an EC2 instance or on-premises machine in your hybrid environment that has been configured for Systems Manager.

works with managed instances, which are configured for use with Systems Manager.
helps configure and maintain managed instances.
helps maintain security and compliance by scanning the managed instances and reporting on (or taking corrective action on) any policy violations it detects.

supported machine types include EC2 instances, on-premises servers, and virtual machines (VMs), including VMs in other cloud environments.
supported operating system types include Windows Server, multiple distributions of Linux, and Raspbian.

Operations Management

Capabilities that help manage the AWS resources

Trusted Advisor is an online tool that provides real-time guidance to help you provision the resources following AWS best practices.
AWS Personal Health Dashboard provides information about AWS Health events that can affect your account
OpsCenter provides a central location where operations engineers and IT professionals can view, investigate, and resolve operational work items (OpsItems) related to AWS resources

Application Management

SSM Parameter Store

SSM Parameter Store provides secure, scalable, centralized, hierarchical storage for configuration data and secret management.
can store data such as passwords, database strings, AMI IDs and license codes as parameter values.

supports values as plain text or encrypted data using the SecureString parameter.
uses AWS KMS to encrypt the parameter value.
parameters can be referenced by using the unique name specified during parameter creation.

supports versioning of configuration/secrets.
provides high availability as Parameter Store is hosted in multiple AZs in an AWS Region.
can be configured for change notifications and invoke automated actions for both parameters and parameter policies

is integrated with Secrets Manager and can be used to retrieve Secrets Manager secrets when using other AWS services that already support references to Parameter Store parameters
does not support password rotation, use Secrets Manager instead.

SSM Parameter Store vs Secrets Manager

Change Management

Capabilities for taking action against or changing the AWS resources

Systems Manager Automation

helps automate common maintenance and deployment tasks for e.g. create and update AMIs, apply driver and agent updates, reset passwords on Windows instances, reset SSH keys on Linux instances, and apply OS patches or application updates.

Maintenance Windows

helps set up recurring schedules for managed instances to run administrative tasks like installing patches and updates without interrupting business-critical operations.

Node Management

Capabilities for managing the EC2 instances, on-premises servers and virtual machines (VMs) in the hybrid environment, and other types of AWS resources (nodes)

Systems Manager Configuration Compliance

helps scan fleet of managed instances for patch compliance and configuration inconsistencies.
helps collect and aggregate data from multiple AWS accounts and Regions, and then drill down into specific resources that aren’t compliant.

provides, by default, displays compliance data about Patch Manager patching and State Manager associations, but can be customized

Session Manager

helps manage EC2 instances through an interactive one-click browser-based shell or through the AWS CLI.
provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys.

helps comply with corporate policies that require controlled access to instances, strict security practices, and fully auditable logs with instance access details, while still providing end users with simple one-click cross-platform access to the EC2 instances.

Systems Manager Run Command

Run Command allows you to automate common administrative tasks and perform one-time configuration changes at scale.
helps to remotely and securely manage the configuration of the managed instances at scale.

helps perform on-demand changes like updating applications or running Linux shell scripts and Windows PowerShell commands on a target set of dozens or hundreds of instances.

Patch Manager

helps automate the process of patching managed instances with both security-related and other types of updates.
helps apply patches for both operating systems and applications. (On Windows Server, application support is limited to updates for Microsoft applications.)

enables scanning of instances for missing patches and applies them individually or to a large group of instances by using EC2 instance tags.
provides options to scan the instances and report compliance on a schedule, install available patches on a schedule, and patch or scan instances on-demand as needed.
Patch baselines
- defines which patches should and shouldn’t be installed
- can include rules for auto-approving patches within days of their release, as well as a list of approved and rejected patches
- helps install security patches on a regular basis by scheduling patching to run as a Systems Manager maintenance window task.

Patch group
- helps associate a set of instances with a specific patch baseline
- requires instances to be tagged with a tag key Patch Group
- an instance can only be part of one Patch Group
- a patch group can be registered with only one patch baseline

Systems Manager Inventory

provides visibility into the EC2 and on-premises computing environment

collect metadata from the managed instances about applications, files, components, patches, and more on the managed instances
collects only metadata from the managed instances and doesn’t access proprietary information or data.
supports custom metadata in addition to the pre-configured metadata

supports inventory data collection from multiple regions and AWS Accounts
supports inventory data storage in a single centralized location like S3 which can then be queried using Athena.

Systems Manager State Manager

is a secure and scalable configuration management service that helps automate the process of keeping the managed instances in a defined state.

helps ensure that the instances are bootstrapped with specific software at startup, joined to a Windows domain (Windows instances only), or patched with specific software updates.
A State Manager association is a configuration that is assigned to the managed instances which defines the state that you want to maintain on the instances.

Shared Resources

Capabilities for managing and configuring the AWS resources

Systems Manager Document (SSM document)

SSM document defines the actions that the Systems Manager performs.
SSM document types include
- Command documents, which are used by State Manager and Run Command, and
- Automation documents, which are used by Systems Manager Automation.
SSM Document can be defined in JSON or YAML and define parameters and actions.

Systems Manager Agent

is software that can be installed and configured on an EC2 instance, an on-premises server, or a virtual machine (VM)

makes it possible for the Systems Manager to update, manage, and configure these resources
must be installed on each instance to use with Systems Manager
usually comes preinstalled with a lot of Amazon Machine Images (AMIs), while it must be installed manually on other AMIs, and on on-premises servers and virtual machines for the hybrid environment

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Which of the following tools from AWS allows the automatic collection of software inventory from EC2 instances and helps apply OS patches?
1. AWS Code Deploy
2. Systems Manager
3. EC2 AMI’s
4. AWS Code Pipeline
A Developer is writing several Lambda functions that each access data in a common RDS DB instance. They must share a connection string that contains the database credentials, which are a secret. A company policy requires that all secrets be stored encrypted. Which solution will minimize the amount of code the Developer must write?
1. Use common DynamoDB table to store settings
2. Use AWS Lambda environment variables
3. Use Systems Manager Parameter Store secure strings
4. Use a table in a separate RDS database
A company has a fleet of EC2 instances and needs to remotely execute scripts for all of the instances. Which Amazon EC2 systems Manager feature allows this?
1. Systems Manager Automation
2. Systems Manager Run Command
3. Systems Manager Parameter Store
4. Systems Manager Inventory

As a part of compliance check it was found that EC2 instances launched by the deployment team were not in compliance to latest security patches. The team had all tagged the resources. Which AWS service can help make the instances complaint?
1. AWS Inspector
2. AWS GuardDuty
3. AWS Systems Manager
4. AWS Shield

References

AWS Systems Manager User Guide

AWS Cloud Migration

July 20, 2019 ~ Last updated on : December 30, 2022 ~ jayendrapatil ~ 5 Comments

AWS Cloud Migration

Some of the key drivers to moving to cloud is

Operational Costs – Key components of operational costs are unit price of infrastructure, the ability to match supply and demand, finding a pathway to optionality, employing an elastic cost base, and transparency
Workforce Productivity – getting up and ready in seconds and various service availability.

Cost Avoidance – eliminating the need for hardware refresh programs and constant maintenance programs
Operational Resilience – increases resilience and thereby reduces organization’s risk profile
Business Agility – react to market conditions more quickly

Cloud Stages of Adoption

PROJECT

In the project phase, execute projects to get familiar with and experience benefits from the cloud.

FOUNDATION

After experiencing the benefits of cloud, build the foundation to scale the cloud adoption.
This includes creating a landing zone (a pre-configured, secure, multi-account AWS environment), Cloud Center of Excellence (CCoE), operations model, as well as assuring security and compliance readiness.

MIGRATION

Migrate existing applications including mission-critical applications or entire data centers to the cloud as you scale your adoption across a growing portion of the IT portfolio.

REINVENTION

Now that the operations are in the cloud, focus on reinvention by taking advantage of the flexibility and capabilities of AWS to transform business by speeding time to market and increasing the attention on innovation.

Migration Process

Phase 1: Migration Preparation and Business Planning

Determine the right objectives and begin to get an idea of the types of benefits you will see.

Starts with some foundational experience and developing a preliminary business case for a migration, which requires taking objectives into account, along with the age and architecture of the existing applications, and their constraints.

Phase 2: Portfolio Discovery and Planning

Understand the IT portfolio, the dependencies between applications, and begin to consider what types of migration strategies needed to meet the business case objectives.
With the portfolio discovery and migration approach, you are in a good position to build a full business case.

Phase 3 & Phase 4: Designing, Migrating, and Validating Application

Move focus from the portfolio level to the individual application level and design, migrate, and validate each application.
Each application is designed, migrated, and validated according to one of the six common application strategies (“The 6 R’s”).
Once you have some foundational experience from migrating a few apps and a plan in place that the organization can get behind – it’s time to accelerate the migration and achieve scale.

AWS provides migration services that help for moving applications and data from on-premises to AWS – AWS Server Migration Service (SMS), AWS Database Migration Service (DMS)

Phase 5: Operate

Once applications are migrated, iterate on the new foundation, turn off old systems, and constantly iterate toward a modern operating model.
Operating model becomes an evergreen set of people, process, and technology that constantly improves as you migrate more applications.

Application Migration Strategies

Migration strategies depend upon what is in your environment and the what is suitable for the portfolio, taking into account the business and technical requirements.

Below are the Six common migration strategies employed and build upon “The 5 R’s” that Gartner outlined in 2011.

1. Rehost (“lift and shift”)

Moving your application as is to the Cloud.

helps to quickly implement the migration and scale to meet a business case
provides better opportunity for re-architect the applications once they are already running in cloud, with the organization having already developed cloud skills and the application with its data is migrated and handling traffic.
Rehosting can be automated with tools such as AWS Server Migration Service, or can be done manually

2. Replatform (“lift, tinker and shift”)

Moving your application to the Cloud with optimizations, without any major changes.
Replatform helps achieve some tangible benefit without changing the core architecture of the application. For e.g., using RDS for database or Elastic Beanstalk for applications.

3. Repurchase (“drop and shop”)

Dropping the application and Moving to a complete new Solution

More of an Buy in a Build vs Buy model, might be expensive in short team but faster time to market.
Move to a different product, which likely means the organization is willing to change the existing used licensing model

4. Refactor / Re-architect

Moving the application to Cloud, with major changes.

More of a Build in a Build vs Buy model, and would take time.
driven by a strong business need to add features, scale, or performance with agility and improvement in business continuity that would otherwise be difficult to achieve in the application’s existing environment.

5. Retire

Decommission the applications, not needed anymore.

Identifying IT assets that are no longer useful and can be turned off will help boost your business case and direct your attention towards maintaining the resources that are widely used.

6. Retain

Keep the applications as is in the current environment
Retain portions of the IT portfolio, which have tight dependencies, difficult, not in priority or ready for migration

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A company is planning the migration of several lab environments used for software testing. An assortment of custom tooling is used to manage the test runs for each lab. The labs use immutable infrastructure for the software test runs, and the results are stored in a highly available SQL database cluster. Although completely rewriting the custom tooling is out of scope for the migration project, the company would like to optimize workloads during the migration. Which application migration strategy meets this requirement?
1. Re-host
2. Re-platform
3. Re-factor/re-architect
4. Retire

References

AWS documentation – Cloud_Migration

AWS Certified DevOps Engineer – Professional (DOP-C01) Exam Learning Path

AWS Certified DevOps Engineer - Professional (DOP-C01) Certificate

June 10, 2019 ~ Last updated on : October 4, 2023 ~ jayendrapatil ~ 48 Comments

AWS Certified DevOps Engineer – Professional (DOP-C01) Exam Learning Path

NOTE – Refer to DOP-C02 Learning Path

AWS Certified DevOps Engineer – Professional (DOP-C01) exam is the upgraded pattern of the DevOps Engineer – Professional exam which was released last year (2018). I recently attempted the latest pattern and AWS has done quite good in improving it further, as compared to the old one, to include more DevOps related questions and services.

AWS Certified DevOps Engineer – Professional (DOP-C01) exam basically validates

Implement and manage continuous delivery systems and methodologies on AWS

Implement and automate security controls, governance processes, and compliance validation
Define and deploy monitoring, metrics, and logging systems on AWS
Implement systems that are highly available, scalable, and self-healing on the AWS platform

Design, manage, and maintain tools to automate operational processes

Refer to AWS Certified DevOps Engineer – Professional Exam Guide

AWS Certified DevOps Engineer – Professional (DOP-C01) Exam Summary

AWS Certified DevOps Engineer – Professional exam was for a total of 170 minutes but it had 75 questions (I was always assuming it to be 65) and I just managed to complete the exam with 20 mins remaining. So be sure you are prepared and manage your time well. As always, mark the questions for review and move on and come back to them after you are done with all.

One of the key tactic I followed when solving the DevOps Engineer questions was to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
AWS Certified DevOps Engineer – Professional exam covers a lot of concepts and services related to Automation, Deployments, Disaster Recovery, HA, Monitoring, Logging and Troubleshooting. It also covers security and compliance related topics.
Be sure to cover the following topics
- Whitepapers are the key to understand Deployments and DR
- Management Tools
  - DevOps professional exam cannot be cleared without the knowledge of this topics
  - Deep dive into CloudFormation, Elastic Beanstalk and OpsWorks
  - Very important to understand CloudFormation vs Elastic Beanstalk vs OpsWorks
  - CloudFormation
    - Have in-depth understand of CloudFormation concepts
    - Know how to indicate completion of events using CloudFormation helper scripts.
    - Understand CloudFormation deployment strategies esp. rolling and replacing update with AutoScaling and update of launch configuration
    - Understand CloudFormation policies esp. Update and Deletion policies (hint : retain resources on stack deletion)
    - Understand CloudFormation Best Practices esp. Nested Stacks and logical grouping
    - Understand CloudFormation template anatomy – parameters, outputs, mappings
    - Understand CloudFormation Custom resource and its use cases (hint : you can use Custom resource to retrieve AMI IDs or interact with external services)
  - Elastic Beanstalk
    - Understand Elastic Beanstalk overall – Applications, Versions and Environments
    - Understand Elastic Beanstalk Deployment Strategies esp. the rolling, immutable and blue/green deployments
    - Know Custom AMIs can be supported
    - Know Elastic Beanstalk offers Docker support
  - OpsWorks
    - Understand OpsWorks overall – stacks, layers, recipes
    - Understand OpsWorks Lifecycle events esp. the Configure event and how it can be used.
    - Understand OpsWorks Deployment Strategies
    - Know OpsWorks auto-healing and how to be notified for it.
  - Development Tools
    - Unlike the previous DevOps Engineer – Professional exam, the latest pattern has a heavy focus on the Developer tools and be sure to deep dive into them
    - Understand CodePipepline, CodeCommit, CodeDeploy, CodeBuild and their uses cases
    - CodePipeline
      - Understand how to build Pipelines and integration with other Code* services
      - Understand CodePipeline pipeline structure (Hint : run builds parallelly using runorder)
      - Understand how to configure notifications on events and failures
      - Know CodePipeline supports Manual Approval
    - CodeCommit
      - How to handle deployments for code. (Hint : Same repository and branches for projects and environments)
      - Know CodeCommit IAM policies
    - CodeDeploy
      - Understand CodeDeploy Lifecycle events hooks
      - Understand CodeDeploy deployment configurations (hint : supports canary and linear deployment)
      - Understand CodeDeploy redeploy and rollbacks
- Monitoring & Governance tools
  - Very important to understand AWS CloudWatch vs AWS CloudTrail vs AWS Config
  - Very important to understand Trust Advisor vs Systems manager vs AWS Inspector
  - Know Personal Health Dashboard & Service Health Dashboard
  - CloudWatch
    - Deep dive CloudWatch
    - Understand CloudWatch logs
    - Understand CloudWatch Subscription Filters and its integration with other services.
    - Understand CloudWatch Events
    - Understand CloudWatch supports custom metrics
    - Know how to monitor AWS managed and on-premises instances
    - Know you can triggers events using CloudWatch scheduled events.
  - CloudTrail
    - Understand CloudTrail for audit and governance
    - Understand how to maintain CloudTrail logs integrity
  - Understand AWS Config and its use cases (hint : Config maintains history and can be used to revert the config)
  - Know Personal Health Dashboard (hint : it tracks events on your AWS resources)
  - Understand AWS Trusted Advisor and what it provides (hint : low utilization resources)
  - Systems Manager
    - Systems Manager is also covered heavily in the exams so be sure you know
    - Understand AWS Systems Manager and its various services like parameter store, patch manager
- Networking & Content Delivery
  - Networking is covered very lightly. Usually the questions are targetted towards Troubleshooting of access or permissions.
  - Know VPC
    - Understand Security Groups, NACLs (Hint : know NACLs are stateless and how it is reflected in VPC Flow Logs)
    - Understand VPC Flow Logs and what information it provides
  - Route 53
    - Understand Route 53
    - Understand Routing Policies and their use cases Focus on Weighted, Latency routing policies
  - Understand CloudFront and use cases (hint : S3 caching)
  - Load Balancer
    - Understand ELB, ALB and NLB
    - Understand ELB with Auto Scaling (hint : ELB with Auto Scaling for blue/green deployments)
- Security, Identity & Compliance
  - Identity and Access Management
    - Understand IAM Roles and use cases
    - Know IAM Best Practices
  - Know AWS Inspector
  - Know AWS Application Discovery Service
- Storage
  - Exam does not cover Storage services in deep
  - Focus on Simple Secure Service (S3)
    - Understand S3 Permissions (Hint – acl authenticated users provides access to all authenticated users. How to control access)
    - Know S3 disaster recovery across region. (hint : cross region replication)
    - Know CloudFront for caching to improve performance
  - Elastic Block Store
    - Focus mainly on EBS Backup using snapshots for HA and Disaster recovery
- Database
  - Exam covers Database mainly in terms of HA and Disaster Recovery.
  - Know Aurora DR & HA using Read Replicas and Global Database
  - Elastic Search did appear in the exam, but it was only where search was relevant.
  - DynamoDB
    - Improve performance – Best practices (hint : one question for selection of keys)
    - DynamoDB Auto Scaling & DAX for caching
- Compute
  - Know EC2
    - Understand ENI for HA, user data, pre-baked AMIs for faster instance start times
    - Amazon Linux 2 Image (hint : it allows for replication of Amazon Linux behavior in on-premises)
    - Snapshot and sharing
  - Auto Scaling
    - Auto Scaling Lifecycle events
    - Blue/green deployments with Auto Scaling – With new launch configurations, new auto scaling groups or CloudFormation update policies.
  - Understand Lambda
    - Know Lambda Alias supports Canary deployments using Routing Config
  - ECS
    - Know Monitoring and deployments with image update
- Integration Tools
  - Know how CloudWatch integration with SNS and Lambda can help in notification (Topics are not required to be in detail)

AWS Certified DevOps Engineer – Professional (DOP-C01) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified DevOps Engineer Professional
- Adrian Cantrill – AWS Certified DevOps Engineer – Professional
- Whizlabs – AWS Certified DevOps Engineer Professional Course
- Coursera – DevOps on AWS Specialization
Practice tests
- Braincert AWS Certified DevOps Engineer – Professional Practice Exams
- Stephane Maarek – AWS Certified DevOps Engineer Professional Practice Tests
- Whizlabs – AWS Certified DevOps Engineer Professional Practice Tests

AWS Certified Advanced Networking – Speciality (ANS-C00) Exam Learning Path

May 29, 2019 ~ Last updated on : October 4, 2023 ~ jayendrapatil ~ 29 Comments

AWS Certified Advanced Networking – Speciality (ANS-C00) Exam Learning Path

I recently cleared the AWS Certified Advanced Networking – Speciality (ANS-C00), which was my first, en route my path to the AWS Speciality certifications. Frankly, I feel the time I gave for preparation was still not enough, but I just about managed to get through. So a word of caution, this exam is inline or tougher than the professional exam especially for the reason that the Networking concepts it covers are not something you can get your hands dirty with easily.

AWS Certified Advanced Networking – Speciality (ANS-C00) exam is the focusing on the AWS Networking concepts. It basically validates

Design, develop, and deploy cloud-based solutions using AWS
Implement core AWS services according to basic architecture best practices

Design and maintain network architecture for all AWS services
Leverage tools to automate AWS networking tasks

Refer to AWS Certified Advanced Networking – Speciality Exam Guide

AWS Certified Advanced Networking – Speciality (ANS-C00) Exam Resources

Online Courses
- Stephane Maarek – Ultimate AWS Certified Advanced Networking Specialty 2021
- Zeal Vora – AWS Certified Advanced Networking Specialty course
- Linux Academy – AWS Certified Advanced Networking Specialty course
Practice Tests
- Braincert – AWS Certified Advanced Networking Specialty ANS-C00 Practice Tests
- Stephane Maarek – Practice Exam – AWS Certified Advanced Networking Specialty
- Whizlabs – AWS Certified Advanced Networking Specialty Practice tests

AWS Certified Advanced Networking – Speciality (ANS-C00) Exam Summary

AWS Certified Advanced Networking – Speciality exam covers a lot of Networking concepts like VPC, VPN, Direct Connect, Route 53, ALB, NLB.

One of the key tactic I followed when solving the DevOps Engineer questions was to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
Be sure to cover the following topics
- Networking & Content Delivery
  - You should know everything in Networking.
  - Understand VPC in depth
    - Understand VPC, Subnets
    - Know that AWS allows you to extend your VPC by adding a secondary VPC (hint: focus on the IP limitations that you can assign to a created VPC)
    - Understand Security Groups, NACLs (Hint : know NACLs are stateless and how it is reflected in VPC Flow Logs)
    - Understand DHCP Option Sets esp. how to resolve DNS from both on-premises data center and AWS.
    - Understand VPC Peering, configuration and its limitations (Hint: try it yourself esp. cross account ones to know whats needed)
    - Understand Placement Groups, Enhanced Networking
    - Understand VPC Endpoints esp. services supported by Gateway and Interface Endpoints. Interface Endpoints are also called Private Links.
    - Know Transit VPC and its use case
    - Know CloudHub and its use case
  - Virtual Private Network to establish connectivity between on-premises data center and AWS VPC
  - Direct Connect to establish connectivity between on-premises data center and AWS VPC and Public Services
    - Make sure you understand Direct Connect in detail, without this you cannot clear the exam
    - Understand Direct Connect connections – Dedicated and Hosted connections
    - Understand how to create a Direct Connect connection (hint: LOA-CFA provides the details for partner to connect to AWS Direct Connect location)
    - Understand virtual interfaces options – Private Virtual Interface for VPC resources and Public Virtual Interface for Public resources
    - Understand setup Private and Public VIF
    - Understand Route Propagation, propagation priority, BGP connectivity
    - Understand High Availability options based on cost and time i.e. Second Direct Connect connection OR VPN connection
    - Understand Direct Connect Gateway – it provides a way to connect to multiple VPCs from on-premises data center using the same Direct Connect connection
  - Route 53
    - Understand Route 53 and Routing Policies and their use cases Focus on Weighted, Latency routing policies
    - Understand Route 53 Split View DNS to have the same DNS to access a site externally and internally
  - Understand CloudFront and use cases
  - Load Balancer
    - Understand ELB, ALB and NLB
    - Understand the difference ELB, ALB and NLB esp. ALB provides Content, Host and Path based Routing while NLB provides the ability to have static IP address
    - Know how to design VPC CIDR block with NLB (Hint – minimum number of IPs required are 8)
    - Know how to pass original Client IP to the backend instances (Hint – X-Forwarded-for and Proxy Protocol)
  - Know WorkSpaces requirements and setup
- Security
  - Know AWS GuardDuty as managed threat detection service
  - Know AWS Shield esp. the Shield Advanced option and the features it provides
  - Know WAF as Web Traffic Firewall – (Hint – WAF can be attached to your CloudFront, Application Load Balancer, API Gateway to dynamically detect and prevent attacks)
- Monitoring & Management Tools
  - Understand AWS CloudFormation esp. in terms of Network creation. (Hint – Know Custom resources can be used to handle activities not supported by AWS)
  - Understand CloudTrail for audit and governance
  - Understand AWS Config and its use case
- Integration Tools
  - Know how CloudWatch integration with SNS and Lambda can help in notification (Topics are not required to be in detail)
- Whitepapers and articles

AWS Network Connectivity Options

May 29, 2019 ~ Last updated on : September 7, 2022 ~ jayendrapatil ~ 3 Comments

AWS Network Connectivity Options

Internet Gateway

provides Internet connectivity to VPC

is a horizontally scaled, redundant, and highly available component that allows communication between instances in your VPC and the internet.
imposes no availability risks or bandwidth constraints on your network traffic.

serves two purposes: to provide a target in the VPC route tables for internet-routable traffic and to perform NAT for instances that have not been assigned public IPv4 addresses.
supports IPv4 and IPv6 traffic.

NAT Gateway

enables instances in a private subnet to connect to the internet or other AWS services, but prevents the Internet from initiating connections with the instances.

Private NAT gateway allows instances in private subnets to connect to other VPCs or the on-premises network.

Egress Only Internet Gateway

NAT devices are not supported for IPv6 traffic, use an Egress-only Internet gateway instead
Egress-only Internet gateway is a horizontally scaled, redundant, and highly available VPC component that

Egress-only Internet gateway allows outbound communication over IPv6 from instances in the VPC to the Internet and prevents the Internet from initiating an IPv6 connection with your instances.

VPC Endpoints

VPC endpoint provides a private connection from VPC to supported AWS services and VPC endpoint services powered by PrivateLink without requiring an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.
Instances in the VPC do not require public IP addresses to communicate with resources in the service. Traffic between the VPC and the other service does not leave the Amazon network.

VPC Endpoints are virtual devices and are horizontally scaled, redundant, and highly available VPC components that allow communication between instances in the VPC and services without imposing availability risks or bandwidth constraints on the network traffic.
VPC Endpoints are of two types
- Interface Endpoints – is an elastic network interface with a private IP address that serves as an entry point for traffic destined to supported services.
- Gateway Endpoints – is a gateway that is a target for a specified route in your route table, used for traffic destined to a supported AWS service. Currently only Amazon S3 and DynamoDB.

VPC Private Links

provides private connectivity between VPCs, AWS services, and your on-premises networks without exposing your traffic to the public internet.
helps privately expose a service/application residing in one VPC (service provider) to other VPCs (consumer) within an AWS Region in a way that only consumer VPCs initiate connections to the service provider VPC.

With ALB as a target of NLB, ALB’s advanced routing capabilities can be combined with AWS PrivateLink.

VPC Peering

enables networking connection between two VPCs to route traffic between them using private IPv4 addresses or IPv6 addresses
connections can be created between your own VPCs, or with a VPC in another AWS account.

enables full bidirectional connectivity between the VPCs
supports inter-region VPC peering connection
uses existing underlying AWS infrastructure

does not have a single point of failure for communication or a bandwidth bottleneck.
VPC Peering connections have limitations
- cannot be used with Overlapping CIDR blocks
- does not provide Transitive peering
- does not support Edge to Edge routing through Gateway or private connection
is best used when resources in one VPC must communicate with resources in another VPC, the environment of both VPCs is controlled and secured, and the number of VPCs to be connected is less than 10

VPN CloudHub

AWS VPN CloudHub allows you to securely communicate from one site to another using AWS Managed VPN or Direct Connect
AWS VPN CloudHub operates on a simple hub-and-spoke model that can be used with or without a VPC
AWS VPN CloudHub can be used if you have multiple branch offices and existing internet connections and would like to implement a convenient, potentially low cost hub-and-spoke model for primary or backup connectivity between these remote offices.

AWS VPN CloudHub leverages VPC virtual private gateway with multiple gateways, each using unique BGP autonomous system numbers (ASNs).

Transit VPC

A transit VPC is a common strategy for connecting multiple, geographically disperse VPCs and remote networks in order to create a global network transit center.
A transit VPC simplifies network management and minimizes the number of connections required to connect multiple VPCs and remote networks

Transit VPC can be used to support important use cases
- Private Networking – You can build a private network that spans two or more AWS Regions.
- Shared Connectivity – Multiple VPCs can share connections to data centers, partner networks, and other clouds.
- Cross-Account AWS Usage – The VPCs and the AWS resources within them can reside in multiple AWS accounts.
Transit VPC design helps implement more complex routing rules, such as network address translation between overlapping network ranges, or to add additional network-level packet filtering or inspection.
Transit VPC
- supports Transitive routing using the overlay VPN network — allowing for a simpler hub and spoke design. Can be used to provide shared services for VPC Endpoints, Direct Connect connection, etc.
- supports network address translation between overlapping network ranges.
- supports vendor functionality around advanced security (layer 7 firewall/Intrusion Prevention System (IPS)/Intrusion Detection System (IDS) ) using third-party software on EC2
- leverages instance-based routing that increases costs while lowering availability and limiting the bandwidth.
- Customers are responsible for managing the HA and redundancy of EC2 instances running the third-party vendor virtual appliance

Transit Gateway

is a highly available and scalable service to consolidate the AWS VPC routing configuration for a region with a hub-and-spoke architecture.
is a Regional resource and can connect VPCs within the same AWS Region.
TGWs across different regions can peer with each other to enable VPC communications within the same or different regions.

provides simpler VPC-to-VPC communication management over VPC Peering with a large number of VPCs.
enables you to attach VPCs (across accounts) and VPN connections in the same Region and route traffic between them.
support dynamic and static routing between attached VPCs and VPN connections

removes the need for using full mesh VPC Peering and Transit VPC

Hybrid Connectivity

AWS Network Connectivity Decision Tree

Virtual Private Network (VPN)

VPC provides the option of creating an IPsec VPN connection between remote customer networks and their VPC over the internet

AWS managed VPN endpoint includes automated multi–data center redundancy & failover built into the AWS side of the VPN connection
AWS managed VPN consists of two parts
- Virtual Private Gateway (VPG) on AWS side
- Customer Gateway (CGW) on the on-premises data center
AWS Managed VPN only provides Site-to-Site VPN connectivity. It does not provide Point-to-Site VPC connectivity for e.g. from Mobile
Virtual Private Gateway are Highly Available as it represents two distinct VPN endpoints, physically located in separate data centers to increase the availability of the VPN connection.

High Availability on the on-premises data center must be handled by creating additional Customer Gateway.
AWS Managed VPN connections are low cost, quick to setup and start with compared to Direct Connect. However, they are not reliable as they traverse through Internet.

Software VPN

VPC offers the flexibility to fully manage both sides of the VPC connectivity by creating a VPN connection between your remote network and a software VPN appliance running in your VPC network.
Software VPNs help manage both ends of the VPN connection either for compliance purposes or for leveraging gateway devices that are not currently supported by Amazon VPC’s VPN solution.
Software VPNs allows you to handle Point-to-Site connectivity

Software VPNs, with the above design, introduces a single point of failure and needs to be handled.

Direct Connect – DX

AWS Direct Connect helps establish a dedicated private connection between an on-premises network and AWS.
Direct Connect can reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based or VPN connections

Direct Connect uses industry-standard VLANs to access EC2 instances running within a VPC using private IP addresses
Direct Connect lets you establish
- Dedicated Connection: A 1G, 10G, or 100G physical Ethernet connection associated with a single customer through AWS.
- Hosted Connection: A 1G or 10G physical Ethernet connection that an AWS Direct Connect Partner provisions on behalf of a customer.
Direct Connect provides the following Virtual Interfaces
- Private virtual interface – to access a VPC using private IP addresses.
- Public virtual interface – to access all AWS public services using public IP addresses.
- Transit virtual interface – to access one or more transit gateways associated with Direct Connect gateways.
Direct Connect connections are not redundant as each connection consists of a single dedicated connection between ports on your router and an Amazon router

Direct Connect High Availability can be configured using
- Multiple Direct Connect connections
- Back-up IPSec VPN connection

LAGs

Direct Connect link aggregation group (LAG) is a logical interface that uses the Link Aggregation Control Protocol (LACP) to aggregate multiple connections at a single AWS Direct Connect endpoint, allowing you to treat them as a single, managed connection.
LAGs need the following
- All connections in the LAG must use the same bandwidth.
- A maximum of four connections in a LAG. Each connection in the LAG counts toward the overall connection limit for the Region.
- All connections in the LAG must terminate at the same AWS Direct Connect endpoint.

Direct Connect Gateway

is a globally available resource to enable connections to multiple VPCs across different regions or AWS accounts.
allows you to connect an AWS Direct Connect connection to one or more VPCs in the account that are located in the same or different regions
allows connecting any participating VPCs from one private VIF, reducing Direct Connect management.

can be created in any public region and accessed from all other public regions
can also access the public resources in any AWS Region using a public virtual interface.

References

AWS VPC Connectivity Options