AWS EC2 Monitoring

EC2 Monitoring

Status Checks

  • Status monitoring helps quickly determine whether EC2 has detected any problems that might prevent instances from running applications.
  • EC2 performs automated checks on every running EC2 instance to identify hardware and software issues.
  • Status checks are performed every minute and each returns a pass or a fail status.
  • If all checks pass, the overall status of the instance is OK.
  • If one or more checks fail, the overall status is Impaired.
  • Status checks are built into EC2, so they cannot be disabled or deleted.
  • Status checks data augments the information that EC2 already provides about the intended state of each instance (such as pending, running, and stopping) as well as the utilization metrics that CloudWatch monitors (CPU utilization, network traffic, and disk activity).
  • Alarms can be created or deleted, that are triggered based on the result of the status checks. for e.g., an alarm can be created to warn if status checks fail on a specific instance.

System Status Checks

  • monitor the AWS systems, required to use the instance, to ensure they are working properly.
  • detect problems with the instance that require AWS involvement to repair.
  • System status checks failure might due to
    • Loss of network connectivity
    • Loss of system power
    • Software issues on the physical host
    • Hardware issues on the physical host
  • When a system status check fails, one can either
    • check Personal Health Dashboard for any scheduled critical maintenance by AWS to the instance’s host.
    • wait for AWS to fix the issue
    • or resolve it by stopping and restarting or terminating and replacing an instance

Instance Status Checks

  • monitor the software and network configuration of the individual instance
  • checks to detect problems that require involvement to repair.
  • Instance status checks failure might be due to
    • Failed system status checks
    • Misconfigured networking or startup configuration
    • Exhausted memory
    • Corrupted file system
    • Incompatible kernel
  • When an instance status check fails, it can be resolved by either rebooting the instance or by making modifications to the operating system

CloudWatch Monitoring

  • CloudWatch helps monitor EC2 instances, which collects and processes
    raw data from EC2 into readable, near real-time metrics.
  • Statistics are recorded for a period of two weeks so that historical information can be accessed and used to gain a better perspective on how
    the application or service is performing.
  • By default, Basic monitoring is enabled and EC2 metric data is sent to CloudWatch in 5-minute periods automatically
  • Detailed monitoring can be enabled on the EC2 instance, which sends data to CloudWatch in 1-minute periods.
  • Aggregating Statistics Across Instances/ASG/AMI ID
    • Aggregate statistics are available for the instances that have detailed monitoring (at an additional charge) enable, which provides data in 1-minute periods
    • Instances that use basic monitoring are not included in the aggregates.
    • CloudWatch does not aggregate data across Regions. Therefore, metrics are completely separate between regions.
    • CloudWatch returns statistics for all dimensions in the AWS/EC2 namespace if no dimension is specified
    • The technique for retrieving all dimensions across an AWS namespace does not work for custom namespaces published to CloudWatch.
    • Statistics include Sum, Average, Minimum, Maximum, Data Samples
    • With custom namespaces, the complete set of dimensions that are associated with any given data point to retrieve statistics that include the data point must be specified
  • CloudWatch alarms
    • can be created to monitor any one of the EC2 instance’s metrics.
    • can be configured to automatically send you a notification when the metric reaches a specified threshold.
    • can automatically stop, terminate, reboot, or recover EC2 instances
    • can automatically recover an EC2 instance when the instance becomes impaired due to an underlying hardware failure a problem that requires AWS involvement to repair
    • can automatically stop or terminate the instances to save costs (EC2 instances that use an EBS volume as the root device can be stopped
      or terminated, whereas instances that use the instance store as the root device can only be terminated)
    • can use EC2ActionsAccess IAM role, which enables AWS to perform stop, terminate, or reboot actions on EC2 instances
    • If you have read/write permissions for CloudWatch but not for EC2, alarms can still be created but the stop or terminate actions won’t be performed on the EC2 instance

EC2 Monitoring Metrics

  • CPUCreditUsage
    • (Only valid for T2 instances) The number of CPU credits consumed
      during the specified period.
    • This metric identifies the amount of time during which physical CPUs
      were used for processing instructions by virtual CPUs allocated to
      the instance.
    • CPU Credit metrics are available at a 5-minute frequency.
  • CPUCreditBalance
    • (Only valid for T2 instances) The number of CPU credits that an instance has accumulated.
    • This metric is used to determine how long an instance can burst beyond its baseline performance level at a given rate.
    • CPU Credit metrics are available at a 5-minute frequency.
  • CPUUtilization
    • % of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance.
  • DiskReadOps
    • Completed read operations from all instance store volumes available to the instance in a specified period of time.
  • DiskWriteOps
    • Completed write operations to all instance store volumes available to the instance in a specified period of time.
  • DiskReadBytes
    • Bytes read from all instance store volumes available to the instance.
    • This metric is used to determine the volume of the data the application reads from the hard disk of the instance.
    • This can be used to determine the speed of the application.
  • DiskWriteBytes
    • Bytes written to all instance store volumes available to the instance.
    • This metric is used to determine the volume of the data the application writes onto the hard disk of the instance.
    • This can be used to determine the speed of the application.
  • NetworkIn
    • The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to an application on a single instance.
  • NetworkOut
    • The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic to an application on a single instance.
  • NetworkPacketsIn
    • The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance.
    • This metric is available for basic monitoring only
  • NetworkPacketsOut
    • The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance.
    • This metric is available for basic monitoring only.
  • StatusCheckFailed
    • Reports if either of the status checks, StatusCheckFailed_Instance and StatusCheckFailed_System that has failed.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status checks passed. A one indicates a status check failure.
    • Status check metrics are available at a 1-minute frequency
  • StatusCheckFailed_Instance
    • Reports whether the instance has passed the EC2 instance status check in the last minute.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status checks passed. A one indicates a status check failure.
    • Status check metrics are available at a 1-minute frequency
  • StatusCheckFailed_System
    • Reports whether the instance has passed the EC2 system status check in the last minute.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status checks passed. A one indicates a status check failure.
    • Status check metrics are available at a 1-minute frequency

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. In the basic monitoring package for EC2, Amazon CloudWatch provides the following metrics:
    1. Web server visible metrics such as number failed transaction requests
    2. Operating system visible metrics such as memory utilization
    3. Database visible metrics such as number of connections
    4. Hypervisor visible metrics such as CPU utilization
  2. Which of the following requires a custom CloudWatch metric to monitor?
    1. Memory Utilization of an EC2 instance
    2. CPU Utilization of an EC2 instance
    3. Disk usage activity of an EC2 instance
    4. Data transfer of an EC2 instance
  3. A user has configured CloudWatch monitoring on an EBS backed EC2 instance. If the user has not attached any additional device, which of the below mentioned metrics will always show a 0 value?
    1. DiskReadBytes
    2. NetworkIn
    3. NetworkOut
    4. CPUUtilization
  4. A user is running a batch process on EBS backed EC2 instances. The batch process starts a few instances to process Hadoop Map reduce jobs, which can run between 50 – 600 minutes or sometimes for more time. The user wants to configure that the instance gets terminated only when the process is completed. How can the user configure this with CloudWatch?
    1. Setup the CloudWatch action to terminate the instance when the CPU utilization is less than 5%
    2. Setup the CloudWatch with Auto Scaling to terminate all the instances
    3. Setup a job which terminates all instances after 600 minutes
    4. It is not possible to terminate instances automatically
  5. An AWS account owner has setup multiple IAM users. One IAM user only has CloudWatch access. He has setup the alarm action, which stops the EC2 instances when the CPU utilization is below the threshold limit. What will happen in this case?
    1. It is not possible to stop the instance using the CloudWatch alarm
    2. CloudWatch will stop the instance when the action is executed
    3. The user cannot set an alarm on EC2 since he does not have the permission
    4. The user can setup the action but it will not be executed if the user does not have EC2 rights
  6. A user has launched 10 instances from the same AMI ID using Auto Scaling. The user is trying to see the average CPU utilization across all instances of the last 2 weeks under the CloudWatch console. How can the user achieve this?
    1. View the Auto Scaling CPU metrics (Refer AS Instance Monitoring)
    2. Aggregate the data over the instance AMI ID (Works but needs detailed monitoring enabled)
    3. The user has to use the CloudWatchanalyser to find the average data across instances
    4. It is not possible to see the average CPU utilization of the same AMI ID since the instance ID is different

References

EC2_Monitoring

AWS Simple Email Service – SES

AWS Simple Email Service – SES

  • SES is a fully managed service that provides an email platform with an easy, cost-effective way to send and receive email using your own email addresses and domains.
  • can be used to send both transactional and promotional emails securely, and globally at scale.
  • acts as an outbound email server and eliminates the need to support its own software or applications to do the heavy lifting of email transport.
  • acts as an inbound email server to receive emails that can help develop software solutions such as email autoresponders, email unsubscribe systems, and applications that generate customer support tickets from incoming emails.
  • existing email server can also be configured to send outgoing emails through SES with no change in any settings in the email clients
  • Maximum message size including attachments is 10 MB per message (after base64 encoding).
  • integrated with CloudWatch and CloudTrail

SES Characteristics

  • Compatible with SMTP
  • Applications can send email using a single API call in many supported languages Java, .Net, PHP, Perl, Ruby, HTTPS, etc
  • Optimized for the highest levels of uptime, availability, and scales as per the demand
  • Provides sandbox environment for testing
  • provides Reputation dashboard, performance insights, anti-spam feedback
  • provides statistics on email deliveries, bounces, feedback loop results, emails opened, etc.
  • supports DomainKeys Identified Mail (DKIM) and Sender Policy Framework (SPF)
  • supports flexible deployment: shared, dedicated, and customer-owned IPs
  • supports attachments with many popular content formats, including documents, images, audio, and video, and scans every attachment for viruses and malware.
  • integrates with KMS to provide the ability to encrypt the mail that it writes to the S3 bucket.
  • uses client-side encryption to encrypt the mail before it sends the email to  S3.

Sending Limits

  • Production SES has a set of sending limits which include
    • Sending Quota – max number of emails in a 24-hour period
    • Maximum Send Rate – max number of emails per second
  • SES automatically adjusts the limits upward as long as emails are of high quality and they are sent in a controlled manner, as any spike in the email sent might be considered to be spam.
  • Limits can also be raised by submitting a Quota increase request

SES Best Practices

  • Send high-quality and real production content that the recipients want
  • Only send to those who have signed up for the mail
  • Unsubscribe recipients who have not interacted with the business recently
  • Have low bounce and compliant rates and remove bounced or complained addresses, using SNS to monitor bounces and complaints, treating them as an opt-out
  • Monitor the sending activity

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon SES stand for?
    1. Simple Elastic Server
    2. Simple Email Service
    3. Software Email Solution
    4. Software Enabled Server
  2. Your startup wants to implement an order fulfillment process for selling a personalized gadget that needs an average of 3-4 days to produce with some orders taking up to 6 months you expect 10 orders per day on your first day. 1000 orders per day after 6 months and 10,000 orders after 12 months. Orders coming in are checked for consistency men dispatched to your manufacturing plant for production quality control packaging shipment and payment processing If the product does not meet the quality standards at any stage of the process employees may force the process to repeat a step Customers are notified via email about order status and any critical issues with their orders such as payment failure. Your case architecture includes AWS Elastic Beanstalk for your website with an RDS MySQL instance for customer data and orders. How can you implement the order fulfillment process while making sure that the emails are delivered reliably? [PROFESSIONAL]
    1. Add a business process management application to your Elastic Beanstalk app servers and re-use the ROS database for tracking order status use one of the Elastic Beanstalk instances to send emails to customers.
    2. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1 Use the decider instance to send emails to customers.
    3. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1 use SES to send emails to customers.
    4. Use an SQS queue to manage all process tasks Use an Auto Scaling group of EC2 Instances that poll the tasks and execute them. Use SES to send emails to customers.

References

AWS Aurora Global Database vs. DynamoDB Global Tables

AWS Aurora Global Database vs DynamoDB Global Tables

AWS Aurora Global Database vs. DynamoDB Global Tables

Aurora Global Database

  • Aurora Global Database provides a relational database supporting MySQL and PostgreSQL
  • Aurora Global Database consists of one primary AWS Region where the data is mastered, and up to five read-only, secondary AWS Regions.
  • Aurora cluster in the primary AWS Region where the data is mastered performs both read and write operations. The clusters in the secondary Regions enable low-latency reads.
  • Aurora replicates data to the secondary AWS Regions with a typical latency of under a second.
  • Secondary clusters can be scaled independently by adding one or more DB instances (Aurora Replicas) to serve read-only workloads.
  • Aurora Global Database uses dedicated infrastructure to replicate the data, leaving database resources available entirely to serve applications.
  • Applications with a worldwide footprint can use reader instances in the secondary AWS Regions for low-latency reads.
  • Typical cross-region replication takes less than 1 second.
  • In case of a disaster or an outage, one of the clusters in a secondary AWS Region can be promoted to take full read/write workloads in under a minute.
  • However, the process is not automatic. If the primary region becomes unavailable, you can manually remove a secondary region from an Aurora Global Database and promote it to take full reads and writes. You will also need to point the application to the newly promoted region.

DynamoDB Global Tables

  • DynamoDB Global tables provide NoSQL database.
  • DynamoDB Global tables provide a fully managed, multi-Region, and multi-active database that delivers fast, local, read and write performance for massively scaled, global applications.
  • Global tables replicate the DynamoDB tables automatically across the choice of AWS Regions and enable reads and writes on all instances.
  • DynamoDB global table consists of multiple replica tables (one per AWS Region). Every replica has the same table name and the same primary key schema. When an application writes data to a replica table in one Region, DynamoDB propagates the write to the other replica tables in the other AWS Regions automatically.
  • Global tables enable the read and write of data locally providing single-digit-millisecond latency for the globally distributed application at any scale. It provides asynchronous replication with approximately 1-second replication latency for tables between two or more Regions.
  • DynamoDB Global tables are designed for 99.999% availability.
  • DynamoDB Global tables enable the applications to stay highly available even in the unlikely event of isolation or degradation of an entire Region, your application can redirect to a different Region and perform reads and writes against a different replica table.

AWS Aurora Global Database vs DynamoDB Global Tables

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company needs to implement a relational database with a multi-region disaster recovery Recovery Point Objective (RPO) of 1 second and a Recovery Time Objective (RTO) of 1 minute. Which AWS solution can achieve this?
    1. Amazon Aurora Global Database
    2. Amazon DynamoDB global tables
    3. Amazon RDS for MySQL with Multi-AZ enabled
    4. Amazon RDS for MySQL with a cross-Region snapshot copy

References

AWS RDS Aurora

AWS Aurora Architecture

AWS RDS Aurora

  • AWS RDS Aurora is a relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases.
  • is a fully managed, MySQL- and PostgreSQL-compatible, relational database engine i.e. applications developed with MySQL can switch to Aurora with little or no changes.
  • delivers up to 5x the performance of MySQL and up to 3x the performance of PostgreSQL without requiring any changes to most MySQL applications
  • is fully managed as RDS manages the databases, handling time-consuming tasks such as provisioning, patching, backup, recovery, failure detection, and repair.
  • can scale storage automatically, based on the database usage, from 10GB to 128TiB in 10GB increments with no impact on database performance

Aurora DB Clusters

AWS Aurora Architecture

  • Aurora DB cluster consists of one or more DB instances and a cluster volume that manages the data for those DB instances.
  • A cluster volume is a virtual database storage volume that spans multiple AZs, with each AZ having a copy of the DB cluster data
  • Two types of DB instances make up an Aurora DB cluster:
    • Primary DB instance
      • Supports read and write operations, and performs all data modifications to the cluster volume.
      • Each DB cluster has one primary DB instance.
    • Aurora Replica
      • Connects to the same storage volume as the primary DB instance and supports only read operations.
      • Each DB cluster can have up to 15 Aurora Replicas in addition to the primary DB instance.
      • Provides high availability by locating Replicas in separate AZs
      • Aurora automatically fails over to a Replica in case the primary DB instance becomes unavailable.
      • Failover priority for Replicas can be specified.
      • Replicas can also offload read workloads from the primary DB instance
  • For Aurora multi-master clusters
    • all DB instances have read/write capability, with no difference between primary and replica.

Aurora Connection Endpoints

  • Aurora involves a cluster of DB instances instead of a single instance
  • Endpoint refers to an intermediate handler with the hostname and port specified to connect to the cluster
  • Aurora uses the endpoint mechanism to abstract these connections

Cluster endpoint

  • Cluster endpoint (or writer endpoint) for a DB cluster connects to the current primary DB instance for that DB cluster.
  • Cluster endpoint is the only one that can perform write operations such as DDL statements as well as read operations
  • Each DB cluster has one cluster endpoint and one primary DB instance
  • Cluster endpoint provides failover support for read/write connections to the DB cluster. If a DB cluster’s current primary DB instance fails, Aurora automatically fails over to a new primary DB instance.
  • During a failover, the DB cluster continues to serve connection requests to the cluster endpoint from the new primary DB instance, with minimal interruption of service.

Reader endpoint

  • Reader endpoint for a DB cluster provides load-balancing support for read-only connections to the DB cluster.
  • Use the reader endpoint for read operations, such as queries.
  • Reader endpoint reduces the overhead on the primary instance by processing the statements on the read-only Replicas.
  • Each DB cluster has one reader endpoint.
  • If the cluster contains one or more Replicas, the reader endpoint load balances each connection request among the Replicas.

Custom endpoint

  • Custom endpoint for a DB cluster represents a set of DB instances that you choose.
  • Aurora performs load balancing and chooses one of the instances in the group to handle the connection.
  • An Aurora DB cluster has no custom endpoints until one is created and up to five custom endpoints can be created for each provisioned cluster.
  • Aurora Serverless clusters do not support custom endpoints.

Instance endpoint

  • An instance endpoint connects to a specific DB instance within a cluster and provides direct control over connections to the DB cluster.
  • Each DB instance in a DB cluster has its own unique instance endpoint. So there is one instance endpoint for the current primary DB instance of the DB cluster, and there is one instance endpoint for each of the Replicas in the DB cluster.

High Availability and Replication

  • Aurora is designed to offer greater than 99.99% availability
  • provides data durability and reliability
    • by replicating the database volume six ways across three Availability Zones in a single region
    • backing up the data continuously to  S3.
  • transparently recovers from physical storage failures; instance failover typically takes less than 30 seconds.
  • automatically fails over to a new primary DB instance, if the primary DB instance fails, by either promoting an existing Replica to a new primary DB instance or creating a new primary DB instance
  • automatically divides the database volume into 10GB segments spread across many disks. Each 10GB chunk of the database volume is replicated six ways, across three Availability Zones
  • is designed to transparently handle
    • the loss of up to two copies of data without affecting database write availability and
    • up to three copies without affecting read availability.
  • provides self-healing storage. Data blocks and disks are continuously scanned for errors and repaired automatically.
  • Replicas share the same underlying volume as the primary instance. Updates made by the primary are visible to all Replicas.
  • As Replicas share the same data volume as the primary instance, there is virtually no replication lag.
  • Any Replica can be promoted to become primary without any data loss and therefore can be used for enhancing fault tolerance in the event of a primary DB Instance failure.
  • To increase database availability, 1 to 15 replicas can be created in any of 3 AZs, and RDS will automatically include them in failover primary selection in the event of a database outage.

Aurora Failovers

  • Aurora automatically fails over, if the primary instance in a DB cluster fails, in the following order:
    • If Aurora Read Replicas are available, promote an existing Read Replica to the new primary instance.
    • If no Read Replicas are available, then create a new primary instance.
  • If there are multiple Aurora Read Replicas, the criteria for promotion is based on the priority that is defined for the Read Replicas.
    • Priority numbers can vary from 0 to 15 and can be modified at any time.
    • PostgreSQL promotes the Aurora Replica with the highest priority to the new primary instance.
    • For Read Replicas with the same priority, PostgreSQL promotes the replica that is largest in size or in an arbitrary manner.
  • During the failover, AWS modifies the cluster endpoint to point to the newly created/promoted DB instance.
  • Applications experience a minimal interruption of service if they connect using the cluster endpoint and implement connection retry logic.

Security

  • Aurora uses SSL (AES-256) to secure the connection between the database instance and the application
  • allows database encryption using keys managed through AWS Key Management Service (KMS).
  • Encryption and decryption are handled seamlessly.
  • With encryption, data stored at rest in the underlying storage is encrypted, as are its automated backups, snapshots, and replicas in the same cluster.
  • Encryption of existing unencrypted Aurora instances is not supported. Create a new encrypted Aurora instance and migrate the data

Backup and Restore

  • Automated backups are always enabled on Aurora DB Instances.
  • Backups do not impact database performance.
  • Aurora also allows the creation of manual snapshots.
  • Aurora automatically maintains 6 copies of the data across 3 AZs and will automatically attempt to recover the database in a healthy AZ with no data loss.
  • If in any case, the data is unavailable within Aurora storage,
    • DB Snapshot can be restored or
    • the point-in-time restore operation can be performed to a new instance. The latest restorable time for a point-in-time restore operation can be up to 5 minutes in the past.
  • Restoring a snapshot creates a new Aurora DB instance
  • Deleting the database deletes all the automated backups (with an option to create a final snapshot), but would not remove the manual snapshots.
  • Snapshots (including encrypted ones) can be shared with other AWS accounts

Aurora Parallel Query

  • Aurora Parallel Query refers to the ability to push down and distribute the computational load of a single query across thousands of CPUs in Aurora’s storage layer.
  • Without Parallel Query, a query issued against an Aurora database would be executed wholly within one instance of the database cluster; this would be similar to how most databases operate.
  • Parallel Query is a good fit for analytical workloads requiring fresh data and good query performance, even on large tables.
  • Parallel Query provides the following benefits
    • Faster performance: Parallel Query can speed up analytical queries by up to 2 orders of magnitude.
    • Operational simplicity and data freshness: you can issue a query directly over the current transactional data in your Aurora cluster.
    • Transactional and analytical workloads on the same database: Parallel Query allows Aurora to maintain high transaction throughput alongside concurrent analytical queries.
  • Parallel Query can be enabled and disabled dynamically at both the global and session level using the aurora_pq parameter.
  • Parallel Query is available for the MySQL 5.6-compatible version of Aurora

Aurora Scaling

  • Aurora storage scaling is built-in and will automatically grow, up to 64 TB (soft limit), in 10GB increments with no impact on database performance.
  • There is no need to provision storage in advance
  • Compute Scaling
    • Instance scaling
      • Vertical scaling of the master instance. Memory and CPU resources are modified by changing the DB Instance class.
      • scaling the read replica and promoting it to master using forced failover which provides a minimal downtime
    • Read scaling
      • provides horizontal scaling with up to 15 read replicas
  • Auto Scaling
    • Scaling policies to add read replicas with min and max replica count based on scaling CloudWatch CPU or connections metrics condition

Aurora Backtrack

  • Backtracking “rewinds” the DB cluster to the specified time
  • Backtracking performs in-place restore and does not create a new instance. There is minimal downtime associated with it.
  • Backtracking is available for Aurora with MySQL compatibility
  • Backtracking is not a replacement for backing up the DB cluster so that you can restore it to a point in time.
  • With backtracking, there is a target backtrack window and an actual backtrack window:
    • Target backtrack window is the amount of time you WANT the DB cluster can be backtracked for e.g 24 hours. The limit for a backtrack window is 72 hours.
    • Actual backtrack window is the actual amount of time you CAN backtrack the DB cluster, which can be smaller than the target backtrack window. The actual backtrack window is based on the workload and the storage available for storing information about database changes, called change records
  • DB cluster with backtracking enabled generates change records.
  • Aurora retains change records for the target backtrack window and charges an hourly rate for storing them.
  • Both the target backtrack window and the workload on the DB cluster determine the number of change records stored.
  • Workload is the number of changes made to the DB cluster in a given amount of time. If the workload is heavy, you store more change records in the backtrack window than you do if your workload is light.
  • Backtracking affects the entire DB cluster and can’t selectively backtrack a single table or a single data update.
  • Backtracking provides the following advantages over traditional backup and restore:
    • Undo mistakes – revert destructive action, such as a DELETE without a WHERE clause
    • Backtrack DB cluster quickly – Restoring a DB cluster to a point in time launches a new DB cluster and restores it from backup data or a DB cluster snapshot, which can take hours. Backtracking a DB cluster doesn’t require a new DB cluster and rewinds the DB cluster in minutes.
    • Explore earlier data changes – repeatedly backtrack a DB cluster back and forth in time to help determine when a particular data change occurred

Aurora Serverless

  • Amazon Aurora Serverless is an on-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora.
  • An Aurora Serverless DB cluster automatically starts up, shuts down, and scales capacity up or down based on the application’s needs.
  • enables running database in the cloud without managing any database instances.
  • provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.
  • use cases include
    • Infrequently-Used Applications
    • New Applications – where the needs and instance size are yet to be determined.
    • Variable and Unpredictable Workloads – scale as per the needs
    • Development and Test Databases
    • Multi-tenant Applications
  • DB cluster does not have a public IP address and can be accessed only from within a VPC based on the VPC service.

Aurora Global Database

  • Aurora global database consists of one primary AWS Region where the data is mastered, and up to five read-only, secondary AWS Regions.
  • Aurora cluster in the primary AWS Region where your data is mastered performs both read and write operations. The clusters in the secondary Regions enable low-latency reads.
  • Aurora replicates data to the secondary AWS Regions with a typical latency of under a second.
  • Secondary clusters can be scaled independently by adding one of more DB instances (Aurora Replicas) to serve read-only workloads.
  • Aurora global database uses dedicated infrastructure to replicate the data, leaving database resources available entirely to serve applications.
  • Applications with a worldwide footprint can use reader instances in the secondary AWS Regions for low-latency reads.
  • In case of a disaster or an outage, one of the cluster in a secondary AWS Regions can be promoted to take full read/write workloads in under a min.

Aurora Clone

  • Aurora cloning feature helps create Aurora cluster duplicates quickly and cost-effectively
  • Creating a clone is faster and more space-efficient than physically copying the data using a different technique such as restoring a snapshot.
  • Aurora cloning uses a copy-on-write protocol.
  • Aurora clone requires only minimal additional space when first created. In the beginning, Aurora maintains a single copy of the data, which is used by both the original and new DB clusters.
  • Aurora allocates new storage only when data changes, either on the source cluster or the cloned cluster.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Company wants to use MySQL compatible relational database with greater performance. Which AWS service can be used?
    1. Aurora
    2. RDS
    3. SimpleDB
    4. DynamoDB
  2. An application requires a highly available relational database with an initial storage capacity of 8 TB. The database will grow by 8 GB every day. To support expected traffic, at least eight read replicas will be required to handle database reads. Which option will meet these requirements?
    1. DynamoDB
    2. Amazon S3
    3. Amazon Aurora
    4. Amazon Redshift
  3. A company is migrating their on-premise 10TB MySQL database to AWS. As a compliance requirement, the company wants to have the data replicated across three availability zones. Which Amazon RDS engine meets the above business requirement?
    1. Use Multi-AZ RDS
    2. Use RDS
    3. Use Aurora
    4. Use DynamoDB

References

AWS_Aurora

AWS Network Firewall vs Gateway Load Balancer

AWS Network Firewall vs Gateway Load Balancer

AWS Network Firewall vs Gateway Load Balancer

AWS Network Firewall vs Gateway Load Balancer

AWS Network Firewall

  • AWS Network Firewall is a stateful, fully managed, network firewall and intrusion detection and prevention service (IDS/IPS) for VPCs.
  • Network Firewall scales automatically with the network traffic, without the need for deploying and managing any infrastructure.
  • AWS Network Firewall cost covers
    • an hourly rate for each firewall endpoint,
    • the amount of traffic and data processing charges, billed by the gigabyte, processed by the firewall endpoint,
    • standard AWS data transfer charges for all data transferred via the AWS Network Firewall.

AWS Gateway Load Balancer

  • Gateway Load Balancer helps deploy, scale, and manage virtual appliances, such as firewalls, intrusion detection and prevention systems (IDS/IPS), and deep packet inspection systems.
  • is architected to handle millions of requests/second, volatile traffic patterns, and introduces extremely low latency.
  • AWS Gateway Load Balancer cost covers
    • charges for each hour or partial hour that a GWLB is running,
    • the number of Gateway Load Balancer Capacity Units (GLCU) used by Gateway Load Balancer per hour.
    • GWLB uses Gateway Load Balancer Endpoint (GWLBE) to simplify how applications can securely exchange traffic with GWLB across VPC boundaries. GWLBE is priced and billed separately.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

AWS Certified Advanced Networking – Specialty ANS-C01 Exam Learning Path

AWS Certified Advanced Networking - Specialty Certificate

AWS Certified Advanced Networking – Specialty ANS-C01 Exam Learning Path

I recently certified/recertified for the AWS Certified Advanced Networking – Specialty (ANS-C01). Frankly, Networking is something that I am still diving deep into and I just about managed to get through. So a word of caution, this exam is inline or tougher than the professional exams, especially for the reason that some of the Networking concepts covered are not something you can get your hands dirty with easily.

AWS Certified Advanced Networking – Specialty ANS-C01 Exam Content

  • AWS Certified Advanced Networking – Specialty (ANS-C01) exam focuses on the AWS Networking concepts. It basically validates
    • Design and develop hybrid and cloud-based networking solutions by using AWS
    • Implement core AWS networking services according to AWS best practices
    • Operate and maintain hybrid and cloud-based network architecture for all AWS services
    • Use tools to deploy and automate hybrid and cloud-based AWS networking tasks
    • Implement secure AWS networks using AWS native networking constructs and services

Refer to AWS Certified Advanced Networking – Specialty Exam Guide AWS Certified Advanced Networking - Specialty ANS-C01 Exam Domains

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Resources

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Summary

  • Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • ANS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question. 65 questions consists of 50 scored and 15 unscored questions.
  • ANS-C01 exam includes two types of questions, multiple-choice and multiple-response.
  • ANS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
  • Each question mainly touches multiple AWS services.
  • Specialty exams currently cost $ 300 + tax.
  • You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
  • As always, mark the questions for review and move on and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
  • AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Topics

  • AWS Certified Networking – Specialty (ANS-C01) exam focuses a lot on Networking concepts involving Hybrid Connectivity with Direct Connect, VPN, Transit Gateway, Direct Connect Gateway, and a bit of VPC, Route 53, ALB, NLB & CloudFront.

Networking & Content Delivery

  • Virtual Private Cloud – VPC
    • Understand VPC, Subnets
    • AWS allows extending the VPC by adding a secondary VPC
    • Understand Security Groups, NACLs
    • VPC Flow Logs
      • help capture information about the IP traffic going to and from network interfaces in the VPC and can help in monitoring the traffic or troubleshooting any connectivity issues
      • NACLs are stateless and how it is reflected in VPC Flow Logs
        • If ACCEPT followed by REJECT, inbound was accepted by Security Groups and ACLs. However, rejected by NACLs outbound
        • If REJECT, inbound was either rejected by Security Groups OR NACLs.
      • Use pkt-dstaddr instead of dstaddr to track the destination address as dstaddr refers to the primary ENI address always and not the secondary addresses.
      • Pattern: VPC Flow Logs -> CloudWatch Logs -> (Subscription) -> Kinesis Data Firehose -> S3/Open Search.
    • DHCP Option Sets esp. how to resolve DNS from both on-premises data center and AWS.
    • VPC Peering
      • helps point-to-point connectivity between 2 VPCs which can be in the same or different regions and accounts.
      • know VPC Peering Limitations esp. it does not allow overlapping CIDRs and transitive routing.
    • Placement Groups determine how the instances are placed on the underlying hardware
    • VRF – Virtual Routing & Forwarding can be used to route traffic to the same customer gateway from multiple VPCs, that can be overlapping.
  • VPC Endpoints
    • VPC Gateway Endpoints for connectivity with S3 & DynamoDB i.e. VPC -> VPC Gateway Endpoints -> S3/DynamoDB.
    • VPC Interface Endpoints or Private Links for other AWS services and custom hosted services i.e. VPC -> VPC Interface Endpoint OR Private Link -> S3/Kinesis/SQS/CloudWatch/Any custom endpoint.
    • S3 gateway endpoints cannot be accessed through VPC Peering, VPN, or Direct Connect. Need HTTP proxy to route traffic.
    • S3 Private Link can be accessed through VPC Peering, VPN, or Direct Connect. Need to use an endpoint-specific DNS name.
    • VPC endpoint policy can be configured to control which S3 buckets can be accessed and the S3 Bucket policy can be used to control which VPC (includes all VPC Endpoints) or VPC Endpoint can access it.
    • Private Link Patterns
  • VPC Network Access Analyzer
    • helps identify unintended network access to the resources on AWS.
  • Transit Gateway
    • helps consolidate the AWS VPC routing configuration for a region with a hub-and-spoke architecture.
    • Appliance Mode ensures that network flows are symmetrically routed to the same AZ and network appliance
    • Transit Gateway Connect attachment can be used to connect SD-WAN to AWS Cloud. This supports GRE.
    • Transit Gateways are regional and Peering can connect Transit Gateways across regions.
    • Transit Gateway Network Manager includes events and metrics to monitor the quality of the global network, both in AWS and on-premises.
  • VPC Routing Priority
  • NAT Gateways
    • for HA, Scalable, Outgoing traffic. Does not support Security Groups or ICMP pings.
    • times out the connection if it is idle for 350 seconds or more. To prevent the connection from being dropped, initiate more traffic over the connection or enable TCP keepalive on the instance with a value of less than 350 seconds.
    • supports Private NAT Gateways for internal communication.
  • Virtual Private Network
    • to establish connectivity between the on-premises data center and AWS VPC
  • Direct Connect
    • to establish connectivity between the on-premises data center and AWS VPC and Public Services
    • Direct Connect connections – Dedicated and Hosted connections
    • Understand how to create a Direct Connect connection
      • LOA-CFA provides the details for partners to connect to the AWS Direct Connect location
    • Virtual interfaces options – Private Virtual Interface for VPC resources and Public Virtual Interface for Public Resources
      • Private VIF is for resources within a VPC
      • Public VIF is for AWS public resources
      • Private VIF has a limit of 100 routes and Public VIF of 1000 routes. Summarize the routes if you need to configure more.
    • Understand setup Private and Public VIF
    • Understand High Availability options based on cost and time i.e. Second Direct Connect connection OR VPN connection
    • Direct Connect Gateway
      • it provides a way to connect to multiple VPCs from an on-premises data center using the same Direct Connect connection.
      • can connect to VGW or TGW.
    • Understand Active/Passive Direct Connect 
    • supports MACsec which delivers native, near line-rate, point-to-point encryption ensuring that data communications between AWS and the data center, office, or colocation facility remain protected.
    • Understand Route Propagation, propagation priority, BGP connectivity
      • BGP prefers the shortest AS PATH to get to the destination. Traffic from the VPC to on-premises uses the primary router. This is because the secondary router advertises a longer AS-PATH.
      • AS PATH prepending doesn’t work when the Direct Connect connections are in different AWS Regions than the VPC.
      • AS PATH works from AWS to on-premises and Local Pref from on-premises to AWS
      • Use Local Preference BGP community tags to configure Active/Passive when the connections are from different regions. The higher tag has a higher preference for 7224:7300 > 7224:7100
      • NO_EXPORT works only for Public VIFs
      • 7224:9100, 7224:9200, and 7224:9300 apply only to public prefixes. Usually used to restrict traffic to regions. Can help control if routes should propagate to the local Region only, all Regions within a continent, or all public Regions.
        • 7224:9100 — Local AWS Region
        • 7224:9200 — All AWS Regions for a continent, North America–wide, Asia Pacific, Europe, the Middle East and Africa
        • 7224:9300 — Global (all public AWS Regions)
      • 7224:8100 — Routes that originate from the same AWS Region in which the AWS Direct Connect point of presence is associated.
      • 7224:8200 — Routes that originate from the same continent with which the AWS Direct Connect point of presence is associated.
      • No-tag — Global (all public AWS Regions).
  • Route 53
    • provides a highly available and scalable DNS web service.
    • Routing Policies and their use cases Focus on Weighted,  Latency, and Failover routing policies.
    • supports Alias resource record sets, which enables routing of queries to a CloudFront distribution, Elastic Beanstalk, ELB, an S3 bucket configured as a static website, or another Route 53 resource record set.
    • CNAME does not support zone apex or root records. 
    • Route 53 DNSSEC
      • secures DNS traffic, and helps protect a domain from DNS spoofing man-in-the-middle attacks. 
      • Requirements
        • Asymmetric Customer Managed Keys
        • us-east-1 with ECC_NIST_P256 spec
    • Route 53 Resolver DNS Firewall
      • protection for outbound DNS requests from the VPCs and can monitor and control the domains that the applications can query.
      • allows you to define allow and deny list.
      • can be used for DNS exfiltration.
      • supports FirewallFailOpen configuration which determines how Route 53 Resolver handles queries during failures.
        • disabled, favors security over availability and blocks queries that it is unable to evaluate properly.
        • enabled, favors availability over security and allows queries to proceed if it is unable to properly evaluate them.
    • Route 53 Resolver (Hybrid DNS)
      • Inbound Endpoint for On-premises -> AWS
      • Outbound Endpoint for AWS -> On-premises
    • Route 53 DNS Query Logging
      • Can be logged to CloudWatch logs, S3, and Kinesis Data Firehose
    • Route 53 Resolver rules take precedence over privately hosted zones.
    • Route 53 Split View DNS helps to have the same DNS to access a site externally and internally
    • Know the Domain Migration process
  • CloudFront
    • provides a fully managed, fast CDN service that speeds up the distribution of static, dynamic web, or streaming content to end-users.
    • supports geo-restriction, WAF & AWS Shield for protection.
    • provides Cloud Functions (Edge location) & Lambda@Edge (Regional location) to execute scripts closer to the user.
    • supports encryption at rest and end-to-end encryption
    • CloudFront Origin Shield
      • helps improve the cache hit ratio and reduce the load on the origin.
      • requests from other regional caches would hit the Origin shield rather than the Origin.
      • should be placed at the regional cache and not in the edge cache
      • should be deployed to the region closer to the origin server
  • Global Accelerator
    • provides 2 static IPs
    • does not support client IP address preservation for NLB and Elastic IP address endpoints.
    • does not support IPv6 address
    • know CloudFront vs Global Accelerator
  • Understand ELB, ALB and NLB
    • Differences between ALB and NLB
    • ALB provides Content, Host, and Path-based Routing while NLB provides the ability to have a static IP address
    • Maintain original Client IP to the backend instances using X-Forwarded-for and Proxy Protocol
    • ALB/NLB do not support TLS renegotiation or mutual TLS authentication (mTLS). For implementing mTLS, use NLB with TCP listener on port 443 and terminate on the instances.
    • NLB
      • also provides local zonal endpoints to keep the traffic within AZ
      • can front Private Link endpoints and provide static IPs.
    • ALB supports Forward Secrecy, through Security Policies, that provide additional safeguards against the eavesdropping of encrypted data, through the use of a unique random session key.
    • Supports sticky session feature (session affinity) to enable the LB to bind a user’s session to a specific target. This ensures that all requests from the user during the session are sent to the same target. Sticky Sessions is configured on the target groups.
  • Gateway Load Balancer – GWLB
    • helps deploy, scale, and manage virtual appliances, such as firewalls, IDS/IPS systems, and deep packet inspection systems.
  • Athena integrates with S3 only and not with CloudWatch logs.
  • Transit VPC
    • helps connect multiple, geographically disperse VPCs and remote networks in order to create a global network transit center.
    • Use Transit Gateway instead now.
  • Know CloudHub and its use case

Security

  • AWS GuardDuty
    • managed threat detection service
    • provides Malware protection
  • AWS Shield
    • managed DDoS protection service
    • AWS Shield Advanced provides 24×7 access to the AWS Shield Response Team (SRT), protection against DDoS-related spike, and DDoS cost protection to safeguard against scaling charges.
  • WAF as Web Traffic Firewall
    • helps protect web applications from attacks by allowing rules configuration that allow, block, or monitor (count) web requests based on defined conditions.
    • integrates with CloudFront, ALB, API Gateway to dynamically detect and prevent attacks
  • Network Firewall
  • AWS Inspector
    • is a vulnerability management service that continuously scans the AWS workloads for vulnerabilities

Monitoring & Management Tools

  • Understand AWS CloudFormation esp. in terms of Network creation.
    • Custom resources can be used to handle activities not supported by AWS
    • While configuring VPN connections use depends_on on route tables to define a dependency on other resources as the VPN gateway route propagation depends on a VPC-gateway attachment when you have a VPN gateway.
  • AWS Config
    • fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security, compliance, and governance.
    • can be used to monitor resource changes e.g. Security Groups and invoke Systems Manager Automation scripts for remediation.
  • CloudTrail for audit and governance

Integration Tools

Networking Architecture Patterns

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Day

  • Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
  • If you are taking the AWS Online exam
    • Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
    • The online verification process does take some time and usually, there are glitches.
    • Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
    • Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Network Architecture Patterns

AWS Network Architecture Patterns

On-premises -> S3 Private Link -> S3 (Without Internet Gateway or S3 Gateway Endpoint)

 Data flow diagram shows access from on-premises and in-VPC apps to S3 using an interface endpoint and AWS PrivateLink.
  • Interface endpoints in the VPC can route both in-VPC applications and on-premises applications to S3 over the Amazon network.
  • On-premises network uses Direct Connect or AWS VPN to connect to VPC.
  • On-premises applications in VPC A use endpoint-specific DNS names to access S3 through the S3 interface endpoint.
  • On-premises applications send data to the interface endpoint in the VPC through AWS Direct Connect (or AWS VPN). AWS PrivateLink moves the data from the interface endpoint to S3 over the AWS network.
  • VPC applications can also send traffic to the interface endpoint. AWS PrivateLink moves the data from the interface endpoint to S3 over the AWS network.

On-premises -> Proxy -> Gateway Endpoint -> S3

On-premises + VPC Endpoint + S3

  • VPC endpoints are only accessible from EC2 instances inside a VPC, a local instance must proxy all remote requests before they can
    utilize a VPC endpoint connection.
  • Proxy farm proxies S3 traffic to the VPC endpoint. Configure an Auto Scaling group to manage the proxy servers and automatically grow or shrink the number of required instances based on proxy server load.

Direct Connect Gateway + Transit Gateway

AWS Direct Connect Gateway + Transit Gateway

  • AWS Direct Connect Gateway does not support transitive routing and has limits on the number of VGWs that can be connected.
  • AWS Direct Connect Gateway can be combined with AWS Transit Gateway using transit VIF attachment which enables your network to connect up to three regional centralized routers over a private dedicated connection
  • DX Gateway + TGW simplifies the management of connections between a VPC and the on-premises networks over a private connection that can reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections.
  • With AWS Transit Gateway connected to VPCs, full or partial mesh connectivity can be achieved between the VPCs.

AWS Direct Connect with VPN as Backup

  • Be sure that you use the same virtual private gateway for both Direct Connect and the VPN connection to the VPC.
  • If you are configuring a Border Gateway Protocol (BGP) VPN, advertise the same prefix for Direct Connect and the VPN.
  • If you are configuring a static VPN, add the same static prefixes to the VPN connection that you are announcing with the Direct Connect virtual interface.
  • If you are advertising the same routes toward the AWS VPC, the Direct Connect path is always preferred, regardless of AS path prepending.

AWS Direct Connect + VPN

AWS Direct Connect + VPN

  • AWS Direct Connect + VPN combines the benefits of the end-to-end secure IPSec connection with low latency and increased bandwidth of the AWS Direct Connect to provide a more consistent network experience than internet-based VPN connections.
  • AWS Direct Connect public VIF establishes a dedicated network connection between the on-premises network to public AWS resources, such as an Amazon virtual private gateway IPsec endpoint.
  • A BGP connection is established between the AWS Direct Connect and your router on the public VIF.
  • Another BGP session or a static router will be established between the virtual private gateway and your router on the IPSec VPN tunnel.

AWS Private Link -> NLB -> ALB

AWS Private Link + NLB + ALB

  • AWS PrivateLink for ALB allows customers to utilize PrivateLink on NLB and route this traffic to a target ALB to utilize the layer 7 benefits.
  • Static NLB IP Addresses for ALB – with one static IP per AZ on NLB allows full control over the IP addresses and enables various use cases as follows:
    • Allow listing of IP addresses for firewall rules.
    • Pointing a DNS Zone apex to an application fronted by an ALB. Utilizing ALB as a target of NLB, a DNS A-record type can be used to resolve your zone apex to the NLB static IP addresses.
    • When legacy clients cannot utilize DNS resulting in a need for hard-coded IP addresses.

Centralized Egress: Transit Gateway + NAT Gateway

Centralized Egress with Transit Gateway & NAT Gateway

  • A separate egress VPC in the network services account can be created to route all egress traffic from the spoke VPCs via a NAT gateway sitting in this VPC using Transit Gateway
  • As the NAT gateway has an hourly charge, deploying a NAT gateway in every spoke VPC can become cost prohibitive and centralizing NAT can provide cost benefits
  • In some edge cases when huge amounts of data is sent through the NAT gateway from a VPC, keeping the NAT local in the VPC to avoid the Transit Gateway data processing charge might be a more cost-effective option.
  • Two NAT gateways (one in each AZ) provide High Availability.

Direct Connect with High Resiliency – 99.9

Direct Connect High Availability

 

  • For critical production workloads that require high resiliency, it is recommended to have one connection at multiple locations.
  • ensures resilience to connectivity failure due to a fiber cut or a device failure as well as a complete location failure. You can use AWS Direct Connect gateway to access any AWS Region (except AWS Regions in China) from any AWS Direct Connect location.

Direct Connect with Maximum Resiliency – 99.99

Direct Connect Maximum Availability

  • Maximum resilience is achieved by separate connections terminating on separate devices in more than one location
  • ensures resilience to device failure, connectivity failure, and complete location failure.

AWS EC2 Spot Instances

Spot Instances

  • EC2 Spot instances allow access to spare EC2 computing capacity for up to 90% off the On-Demand price.
  • EC2 sets up the hourly price referred to as Spot price, which fluctuates depending upon the demand and supply of spot instances.
  • Spot instances enable bidding on unused EC2 instances and are launched whenever the bid price exceeds the current market spot price.
  • Spot Instances can be interrupted by EC2 when EC2 needs the capacity back with a two minutes notification.
  • Spot instances are a cost-effective choice and can bring the EC2 costs down significantly.
  • Spot instances can be used for applications flexible in the timing when they can run and also able to handle interruption by storing the state externally for e.g. they are well-suited for data analysis, batch jobs, background processing, and optional tasks
  • Spot instances differ from the On-Demand instances
    • they are not launched immediately
    • they can be terminated anytime
    • price varies as per the demand and supply of spot instances
  • Usual strategy involves using Spot instances with On-Demand or Reserved Instances, which provide a minimum level of guaranteed compute resources, while spot instances provide an additional computation boost.
  • Spot instances can also be launched with a required duration (also known as Spot blocks), which are not interrupted due to changes in the Spot price.
  • EC2 provides a data feed, sent to an S3 bucket specified during subscription, that describes the Spot instance usage and pricing.
  • T2 and HS1 instance class types are not supported for Spot instances
  • Well Suited for
    • Ideal for various stateless, fault-tolerant, or flexible applications such as big data, containerized workloads, CI/CD, high-performance computing (HPC), web servers, and other test & development workloads
    • Applications that have flexible start and end times
    • Applications that are only feasible at very low compute prices
    • Users with urgent computing needs for large amounts of additional capacity

Spot Concepts

  • Spot pool – Pool of EC2 instances with the same instance type, availability zone, operating system, and network platform.
  • Spot price – Current market price of a spot instance per hour as set by EC2 based on the last fulfilled bid.
  • Spot bid – maximum bid price the bidder is willing to pay for the spot instance.
  • Spot fleet – set of instances launched based on the criteria of the bidder
  • Spot Instance request
    • Provides the maximum price per hour that you are willing to pay for a Spot Instance. If unspecified, it defaults to the On-Demand price.
    • EC2 fulfils the request when the maximum price per hour for the request exceeds the Spot price and if capacity is available.
    • A Spot Instance request is either one-time or persistent.
    • EC2 automatically resubmits a persistent Spot request after the Spot Instance associated with the request is terminated. The Spot Instance request can optionally specify a duration for the Spot Instances.
  • Spot instance interruption – EC2 terminates the spot instances whenever the bid price is lower than the current market price or the supply has reduced. EC2 provides a Spot Instance interruption notice, which gives the instance a two-minute warning before it is interrupted.
  • EC2 Instance Rebalance Recommendation is a signal that notifies when a Spot Instance is at elevated risk of interruption. The signal provides an opportunity to proactively manage the Spot Instance in advance of the two-minute Spot Instance interruption notice.
  • Bid status – provides the current state of the spot bid.

Spot Instances Requests

  • Spot Instance requests must include
    • the maximum price that you’re willing to pay per hour per instance, which defaults to the On-Demand price.
    • Instance type
    • Availability Zone.
    • Desired number of instances
  • EC2 fulfils the request when the maximum price per hour for the request exceeds the Spot price and if capacity is available.
  • A Spot Instance request is either
    • One-time
      • A one-time request remains active until EC2 launches the Spot Instance, the request expires, or you cancel the request.
    • Persistent
      • EC2 automatically resubmits a persistent Spot request after the Spot Instance associated with the request is terminated.
      • A persistent Spot Instance request remains active until it expires or you cancel it, even if the request is fulfilled.
      • The Spot Instance request can optionally specify a duration for the Spot Instances.
      • Cancelling spot instance requests does not terminate the instances
      • Be sure to delete the spot request before you delete the instances, else they would be launched again.

EC2 Spot Instance Requests

Spot Instances Pricing & How it works

  • EC2 sets up an hourly spot price which fluctuates depending upon the demand and supply.
  • A Spot Instance request is created by you (one-time) or EC2 (persistent) on your behalf.
  • Spot Instance requests must include
    • the maximum price that you’re willing to pay per hour per instance, which defaults to the On-Demand price.
    • other attributes like instance type and Availability Zone.
  • If the bid price exceeds the current market spot price, the request is fulfilled by Amazon till either the spot instance is terminated or the spot price increases beyond the bid price
  • Everyone pays the same market price for the period irrespective of the bid price given the bid price is more than the spot price for e.g. if the spot price is $0.20 and there are 2 bids from Customers with a bid price of $0.25 and $0.30, both customers would still pay $0.20 only
  • If the Spot instance is terminated by Amazon, you are not billed for the partial hour. However, if the spot instance is terminated by you, you will be charged for the partial hour
  • Spot instances with a predefined duration use a fixed hourly price that remains in effect for the Spot instance while it runs.
  • EC2 can interrupt the Spot instance when the Spot price rises above the bid price, when the demand for Spot instances rises, or when the supply of Spot instances decreases.
  • When EC2 marks a Spot instance for termination, it provides a Spot instance termination notice, which gives the instance a two-minute warning before it terminates.
  • Termination notice warning is made available to the applications on the Spot instance using an item in the instance metadata termination-time attribute http://169.254.169.254/latest/meta-data/spot/termination-time and includes the time when the shutdown signal will be sent to the instance’s operating system
  • Relevant applications on Spot Instances should poll for the termination notice at 5-second intervals, giving it almost the entire two minutes to complete any needed processing before the instance is terminated and taken back by AWS
  • EBS-backed instance if it is a Spot instance cannot be stopped and started, but only rebooted or terminated
  • EBS-backed Spot Instance can be stopped – started, rebooted, or terminated

Pricing Example

Spot Instances Pricing Example

  • State 1 – Starting with Amazon EC2 has 5 Spot instances available
    • 6 bids available for Spot instances
    • Amazon EC2 picks up the top five priced bids and allocates a Spot instance to them
    • Spot Price is $0.10
    • Bid with the price of $0.05 is not served
  • State 2 – Supply of Amazon EC2 Spot instances reduce to 3
    • Amazon EC2 terminates the 2 spot instances with $0.10 ( the order in which the instances are terminated is determined at random )
    • Rest of the Spot instances continue
  • State 3 – New bid for Spot Instance is placed with Price $0.15 is placed
    • Spot instance with price $0.15 is fulfilled
    • Amazon EC2 terminates the single spot instances with $0.10
    • Spot Price changed to $0.15
  • State 4 New bid for Spot Instance is placed with Price $2 is placed
    • Spot instance with price $2 is fulfilled
    • Amazon EC2 terminates the single spot instances with $0.15
    • Spot Price changed to $1.00

Spot Fleet

  • Spot Fleet is a collection, or fleet, of Spot Instances, and optionally On-Demand Instances
  • Spot Fleet attempts to launch the number of Spot Instances and On-Demand Instances to meet the specified target capacity
  • Request for Spot Instances is fulfilled if there is available capacity and the maximum price specified in the request exceeds the current Spot price.
  • Spot Fleet also attempts to maintain its target capacity fleet if the Spot Instances are interrupted.
  • Spot Fleet requests type
    • Request
      • Spot Fleet places an asynchronous one-time request for the desired capacity.
      • If capacity is diminished because of Spot interruptions, the fleet does not attempt to replenish Spot Instances, nor does it submit requests in alternative Spot capacity pools if capacity is unavailable.
    • Maintain
      • Spot Fleet places an asynchronous request for the desired capacity and maintains capacity by automatically replenishing any interrupted Spot Instances.
  • Spot Fleet Allocation Strategy
    • lowestPrice
      • default strategy, from the pool with the lowest price
      • cost optimization, short workload
    • diversified
      • distributed across all pools.
      • high availability, long workloads
    • capacityOptimized
      • from the pools with optimal capacity for the number of instances that are launching.
    • InstancePoolsToUseCount
      • distributed across the number of specified Spot pools that you specify.
      • Valid only when used in combination with lowestPrice.

Spot Instances Interruption

  • EC2 Instance Rebalance Recommendations and Spot Instance interruption notices can be used to gracefully handle Spot Instance interruptions.
  • EC2 Instance Rebalance Recommendation
    • is a signal that notifies when a Spot Instance is at elevated risk of interruption.
    • provides an opportunity to proactively manage the Spot Instance in advance of the two-minute Spot Instance interruption notice.
  • Spot Instance Interruption Notice
    • is a warning issued two minutes before EC2 interrupts a Spot Instance.
    • EC2 automatically stops or hibernates the Spot Instances on interruption, and automatically resumes the instances when we have available capacity.

Spot Instances vs On-Demand Instances

Spot Instances vs On-Demand Instances

Spot Instances Best Practices

  • Choose a reasonable bid price
    • which is low enough to suit the budget and high enough for the request to be fulfilled and should not be higher than the On-Demand bid price
  • Be flexible about instance types and Availability Zones
    • A Spot Instance pool is a set of unused EC2 instances with the same instance type (for example, m5.large) and AZ (for example, us-east-1a).
    • Be flexible about requested instance types and AZs you can deploy the workload. This gives Spot a better chance to find and allocate your required amount of compute capacity.
  • Ensure the instances are up and ready as soon as the request is fulfilled,
    • by provisioning an AMI with all the required software and load application data from user data
  • Prepare individual instances for interruptions
    • Make application fault-tolerant. Store important data regularly and externally in a place that won’t be affected by Spot instance termination e.g., use S3, EBS, or DynamoDB.
  • Divide the work into smaller finer tasks
    • so that they can be completed and the state saved more frequently
  • Use Spot termination notice warning
    • to monitor instance status regularly
  • Use Proactive Capacity Rebalancing
    • Capacity Rebalancing helps you maintain workload availability by proactively augmenting your fleet with a new Spot Instance before a running Spot Instance receives the two-minute Spot Instance interruption notice. When Capacity Rebalancing is enabled, Auto Scaling or Spot Fleet attempts to proactively replace Spot Instances that have received a rebalance recommendation, providing the opportunity to rebalance your workload to new Spot Instances that are not at elevated risk of interruption.
    • Capacity Rebalancing complements the capacity optimized allocation strategy (which is designed to help find the most optimal spare capacity) and the mixed instances policy (which is designed to enhance availability by deploying instances across multiple instance types running in multiple Availability Zones).
  • Test applications
    • using On-Demand instances and terminating them to ensure that it handles unexpected termination gracefully

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You have a video transcoding application running on Amazon EC2. Each instance polls a queue to find out which video should be transcoded, and then runs a transcoding process. If this process is interrupted, the video will be transcoded by another instance based on the queuing system. You have a large backlog of videos, which need to be transcoded, and would like to reduce this backlog by adding more instances. You will need these instances only until the backlog is reduced. Which type of Amazon EC2 instances should you use to reduce the backlog in the most cost efficient way?
    1. Reserved instances
    2. Spot instances
    3. Dedicated instances
    4. On-demand instances
  2. You have a distributed application that periodically processes large volumes of data across multiple Amazon EC2 Instances. The application is designed to recover gracefully from Amazon EC2 instance failures. You are required to accomplish this task in the most cost-effective way. Which of the following will meet your requirements?
    1. Spot Instances
    2. Reserved instances
    3. Dedicated instances
    4. On-Demand instances
  3. A company needs to process a large amount of data stored in an Amazon S3 bucket. The total processing time is expected to be
    less than five hours. The workload cannot be interrupted and will be executed only once. Which pricing model will ensure that job
    completes at the lowest cost?

    1. EC2 reserved instances
    2. EC2 spot block
    3. EC2 On-demand Instances
    4. EC2 spot fleet.

References

AWS_EC2_Spot_Instances

VPC Network Access Analyzer

VPC Network Access Analyzer

  • VPC Network Access Analyzer helps identify unintended network access to the resources on AWS.
  • Network Access Analyzer can be used to
    • Understand, verify, and improve the network security posture
      • helps identify unintended network access relative to the security and compliance requirements that enable improving network security.
    • Demonstrate compliance
      • can help demonstrate that the network on AWS meets certain compliance requirements.
  • Network Access Analyzer can help verify the following example requirements:
    • Network segmentation
      • verify that the production environment VPCs and development environment VPCs are isolated from one another or systems that process credit card information are isolated from the rest of the environment.
    • Internet accessibility
      • can help identify resources that can be accessed from internet gateways, and verify that they are limited to only those resources that have a legitimate need to be accessible from the internet.
    • Trusted network paths
      • can help verify that appropriate network controls such as network firewalls and NAT gateways are configured on all network paths between the resources and internet gateways.
    • Trusted network access
      • can help verify that the resources have network access only from a trusted IP address range, over specific ports and protocols.
      • network access requirements can be specified in terms of:
        • Individual resource IDs, such as vpc-01234567
        • All resources of a given type, such as AWS::EC2::InternetGateway
        • All resources with a given tag, using AWS Resource Groups
        • IP address ranges, port ranges, and traffic protocols

Network Access Analyzer Concepts

  • Network Access Scopes
    • Network Access Scopes specifies the network access requirements, which determine the types of findings that the analysis produces.
    • MatchPaths field help to specify the types of network paths to identify.
    • ExcludePaths field help to specify the types of network paths to exclude
  • Findings
    • Findings are potential paths in the network that match any of the MatchPaths entries in the Network Access Scope, but that do not match any of the ExcludePaths entries in the Network Access Scope.

Network Access Analyzer Works

  • Network Access Analyzer uses automated reasoning algorithms to analyze the network paths that a packet can take between resources in an AWS network.
  • performs a static analysis of a network configuration, meaning that no packets are transmitted in the network as part of this analysis.
  • produces findings for paths that match a customer-defined Network Access Scope.
  • only considers the state of the network as described in the network configuration, packet loss that’s due to transient network interruptions or service failures is not considered in this analysis.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

AWS_VPC_Network_Access_Analyzer

AWS VPC NAT – NAT Gateway

NAT Gateway High Availability

AWS NAT

  • AWS NAT – Network Address Translation devices, launched in the public subnet, enables instances in a private subnet to connect to the Internet but prevents the Internet from initiating connections with the instances.
  • Instances in private subnets would need an internet connection for performing software updates or trying to access external services.
  • NAT device performs the function of both address translation and port address translation (PAT)
  • NAT instance prevents instances to be directly exposed to the Internet and having to be launched in a Public subnet and assigning of the Elastic IP address to all, which are limited.
  • NAT device routes the traffic, from the private subnet to the Internet, by replacing the source IP address with its address and it translates the address back to the instances’ private IP addresses for the response traffic.
  • AWS allows NAT configuration in 2 ways
    • NAT Gateway, managed service by AWS
    • NAT Instance

NAT Gateway

  • NAT gateway is an AWS managed NAT service that provides better availability, higher bandwidth, and requires less administrative effort.
  • A NAT gateway supports 5 Gbps of bandwidth and automatically scales up to 100 Gbps. For higher bursts requirements, the workload can be distributed by splitting the resources into multiple subnets and creating a NAT gateway in each subnet.
  • Public NAT gateway is associated with One Elastic IP address which cannot be disassociated after its creation.
  • Each NAT gateway is created in a specific Availability Zone and implemented with redundancy in that zone.
  • A NAT gateway supports the TCP, UDP, and ICMP protocols.
  • NAT gateway cannot be associated with a security group. Security can be configured for the instances in the private subnets to control the traffic.
  • Network ACL can be used to control the traffic to and from the subnet. NACL applies to the NAT gateway’s traffic, which uses ports 1024-65535
  • NAT gateway when created receives an elastic network interface that’s automatically assigned a private IP address from the IP address range of the subnet. Attributes of this network interface cannot be modified.
  • NAT gateway cannot send traffic over VPC endpoints, VPN connections, AWS Direct Connect, or VPC peering connections. The private subnet’s route table should be modified to route the traffic directly to these devices.
  • NAT gateway times out the connection if it is idle for 350 seconds or more. To prevent the connection from being dropped, initiate more traffic over the connection or enable TCP keepalive on the instance with a value of less than 350 seconds.
  • NAT gateways currently do not support the IPsec protocol.
  • A NAT gateway only passes traffic from an instance in a private subnet to the internet.

NAT Gateway High Availability

NAT Gateway vs NAT Instance

NAT Gateway vs NAT Instance

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. After launching an instance that you intend to serve as a NAT (Network Address Translation) device in a public subnet you modify your route tables to have the NAT device be the target of internet bound traffic of your private subnet. When you try and make an outbound connection to the Internet from an instance in the private subnet, you are not successful. Which of the following steps could resolve the issue?
    1. Attaching a second Elastic Network interface (ENI) to the NAT instance, and placing it in the private subnet
    2. Attaching an Elastic IP address to the instance in the private subnet
    3. Attaching a second Elastic Network Interface (ENI) to the instance in the private subnet, and placing it in the public subnet
    4. Disabling the Source/Destination Check attribute on the NAT instance
  2. You manually launch a NAT AMI in a public subnet. The network is properly configured. Security groups and network access control lists are property configured. Instances in a private subnet can access the NAT. The NAT can access the Internet. However, private instances cannot access the Internet. What additional step is required to allow access from the private instances?
    1. Enable Source/Destination Check on the private Instances.
    2. Enable Source/Destination Check on the NAT instance.
    3. Disable Source/Destination Check on the private instances
    4. Disable Source/Destination Check on the NAT instance
  3. A user has created a VPC with public and private subnets. The VPC has CIDR 20.0.0.0/16. The private subnet uses CIDR 20.0.1.0/24 and the public subnet uses CIDR 20.0.0.0/24. The user is planning to host a web server in the public subnet (port 80. and a DB server in the private subnet (port 3306.. The user is configuring a security group of the NAT instance. Which of the below mentioned entries is not required for the NAT security group?
    1. For Inbound allow Source: 20.0.1.0/24 on port 80
    2. For Outbound allow Destination: 0.0.0.0/0 on port 80
    3. For Inbound allow Source: 20.0.0.0/24 on port 80 (Refer NATSG)
    4. For Outbound allow Destination: 0.0.0.0/0 on port 443
  4. A web company is looking to implement an external payment service into their highly available application deployed in a VPC. Their application EC2 instances are behind a public facing ELB. Auto scaling is used to add additional instances as traffic increases. Under normal load the application runs 2 instances in the Auto Scaling group but at peak it can scale 3x in size. The application instances need to communicate with the payment service over the Internet, which requires whitelisting of all public IP addresses used to communicate with it. A maximum of 4 whitelisting IP addresses are allowed at a time and can be added through an API. How should they architect their solution?
    1. Route payment requests through two NAT instances setup for High Availability and whitelist the Elastic IP addresses attached to the NAT instances
    2. Whitelist the VPC Internet Gateway Public IP and route payment requests through the Internet Gateway. (Internet gateway is only to route traffic)
    3. Whitelist the ELB IP addresses and route payment requests from the Application servers through the ELB. (ELB does not have a fixed IP address)
    4. Automatically assign public IP addresses to the application instances in the Auto Scaling group and run a script on boot that adds each instances public IP address to the payment validation whitelist API. (would exceed the allowed 4 IP addresses)