AWS VPC Gateway Endpoints

AWS VPC Gateway Endpoints

AWS VPC Gateway Endpoints

  • A VPC Gateway Endpoint is a gateway that is a target for a specified route in the route table, used for traffic destined for a supported AWS service.
  • VPC Gateway Endpoints currently supports S3 and DynamoDB services
  • VPC Gateway Endpoints do not require an Internet gateway or a NAT device for the VPC.
  • Gateway endpoints do not enable AWS PrivateLink.
  • VPC Endpoint policy and Resource-based policies can be used for fine-grained access control.
AWS VPC Gateway Endpoints

 

Gateway Endpoint Configuration

  • Endpoint requires the VPC and the service to be accessed via the endpoint.
  • The endpoint needs to be associated with the Route table and the route table cannot be modified to remove the route entry. It can only be deleted by removing the Endpoint association with the Route table
  • A route is automatically added to the Route table with a destination that specifies the prefix list of service and the target with the endpoint id for e.g. A rule with destination pl-68a54001 (com.amazonaws.us-west-2.s3) and a target with this endpoints’ ID (e.g. vpce-12345678) will be added to the route tables
  • Access to the resources in other services can be controlled by endpoint policies
  • Security groups need to be modified to allow outbound traffic from the VPC to the service that is specified in the endpoint. Use the service prefix list ID for e.g. com.amazonaws.us-east-1.s3 as the destination in the outbound rule
  • Multiple endpoints can be created in a single VPC, for e.g., to multiple services.
  • Multiple endpoints can be created for the same service but in different route tables.
  • Multiple endpoints to the same service CAN NOT be specified in a single route table

Gateway Endpoint Limitations

  • are regional and supported within the same Region only.
  • cannot be created between a VPC and an AWS service in a different region.
  • support IPv4 traffic only.
  • cannot be transferred from one VPC to another, or from one service to another service.
  • connections cannot be extended out of a VPC i.e. resources across the VPN, VPC peering, Direct Connect connection cannot use the endpoint.

VPC Endpoint policy

  • VPC Endpoint policy is an IAM resource policy attached to an endpoint for controlling access from the endpoint to the specified service.
  • Endpoint policy, by default, allows full access to any user or service within the VPC, using credentials from any AWS account to any S3 resource; including S3 resources for an AWS account other than the account with which the VPC is associated
  • Endpoint policy does not override or replace IAM user policies or service-specific policies (such as S3 bucket policies).
  • Endpoint policy can be used to restrict which specific resources can be accessed using the VPC Endpoint.

S3 Bucket Policies

  • IAM policy or bucket policy can’t be used to allow access from a VPC IPv4 CIDR range as the VPC CIDR blocks can be overlapping or identical, which might lead to unexpected results.
  • aws:SourceIp condition can’t be used in the IAM policies for requests to S3 through a VPC endpoint.
  • S3 Bucket Policies can be used to restrict access through the VPC endpoint only.

VPC Gateway Endpoint Troubleshooting

  • Verify the services are within the same region.
  • DNS resolution must be enabled in the VPC
  • Route table should have a route to S3 using the gateway VPC endpoint.
  • Security groups should have outbound traffic allowed VPC endpoint.
  • NACLs should allow inbound and outbound traffic.
  • Gateway Endpoint Policy should define access to the resource
  • Resource-based policies like the S3 bucket policy should allow access to the VPC endpoint or the VPC.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You have an application running on an Amazon EC2 instance that uploads 10 GB video objects to amazon S3. Video uploads are taking longer than expected inspite of using multipart upload cause of internet bandwidth, resulting in poor application performance. Which action can help improve the upload performance?
    1. Apply an Amazon S3 bucket policy
    2. Use Amazon EBS provisioned IOPS
    3. Use VPC endpoints for S3
    4. Request a service limit increase
  2. What are the services supported by VPC endpoints, using Gateway endpoint type? Choose 2 answers
    1. Amazon S3
    2. Amazon EFS
    3. Amazon DynamoDB
    4. Amazon Glacier
    5. Amazon SQS
  3. An application running on EC2 instances processes sensitive information stored on Amazon S3. The information is accessed over the Internet. The security team is concerned that the Internet connectivity to Amazon S3 is a security risk. Which solution will resolve the security concern?
    1. Access the data through an Internet Gateway.
    2. Access the data through a VPN connection.
    3. Access the data through a NAT Gateway.
    4. Access the data through a VPC endpoint for Amazon S3.

References

AWS_VPC_Gateway_Endpoints

AWS VPC Endpoints

VPC Endpoints

AWS VPC Endpoints

  • VPC Endpoints enable the creation of a private connection between VPC to supported AWS services and VPC endpoint services powered by PrivateLink using its private IP address
  • Endpoints do not require a public IP address, access over the Internet, NAT device, a VPN connection, or AWS Direct Connect.
  • Traffic between VPC and AWS service does not leave the Amazon network
  • Endpoints are virtual devices, that are horizontally scaled, redundant, and highly available VPC components that allow communication between instances in the VPC and AWS services without imposing availability risks or bandwidth constraints on your network traffic.
  • Endpoints currently do not support cross-region requests, ensure that the endpoint is created in the same region as the S3 bucket
  • AWS currently supports the following types of Endpoints

VPC Endpoints

VPC Gateway Endpoints

  • A VPC Gateway Endpoint is a gateway that is a target for a specified route in the route table, used for traffic destined for a supported AWS service.
  • Gateway Endpoints currently supports S3 and DynamoDB services
  • Gateway Endpoints do not require an Internet gateway or a NAT device for the VPC.
  • Gateway endpoints do not enable AWS PrivateLink.
  • VPC Endpoint policy and Resource-based policies can be used for fine-grained access control.
"AWS

VPC Interface Endpoints – PrivateLink

AWS Private Links

  • VPC Interface endpoints enable connectivity to services powered by AWS PrivateLink.
  • Services include AWS services like CloudTrail, CloudWatch, etc., services hosted by other AWS customers and partners in their own VPCs (referred to as endpoint services), and supported AWS Marketplace partner services.
  • Interface Endpoints only allow traffic from VPC resources to the endpoints and not vice versa
  • PrivateLink endpoints can be accessed across both intra- and inter-region VPC peering connections, Direct Connect, and VPN connections.
  • VPC Interface Endpoints, by default, have an address like vpce-svc-01234567890abcdef.us-east-1.vpce.amazonaws.com which needs application changes to point to the service.
  • Private DNS name feature allows consumers to use AWS service public default DNS names which would point to the private VPC endpoint service.
  • Interface Endpoints can be used to create custom applications in VPC and configure them as an AWS PrivateLink-powered service (referred to as an endpoint service) exposed through a Network Load Balancer.
  • Custom applications can be hosted within AWS or on-premises (via Direct Connect or VPN)

S3 VPC Endpoints Strategy

S3 is now accessible with both Gateway Endpoints and Interface Endpoints.

S3 Strategy - VPC Gateway Endpoints vs VPC Interface Endpoints

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You have an application running on an Amazon EC2 instance that uploads 10 GB video objects to amazon S3. Video uploads are taking longer than expected inspite of using multipart upload cause of internet bandwidth, resulting in poor application performance. Which action can help improve the upload performance?
    1. Apply an Amazon S3 bucket policy
    2. Use Amazon EBS provisioned IOPS
    3. Use VPC endpoints for S3
    4. Request a service limit increase
  2. What are the services supported by VPC endpoints, using Gateway endpoint type? Choose 2 answers
    1. Amazon S3
    2. Amazon EFS
    3. Amazon DynamoDB
    4. Amazon Glacier
    5. Amazon SQS
  3. What are the different types of endpoint types supported by VPC endpoints? Choose 2 Answers
    1. Gateway
    2. Classic
    3. Interface
    4. Virtual
    5. Network
  4. An application running on EC2 instances processes sensitive information stored on Amazon S3. The information is accessed over the Internet. The security team is concerned that the Internet connectivity to Amazon S3 is a security risk. Which solution will resolve the security concern?
    1. Access the data through an Internet Gateway.
    2. Access the data through a VPN connection.
    3. Access the data through a NAT Gateway.
    4. Access the data through a VPC endpoint for Amazon S3.
  5. You need to design a VPC for a three-tier architecture, a web application consisting of an Elastic Load Balancer (ELB), a fleet of web/application servers, and a backend consisting of an RDS database. The entire Infrastructure must be distributed over 2 availability zones. Which VPC configuration works while assuring the least components are exposed to Internet?
    1. Two public subnets for ELB, two private subnets for the web-servers, two private subnets for RDS and DynamoDB
    2. Two public subnets for ELB and web-servers, two private subnets for RDS and DynamoDB
    3. Two public subnets for ELB, two private subnets for the web-servers, two private subnets for RDS and VPC Endpoints for DynamoDB
    4. Two public subnets for ELB and web-servers, two private subnets for RDS and VPC Endpoints for DynamoDB

References

AWS_VPC_User_Guide_-_Endpoints

AWS VPC Peering

AWS VPC Peering

VPC Peering

  • A VPC peering connection is a networking connection between two VPCs that enables routing of traffic between them using private IPv4 addresses or IPv6 addresses.
  • VPC peering connection
    • can be established between your own VPCs, or with a VPC in another AWS account in the same or different region.
    • is a one-to-one relationship between two VPCs.
    • supports intra and inter-region peering connections.
  • With VPC peering,
    • Instances in either VPC can communicate with each other as if they are within the same network 
    • AWS uses the existing infrastructure of a VPC to create a peering connection; it is neither a gateway nor a VPN connection and does not rely on a separate piece of physical hardware. 
    • There is no single point of failure for communication or a bandwidth bottleneck
    • All inter-region traffic is encrypted with no single point of failure, or bandwidth bottleneck. Traffic always stays on the global AWS backbone, and never traverses the public internet, which reduces threats, such as common exploits, and DDoS attacks.
  • VPC peering does not have any separate charges. However, there are data transfer charges.

AWS VPC Peering

VPC Peering Connectivity

  • To create a VPC peering connection, the owner of the requester VPC sends a request to the owner of the accepted VPC.
  • Accepter VPC can be owned by the same account or a different AWS account.
  • Once the Accepter VPC accepts the peering connection request, the peering connection is activated.
  • Route tables on both the VPCs should be manually updated to allow traffic
  • Security groups on the instances should allow traffic to and from the peered VPCs.

VPC Peering Limitations & Rules

  1. Does not support Overlapping or matching IPv4 or IPv6 CIDR blocks.
  2. Does not support transitive peering relationships i.e. the VPC does not have access to any other VPCs that the peer VPC may be peered with even if established entirely within your own AWS account
  3. Does not support Edge to Edge Routing Through a Gateway or Private Connection
  4. In a VPC peering connection, the VPC does not have access to any other connection that the peer VPC may have and vice versa. Connections that the peer VPC can include
    1. A VPN connection or an AWS Direct Connect connection to a corporate network
    2. An Internet connection through an Internet gateway
    3. An Internet connection in a private subnet through a NAT device
    4. A ClassicLink connection to an EC2-Classic instance
    5. A VPC endpoint to an AWS service; for example, an endpoint to S3.
  5. VPC peering connections are limited on the number of active and pending peering connections that you can have per VPC.
  6. Only one peering connection can be established between the same two VPCs at the same time.
  7. Jumbo frames are supported for peering connections within the same region.
  8. A placement group can span peered VPCs that are in the same region; however, you do not get full-bisection bandwidth between instances in peered VPCs
  9. Inter-region VPC peering connections
    1. The Maximum Transmission Unit (MTU) across an inter-region peering connection is 1500 bytes. Jumbo frames are not supported.
    2. Security group rule that references a peer VPC security group cannot be created.
  10. Any tags created for the peering connection are only applied in the account or region in which they were created
  11. Unicast reverse path forwarding in peering connections is not supported
  12. Circa July 2016, Instance’s Public DNS can now be resolved to its private IP address across peered VPCs. The instance’s public DNS hostname does not resolve to its private IP address across peered VPCs.

VPC Peering Troubleshooting

  • Verify that the VPC peering connection is in the Active state.
  • Be sure to update the route tables for the peering connection. Verify that the correct routes exist for connections to the IP address range of the peered VPCs through the appropriate gateway.
  • Verify that an ALLOW rule exists in the network access control (NACL) table for the required traffic.
  • Verify that the security group rules allow network traffic between the peered VPCs.
  • Verify using VPC flow logs that the required traffic isn’t rejected at the source or destination. This rejection might occur due to the permissions associated with security groups or network ACLs.
  • Be sure that no firewall rules block network traffic between the peered VPCs. Use network utilities such as traceroute (Linux) or tracert (Windows) to check rules for firewalls such as iptables (Linux) or Windows Firewall (Windows).

VPC Peering Architecture

AWS VPC Architecture

  • VPC Peering can be applied to create shared services or perform authentication with an on-premises instance
  • This would help create a single point of contact, as well limiting the VPN connections to a single account or VPC

VPC Peering vs Transit Gateway

VPC Peering vs Transit VPC vs Transit Gateway

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You currently have 2 development environments hosted in 2 different VPCs in an AWS account in the same region. There is now a need for resources from one VPC to access another. How can this be accomplished?
    1. Establish a Direct Connect connection.
    2. Establish a VPN connection.
    3. Establish VPC Peering.
    4. Establish Subnet Peering.
  2. A company has an AWS account that contains three VPCs (Dev, Test, and Prod) in the same region. Test is peered to both Prod and Dev. All VPCs have non-overlapping CIDR blocks. The company wants to push minor code releases from Dev to Prod to speed up the time to market. Which of the following options helps the company accomplish this?
    1. Create a new peering connection Between Prod and Dev along with appropriate routes.
    2. Create a new entry to Prod in the Dev route table using the peering connection as the target.
    3. Attach a second gateway to Dev. Add a new entry in the Prod route table identifying the gateway as the target.
    4. The VPCs have non-overlapping CIDR blocks in the same account. The route tables contain local routes for all VPCs.
  3. A company has 2 AWS accounts that have individual VPCs. The VPCs are in different AWS regions and need to communicate with each other. The VPCs have non-overlapping CIDR blocks. Which of the following would be a cost-effective connectivity option?
    1. Use VPN connections
    2. Use VPC peering between the 2 VPC’s
    3. Use AWS Direct Connect
    4. Use a NAT gateway

References

AWS_VPC_Peering

AWS Certified Solutions Architect – Professional (SAP-C02) Exam Learning Path

AWS Certified Solutions Architect - Professional Exam Certificate

AWS Certified Solutions Architect – Professional (SAP-C02) Exam Learning Path

  • AWS Certified Solutions Architect – Professional (SAP-C02) exam is the upgraded pattern of the previous Solution Architect – Professional SAP-C01 exam and was released in Nov. 2022.
  • SAP-C02 is quite similar to SAP-C01 but has included some new services.

AWS Certified Solutions Architect – Professional (SAP-C02) Exam Content

  • AWS Certified Solutions Architect – Professional (SAP-C02) exam validates the ability to complete tasks within the scope of the AWS Well-Architected Framework
    • Design for organizational complexity
    • Design for new solutions
    • Continuously improve existing solutions
    • Accelerate workload migration and modernization

Refer to AWS Certified Solutions Architect – Professional Exam Guide

AWS Certified Solutions Architect - Professional Exam Domains

AWS Certified Solutions Architect – Professional (SAP-C02) Exam Resources

AWS Certified Solutions Architect – Professional (SAP-C02) Exam Summary

  • Professional exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • Each solution involves multiple AWS services.
  • AWS Certified Solutions Architect – Professional (SAP-C02) exam has 65 questions to be solved in 170 minutes.
  • SAP-C02 exam includes two types of questions, multiple-choice and multiple-response.
  • SAP-C02 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
  • Each question mainly touches multiple AWS services.
  • Associate exams currently cost $ 300 + tax.
  • You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
  • As always, mark the questions for review and move on and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
  • AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Solutions Architect – Professional (SAP-C02) Exam Topics

AWS Certified Solutions Architect – Professional (SAP-C02) focuses a lot on concepts and services related to Architecture & Design, Scalability, High Availability, Disaster Recovery, Migration, Security, and Cost Control.

Storage

  • Simple Storage Service – S3
    • S3 Permissions & S3 Data Protection
      • S3 bucket policies to control access to VPC Endpoints and provide cross-account access.
    • S3 Storage Classes & Lifecycle policies
      • covers S3 Standard, Infrequent access, intelligent tier, and Glacier for archival and object transitions & deletions for cost management.
    • S3 Performance
    • S3 Security
      • S3 supports encryption using KMS
      • S3 supports Object Lock and Glacier supports Vault lock to prevent the deletion of objects, especially required for compliance requirements.
      • CORS allows client web applications loaded in one domain access to the restricted resources to be requested from another domain.
    • S3 supports the same and cross-region replication for disaster recovery.
    • S3 Access Logs enable tracking access requests to an S3 bucket.
    • supports S3 Select feature to query selective data from a single object.
    • S3 Event Notification enables notifications to be triggered when certain events happen in the bucket and support SNS, SQS, and Lambda as the destination.
  • Elastic Block Store
    • EBS Backup using snapshots for HA and Disaster recovery
    • Data Lifecycle Manager can be used to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes.
  • Storage Gateway
    • supports File Gateways and Volume Gateways
    • File Gateways provides a file interface into S3 and allows storing and retrieving of objects in S3 using industry-standard file protocols such as NFS and SMB.
  • Elastic File System – EFS
    • provides fully managed, scalable, serverless, shared, and cost-optimized file storage for use with AWS and on-premises resources.
    • supports cross-region replication for disaster recovery
    • supports storage classes like S3
    • supports only Linux-based AMIs
  • AWS Transfer Family
    • provides a secure transfer service (FTP, SFTP, FTPs) that helps transfer files into and out of AWS storage services.
    • supports transferring data from or to S3 and EFS.
  • FSx for Lustre
    • managed, cost-effective service to launch and run the HPC high-performance Lustre file system.
  • Understand different use cases for S3 vs EBS vs EFS

Database

  • DynamoDB
    • provides a fully managed NoSQL database service with fast and predictable performance with seamless scalability.
    • supports following capacity modes
      • Provisioned – the maximum amount of capacity in terms of reads/writes per second that an application can consume from a table or index
      • On-demand – serves thousands of requests per second without capacity planning.
    • DynamoDB Auto Scaling can be used to handle peaks or bursts.
    • DynamoDB Streams for tracking changes
    • TTL to expire objects automatically and cost-effectively.
    • Global tables for multi-master, active-active inter-region storage needs.
    • Global tables do not support strong global consistency
    • DynamoDB Accelerator – DAX for seamless caching to reduce the load on DynamoDB for read-heavy requirements.
  • RDS
    • supports cross-region read replicas ideal for disaster recovery with low RTO and RPO.
    • provides RDS proxy for effective database connection polling
    • RDS Multi-AZ vs Read Replicas
  • Aurora
    • fully managed, MySQL- and PostgreSQL-compatible, relational database engine
    • Aurora Serverless provides on-demand, autoscaling configuration.
    • Aurora Global Database consists of one primary AWS Region where the data is mastered, and up to five read-only, secondary AWS Regions.
  • Understand DynamoDB Global Tables vs Aurora Global Databases
  • DocumentDB as a replacement for MongoDB
  • Keyspaces as a replacement for Cassandra

Data Migration & Transfer

  • Cloud Migration Services
    • Cloud Migration (hint: make sure you understand the difference between rehost, replatform, and rearchitect)
    • Server Migration Service helps to migrate servers and applications.
    • Database Migration Service
      • enables quick and secure data migration with minimal to zero downtime
      • supports Full and Change Data Capture – CDC migration to support continuous replication for zero downtime migration.
      • homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations (using SCT) between different database platforms, such as Oracle or Microsoft SQL Server to Aurora.
    • Snow Family
      • Ideal for one-time huge data transfers usually for use cases with limited bandwidth from on-premises to AWS.
    • Understand use cases for data transfer using VPN (quick, slow, uses the Internet), Direct Connect (time to set up, private, recurring transfers), Snow Family (moderate time, private, one-time huge data transfers)
  • Application Discovery Service
    • Agent ones can be used for hyper-v and physical services
    • Agentless can be used for VMware but does not track processes
  • AWS Migration Hub provides a central location to collect server and application inventory data for the assessment, planning, and tracking of migrations to AWS and also helps accelerate application modernization following migration.

Networking & Content Delivery

  • VPC – Virtual Private Cloud
    • Security Groups, NACLs
      • NACLs are stateless and need to open ephemeral ports for response traffic.
    • VPC Gateway Endpoints to provide access to S3 and DynamoDB
    • VPC Interface Endpoints or PrivateLink provide access to a variety of services like SQS, Kinesis, or Private APIs exposed through NLB.
    • VPC Peering to enable communication between VPCs within the same or different regions.
    • VPC Peering does not support overlapping CIDRs while PrivateLink does as only the endpoint is exposed.
    • VPC Flow Logs to track network traffic
    • NAT Gateway provides managed NAT service that provides better availability, higher bandwidth, and requires less administrative effort.
  • Route 53
    • Routing Policies
      • focus on Weighted, Latency, and failover routing policies
      • failover routing provides active-passive configuration for disaster recovery while the others are active-active configurations.
    • Route 53 Resolver
      • Outbound endpoint for AWS -> On-premises DNS query resolution
      • Inbound endpoint for On-premises DNS query resolution
  • CloudFront
    • fully managed, fast CDN service that speeds up the distribution of static, dynamic web or streaming content to end-users.
    • supports Origin Groups for multiple origins providing failover capability with primary and secondary origins.
    • does not support Auto Scaling as an origin
    • supports Geo-restriction
    • supports Lambda@Edge and Cloud Functions to execute code closer to the user.
    • Lambda@Edge can be used for quick auth checks, and redirect users based on request data.
    • Security can be enhanced by whitelisting CloudFront IPs or adding a custom header in CloudFront and verifying it in ALB.
  • API Gateway
    • supports throttling, caching and helps define usage plans with API keys to identify clients
    • provides regional and edge-optimized endpoint types
    • supports CORS for cross-domain calls.
    • supports authentication mechanisms, such as AWS IAM policies, Lambda authorizer functions, and Amazon Cognito user pools.
    • provide serverless architecture with Lambda.
  • Load Balancer – ELB, ALB and NLB
  • Global Accelerator
    • optimizes the path to applications to keep packet loss, jitter, and latency consistently low.
    • helps improve the performance of the applications by lowering first-byte latency
    • provides 2 static IP addresses
    • does not preserve the client’s IP address with NLB
  • Transit Gateway or Transit VPC
    • is a network transit hub that can be used to interconnect VPCs and on-premises networks via Direct Connect or VPN.
    • Transit Gateway is regional and Transit Gateway Peering needs to be configured to peer regional Transit gateways.
  • Placement Groups
    • Cluster placement group with Enhanced Networking for HPC
    • Spread placement group for fault tolerance and high availability.
  • Direct Connect & VPN
    • provide on-premises to AWS connectivity
    • Understand Direct Connect vs VPN
    • VPN can provide a cost-effective, quick failover for Direct Connect.
    • VPN over Direct Connect provides a secure dedicated connection and requires a public virtual interface.
    • Direct Connect Gateway is a global network device that helps establish connectivity that spans VPCs spread across multiple AWS Regions with a single Direct Connect connection.

Security, Identity & Compliance

  • AWS Identity and Access Management
  • AWS Shield & Shield Advanced
    • for DDoS protection and integrates with Route 53, CloudFront, ALB, and Global Accelerator.
  • AWS WAF
    • protects from common attack techniques like SQL injection and XSS, Conditions based include IP addresses, HTTP headers, HTTP body, and URI strings.
    • integrates with CloudFront, ALB, and API Gateway.
    • supports Web ACLs and can block traffic based on IPs, Rate limits, and specific countries as well.
  • ACM – AWS Certificate Manager
    • helps easily provision, manage, and deploy public and private SSL/TLS certificates
    • is regional and you need to request certificates in all regions and associate individually in all regions.
    • does not provide certificates for EC2 instances.
  • AWS KMS – Key Management Service
    • managed encryption service that allows the creation and control of encryption keys to enable data encryption.
    • KMS Multi-region keys
      • are AWS KMS keys in different AWS Regions that can be used interchangeably – as though having the same key in multiple Regions.
      • are not global and each multi-region key needs to be replicated and managed independently.
  • Secrets Manager
    • helps protect secrets needed to access applications, services, and IT resources.
    • Secrets Manager vs SSM Parameter Store.
      • Secrets Manager supports random generation and automatic rotation of secrets, which is not provided by SSM Parameter Store.
      • Costs more than SSM Parameter Store.
  • Amazon Macie is a data security and data privacy service that uses ML and pattern matching to discover and protect sensitive data in S3.
  • AWS Security Hub is a cloud security posture management service that performs security best practice checks, aggregates alerts, and enables automated remediation.

Compute

  • EC2
  • Auto Scaling provides the ability to ensure a correct number of EC2 instances are always running to handle the load of the application
  • Lambda
    • offers Serverless computing 
    • Lambda running in VPC requires NAT Gateway to communicate with external public services
    • Lambda CPU can be increased by increasing memory only.
    • helps define reserved concurrency limits to reduce the impact
    • Lambda Alias now supports canary deployments
    • Lambda supports docker containers
    • Reserved Concurrency guarantees the maximum number of concurrent instances for the function
    • Provisioned Concurrency provides greater control over the performance of serverless applications and helps keep functions initialized and hyper-ready to respond in double-digit milliseconds.
    • Lambda Best Practices esp. handling the database connection code.
  • Step Functions helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines.
  • ECS – Elastic Container Service
    • container management service that supports Docker containers
    • supports two launch types
      • EC2 and
      • Fargate which provides the serverless capability
    • For least privilege, the role should be assigned to the Task.
    • awsvpc network mode gives ECS tasks the same networking properties as EC2 instances.

Disaster Recovery

  • Disaster Recovery whitepaper, although outdated, make sure you understand the differences and implementation for each type esp. pilot light, warm standby w.r.t RTO, and RPO.
  • Compute
    • Make components available in an alternate region,
    • Backup and Restore using either snapshots or AMIs that can be restored.
    • Use minimal low-scale capacity running which can be scaled once the failover happens
    • Use fully running compute in active-active confirmation with health checks.
    • CloudFormation to create, and scale infra as needed
  • Storage
    • S3 and EFS support cross-region replication
    • DynamoDB supports Global tables for multi-master, active-active inter-region storage needs.
    • Aurora Global Database provides cross-region read replicas and failover capabilities.
    • RDS supports cross-region read replicas which can be promoted to master in case of a disaster. This can be done using Route 53, CloudWatch, and lambda functions.
  • Network
    • Route 53 failover routing with health checks to failover across regions.
    • CloudFront Origin Groups support primary and secondary endpoints with failover.

Management & Governance tools

  • AWS Organizations
  • Systems Manager
    • AWS Systems Manager and its various services like parameter store, patch manager
    • Parameter Store provides secure, scalable, centralized, hierarchical storage for configuration data and secret management. Does not support secrets rotation. Use Secrets Manager instead
    • Session Manager provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys.
    • Patch Manager helps automate the process of patching managed instances with both security-related and other types of updates.
  • CloudWatch
  • CloudTrail
    • for audit and governance
    • With Organizations, the trail can be configured to log CloudTrail from all accounts to a central account.
  • CloudFormation
    • Handle disaster Recovery by automating the infra to replicate the environment across regions.
    • Deletion Policy to prevent, retain, or backup RDS, EBS Volumes
    • Stack policy can prevent stack resources from being unintentionally updated or deleted during a stack update. Stack Policy only applies for Stack updates and not stack deletion.
    • StackSets helps to create, update, or delete stacks across multiple accounts and Regions with a single operation.
  • Control Tower
    • to setup, govern, and secure a multi-account environment
    • strongly recommended guardrails cover EBS encryption
  • Service Catalog
    • allows organizations to create and manage catalogues of IT services that are approved for use on AWS with minimal permissions.
  • Trusted Advisor
    • helps with cost optimization and service limits in addition to security, performance and fault tolerance.
  • Compute Optimizer recommends optimal AWS resources for the workloads to reduce costs and improve performance by using machine learning to analyze historical utilization metrics.
  • AWS Budgets to see usage-to-date and current estimated charges from AWS, set limits and provide alerts or notifications.
  • Cost Allocation Tags can be used to organize AWS resources, and cost allocation tags to track the AWS costs on a detailed level.
  • Cost Explorer helps visualize, understand, manage and forecast the AWS costs and usage over time.
  • Amazon WorkSpaces provides a virtual workspace for varied worker types, especially hybrid and remote workers.

Integration Tools

  • SQS in terms of loose coupling and scaling.
    • Difference between SQS Standard and FIFO esp. with throughput and order
    • SQS supports dead letter queues
  • CloudWatch integration with SNS and Lambda for notifications.

Analytics

  • Kinesis
  • OpenSearch (Elasticsearch) provides a managed search solution.
  • Amazon Timestream is a fast, scalable, and serverless time-series database service that makes it easier to store and analyze trillions of events per day.
  • Amazon Connect is an omnichannel cloud contact center.
  • Amazon Pinpoint is a flexible, scalable marketing communications service that helps connects customers over email, SMS, push notifications or voice
  • Amazon Rekognition offers pre-trained and customizable computer vision capabilities to extract information and insights from images and videos
  • Amazon Transcribe to Voice to Text conversion

Architecture & Design Flows

On the Exam Day

  • Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
  • If you are taking the AWS Online exam
    • Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
    • The online verification process does take some time and usually, there are glitches.
    • Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
    • Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Secrets Manager vs Systems Manager Parameter Store

AWS Secrets Manager vs Systems Parameter Store

AWS Secrets Manager vs Systems Manager Parameter Store

  • AWS Secrets Manager helps protect secrets needed to access applications, services, and IT resources and can easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle.
  • AWS Systems Manager Parameter Store provides secure, scalable, centralized, hierarchical storage for configuration data and secret management and can store data such as passwords, database strings, etc.

AWS Secrets Manager vs Systems Parameter Store

  • Storage (Limits keep on upgrading)
    • AWS Systems Manager Parameter Store allows us to store up to
      • Standard tier – 10,000 parameters, each of which can be up to 4KB
      • Advanced tier – 100,000 parameters, each of which can be up to 8KB
    • AWS Secrets Manager will enable us to store up to 40,000 parameters, each of which can be up to 64kb.
  • Encryption
    • Encryption is optional for Systems Parameter Store
    • Encryption is mandatory for Secrets Manager and you cannot opt out.
  • Automated Secret Rotation
    • System Parameter Store does not support out-of-the-box secrets rotation.
    • AWS Secrets Manager enables database credential rotation on a schedule.
  • Cross-account Access
    • System Parameter Store does not support cross-account access
    • AWS Secrets Manager supports resource-based IAM policies that grant cross-account access.
  • Cost (keeps on changing)
    • Secrets Manager is comparatively costlier than the System Parameter store.
    • AWS Systems Manager Parameter Store comes with no additional cost for the Standard tier.
    • AWS Secrets Manager costs $0.40 per secret per month, and data retrieval costs $0.05 per 10,000 API calls.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company uses Amazon RDS for PostgreSQL databases for its data tier. The company must implement password rotation for the databases. Which solution meets this requirement with the LEAST operational overhead?
    1. Store the password in AWS Secrets Manager. Enable automatic rotation on the secret.
    2. Store the password in AWS Systems Manager Parameter Store. Enable automatic rotation on the parameter.
    3. Store the password in AWS Systems Manager Parameter Store. Write an AWS Lambda function that rotates the password.
    4. Store the password in AWS Key Management Service (AWS KMS). Enable automatic rotation on the customer master key (CMK).

References

AWS EC2 Image Builder

AWS EC2 Image Builder

  • EC2 Image Builder is a fully managed AWS service that makes it easier to automate the creation, management, and deployment of customized, secure, and up-to-date server images that are pre-installed and pre-configured with software and settings to meet specific IT standards
  • EC2 Image Builder simplifies the building, testing, and deployment of Virtual Machine and container images for use on AWS or on-premises.
  • Image Builder significantly reduces the effort of keeping images up-to-date and secure by providing a simple graphical interface, built-in automation, and AWS-provided security settings.
  • Image Builder removes any manual steps for updating an image without to need to build your own automation pipeline.
  • Image Builder provides a one-stop-shop to build, secure, and test up-to-date Virtual Machine and container images using common workflows.
  • Image Builder allows image validation for functionality, compatibility, and security compliance with AWS-provided tests and your own tests before using them in production.
  • Image Builder is offered at no cost, other than the cost of the underlying AWS resources used to create, store, and share the images.

EC2 Image Builder

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is running a website on Amazon EC2 instances that are in an Auto Scaling group. When the website traffic increases, additional instances take several minutes to become available because of a long-running user data script that installs software. An AWS engineer must decrease the time that is required for new instances to become available. Which action should the engineer take to meet this requirement?
    1. Reduce the scaling thresholds so that instances are added before traffic increases.
    2. Purchase Reserved Instances to cover 100% of the maximum capacity of the Auto Scaling group.
    3. Update the Auto Scaling group to launch instances that have a storage optimized instance type.
    4. Use EC2 Image Builder to prepare an Amazon Machine Image (AMI) that has pre-installed software.

References

AWS_EC2_Image_Builder

AWS RDS Aurora Serverless

Aurora Serverless

  • Amazon Aurora Serverless is an on-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora.
  • An Aurora Serverless DB cluster automatically starts up, shuts down, and scales capacity up or down based on the application’s needs.
  • enables running database in the cloud without managing any database instances.
  • provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.
  • use Cases include
    • Infrequently-Used Applications
    • New Applications – where the needs and instance size is yet to be determined.
    • Variable and Unpredictable Workloads – scale as per the needs
    • Development and Test Databases
    • Multi-tenant Applications
  • DB cluster does not have a public IP address and can be accessed only from within a VPC based on the VPC service.

Aurora Architecture

 Aurora Serverless Architecture

  • Aurora Serverless separates Storage and Compute, so it can scale down to zero processing and you pay only for storage.
  • A database endpoint is created without specifying the DB instance class size.
  • Minimum and maximum capacity is set in terms of Aurora capacity units (ACUs). Each ACU is a combination of processing and memory capacity.
  • Database storage automatically scales from 10 GiB to 64 TiB, the same as storage in a standard Aurora DB cluster.
  • The minimum Aurora capacity unit is the lowest ACU to which the DB cluster can scale down. The maximum Aurora capacity unit is the highest ACU to which the DB cluster can scale up. Based on the settings, Aurora Serverless automatically creates scaling rules for thresholds for CPU utilization, connections, and available memory.
  • Database endpoint connects to a proxy fleet that routes the workload to a fleet of resources that are automatically scaled.
  • Aurora Serverless manages the connections automatically.
  • Proxy fleet enables continuous connections as Aurora Serverless scales the resources automatically based on the minimum and maximum capacity specifications.
  • Database client applications don’t need to change to use the proxy fleet.
  • Scaling is rapid because it uses a pool of “warm” resources that are always ready to service requests.
  • Aurora Serverless supports Automatic Pause where the DB cluster can be paused after a given amount of time with no activity. The default inactvity timeout is five minutes. Pausing the DB cluster can be disabled.
  • Automatic Pause reduces the compute charges to zero and only storage is charged. If database connections are requested when an Aurora Serverless DB cluster is paused, the DB cluster automatically resumes and services the connection requests.
  • When the DB cluster resumes activity, it has the same capacity as it had when Aurora paused the cluster. The number of ACUs depends on how much Aurora scaled the cluster up or down before pausing it.

Aurora Serverless and Failover

  • Aurora Serverless compute layer is placed in a Single AZ
  • separates computation capacity and storage, and the storage volume for the cluster is spread across multiple AZs. The data remains available even if outages affect the DB instance or the associated AZ.
  • supports automatic multi-AZ failover where if the DB instance for a DB cluster becomes unavailable or the Availability Zone (AZ) it is in fails, Aurora recreates the DB instance in a different AZ.
  • failover mechanism takes longer than for an Aurora Provisioned cluster.
  • failover time is currently undefined because it depends on demand and capacity available in other AZs within the given AWS Region

Aurora Serverless Auto Scaling

  • Aurora Serverless automatically scales based on the active database workload ( CPU or connections), in some cases, capacity might not scale fast enough to meet a sudden workload change, such as a large number of new transactions.
  • Once a scaling operation is initiated, Aurora Serverless attempts to find a scaling point, which is a point in time at which the database can safely complete scaling.
  • might not be able to find a scaling point and will not scale if there are
    • long-running queries or transactions in progress, or
    • temporary tables or table locks in use.
  • Supports cooldown period
    • After Scale up, it has a 15 minutes cooldown period for subsequent scale down
    • After Scale down, it has a 310 secs cooldown period for subsequent scale down
  • has no cooldown period for scaling up activities and scales as and when necessary

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

AWS_Aurora_Serverless

AWS RDS Monitoring & Notification

AWS RDS Monitoring & Notification

  • RDS integrates with CloudWatch and provides metrics for monitoring
  • CloudWatch alarms can be created over a single metric that sends an SNS message when the alarm changes state
  • RDS also provides SNS notification whenever any RDS event occurs
  • RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
  • RDS Recommendations provides automated recommendations for database resources.

 RDS CloudWatch Monitoring

  • RDS DB instance can be monitored using CloudWatch, which collects and processes raw data from RDS into readable, near real-time metrics.
  • Statistics are recorded so that you can access historical information and gain a better perspective on how the service is performing.
  • By default, RDS metric data is automatically sent to CloudWatch in 1-minute periods
  • CloudWatch RDS Metrics
    • BinLogDiskUsage – Amount of disk space occupied by binary logs on the master. Applies to MySQL read replicas.
    • CPUUtilization – Percentage of CPU utilization.
    • DatabaseConnections – Number of database connections in use.
    • DiskQueueDepth – The number of outstanding IOs (read/write requests) waiting to access the disk.
    • FreeableMemory – Amount of available random access memory.
    • FreeStorageSpace – Amount of available storage space.
    • ReplicaLag – Amount of time a Read Replica DB instance lags behind the source DB instance.
    • SwapUsage – Amount of swap space used on the DB instance.
    • ReadIOPS – Average number of disk I/O operations per second.
    • WriteIOPS – Average number of disk I/O operations per second.
    • ReadLatency – Average amount of time taken per disk I/O operation.
    • WriteLatency – Average amount of time taken per disk I/O operation.
    • ReadThroughput – Average number of bytes read from disk per second.
    • WriteThroughput – Average number of bytes written to disk per second.
    • NetworkReceiveThroughput – Incoming (Receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.
    • NetworkTransmitThroughput – Outgoing (Transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.

RDS Enhanced Monitoring

  • RDS provides metrics in real-time for the operating system (OS) that the DB instance runs on.
  • By default, Enhanced Monitoring metrics are stored for 30 days in the CloudWatch Logs, which are different from typical CloudWatch metrics.

CloudWatch vs Enhanced Monitoring Metrics

  • CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance.
  • Enhanced Monitoring metrics are useful to understand how different processes or threads on a DB instance use the CPU.
  • There might be differences between the measurements because the hypervisor layer performs a small amount of work. The differences can be greater if the DB instances use smaller instance classes because then there are likely more virtual machines (VMs) that are managed by the hypervisor layer on a single physical instance.

RDS Performance Insights

  • Performance Insights is a database performance tuning and monitoring feature that helps check the database’s performance and helps analyze any issues that affect it.
  • Database load is measured using a metric called Average Active Sessions or AAS which is calculated by sampling memory to determine the state of each active database connection.
  • AAS is the total number of sessions divided by the total number of samples for a specific time period.
  • Performance Insights help visualize the database load and filter the load by waits, SQL statements, hosts, or users.

RDS CloudTrail Logs

  • CloudTrail provides a record of actions taken by a user, role, or an AWS service in RDS.
  • CloudTrail captures all API calls for RDS as events, including calls from the console and from code calls to RDS API operations.
  • CloudTrail can help determine the request that was made to RDS, the IP address from which the request was made, who made the request, when it was made, and additional details.

RDS Recommendations

  • RDS provides automated recommendations for database resources.
  • The recommendations provide best practice guidance by analyzing DB instance configuration, usage, and performance data.

RDS Event Notification

  • RDS uses the SNS to provide notification when an RDS event occurs
  • RDS groups the events into categories, which can be subscribed so that a notification is sent when an event in that category occurs.
  • Event category for a DB instance, DB cluster, DB snapshot, DB cluster snapshot, DB security group, or for a DB parameter group can be subscribed
  • Event notifications are sent to the email addresses provided during subscription creation
  • Subscriptions can be easily turned off without deleting a subscription by setting the Enabled radio button to No in the RDS console or by setting the Enabled parameter to false using the CLI or RDS API.

RDS Trusted Advisor

  • Trusted Advisor inspects the AWS environment and then makes recommendations when opportunities exist to save money, improve system availability and performance, or help close security gaps.
  • Trusted Advisor has the following RDS-related checks:
    • RDS Idle DB Instances
    • RDS Security Group Access Risk
    • RDS Backups
    • RDS Multi-AZ

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You run a web application with the following components Elastic Load Balancer (ELB), 3 Web/Application servers, 1 MySQL RDS database with read replicas, and Amazon Simple Storage Service (Amazon S3) for static content. Average response time for users is increasing slowly. What three CloudWatch RDS metrics will allow you to identify if the database is the bottleneck? Choose 3 answers
    1. The number of outstanding IOs waiting to access the disk
    2. The amount of write latency
    3. The amount of disk space occupied by binary logs on the master.
    4. The amount of time a Read Replica DB Instance lags behind the source DB Instance
    5. The average number of disk I/O operations per second.
  2. Typically, you want your application to check whether a request generated an error before you spend any time processing results. The easiest way to find out if an error occurred is to look for an __________ node in the response from the Amazon RDS API.
    1. Incorrect
    2. Error
    3. FALSE
  3. In the Amazon CloudWatch, which metric should I be checking to ensure that your DB Instance has enough free storage space?
    1. FreeStorage
    2. FreeStorageSpace
    3. FreeStorageVolume
    4. FreeDBStorageSpace
  4. A user is receiving a notification from the RDS DB whenever there is a change in the DB security group. The user does not want to receive these notifications for only a month. Thus, he does not want to delete the notification. How can the user configure this?
    1. Change the Disable button for notification to “Yes” in the RDS console
    2. Set the send mail flag to false in the DB event notification console
    3. The only option is to delete the notification from the console
    4. Change the Enable button for notification to “No” in the RDS console
  5. A sys admin is planning to subscribe to the RDS event notifications. For which of the below mentioned source categories the subscription cannot be configured?
    1. DB security group
    2. DB snapshot
    3. DB options group
    4. DB parameter group
  6. A user is planning to setup notifications on the RDS DB for a snapshot. Which of the below mentioned event categories is not supported by RDS for this snapshot source type?
    1. Backup (Refer link)
    2. Creation
    3. Deletion
    4. Restoration
  7. A system admin is planning to setup event notifications on RDS. Which of the below mentioned services will help the admin setup notifications?
    1. AWS SES
    2. AWS Cloudtrail
    3. AWS CloudWatch
    4. AWS SNS
  8. A user has setup an RDS DB with Oracle. The user wants to get notifications when someone modifies the security group of that DB. How can the user configure that?
    1. It is not possible to get the notifications on a change in the security group
    2. Configure SNS to monitor security group changes
    3. Configure event notification on the DB security group
    4. Configure the CloudWatch alarm on the DB for a change in the security group
  9. It is advised that you watch the Amazon CloudWatch “_____” metric (available via the AWS Management Console or Amazon Cloud Watch APIs) carefully and recreate the Read Replica should it fall behind due to replication errors.
    1. Write Lag
    2. Read Replica
    3. Replica Lag
    4. Single Replica

Kubernetes Overview

Kubernetes Overview

  • Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.
  • Kubernetes originates from Greek, meaning helmsman or pilot.
  • Kubernetes provides an orchestration framework to run distributed systems resiliently. It takes care of scaling and failover for the application, provides deployment patterns, and more.

Container Deployment Model

Deployment evolution

  • Containers are similar to VMs, but they have relaxed isolation properties to share the Operating System (OS) among the applications.
  • Containers are lightweight and have their own filesystem, share of CPU, memory, process space, and more.
  • Containers are decoupled from the underlying infrastructure, they are portable across clouds and OS distributions.
  • Containers provide the following benefits
    • Agile application creation and deployment
    • Continuous development, integration, and deployment
    • Dev and Ops separation of concerns
    • Observability
    • Environmental consistency across development, testing, and production
    • Cloud and OS distribution portability
    • Application-centric management
    • Loosely coupled, distributed, elastic, liberated micro-services
    • Resource isolation & utilization

Kubernetes Features

  • Service discovery and load balancing
    • Kubernetes can expose a container using the DNS name or using their own IP address.
    • If traffic to a container is high, Kubernetes is able to load balance and distribute the network traffic so that the deployment is stable.
  • Storage orchestration
    • Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and more.
  • Automated rollouts and rollbacks
    • Kubernetes can change the actual state of the deployed containers to the desired state at a controlled rate ensuring zero downtime.
  • Automatic bin packing
    • Kubernetes can fit containers onto the available nodes to make the best use of the resources as per the specified container specification.
  • Self-healing & High Availability

    • Kubernetes restarts containers that fail, replaces containers, kills containers that don’t respond to the user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
  • Scalability
    • Kubernetes can help scale the application as per the load.
  • Secret and configuration management
    • Kubernetes helps store and manage sensitive information, such as passwords, OAuth tokens, and SSH keys.
    • Secrets and application configuration can be deployed without rebuilding the container images, and without exposing secrets in the stack configuration.

Kubernetes Architecture

Refer to detailed blog post @ Kubernetes Architecture

Kubernetes ArchitectureMaster components

  • Master components provide the cluster’s control plane.
  • Master components make global decisions about the cluster (for example, scheduling), and that they detect and answer cluster events (for example, beginning a replacement pod when a deployment’s replicas field is unsatisfied).
  • Master components include
    • Kube-API server – Exposes the API.
    • Etcd – key-value stores all cluster data. (Can be run on the same server as a master node or on a dedicated cluster.)
    • Kube-scheduler – Schedules new pods on worker nodes.
    • Kube-controller-manager – Runs the controllers.
    • Cloud-controller-manager – Talks to cloud providers.

Node components

  • Node components run on every node, maintaining running pods and providing the Kubernetes runtime environment.
    • Kubelet – Agent that ensures containers in a pod are running.
    • Kube-proxy – Keeps network rules and performs forwarding.
    • Container runtime – Runs containers.

Kubernetes Components

Refer to blog post @ Kubernetes Components

Kubernetes Security

Refer to blog post @ Kubernetes Security

 

AWS ElastiCache

AWS ElastiCache

  • AWS ElastiCache is a managed web service that helps deploy and run Memcached or Redis protocol-compliant cache clusters in the cloud easily.
  • ElastiCache is available in two flavours: Memcached and Redis
  • ElastiCache helps
    • simplify and offload the management, monitoring, and operation of in-memory cache environments, enabling the engineering resources to focus on developing applications.
    • automate common administrative tasks required to operate a distributed cache environment.
    • improves the performance of web applications by allowing retrieval of information from a fast, managed, in-memory caching system, instead of relying entirely on slower disk-based databases.
    • helps improve load & response times to user actions and queries, but also reduces the cost associated with scaling web applications.
    • helps automatically detect and replace failed cache nodes, providing a resilient system that mitigates the risk of overloaded databases, which can slow website and application load times.
    • provides enhanced visibility into key performance metrics associated with the cache nodes through integration with CloudWatch.
    • code, applications, and popular tools already using Memcached or Redis environments work seamlessly, with being protocol-compliant with Memcached and Redis environments
  • ElastiCache provides in-memory caching which can
    • significantly lower latency and improve throughput for many
      • read-heavy application workloads e.g. social networking, gaming, media sharing, and Q&A portals.
      • compute-intensive workloads such as a recommendation engine.
    • improve application performance by storing critical pieces of data in memory for low-latency access.
    • be used to cache the results of I/O-intensive database queries or the results of computationally-intensive calculations.
  • ElastiCache currently allows access only from the EC2 network and cannot be accessed from outside networks like on-premises servers.

ElastiCache Redis vs Memcached

AWS ElastiCache Redis vs Memcached

Redis

  • Redis is an open source, BSD licensed, advanced key-value cache & store.
  • ElastiCache enables the management, monitoring, and operation of a Redis node; creation, deletion, and modification of the node.
  • ElastiCache for Redis can be used as a primary in-memory key-value data store, providing fast, sub-millisecond data performance, high availability and scalability up to 16 nodes plus up to 5 read replicas, each of up to 3.55 TiB of in-memory data.
  • ElastiCache for Redis supports (similar to RDS features)
    • Redis Master/Slave replication.
    • Multi-AZ operation by creating read replicas in another AZ
    • Backup and Restore feature for persistence using snapshots
  • ElastiCache for Redis can be vertically scaled upwards by selecting a larger node type or by adding shards (with cluster mode enabled).
  • Parameter group can be specified for Redis during installation, which acts as a “container” for Redis configuration values that can be applied to one or more Redis primary clusters.
  • Append Only File – AOF
    • provides persistence and can be enabled for recovery scenarios.
    • if a node restarts or service crashes, Redis will replay the updates from an AOF file, thereby recovering the data lost due to the restart or crash.
    • cannot protect against all failure scenarios, cause if the underlying hardware fails, a new server would be provisioned and the AOF file will no longer be available to recover the data.
  • ElastiCache for Redis doesn’t support the AOF feature but you can achieve persistence by snapshotting the Redis data using the Backup and Restore feature.
  • Enabling Redis Multi-AZ is a Better Approach to Fault Tolerance, as failing over to a read replica is much faster than rebuilding the primary from an AOF file.

Redis Features

  • High Availability, Fault Tolerance & Auto Recovery
    • Multi-AZ for a failed primary cluster to a read replica, in Redis clusters that support replication.
    • Fault Tolerance – Flexible AZ placement of nodes and clusters
    • High Availability – Primary instance and a synchronous secondary instance to fail over when problems occur. You can also use read replicas to increase read scaling.
    • Auto-Recovery – Automatic detection of and recovery from cache node failures.
    • Backup & Restore – Automated backups or manual snapshots can be performed. Redis restore process works reliably and efficiently.
  • Performance
    • Data Partitioning – Redis (cluster mode enabled) supports partitioning the data across up to 500 shards.
    • Data Tiering – Provides a price-performance option for Redis workloads by utilizing lower-cost solid state drives (SSDs) in each cluster node in addition to storing data in memory. It is ideal for workloads that access up to 20% of their overall dataset regularly, and for applications that can tolerate additional latency when accessing data on SSD.
  • Security
    • Encryption – Supports encryption in transit and encryption at rest encryption with authentication. This support helps you build HIPAA-compliant applications.
    • Access Control – Control access to the ElastiCache for Redis clusters by using AWS IAM to define users and permissions.
    • Supports Redis AUTH or Managed Role-Based Access Control (RBAC).
  • Administration
    • Low Administration – ElastiCache for Redis manages backups, software patching, automatic failure detection, and recovery.
    • Integration with other AWS services such as EC2, CloudWatch, CloudTrail, and SNS.
    • Global Datastore for Redis feature provides a fully managed, fast, reliable, and secure replication across AWS Regions. Cross-Region read replica clusters for ElastiCache for Redis can be created to enable low-latency reads and disaster recovery across AWS Regions.

Redis Read Replica

  • Read Replicas help provide Read scaling and handling failures
  • Read Replicas are kept in sync with the Primary node using Redis’s asynchronous replication technology
  • Redis Read Replicas provides
    • Horizontal scaling beyond the compute or I/O capacity of a single primary node for read-heavy workloads.
    • Serving read traffic while the primary is unavailable either being down due to failure or maintenance
    • Data protection scenarios to promote a Read Replica as the primary node, in case the primary node or the AZ of the primary node fails.
  • ElastiCache supports initiated or forced failover where it flips the DNS record for the primary node to point at the read replica, which is in turn promoted to become the new primary.
  • Read replica cannot span across regions and may only be provisioned in the same or different AZ of the same Region as the cache node primary.

Redis Multi-AZ

  • ElastiCache for Redis shard consists of a primary and up to 5 read replicas
  • Redis asynchronously replicates the data from the primary node to the read replicas
  • ElastiCache for Redis Multi-AZ mode
    • provides enhanced availability and a smaller need for administration as the node failover is automatic.
    • impact on the ability to read/write to the primary is limited to the time it takes for automatic failover to complete.
    • no longer needs monitoring of Redis nodes and manually initiating a recovery in the event of a primary node disruption.
  • During certain types of planned maintenance, or in the unlikely event of ElastiCache node failure or AZ failure,
    • it automatically detects the failure,
    • selects a replica, depending upon the read replica with the smallest asynchronous replication lag to the primary, and promotes it to become the new primary node
    • it will also propagate the DNS changes so that the primary endpoint remains the same
  • If Multi-AZ is not enabled,
    • ElastiCache monitors the primary node.
    • in case the node becomes unavailable or unresponsive, it will repair the node by acquiring new service resources.
    • it propagates the DNS endpoint changes to redirect the node’s existing DNS name to point to the new service resources.
    • If the primary node cannot be healed and you will have the choice to promote one of the read replicas to be the new primary.

Redis Backup & Restore

  • Backup and Restore allow users to create snapshots of the Redis clusters.
  • Snapshots can be used for recovery, restoration, archiving purposes, or warm start an ElastiCache for Redis cluster with preloaded data
  • Snapshots can be created on a cluster basis and use Redis’ native mechanism to create and store an RDB file as the snapshot.
  • Increased latencies for a brief period at the node might be encountered while taking a snapshot and is recommended to be taken from a Read Replica minimizing performance impact
  • Snapshots can be created either automatically (if configured) or manually
  • ElastiCache for Redis cluster when deleted removes the automatic snapshots. However, manual snapshots are retained.

Redis Cluster Mode

ElastiCache Redis provides the ability to create distinct types of Redis clusters

  • A Redis (cluster mode disabled) cluster
    • always has a single shard with up to 5 read replica nodes.
  • A Redis (cluster mode enabled) cluster
    • has up to 500 shards with 1 to 5 read replica nodes in each.

ElastiCache Redis Cluster Mode

  • Scaling vs Partitioning
    • Redis (cluster mode disabled) supports Horizontal scaling for read capacity by adding or deleting replica nodes, or vertical scaling by scaling up to a larger node type.
    • Redis (cluster mode enabled) supports partitioning the data across up to 500 node groups. The number of shards can be changed dynamically as the demand changes. It also helps spread the load over a greater number of endpoints, which reduces access bottlenecks during peak demand.
  • Node Size vs Number of Nodes
    • Redis (cluster mode disabled) cluster has only one shard and the node type must be large enough to accommodate all the cluster’s data plus necessary overhead.
    • Redis (cluster mode enabled) cluster can have smaller node types as the data can be spread across partitions.
  • Reads vs Writes
    • Redis (cluster mode disabled) cluster can be scaled for reads by adding more read replicas (5 max)
    • Redis (cluster mode disabled) cluster can be scaled for both reads and writes by adding read replicas and multiple shards.

Memcached

  • Memcached is an in-memory key-value store for small chunks of arbitrary data.
  • ElastiCache for Memcached can be used to cache a variety of objects
    • from the content in persistent data stores such as RDS, DynamoDB, or self-managed databases hosted on EC2)
    • dynamically generated web pages e.g. with Nginx
    • transient session data that may not require a persistent backing store
  • ElastiCache for Memcached
    • can be scaled Vertically by increasing the node type size
    • can be scaled Horizontally by adding and removing nodes
    • does not support the persistence of data
  • ElastiCache for Memcached cluster can have
    • nodes that can span across multiple AZs within the same region
    • maximum of 20 nodes per cluster with a maximum of 100 nodes per region (soft limit and can be extended).
  • ElastiCache for Memcached supports auto-discovery, which enables the automatic discovery of cache nodes by clients when they are added to or removed from an ElastiCache cluster.

ElastiCache Mitigating Failures

  • ElastiCache should be designed to plan so that failures have a minimal impact on the application and data.
  • Mitigating Failures when Running Memcached
    • Mitigating Node Failures
      • spread the cached data over more nodes
      • as Memcached does not support replication, a node failure will always result in some data loss from the cluster
      • having more nodes will reduce the proportion of cache data lost
    • Mitigating Availability Zone Failures
      • locate the nodes in as many availability zones as possible, only the data cached in that AZ is lost, not the data cached in the other AZs
  • Mitigating Failures when Running Redis
    • Mitigating Cluster Failures
      • Redis Append Only Files (AOF)
        • enable AOF so whenever data is written to the Redis cluster, a corresponding transaction record is written to a Redis AOF.
        • when Redis process restarts, ElastiCache creates a replacement cluster and provisions it and repopulates it with data from AOF.
        • It is time-consuming
        • AOF can get big.
        • Using AOF cannot protect you from all failure scenarios.
      • Redis Replication Groups
        • A Redis replication group is comprised of a single primary cluster which the application can both read from and write to, and from 1 to 5 read-only replica clusters.
        • Data written to the primary cluster is also asynchronously updated on the read replica clusters.
        • When a Read Replica fails, ElastiCache detects the failure, replaces the instance in the same AZ, and synchronizes with the Primary Cluster.
        • Redis Multi-AZ with Automatic Failover, ElastiCache detects Primary cluster failure and promotes a read replica with the least replication lag to primary.
        • Multi-AZ with Auto Failover is disabled, ElastiCache detects Primary cluster failure, creates a new one and syncs the new Primary with one of the existing replicas.
    • Mitigating Availability Zone Failures
      • locate the clusters in as many availability zones as possible

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon ElastiCache provide?
    1. A service by this name doesn’t exist. Perhaps you mean Amazon CloudCache.
    2. A virtual server with a huge amount of memory.
    3. A managed In-memory cache service
    4. An Amazon EC2 instance with the Memcached software already pre-installed.
  2. You are developing a highly available web application using stateless web servers. Which services are suitable for storing session state data? Choose 3 answers.
    1. Elastic Load Balancing
    2. Amazon Relational Database Service (RDS)
    3. Amazon CloudWatch
    4. Amazon ElastiCache
    5. Amazon DynamoDB
    6. AWS Storage Gateway
  3. Which statement best describes ElastiCache?
    1. Reduces the latency by splitting the workload across multiple AZs
    2. A simple web services interface to create and store multiple data sets, query your data easily, and return the results
    3. Offload the read traffic from your database in order to reduce latency caused by read-heavy workload
    4. Managed service that makes it easy to set up, operate and scale a relational database in the cloud
  4. Our company is getting ready to do a major public announcement of a social media site on AWS. The website is running on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers)
    1. Deploy ElastiCache in-memory cache running in each availability zone
    2. Implement sharding to distribute load to multiple RDS MySQL instances
    3. Increase the RDS MySQL Instance size and Implement provisioned IOPS
    4. Add an RDS MySQL read replica in each availability zone
  5. You are using ElastiCache Memcached to store session state and cache database queries in your infrastructure. You notice in CloudWatch that Evictions and Get Misses are both very high. What two actions could you take to rectify this? Choose 2 answers
    1. Increase the number of nodes in your cluster
    2. Tweak the max_item_size parameter
    3. Shrink the number of nodes in your cluster
    4. Increase the size of the nodes in the cluster
  6. You have been tasked with moving an ecommerce web application from a customer’s datacenter into a VPC. The application must be fault tolerant and well as highly scalable. Moreover, the customer is adamant that service interruptions not affect the user experience. As you near launch, you discover that the application currently uses multicast to share session state between web servers, In order to handle session state within the VPC, you choose to:
    1. Store session state in Amazon ElastiCache for Redis (scalable and makes the web applications stateless)
    2. Create a mesh VPN between instances and allow multicast on it
    3. Store session state in Amazon Relational Database Service (RDS solution not highly scalable)
    4. Enable session stickiness via Elastic Load Balancing (affects user experience if the instance goes down)
  7. When you are designing to support a 24-hour flash sale, which one of the following methods best describes a strategy to lower the latency while keeping up with unusually heavy traffic?
    1. Launch enhanced networking instances in a placement group to support the heavy traffic (only improves internal communication)
    2. Apply Service Oriented Architecture (SOA) principles instead of a 3-tier architecture (just simplifies architecture)
    3. Use Elastic Beanstalk to enable blue-green deployment (only minimizes download for applications and ease of rollback)
    4. Use ElastiCache as in-memory storage on top of DynamoDB to store user sessions (scalable, faster read/writes and in memory storage)
  8. You are configuring your company’s application to use Auto Scaling and need to move user state information. Which of the following AWS services provides a shared data store with durability and low latency?
    1. AWS ElastiCache Memcached (does not provide durability as if the node is gone the data is gone)
    2. Amazon Simple Storage Service
    3. Amazon EC2 instance storage
    4. Amazon DynamoDB
  9. Your application is using an ELB in front of an Auto Scaling group of web/application servers deployed across two AZs and a Multi-AZ RDS Instance for data persistence. The database CPU is often above 80% usage and 90% of I/O operations on the database are reads. To improve performance you recently added a single-node Memcached ElastiCache Cluster to cache frequent DB query results. In the next weeks the overall workload is expected to grow by 30%. Do you need to change anything in the architecture to maintain the high availability for the application with the anticipated additional load and Why?
    1. You should deploy two Memcached ElastiCache Clusters in different AZs because the RDS Instance will not be able to handle the load if the cache node fails.
    2. If the cache node fails the automated ElastiCache node recovery feature will prevent any availability impact. (does not provide high availability, as data is lost if the node is lost)
    3. Yes you should deploy the Memcached ElastiCache Cluster with two nodes in the same AZ as the RDS DB master instance to handle the load if one cache node fails. (Single AZ affects availability as DB is Multi AZ and would be overloaded is the AZ goes down)
    4. No if the cache node fails you can always get the same data from the DB without having any availability impact. (Will overload the database affecting availability)
  10. A read only news reporting site with a combined web and application tier and a database tier that receives large and unpredictable traffic demands must be able to respond to these traffic fluctuations automatically. What AWS services should be used meet these requirements?
    1. Stateless instances for the web and application tier synchronized using ElastiCache Memcached in an autoscaling group monitored with CloudWatch and RDS with read replicas.
    2. Stateful instances for the web and application tier in an autoscaling group monitored with CloudWatch and RDS with read replicas (Stateful instances will not allow for scaling)
    3. Stateful instances for the web and application tier in an autoscaling group monitored with CloudWatch and multi-AZ RDS (Stateful instances will allow not for scaling & multi-AZ is for high availability and not scaling)
    4. Stateless instances for the web and application tier synchronized using ElastiCache Memcached in an autoscaling group monitored with CloudWatch and multi-AZ RDS (multi-AZ is for high availability and not scaling)
  11. You have written an application that uses the Elastic Load Balancing service to spread traffic to several web servers. Your users complain that they are sometimes forced to login again in the middle of using your application, after they have already logged in. This is not behavior you have designed. What is a possible solution to prevent this happening?
    1. Use instance memory to save session state.
    2. Use instance storage to save session state.
    3. Use EBS to save session state.
    4. Use ElastiCache to save session state.
    5. Use Glacier to save session slate.