AWS Web Application Firewall – WAF

AWS Web Application Firewall – WAF

  • AWS WAF – Web Application Firewall protects web applications from attacks by allowing rules configuration that allow, block, or monitor (count) web requests based on defined conditions.
  • helps protects from common attack techniques like SQL injection and Cross-Site Scripting (XSS), Conditions based include IP addresses, HTTP headers, HTTP body, and URI strings.
  • tightly integrates with CloudFront, API Gateway, AppSync, and the Application Load Balancer (ALB) services used to deliver content for their websites and applications.
    • AWS WAF with Amazon CloudFront
      • AWS WAF rules run in all AWS Edge Locations, located around the world close to the end users.
      • Blocked requests are stopped before they reach the web servers.
      • Helps support custom origins outside of AWS.
    • AWS WAF with Application Load Balancer
      • WAF rules run in the region and can be used to protect internet-facing as well as internal load balancers.
    • AWS WAF with API Gateway
      • Can help secure and protect the REST APIs.
  • helps protect applications and can inspect web requests transmitted over HTTP or HTTPS.
  • provides Managed Rules which are pre-configured rules to protect applications from common threats like application vulnerabilities like OWASP, bots, or Common Vulnerabilities and Exposures (CVE).
  • logs can be sent to the CloudWatch Logs log group, an S3 bucket, or Kinesis Data Firehose.

WAF Benefits

  • Additional protection against web attacks using specified conditions
  • Conditions can be defined by using characteristics of web requests such as the following:
    • IP addresses that the requests originate from
    • Values in request headers
    • Strings that appear in the requests
    • Length of requests
    • Presence of SQL code that is likely to be malicious (this is known as SQL injection)
    • Presence of a script that is likely to be malicious (this is known as cross-site scripting)
  • Managed Rules to get you started quickly
  • Rules that you can reuse for multiple web applications
  • Real-time metrics and sampled web requests
  • Automated administration using the WAF API

How WAF Works

WAF allows controlling the behaviour of web requests by creating conditions, rules, and web access control lists (web ACLs).

WAF Works

Conditions

  • Conditions define basic characteristics to watch for in a web request
    • Malicious script – XSS  (Cross Site Scripting) – Attackers embed scripts that can exploit vulnerabilities in web applications
    • IP addresses or address ranges that requests originate from.
    • Size – Length of specified parts of the request, such as the query string.
    • Malicious SQL – SQL injection – Attackers try to extract data from the database by embedding malicious SQL code in a web request
    • Geographic match – Allow or block requests based on the country from which the requests originate.
    • Strings that appear in the request, for e.g., values that appear in the User-Agent header or text strings that appear in the query string.
      Some conditions take multiple values.

Actions

  • Allow all requests except the ones specified – blacklisting for e.g all IP addresses except the ones specified
  • Block all requests except the ones specified – whitelisting for e.g IP addresses the request originates from
  • Monitor (Count) the requests that match the specified properties – allows counting of the requests that match the defined properties, which can be useful when configuring and testing allow or block requests using new properties. After confirming that the config did not accidentally block all of the traffic to the website, the configuration can be applied to change the behaviour to allow or block requests.
  • CAPTCHA – runs a CAPTCHA check against the request.

Rules

  • AWS WAF rule defines how to inspect HTTP(S) web requests and the action to take on a request when it matches the inspection criteria.
  • Each rule requires one top-level rule statement, which might contain nested statements at any depth, depending on the rule and statement type.
  • AWS WAF also supports logical statements for ANDOR, and NOT that you use to combine statements in a rule. for e.g.,
    • based on recent requests that you’ve seen from an attacker, you might create a rule that includes the following conditions with logical AND condition:
      • The requests come from 192.0.2.44.
      • They contain the value BadBot in the User-Agent header.
      • They appear to include malicious SQL code in the query string.
    • All 3 conditions should be satisfied for the Rule to be passed and the associated action to be taken.

Rule Groups

  • A Rule Group is a reusable set of rules that can be added to a Web ACL.
  • Rule groups fall into the following main categories
    • Managed rule groups, which AWS Managed Rules and AWS Marketplace sellers create and maintain for you
    • Your own rule groups, which you create and maintain
    • Rule groups that are owned and managed by other services, like AWS Firewall Manager and Shield Advanced.

Web ACLs – Access Control Lists

  • A Web Access Control List – Web ACL provides fine-grained control over all of the HTTP(S) web requests that the protected resource responds to.
  • Web ACLs provides
    • Rule Groups OR Combination of Rules
    • Action – allow, block or count to perform for each rule
      • WAF compares a request with the rules in a web ACL in the order in which it is listed and takes the action that is associated with the first rule that the request matches.
      • For multiple rules in a web ACL, WAF evaluates each request against the rules in the order they are listed in the web ACL.
      • When a web request matches all of the conditions in a rule, WAF immediately takes the action – allow or block – and doesn’t evaluate the request against the remaining rules in the web ACL, if any.
    • Default action
      • determines whether WAF allows or blocks a request that does not match all of the conditions in any of the rules
  • Supports criteria like the following to allow or block requests
    • IP address origin of the request
    • Country of origin of the request
    • String match or regular expression (regex) match in a part of the request
    • Size of a particular part of the request
    • Detection of malicious SQL code or scripting
    • Rate based rules

AWS WAF based Architecture

AWS WAF Blacklist Example
  1. AWS WAF integration with CloudFront and Lambda to dynamically update WAF rules
  2. CloudFront receives requests on behalf of the web application, it sends access logs to an S3 bucket that contains detailed information about the requests.
  3. For every new access log stored in the S3 bucket, a Lambda function is triggered. The Lambda function parses the log files and looks for requests that resulted in error codes 400, 403, 404, and 405.
  4. Lambda function then counts the number of bad requests and temporarily stores results in the S3 bucket
  5. Lambda function updates AWS WAF rules to block the IP addresses for a period of time that you specify.
  6. After this blocking period has expired, AWS WAF allows those IP addresses to access your application again, but continues to monitor the requests from those IP addresses.
  7. Lambda function publishes execution metrics in CloudWatch, such as the number of requests analyzed and IP addresses blocked.
  8. CloudWatch metrics can be integrated with SNS for notification

Web Application Firewall Sandwich Architecture

NOTE :- From DDOS Resiliency Whitepaper – doesn’t use the AWS WAF and is not valid anymore

WAF Sandwich Architecture

  • DDoS attacks at the application layer commonly target web applications with lower volumes of traffic compared to infrastructure attacks.
  • WAF can be included as part of the infrastructure to mitigate these types of attacks
  • WAFs act as filters that apply a set of rules to web traffic, which cover exploits like XSS and SQL injection but can also help build resiliency against DDoS by mitigating HTTP GET or POST floods.
  • HTTP works as a request-response protocol between end users and applications where end users request data (GET) and submit data to be processed (POST). GET floods work by requesting the same URL at a high rate or requesting all objects from your application. POST floods work by finding expensive application processes, i.e., logins or database searches, and triggering those process to overwhelm your application.
  • WAFs have several features that may prevent these types of attacks from affecting the application availability for e.g. HTTP rate limiting which limits the number of requests per end user within a certain time period. Once the threshold is exceeded, WAFs can block or buffer new requests to ensure other end users have access to the application.
  • WAFs can also inspect HTTP requests and identify those that don’t confirm to normal patterns
  • In the “WAF sandwich,” the EC2 instance running the WAF software (not the AWS WAF) is included in an Auto Scaling group and placed in between two ELB load balancers. Basic load balancer in the default VPC will be the frontend, public facing load balancer that will distribute all incoming traffic to the WAF EC2 instance.
  • With WAF sandwich pattern, the instance can scale and add additional WAF EC2 instances should the traffic spike to elevated levels.
  • Once the traffic has been inspected and filtered, the WAF EC2 instance forwards traffic to the internal, backend load balancer which then distributes traffic across the application EC2 instance.
  • This configuration allows the WAF EC2 instances to scale and meet capacity demands without affecting the availability of your application EC2 instance.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. The Web Application Development team is worried about malicious activity from 200 random IP addresses. Which action will ensure security and scalability from this type of threat?
    1. Use inbound security group rules to block the IP addresses.
    2. Use inbound network ACL rules to block the IP addresses.
    3. Use AWS WAF to block the IP addresses.
    4. Write iptables rules on the instance to block the IP addresses.
  2. You’ve been hired to enhance the overall security posture for a very large e-commerce site. They have a well architected multi-tier application running in a VPC that uses ELBs in front of both the web and the app tier with static assets served directly from S3. They are using a combination of RDS and DynamoDB for their dynamic data and then archiving nightly into S3 for further processing with EMR. They are concerned because they found questionable log entries and suspect someone is attempting to gain unauthorized access. Which approach provides a cost effective scalable mitigation to this kind of attack? [Old Exam Question]
    1. Recommend mat they lease space at a DirectConnect partner location and establish a 1G DirectConnect connection to their VPC they would then establish Internet connectivity into their space, filter the traffic in hardware Web Application Firewall (WAF). And then pass the traffic through the DirectConnect connection into their application running in their VPC. (Not cost effective)
    2. Add previously identified hostile source IPs as an explicit INBOUND DENY NACL to the web tier subnet. (does not protect against new source)
    3. Add a WAF tier by creating a new ELB and an AutoScaling group of EC2 Instances running a host-based WAF. They would redirect Route 53 to resolve to the new WAF tier ELB. The WAF tier would then pass the traffic to the current web tier. Web tier Security Groups would be updated to only allow traffic from the WAF tier Security Group
    4. Remove all but TLS 1.2 from the web tier ELB and enable Advanced Protocol Filtering This will enable the ELB itself to perform WAF functionality. (No advanced protocol filtering in ELB)

References

AWS_WAF

AWS Identity Services Cheat Sheet

AWS Identity Services Cheat Sheet

AWS Identity and Security Services

IAM – Identity & Access Management

  • securely control access to AWS services and resources
  • helps create and manage user identities and grant permissions for those users to access AWS resources
  • helps create groups for multiple users with similar permissions
  • not appropriate for application authentication
  • is Global and does not need to be migrated to a different region
  • helps define Policies,
    • in JSON format
    • all permissions are implicitly denied by default
    • most restrictive policy wins
  • IAM Role
    • helps grants and delegate access to users and services without the need of creating permanent credentials
    • IAM users or AWS services can assume a role to obtain temporary security credentials that can be used to make AWS API calls
    • needs Trust policy to define who and Permission policy to define what the user or service can access
    • used with Security Token Service (STS), a lightweight web service that provides temporary, limited privilege credentials for IAM users or for authenticated federated users
    • IAM role scenarios
      • Service access for e.g. EC2 to access S3 or DynamoDB
      • Cross Account access for users
        • with user within the same account
        • with user within an AWS account owned the same owner
        • with user from a Third Party AWS account with External ID for enhanced security
      • Identity Providers & Federation
        • AssumeRoleWithWebIdentity – Web Identity Federation, where the user can be authenticated using external authentication Identity providers like Amazon, Google or any OpenId IdP
        • AssumeRoleWithSAML – Identity Provider using SAML 2.0, where the user can be authenticated using on premises Active Directory, Open Ldap or any SAML 2.0 compliant IdP
        • AssumeRole (recommended) or GetFederationToken – For other Identity Providers, use Identity Broker to authenticate and provide temporary Credentials
  • IAM Best Practices
    • Do not use Root account for anything other than billing
    • Create Individual IAM users
    • Use groups to assign permissions to IAM users
    • Grant least privilege
    • Use IAM roles for applications on EC2
    • Delegate using roles instead of sharing credentials
    • Rotate credentials regularly
    • Use Policy conditions for increased granularity
    • Use CloudTrail to keep a history of activity
    • Enforce a strong IAM password policy for IAM users
    • Remove all unused users and credentials

AWS Directory Services

  • gives applications in AWS access to Active Directory services
  • different from SAML + AD, where the access is granted to AWS services through Temporary Credentials
  • Simple AD
    • least expensive but does not support Microsoft AD advanced features
    • provides a Samba 4 Microsoft Active Directory compatible standalone directory service on AWS
    • No single point of Authentication or Authorization, as a separate copy is maintained
    • trust relationships cannot be setup between Simple AD and other Active Directory domains
    • Don’t use it, if the requirement is to leverage access and control through centralized authentication service
  • AD Connector
    • acts just as an hosted proxy service for instances in AWS to connect to on-premises Active Directory
    • enables consistent enforcement of existing security policies, such as password expiration, password history, and account lockouts, whether users are accessing resources on-premises or in the AWS cloud
    • needs VPN connectivity (or Direct Connect)
    • integrates with existing RADIUS-based MFA solutions to enabled multi-factor authentication
    • does not cache data which might lead to latency
  • Read-only Domain Controllers (RODCs)
    • works out as a Read-only Active Directory
    • holds a copy of the Active Directory Domain Service (AD DS) database and respond to authentication requests
    • they cannot be written to and are typically deployed in locations where physical security cannot be guaranteed
    • helps maintain a single point to authentication & authorization controls, however needs to be synced
  • Writable Domain Controllers
    • are expensive to setup
    • operate in a multi-master model; changes can be made on any writable server in the forest, and those changes are replicated to servers throughout the entire forest

AWS Single Sign-On SSO

  • is a cloud-based single sign-on (SSO) service that makes it easy to centrally manage SSO access to all of the AWS accounts and cloud applications.
  • helps manage access and permissions to commonly used third-party software as a service (SaaS) applications, AWS SSO-integrated applications as well as custom applications that support SAML 2.0.
  • includes a user portal where the end-users can find and access all their assigned AWS accounts, cloud applications, and custom applications in one place.

AWS Security Services Cheat Sheet

AWS Identity and Security Services

AWS Security Services Cheat Sheet

AWS Identity and Security Services

Key Management Service – KMS

  • is a managed encryption service that allows the creation and control of encryption keys to enable data encryption.
  • provides a highly available key storage, management, and auditing solution to encrypt the data across AWS services & within applications.
  • uses hardware security modules (HSMs) to protect and validate the KMS keys by the FIPS 140-2 Cryptographic Module Validation Program.
  • seamlessly integrates with several AWS services to make encrypting data in those services easy.
  • supports multi-region keys, which are AWS KMS keys in different AWS Regions. Multi-Region keys are not global and each multi-region key needs to be replicated and managed independently.

CloudHSM

  • provides secure cryptographic key storage to customers by making hardware security modules (HSMs) available in the AWS cloud
  • helps manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs.
  • single tenant, dedicated physical device to securely generate, store, and manage cryptographic keys used for data encryption
  • are inside the VPC (not EC2-classic) & isolated from the rest of the network
  • can use VPC peering to connect to CloudHSM from multiple VPCs
  • integrated with Amazon Redshift and Amazon RDS for Oracle
  • EBS volume encryption, S3 object encryption and key management can be done with CloudHSM but requires custom application scripting
  • is NOT fault-tolerant and would need to build a cluster as if one fails all the keys are lost
  • enables quick scaling by adding and removing HSM capacity on-demand, with no up-front costs.
  • automatically load balance requests and securely duplicates keys stored in any HSM to all of the other HSMs in the cluster.
  • expensive, prefer AWS Key Management Service (KMS) if cost is a criteria.

AWS WAF

  • is a web application firewall that helps monitor the HTTP/HTTPS traffic and allows controlling access to the content.
  • helps protect web applications from attacks by allowing rules configuration that allow, block, or monitor (count) web requests based on defined conditions. These conditions include IP addresses, HTTP headers, HTTP body, URI strings, SQL injection and cross-site scripting.
  • helps define Web ACLs, which is a combination of Rules that is a combinations of Conditions and Action to block or allow
  • integrated with CloudFront, Application Load Balancer (ALB), API Gateway services commonly used to deliver content and applications
  • supports custom origins outside of AWS, when integrated with CloudFront
  • Third Party WAF
    • act as filters that apply a set of rules to web traffic to cover exploits like XSS and SQL injection and also help build resiliency against DDoS by mitigating HTTP GET or POST floods
    • WAF provides a lot of features like OWASP Top 10, HTTP rate limiting, Whitelist or blacklist, inspect and identify requests with abnormal patterns, CAPTCHA etc
    • a WAF sandwich pattern can be implemented where an autoscaled WAF sits between the Internet and Internal Load Balancer

AWS Secrets Manager

  • helps protect secrets needed to access applications, services, and IT resources.
  • enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle.
  • secure secrets by encrypting them with encryption keys managed using AWS KMS.
  • offers native secret rotation with built-in integration for RDS, Redshift, and DocumentDB.
  • supports Lambda functions to extend secret rotation to other types of secrets, including API keys and OAuth tokens.
  • supports IAM and resource-based policies for fine-grained access control to secrets and centralized secret rotation audit for resources in the AWS Cloud, third-party services, and on-premises.
  • enables secret replication in multiple AWS regions to support multi-region applications and disaster recovery scenarios.
  • supports private access using VPC Interface endpoints

AWS Shield

  • is a managed service that provides protection against Distributed Denial of Service (DDoS) attacks for applications running on AWS
  • provides protection for all AWS customers against common and most frequently occurring infrastructure (layer 3 and 4) attacks like SYN/UDP floods, reflection attacks, and others to support high availability of applications on AWS.
  • provides AWS Shield Advanced with additional protections against more sophisticated and larger attacks for applications running on EC2, ELB, CloudFront, AWS Global Accelerator, and Route 53.

AWS GuardDuty

  • offers threat detection that enables continuous monitoring and protects the AWS accounts and workloads.
  • is a Regional service
  • analyzes continuous streams of meta-data generated from AWS accounts and network activity found in AWS CloudTrail Events, EKS audit logs, VPC Flow Logs, and DNS Logs.
  • integrated threat intelligence
  • combines machine learning, anomaly detection, network monitoring, and malicious file discovery, utilizing both AWS-developed and industry-leading third-party sources to help protect workloads and data on AWS
  • supports suppression rules, trusted IP lists, and thread lists.
  • provides Malware Protection to detect malicious files on EBS volumes
  • operates completely independently from the resources so there is no risk of performance or availability impacts on the workloads.

Amazon Inspector

  • is a vulnerability management service that continuously scans the AWS workloads for vulnerabilities
  • automatically discovers and scans EC2 instances and container images residing in Elastic Container Registry (ECR) for software vulnerabilities and unintended network exposure.
  • creates a finding, when a software vulnerability or network issue is discovered, that describes the vulnerability, rates its severity, identifies the affected resource,  and provides remediation guidance.
  • is a Regional service.
  • requires Systems Manager (SSM) agent to be installed and enabled.

Amazon Detective

  • helps analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities.
  • automatically collects log data from the AWS resources and uses machine learning, statistical analysis, and graph theory to build a linked set of data to easily conduct faster and more efficient security investigations.
  • enables customers to view summaries and analytical data associated with CloudTrail logs, EKS audit logs, VPC Flow Logs.
  • provides detailed summaries, analysis, and visualizations of the behaviors and interactions amongst your AWS accounts, EC2 instances, AWS users, roles, and IP addresses.
  • maintains up to a year of aggregated data
  • is a Regional service and needs to be enabled on a region-by-region basis.
  • is a multi-account service that aggregates data from monitored member accounts under a single administrative account within the same region.
  • has no impact on the performance or availability of the AWS infrastructure since it retrieves the log data and findings directly from the AWS services.

AWS Security Hub

  • a cloud security posture management service that performs security best practice checks, aggregates alerts, and enables automated remediation.
  • collects security data from across AWS accounts, services, and supported third-party partner products and helps you analyze your security trends and identify the highest priority security issues.
  • is Regional abut supports cross-region aggregation of findings.
  • automatically runs continuous, account-level configuration and security checks based on AWS best practices and industry standards which include CIS Foundations, PCI DSS.
  • consolidates the security findings across accounts and provider products and displays results on the Security Hub console.
  • supports integration with Amazon EventBridge. Custom actions can be defined when a finding is received.
  • has multi-account management through AWS Organizations integration, which allows delegating an administrator account for the organization.
  • works with AWS Config to perform most of its security checks for controls

AWS Artifact

  • is a self-service audit artifact retrieval portal that provides customers with on-demand access to AWS’ compliance documentation and agreements
  • can use AWS Artifact Reports to download AWS security and compliance documents, such as AWS ISO certifications, Payment Card Industry (PCI), and System and Organization Control (SOC) reports.

References

AWS_Security_Products

AWS Security Hub

AWS Security Hub

  • AWS Security Hub is a cloud security posture management service that performs security best practice checks, aggregates alerts, and enables automated remediation.
  • collects security data from across AWS accounts, services, and supported third-party partner products and helps you analyze your security trends and identify the highest priority security issues.
  • is Regional and only receives and processes findings from the Region where the Security Hub is enabled. But, it supports cross-region aggregation of findings via designation of an aggregator region. Security Hub in each region must be enabled to view findings in that region.
  • automatically runs continuous, account-level configuration and security checks based on AWS best practices and industry standards which include
    • CIS AWS Foundations
    • Payment Card Industry Data Security Standard (PCI DSS)
    • AWS Foundational Security Best Practices
  • can consume, aggregate, organize and prioritize findings from
    • AWS services like
    • other supported third-party partner products.
  • consolidates the security findings across accounts and provider products and displays results on the Security Hub console.
  • supports integration with Amazon EventBridge. Custom actions can be defined when a finding is received.
  • only detects and consolidates findings that are generated after the Security Hub is enabled.
  • has multi-account management through AWS Organizations integration, which allows delegating an administrator account for the organization.
  • uses service-linked AWS Config rules to perform most of its security checks for controls. AWS Config must be enabled on all accounts – both the administrator account and member accounts – in each Region where Security Hub is enabled.
  • works with a service-linked role named AWSServiceRoleForSecurityHub which includes the permissions and trust policy to do the following:
    • Detect and aggregate findings from Amazon GuardDuty, Amazon Inspector, and Amazon Macie
    • Configure the requisite AWS Config infrastructure to run security checks for the supported standards

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A security engineer has been asked to continuously monitor the company’s AWS account using automated compliance checks based on AWS best practices and Center for Internet Security (CIS) AWS Foundations Benchmarks. How can the security engineer accomplish this using AWS services?
    1. AWS Config + AWS Security Hub
    2. Amazon Inspector + AWS GuardDuty
    3. Amazon Inspector + AWS Shield
    4. AWS Config + Amazon Inspector

References

AWS_Security_Hub

Amazon Detective

Amazon Detective

  • Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities.
  • automatically collects log data from the AWS resources and uses machine learning, statistical analysis, and graph theory to build a linked set of data to easily conduct faster and more efficient security investigations.
  • enables customers to view summaries and analytical data associated with CloudTrail logs, EKS audit logs, VPC Flow Logs.
  • provides detailed summaries, analysis, and visualizations of the behaviors and interactions amongst your AWS accounts, EC2 instances, AWS users, roles, and IP addresses.
  • maintains up to a year of aggregated data and makes it easily available through a set of visualizations that shows changes in the type and volume of activity over a selected time window, and links those changes to security findings.
  • is a Regional service and needs to be enabled on a region-by-region basis. This ensures all data analyzed is regionally based and doesn’t cross AWS regional boundaries.
  • requires Amazon GuardDuty to be enabled on the accounts for at least 48 hours before you enable Detective on those accounts.
  • is a multi-account service that aggregates data from monitored member accounts under a single administrative account within the same region.
  • Multi-account monitoring deployments can be configured in the same way it is configured for administrative and member accounts in Amazon GuardDuty and AWS Security Hub.
  • has no impact on the performance or availability of the AWS infrastructure since it retrieves the log data and findings directly from the AWS services.

Amazon Detective vs GuardDuty

  • Amazon GuardDuty is a threat detection service that continuously monitors malicious activity and unauthorized behavior to protect AWS accounts and workloads.
  • Amazon Detective simplifies the process of investigating security findings and identifying the root cause. It automatically creates a graph model that provides you with a unified, interactive view of your resources, users, and the interactions between them over time.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

 

References

Amazon_Detective

AWS Certified Solutions Architect – Associate SAA-C03 Exam Learning Path

AWS Solutions Architect - Associate Certificate

AWS Certified Solutions Architect – Associate SAA-C03 Exam Learning Path

  • AWS Solutions Architect – Associate SAA-C03 exam is the latest AWS exam released on 30th August 2022 and has replaced the previous AWS Solutions Architect – SAA-C02 certification exam.
  • It basically validates the ability to effectively demonstrate knowledge of how to design, architect and deploy secure, cost-effective, and robust applications on AWS technologies
  • The exam also validates a candidate’s ability to complete the following tasks:
    • Design solutions that incorporate AWS services to meet current business requirements and future projected needs
    • Design architectures that are secure, resilient, high-performing, and cost-optimized
    • Review existing solutions and determine improvements

Refer AWS Solutions Architect – Associate SAA-C03 Exam Guide 

AWS Solutions Architect – Associate SAA-C03 Exam Summary

  • SAA-C03 exam consists of 65 questions in 170 minutes, and the time is more than sufficient if you are well prepared.
  • You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams, but is helpful for Professional and Specialty ones.
  • SAA-C03 Exam covers the design and architecture aspects in deep, so you must be able to visualize the architecture, even draw them out or prepare a mental picture just to understand how it would work and how different services relate.
  • AWS SAA-C03 exam concepts cover solutions that fall within AWS Well-Architected framework to cover scalable, highly available, cost-effective, performant, and resilient pillars.
  • If you had been preparing for the SAA-C02, SAA-C03 is pretty much similar to SAA-C02 except for the addition of some new services Aurora Serverless, AWS Global Accelerator, FSx for Windows, and FSx for Lustre.
  • AWS exams are available online, and I took the online one. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Solutions Architect – Associate SAA-C03 Exam Resources

AWS Solutions Architect – Associate SAA-C03 Exam Topics

Networking

  • Virtual Private Network – VPC
    • Create a VPC from scratch with public, private, and dedicated subnets with proper route tables, security groups, and NACLs.
    • Understand what a CIDR is and address patterns.
    • Subnets are public or private depending on whether they can route traffic directly through an Internet gateway
    • Understand how communication happens between the Internet, Public subnets, Private subnets, NAT, Bastion, etc.
    • Bastion (also referred to as a Jump server) can be used to securely access instances in the private subnets.
    • Create two-tier architecture with application in public and database in private subnets
    • Create three-tier architecture with web servers in public, application, and database servers in private. (hint: focus on security group configuration with least privilege)
  • Security Groups and NACLs
    • Security Groups are Stateful vs NACLs are stateless.
    • Also, only NACLs provide the ability to deny or block IPs
  • NAT Gateway or Instances
    • help enables instances in a private subnet to connect to the Internet.
    • Understand the difference between NAT Gateway & NAT Instance. 
    • NAT Gateway is AWS-managed and is scalable and highly available.
  • VPC endpoints
    • enable the creation of a private connection between VPC to supported AWS services and VPC endpoint services powered by PrivateLink using its private IP address without needing an Internet or NAT Gateway.
    • VPC Gateway Endpoints supports S3 and DynamoDB.
    • VPC Interface Endpoints OR Private Links supports others
  • VPN and Direct Connect for on-premises to AWS connectivity
    • VPN provides a quick, cost-effective, secure channel, however, routes through the internet and does not provide consistent throughput
    • Direct Connect provides consistent, dedicated throughput without Internet, however, requires time to set up and is not cost-effective.
  • Understand Data Migration techniques at a high level
    • VPN and Direct Connect for continuous, frequent data transfers.
    • Snow Family is ideal for one-time, cost-effective huge data transfer.
    • Choose a technique depending on the available bandwidth, data transfer needed, time available, encryption, one-time or continuous.
  • CloudFront
    • fully managed, fast CDN service that speeds up the distribution of static, dynamic web, or streaming content to end-users
    • S3 frontend by CloudFront provides low latency, performant experience for global users.
    • provides static and dynamic caching for both AWS and on-premises origin.
  • Global Accelerator
    • optimizes the path to applications to keep packet loss, jitter, and latency consistently low.
    • helps improve the performance by lowering first-byte latency
    • provides 2 static IP address
  • Know CloudFront vs Global Accelerator
  • Route 53
    • highly available and scalable DNS web service.
    • Health checks and failover routing helps provide resilient and active-passive solutions
    • Route 53 Routing Policies and their use cases (hint: focus on weighted, latency, geolocation, failover routing)
  • Elastic Load Balancer
    • Focus on ALB and NLB
    • Differences between ALB vs NLB
      • ALB is layer 7 vs NLB is layer 4
      • ALB provides content-based, host-based, path-based routing
      • ALB provides dynamic port mapping which allows the same tasks to be hosted on the ECS node
      • NLB provides low latency, the ability to scale rapidly, and a static IP address
      • ALB works with WAF while NLB does not.
    • Gateway Load Balancer – GWLB
      • helps deploy, scale, and manage virtual appliances, such as firewalls, IDS/IPS, and deep packet inspection systems.

Security

  • Identity Access Management – IAM
    • IAM role
      • provides permissions that are not associated with a particular user, group, or service and are intended to be assumable by anyone who needs it.
      • can be used for EC2 application access and Cross-account access
    • IAM identity providers and federation and use cases – Although did not see much in SAA-C03
  • Key Management Services – KMS encryption service
  • AWS WAF
    • integrates with CloudFront, and ALB to provide protection against Cross-site scripting (XSS), and SQL injection attacks.
    • provides IP blocking and geo-protection, rate limiting, etc.
  • AWS Shield
    • managed DDoS protection service
    • integrates with CloudFront, ALB, and Route 53
    • Advanced provides additional detection and mitigation against large and sophisticated DDoS attacks, near real-time visibility into attacks
  • AWS GuardDuty
    • managed threat detection service and provides Malware protection
  • AWS Inspector
    • is a vulnerability management service that continuously scans the AWS workloads for vulnerabilities
  • AWS Secrets Manager
    • helps protect secrets needed to access applications, services, and IT resources.
    • supports rotations of secrets, which Systems Manager Parameter Stores does not support.
  • Disaster Recovery whitepaper
    • Be sure you know the different recovery types with impact on RTO/RPO.

Storage

  • Understand various storage options S3, EBS, Instance store, EFS, Glacier, FSx, and what are the use cases and anti-patterns for each
  • Instance Store
    •  is physically attached  to the EC2 instance and provides the lowest latency and highest IOPS
  • Elastic Block Storage – EBS
    • EBS volume types and their use cases in terms of IOPS and throughput. SSD for IOPS and HDD for throughput
    • EBS Snapshots
      • Backups are automated, snapshots are manual
      • Can be used to encrypt an unencrypted EBS volume
    • Multi-Attach EBS feature allows attaching an EBS volume to multiple instances within the same AZ only.
    • EBS fast snapshot restore feature helps ensure that the EBS volumes created from a snapshot are fully-initialized at creation and instantly deliver all of their provisioned performance.
  • Simple Storage Service – S3
    • S3 storage classes with lifecycle policies
      • Understand the difference between SA Standard vs SA IA vs SA IA One Zone in terms of cost and durability
    • S3 Data Protection
      • S3 Client-side encryption encrypts data before storing it in S3
    • S3 features including
      • S3 provides cost-effective static website hosting. Can be integrated with CloudFront.
      • S3 versioning provides protection against accidental overwrites and deletions. Used with MFA Delete feature.
      • S3 Pre-Signed URLs for both upload and download provide access without needing AWS credentials
      • S3 CORS allows cross-domain calls
      • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket.
      • S3 Event Notifications to trigger events on various S3 events like objects added or deleted. Supports SQS, SNS, and Lambda functions.
      • Integrates with Amazon Macie to detect PII data
      • Replication that supports the same and cross-region replication required versioning to be enabled.
      • Integrates with Athena to analyze data in S3 using standard SQL.
  • Glacier
    • as archival storage with various retrieval patterns
    • Glacier Instant Retrieval allows retrieval in milliseconds. 
    • Glacier Expedited retrieval allows object retrieval within mins.
  • Storage gateway and its different types.
    • Cached Volume Gateway provides access to frequently accessed data while using AWS as the actual storage
    • Stored Volume gateway uses AWS as a backup, while the data is being stored on-premises as well
    • File Gateway supports SMB protocol
  • FSx is easy and cost-effective to launch and run popular file systems.
    • FSx provides two file systems to choose from:
    • Amazon FSx for Windows File Server
      • works with both Linux and Windows
      • provides Windows File System features including integration with Active Directory.
    • Amazon FSx for Lustre
      • for high-performance workloads
      • works with only Linux
  • Elastic File System – EFS
    • simple, fully managed, scalable, serverless, and cost-optimized file storage for use with AWS Cloud and on-premises resources.
    • provides shared volume across multiple EC2 instances, while EBS can be attached to a single instance within the same AZ or EBS Multi-Attach can be attached to multiple instances within the same AZ
    • supports the NFS protocol, and is compatible with Linux-based AMIs
    • supports cross-region replication, storage classes for cost.
  • AWS Transfer Family
    • secure transfer service that helps transfer files into and out of AWS storage services using FTP, SFTP and FTPS protocol.
  • Difference between EBS vs S3 vs EFS
  • Difference between EBS vs Instance Store
  • Would recommend referring Storage Options whitepaper, although a bit dated 90% still holds right

Compute

  • Elastic Cloud Compute – EC2
  • Auto Scaling and ELB
    • Auto Scaling provides the ability to ensure a correct number of EC2 instances are always running to handle the load of the application
    • Elastic Load Balancer allows the incoming traffic to be distributed automatically across multiple healthy EC2 instances
  • Autoscaling & ELB
    • work together to provide High Availability and Scalability.
    • Span both ELB and Auto Scaling across Multi-AZs to provide High Availability
    • Do not span across regions. Use Route 53 or Global Accelerator to route traffic across regions.
  • EC2 Instance Purchase Types – Reserved, Scheduled Reserved, On-demand, and Spot and their use cases
    • Reserved instances provide cost benefits for long terms requirements over On-demand instances for continuous persistent load
    • Scheduled Reserved Instances for load with fixed scheduled and time interval
    • Spot instances provide cost benefits for temporary, fault-tolerant, spiky load
  • EC2 Placement Groups
    • Cluster placement groups provide low latency and high throughput communication
    • Spread placement group provides high availability
  • Lambda and serverless architecture, its features, and use cases.
    • Lambda integrated with API Gateway to provide a serverless, highly scalable, cost-effective architecture
  • Elastic Container Service – ECS with its ability to deploy containers and microservices architecture.
    • ECS role for tasks can be provided through taskRoleArn
    • ALB provides dynamic port mapping to allow multiple same tasks on the same node.
  • Elastic Kubernetes Service – EKS
    • managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers
    • ideal for migration of an existing workload on Kubernetes
  • Elastic Beanstalk at a high level, what it provides, and its ability to get an application running quickly.

Databases

  • Understand relational and NoSQL data storage options which include RDS, DynamoDB, and Aurora with their use cases
  • Relational Database Service – RDS
    • Read Replicas vs Multi-AZ
      • Read Replicas for scalability, Multi-AZ for High Availability
      • Multi-AZ are regional only
      • Read Replicas can span across regions and can be used for disaster recovery
    • Understand Automated Backups, underlying volume types (which are the same as EBS volume types)
  • Aurora
    • provides multiple read replicas and replicates 6 copies of data across AZs
    • Aurora Serverless
      • provides a highly scalable cost-effective database solution
      • automatically starts up, shuts down, and scales capacity up or down based on the application’s needs.
      • supports only MySQL and PostgreSQL
  • DynamoDB
    • provides low latency performance, a key-value store
    • is not a relational database
    • DynamoDB DAX provides caching for DynamoDB
    • DynamoDB TTL helps expire data in DynamoDB without any cost or consuming any write throughput.
  • ElastiCache use cases, mainly for caching performance

Integration Tools

  • Simple Queue Service
    • as message queuing service and SNS as pub/sub notification service
    • as a decoupling service and provide resiliency
    • SQS features like visibility, and long poll vs short poll
    • provide scaling for the Auto Scaling group based on the SQS size.
    • SQS Standard vs SQS FIFO difference
      • FIFO provides exactly-once delivery both low throughput
  • Simple Notification Service – SNS
    • is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients
    • Fanout pattern can be used to push messages to multiple subscribers

Analytics

  • Redshift as a business intelligence tool
  • Kinesis
    • for real-time data capture and analytics.
    • Integrates with Lambda functions to perform transformations
  • AWS Glue
    • fully-managed, ETL service that automates the time-consuming steps of data preparation for analytics

Management Tools

  • CloudWatch
    • monitoring to provide operational transparency
    • is extendable with custom metrics
    • CloudWatch -> (Subscription filter) -> Kinesis Data Firehose -> S3
  • CloudTrail
    • helps enable governance, compliance, and operational and risk auditing of the AWS account.
    • helps to get a history of AWS API calls and related events for the AWS account.
  • CloudFormation
    • easy way to create and manage a collection of related AWS resources, and provision and update them in an orderly and predictable fashion.
  • AWS Config
    • fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security, compliance, and governance.

AWS Whitepapers & Cheat sheets

Finally, All the Best 🙂

AWS S3 Glacier

AWS S3 Glacier

  • S3 Glacier is a storage service optimized for archival, infrequently used data, or “cold data.”
  • S3 Glacier is an extremely secure, durable, and low-cost storage service for data archiving and long-term backup.
  • S3 Glacier is designed to provide average annual durability of 99.999999999% for an archive.
  • S3 Glacier redundantly stores data in multiple facilities and on multiple devices within each facility.
  • To increase durability, Glacier synchronously stores the data across multiple facilities before returning SUCCESS on uploading archives.
  • Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.
  • Glacier enables customers to offload the administrative burdens of operating and scaling storage to AWS, without having to worry about capacity planning, hardware provisioning, data replication, hardware failure detection, and recovery, or time-consuming hardware migrations.
  • Glacier is a great storage choice when low storage cost is paramount, with data rarely retrieved, and retrieval latency of several hours is acceptable.
  • Glacier now offers a range of data retrievals options where the retrieval time varies from hours to 1-5 minutes
  • S3 should be used if applications require fast, frequent real-time access to the data
  • S3 Glacier can store virtually any kind of data in any format.
  • Glacier allows interaction through AWS Management Console, Command Line Interface CLI and SDKs or REST based APIs.
    • management console can only be used to create and delete vaults.
    • rest of the operations to upload, download data, create jobs for retrieval need CLI, SDK or REST based APIs
  • Use cases include
    • Digital media archives
    • Data that must be retained for regulatory compliance
    • Financial and healthcare records
    • Raw genomic sequence data
    • Long-term database backups

S3 Glacier Data Model

  • Glacier data model core concepts include vaults and archives and also includes job and notification configuration resources
    • Vault
      • A vault is a container for storing archives
      • Each vault resource has a unique address, which comprises of the region the vault was created and the unique vault name within the region and account for e.g. https://glacier.us-west-2.amazonaws.com/111122223333/vaults/examplevault
      • Vault allows storage of an unlimited number of archives
      • Glacier supports various vault operations which are region-specific
      • An AWS account can create up to 1,000 vaults per region.
    • Archive
      • An archive can be any data such as a photo, video, or document and is a base unit of storage in Glacier.
      • Each archive has a unique ID and an optional description, which can only be specified during the upload of an archive.
      • Glacier assigns the archive an ID, which is unique in the AWS region in which it is stored.
      • An archive can be uploaded in a single request. While for large archives, Glacier provides a multipart upload API that enables uploading an archive in parts.
      • An Archive can be up to 40TB.
    • Jobs
      • A Job is required to retrieve an Archive and vault inventory list
      • Data retrieval requests are asynchronous operations, are queued and most jobs take about four hours to complete.
      • A job is first initiated and then the output of the job is downloaded after the job is completes
      • Vault inventory jobs need the vault name
      • Data retrieval jobs need both the vault name and the archive id, with an optional description
      • A vault can have multiple jobs in progress at any point in time and can be identified by Job ID, assigned when is it created for tracking
      • Glacier maintains job information such as job type, description, creation date, completion date, and job status and can be queried
      • After the job completes, the job output can be downloaded in full or partially by specifying a byte range.
    • Notification Configuration
      • As the jobs are asynchronous, Glacier supports a notification mechanism to an SNS topic when the job completes
      • SNS topic for notification can either be specified with each individual job request or with the vault
      • Glacier stores the notification configuration as a JSON document

S3 Glacier Data Retrievals Options

Glacier provides three options for retrieving data with varying access times and costs: Expedited, Standard, and Bulk retrievals.

Expedited Retrievals

  • Expedited retrievals allow quick access to the data when occasional urgent requests for a subset of archives are required.
  • For all but the largest archives (250MB+), data accessed using Expedited retrievals are typically made available within 1-5 minutes.
  • There are two types of Expedited retrievals: On-Demand and Provisioned.
    • On-Demand requests are like EC2 On-Demand instances and are available the vast majority of the time.
    • Provisioned requests are guaranteed to be available when needed

Standard Retrievals

  • Standard retrievals allow access to any of the archives within several hours.
  • Standard retrievals typically complete within 3-5 hours.

Bulk Retrievals

  • Bulk retrievals are Glacier’s lowest-cost retrieval option, enabling retrieval of large amounts, even petabytes, of data inexpensively in a day.
  • Bulk retrievals typically complete within 5-12 hours.

Glacier Supported Operations

Vault Operations

  • Glacier provides operations to create and delete vaults.
  • A vault can be deleted only if there are no archives in the vault as of the last computed inventory and there have been no writes to the vault since the last inventory (as the inventory is prepared periodically)
  • Vault Inventory
    • Vault inventory helps retrieve a list of archives in a vault with information such as archive ID, creation date, and size for each archive
    • Inventory for each vault is prepared periodically, every 24 hours
    • Vault inventory is updated approximately once a day, starting on the day the first archive is uploaded to the vault.
    • When a vault inventory job is, Glacier returns the last inventory it generated, which is a point-in-time snapshot and not real-time data.
  • Vault Metadata or Description can also be obtained for a specific vault or for all vaults in a region, which provides information such as
    • creation date,
    • number of archives in the vault,
    • total size in bytes used by all the archives in the vault,
    • and the date the vault inventory was generated
  • S3 Glacier also provides operations to set, retrieve, and delete a notification configuration on the vault. Notifications can be used to identify vault events.

Archive Operations

  • S3 Glacier provides operations to upload, download and delete archives.
  • All archive operations must either be done using AWS CLI or SDK. It cannot be done using AWS Management Console.
  • An existing archive cannot be updated, it has to be deleted and uploaded.

Uploading an Archive

  • An archive can be uploaded in a single operation (1 byte to up to 4 GB in size ) or in parts referred to as Multipart upload (40 TB)
  • Multipart Upload helps to
    • improve the upload experience for larger archives.
    • upload archives in parts, independently, parallelly and in any order
    • faster recovery by needing to upload only the part that failed upload and not the entire archive.
    • upload archives without even knowing the size
    • upload archives from 1 byte to about 40,000 GB (10,000 parts * 4 GB) in size
  • To upload existing data to Glacier, consider using the AWS Import/Export Snowball service, which accelerates moving large amounts of data into and out of AWS using portable storage devices for transport. AWS transfers the data directly onto and off of storage devices using Amazon’s high-speed internal network, bypassing the Internet.
  • Glacier returns a response that includes an archive ID which is unique in the region in which the archive is stored
  • Glacier does not support any additional metadata information apart from an optional description. Any additional metadata information required should be maintained at the client side.

Downloading an Archive

  • Downloading an archive is an asynchronous operation and is the 2 step process
    • Initiate an archive retrieval job
      • When a Job is initiated, a job ID is returned as a part of the response.
      • Job is executed asynchronously and the output can be downloaded after the job completes.
      • A job can be initiated to download the entire archive or a portion of the archive.
    • After the job completes, download the bytes
      • An archive can be downloaded as all the bytes or specific byte range to download only a portion of the output
      • Downloading the archive in chunks helps in the event of the download failure, as only that part needs to be downloaded
      • Job completion status can be checked by
        • Check status explicitly (Not Recommended)
          • periodically poll the describe job operation request to obtain job information
        • Completion notification
          • An SNS topic can be specified, when the job is initiated or with the vault, to be used to notify job completion

About Range Retrievals

  • S3 Glacier allows retrieving an archive either in whole (default) or a range, or portion.
  • Range retrievals need a range to be provided that is megabyte aligned.
  • Glacier returns checksum in the response which can be used to verify if any errors in download by comparing with the checksum computed on the client-side.
  • Specifying a range of bytes can be helpful when:
    • Control bandwidth costs
      • Glacier allows retrieval of up to 5 percent of the average monthly storage (pro-rated daily) for free each month
      • Scheduling range retrievals can help in two ways.
        • meet the monthly free allowance of 5 percent by spreading out the data requested
        • if the amount of data retrieved doesn’t meet the free allowance percentage, scheduling range retrievals enables reduction of peak retrieval rate, which determines the retrieval fees.
    • Manage your data downloads
      • Glacier allows retrieved data to be downloaded for 24 hours after the retrieval request completes
      • Only portions of the archive can be retrieved so that the schedule of downloads can be managed within the given download window.
    • Retrieve a targeted part of a large archive
      • Retrieving an archive in the range can be useful if an archive is uploaded as an aggregate of multiple individual files, and only a few files need to be retrieved

Deleting an Archive

  • An archive can be deleted from the vault only one at a time
  • This operation is idempotent. Deleting an already-deleted archive does not result in an error
  • AWS applies a pro-rated charge for items that are deleted prior to 90 days, as it is meant for long term storage

Updating an Archive

  • An existing archive cannot be updated and must be deleted and re-uploaded, which would be assigned a new archive id

S3 Glacier Vault Lock

  • S3 Glacier Vault Lock helps deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy.
  • Specify controls such as “write once read many” (WORM) can be enforced using a vault lock policy and the policy can be locked for future edits.
  • Once locked, the policy can no longer be changed.

S3 Glacier Security

  • S3 Glacier supports data in transit encryption using Secure Sockets Layer (SSL) or client-side encryption.
  • All data is encrypted on the server side with Glacier handling key management and key protection. It uses AES-256, one of the strongest block ciphers available
  • Security and compliance of S3 Glacier is assessed by third-party auditors as part of multiple AWS compliance programs including SOC, HIPAA, PCI DSS, FedRAMP etc.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What is Amazon Glacier?
    1. You mean Amazon “Iceberg”: it’s a low-cost storage service.
    2. A security tool that allows to “freeze” an EBS volume and perform computer forensics on it.
    3. A low-cost storage service that provides secure and durable storage for data archiving and backup
    4. It’s a security tool that allows to “freeze” an EC2 instance and perform computer forensics on it.
  2. Amazon Glacier is designed for: (Choose 2 answers)
    1. Active database storage
    2. Infrequently accessed data
    3. Data archives
    4. Frequently accessed data
    5. Cached session data
  3. An organization is generating digital policy files which are required by the admins for verification. Once the files are verified they may not be required in the future unless there is some compliance issue. If the organization wants to save them in a cost effective way, which is the best possible solution?
    1. AWS RRS
    2. AWS S3
    3. AWS RDS
    4. AWS Glacier
  4. A user has moved an object to Glacier using the life cycle rules. The user requests to restore the archive after 6 months. When the restore request is completed the user accesses that archive. Which of the below mentioned statements is not true in this condition?
    1. The archive will be available as an object for the duration specified by the user during the restoration request
    2. The restored object’s storage class will be RRS (After the object is restored the storage class still remains GLACIER. Read more)
    3. The user can modify the restoration period only by issuing a new restore request with the updated period
    4. The user needs to pay storage for both RRS (restored) and Glacier (Archive) Rates
  5. To meet regulatory requirements, a pharmaceuticals company needs to archive data after a drug trial test is concluded. Each drug trial test may generate up to several thousands of files, with compressed file sizes ranging from 1 byte to 100MB. Once archived, data rarely needs to be restored, and on the rare occasion when restoration is needed, the company has 24 hours to restore specific files that match certain metadata. Searches must be possible by numeric file ID, drug name, participant names, date ranges, and other metadata. Which is the most cost-effective architectural approach that can meet the requirements?
    1. Store individual files in Amazon Glacier, using the file ID as the archive name. When restoring data, query the Amazon Glacier vault for files matching the search criteria. (Individual files are expensive and does not allow searching by participant names etc)
    2. Store individual files in Amazon S3, and store search metadata in an Amazon Relational Database Service (RDS) multi-AZ database. Create a lifecycle rule to move the data to Amazon Glacier after a certain number of days. When restoring data, query the Amazon RDS database for files matching the search criteria, and move the files matching the search criteria back to S3 Standard class. (As the data is not needed can be stored to Glacier directly and the data need not be moved back to S3 standard)
    3. Store individual files in Amazon Glacier, and store the search metadata in an Amazon RDS multi-AZ database. When restoring data, query the Amazon RDS database for files matching the search criteria, and retrieve the archive name that matches the file ID returned from the database query. (Individual files and Multi-AZ is expensive)
    4. First, compress and then concatenate all files for a completed drug trial test into a single Amazon Glacier archive. Store the associated byte ranges for the compressed files along with other search metadata in an Amazon RDS database with regular snapshotting. When restoring data, query the database for files that match the search criteria, and create restored files from the retrieved byte ranges.
    5. Store individual compressed files and search metadata in Amazon Simple Storage Service (S3). Create a lifecycle rule to move the data to Amazon Glacier, after a certain number of days. When restoring data, query the Amazon S3 bucket for files matching the search criteria, and retrieve the file to S3 reduced redundancy in order to move it back to S3 Standard class. (Once the data is moved from S3 to Glacier the metadata is lost, as Glacier does not have metadata and must be maintained externally)
  6. A user is uploading archives to Glacier. The user is trying to understand key Glacier resources. Which of the below mentioned options is not a Glacier resource?
    1. Notification configuration
    2. Archive ID
    3. Job
    4. Archive

References

AWS Simple Email Service – SES

AWS Simple Email Service – SES

  • SES is a fully managed service that provides an email platform with an easy, cost-effective way to send and receive email using your own email addresses and domains.
  • can be used to send both transactional and promotional emails securely, and globally at scale.
  • acts as an outbound email server and eliminates the need to support its own software or applications to do the heavy lifting of email transport.
  • acts as an inbound email server to receive emails that can help develop software solutions such as email autoresponders, email unsubscribe systems, and applications that generate customer support tickets from incoming emails.
  • existing email server can also be configured to send outgoing emails through SES with no change in any settings in the email clients
  • Maximum message size including attachments is 10 MB per message (after base64 encoding).
  • integrated with CloudWatch and CloudTrail

SES Characteristics

  • Compatible with SMTP
  • Applications can send email using a single API call in many supported languages Java, .Net, PHP, Perl, Ruby, HTTPS, etc
  • Optimized for the highest levels of uptime, availability, and scales as per the demand
  • Provides sandbox environment for testing
  • provides Reputation dashboard, performance insights, anti-spam feedback
  • provides statistics on email deliveries, bounces, feedback loop results, emails opened, etc.
  • supports DomainKeys Identified Mail (DKIM) and Sender Policy Framework (SPF)
  • supports flexible deployment: shared, dedicated, and customer-owned IPs
  • supports attachments with many popular content formats, including documents, images, audio, and video, and scans every attachment for viruses and malware.
  • integrates with KMS to provide the ability to encrypt the mail that it writes to the S3 bucket.
  • uses client-side encryption to encrypt the mail before it sends the email to  S3.

Sending Limits

  • Production SES has a set of sending limits which include
    • Sending Quota – max number of emails in a 24-hour period
    • Maximum Send Rate – max number of emails per second
  • SES automatically adjusts the limits upward as long as emails are of high quality and they are sent in a controlled manner, as any spike in the email sent might be considered to be spam.
  • Limits can also be raised by submitting a Quota increase request

SES Best Practices

  • Send high-quality and real production content that the recipients want
  • Only send to those who have signed up for the mail
  • Unsubscribe recipients who have not interacted with the business recently
  • Have low bounce and compliant rates and remove bounced or complained addresses, using SNS to monitor bounces and complaints, treating them as an opt-out
  • Monitor the sending activity

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon SES stand for?
    1. Simple Elastic Server
    2. Simple Email Service
    3. Software Email Solution
    4. Software Enabled Server
  2. Your startup wants to implement an order fulfillment process for selling a personalized gadget that needs an average of 3-4 days to produce with some orders taking up to 6 months you expect 10 orders per day on your first day. 1000 orders per day after 6 months and 10,000 orders after 12 months. Orders coming in are checked for consistency men dispatched to your manufacturing plant for production quality control packaging shipment and payment processing If the product does not meet the quality standards at any stage of the process employees may force the process to repeat a step Customers are notified via email about order status and any critical issues with their orders such as payment failure. Your case architecture includes AWS Elastic Beanstalk for your website with an RDS MySQL instance for customer data and orders. How can you implement the order fulfillment process while making sure that the emails are delivered reliably? [PROFESSIONAL]
    1. Add a business process management application to your Elastic Beanstalk app servers and re-use the ROS database for tracking order status use one of the Elastic Beanstalk instances to send emails to customers.
    2. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1 Use the decider instance to send emails to customers.
    3. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1 use SES to send emails to customers.
    4. Use an SQS queue to manage all process tasks Use an Auto Scaling group of EC2 Instances that poll the tasks and execute them. Use SES to send emails to customers.

References

AWS Aurora Global Database vs DynamoDB Global Tables

AWS Aurora Global Database vs DynamoDB Global Tables

AWS Aurora Global Database vs DynamoDB Global Tables

AWS Aurora Global Database vs DynamoDB Global Tables

Aurora Global Database

  • Aurora Global Database consists of one primary AWS Region where the data is mastered, and up to five read-only, secondary AWS Regions.
  • Aurora cluster in the primary AWS Region where the data is mastered performs both read and write operations. The clusters in the secondary Regions enable low-latency reads.
  • Aurora replicates data to the secondary AWS Regions with a typical latency of under a second.
  • Secondary clusters can be scaled independently by adding one or more DB instances (Aurora Replicas) to serve read-only workloads.
  • Aurora global database uses dedicated infrastructure to replicate the data, leaving database resources available entirely to serve applications.
  • Applications with a worldwide footprint can use reader instances in the secondary AWS Regions for low-latency reads.
  • Typical cross-region replication takes less than 1 second.
  • In case of a disaster or an outage, one of the clusters in a secondary AWS Region can be promoted to take full read/write workloads in under a min.

DynamoDB Global Tables

  • DynamoDB Global tables provide a fully managed, multi-Region, and multi-active database that delivers fast, local, read and write performance for massively scaled, global applications.
  • Global tables replicate the DynamoDB tables automatically across the choice of AWS Regions and enable reads and writes on all instances.
  • DynamoDB global table consists of multiple replica tables (one per AWS Region). Every replica has the same table name and the same primary key schema. When an application writes data to a replica table in one Region, DynamoDB propagates the write to the other replica tables in the other AWS Regions automatically.
  • Global tables enable the read and write of data locally providing single-digit-millisecond latency for the globally distributed application at any scale.
  • Global tables enable the applications to stay highly available even in the unlikely event of isolation or degradation of an entire Region.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

AWS RDS Aurora

AWS RDS Aurora

  • AWS Aurora is a relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.
  • is a fully managed, MySQL- and PostgreSQL-compatible, relational database engine i.e. applications developed with MySQL can switch to Aurora with little or no changes
  • delivers up to 5x performance of MySQL and up to 3x performance of PostgreSQL without requiring any changes to most MySQL applications
  • is fully managed as RDS manages the databases, handling time-consuming tasks such as provisioning, patching, backup, recovery, failure detection and repair.
  • can scale storage automatically, based on the database usage, from 10GB to 128TiB in 10GB increments with no impact on database performance

Aurora DB Clusters

AWS Aurora Architecture

  • Aurora DB cluster consists of one or more DB instances and a cluster volume that manages the data for those DB instances.
  • A cluster volume is a virtual database storage volume that spans multiple AZs, with each AZ having a copy of the DB cluster data
  • Two types of DB instances make up an Aurora DB cluster:
    • Primary DB instance
      • Supports read and write operations, and performs all data modifications to the cluster volume.
      • Each DB cluster has one primary DB instance.
    • Aurora Replica
      • Connects to the same storage volume as the primary DB instance and supports only read operations.
      • Each DB cluster can have up to 15 Aurora Replicas in addition to the primary DB instance.
      • Provides high availability by locating Replicas in separate AZs
      • Aurora automatically fails over to a Replica in case the primary DB instance becomes unavailable.
      • Failover priority for Replicas can be specified.
      • Replicas can also offload read workloads from the primary DB instance
  • For Aurora multi-master clusters
    • all DB instances have read/write capability, with no difference between primary and replica.

Aurora Connection Endpoints

  • Aurora involves a cluster of DB instances instead of a single instance
  • Endpoint refers to an intermediate handler with the hostname and port specified to connect to the cluster
  • Aurora uses the endpoint mechanism to abstract these connections

Cluster endpoint

  • Cluster endpoint (or writer endpoint) for a DB cluster connects to the current primary DB instance for that DB cluster.
  • Cluster endpoint is the only one that can perform write operations such as DDL statements as well as read operations
  • Each DB cluster has one cluster endpoint and one primary DB instance
  • Cluster endpoint provides failover support for read/write connections to the DB cluster. If the current primary DB instance of a DB cluster fails, Aurora automatically fails over to a new primary DB instance. During a failover, the DB cluster continues to serve connection requests to the cluster endpoint from the new primary DB instance, with minimal interruption of service.

Reader endpoint

  • Reader endpoint for a DB cluster provides load-balancing support for read-only connections to the DB cluster.
  • Use the reader endpoint for read operations, such as queries.
  • Reader endpoint reduces the overhead on the primary instance by processing the statements on the read-only Replicas.
  • Each DB cluster has one reader endpoint.
  • If the cluster contains one or more Replicas, the reader endpoint load balances each connection request among the Replicas.

Custom endpoint

  • Custom endpoint for a DB cluster represents a set of DB instances that you choose.
  • Aurora performs load balancing and chooses one of the instances in the group to handle the connection.
  • An Aurora DB cluster has no custom endpoints until one is created and up to five custom endpoints can be created for each provisioned cluster.
  • Aurora Serverless clusters do not support custom endpoints.

Instance endpoint

  • An instance endpoint connects to a specific DB instance within a cluster and provides direct control over connections to the DB cluster.
  • Each DB instance in a DB cluster has its own unique instance endpoint. So there is one instance endpoint for the current primary DB instance of the DB cluster, and there is one instance endpoint for each of the Replicas in the DB cluster.

High Availability and Replication

  • Aurora is designed to offer greater than 99.99% availability
  • provides data durability and reliability
    • by replicating the database volume six ways across three Availability Zones in a single region
    • backing up the data continuously to  S3.
  • transparently recovers from physical storage failures; instance failover typically takes less than 30 seconds.
  • automatically fails over to a new primary DB instance, if the primary DB instance fails, by either promoting an existing Replica to a new primary DB instance or creating a new primary DB instance
  • automatically divides the database volume into 10GB segments spread across many disks. Each 10GB chunk of the database volume is replicated six ways, across three Availability Zones
  • RDS databases for e.g. MySQL, Oracle, etc. have the data in a single AZ
  • is designed to transparently handle
    • the loss of up to two copies of data without affecting database write availability and
    • up to three copies without affecting read availability.
  • provides self-healing storage. Data blocks and disks are continuously scanned for errors and repaired automatically.
  • Replicas share the same underlying volume as the primary instance. Updates made by the primary are visible to all Replicas.
  • As Replicas share the same data volume as the primary instance, there is virtually no replication lag.
  • Any Replica can be promoted to become primary without any data loss and therefore can be used for enhancing fault tolerance in the event of a primary DB Instance failure.
  • To increase database availability, 1 to 15 replicas can be created in any of 3 AZs, and RDS will automatically include them in failover primary selection in the event of a database outage.

Aurora Failovers

  • Aurora automatically fails over, if the primary instance in a DB cluster fails, in the following order:
    • If Aurora Read Replicas are available, promote an existing Read Replica to the new primary instance.
    • If no Read Replicas are available, then create a new primary instance.
  • If there are multiple Aurora Read Replicas, the criteria for promotion is based on the priority that is defined for the Read Replicas.
    • Priority numbers can vary from 0 to 15 and can be modified at any time.
    • PostgreSQL promotes the Aurora Replica with the highest priority to the new primary instance.
    • For Read Replicas with the same priority, PostgreSQL promotes the replica that is largest in size or in an arbitrary manner.
  • During the failover, AWS modifies the cluster endpoint to point to the newly created/promoted DB instance.
  • Applications experience a minimal interruption of service if they connect using the cluster endpoint and implement connection retry logic.

Security

  • Aurora uses SSL (AES-256) to secure the connection between the database instance and the application
  • allows database encryption using keys managed through AWS Key Management Service (KMS).
  • Encryption and decryption are handled seamlessly.
  • With encryption, data stored at rest in the underlying storage is encrypted, as are its automated backups, snapshots, and replicas in the same cluster.
  • Encryption of existing unencrypted Aurora instances is not supported. Create a new encrypted Aurora instance and migrate the data

Backup and Restore

  • Automated backups are always enabled on Aurora DB Instances.
  • Backups do not impact database performance.
  • Aurora also allows the creation of manual snapshots
  • Aurora automatically maintains 6 copies of the data across 3 AZs and will automatically attempt to recover the database in a healthy AZ with no data loss
  • If in any case, the data is unavailable within Aurora storage,
    • DB Snapshot can be restored or
    • the point-in-time restore operation can be performed to a new instance. The latest restorable time for a point-in-time restore operation can be up to 5 minutes in the past.
  • Restoring a snapshot creates a new Aurora DB instance
  • Deleting database deletes all the automated backups (with an option to create a final snapshot), but would not remove the manual snapshots.
  • Snapshots (including encrypted ones) can be shared with other AWS accounts

Aurora Parallel Query

  • Aurora Parallel Query refers to the ability to push down and distribute the computational load of a single query across thousands of CPUs in Aurora’s storage layer.
  • Without Parallel Query, a query issued against an Aurora database would be executed wholly within one instance of the database cluster; this would be similar to how most databases operate.
  • Parallel Query is a good fit for analytical workloads requiring fresh data and good query performance, even on large tables.
  • Parallel Query provides the following benefits
    • Faster performance: Parallel Query can speed up analytical queries by up to 2 orders of magnitude.
    • Operational simplicity and data freshness: you can issue a query directly over the current transactional data in your Aurora cluster.
    • Transactional and analytical workloads on the same database: Parallel Query allows Aurora to maintain high transaction throughput alongside concurrent analytical queries.
  • Parallel Query can be enabled and disabled dynamically at both the global and session level using the aurora_pq parameter.
  • Parallel Query is available for the MySQL 5.6-compatible version of Aurora

Aurora Scaling

  • Aurora storage scaling is built-in and will automatically grow, up to 64 TB (soft limit), in 10GB increments with no impact on database performance.
  • There is no need to provision storage in advance
  • Compute Scaling
    • Instance scaling
      • Vertical scaling the master the instance. Memory and CPU resources are modified by changing the DB Instance class.
      • scaling the read replica and promoting it to master using forced failover which provides a minimal downtime
    • Read scaling
      • provides horizontal scaling with up to 15 read replicas
  • Auto Scaling
    • Scaling policies to add read replicas with min and max replica count based on scaling CloudWatch CPU or connections metrics condition

Aurora Backtrack

  • Backtracking “rewinds” the DB cluster to the specified time
  • Backtracking performs in-place restore and does not create a new instance. There is minimal downtime associated with it.
  • Backtracking is available for Aurora with MySQL compatibility
  • Backtracking is not a replacement for backing up the DB cluster so that you can restore it to a point in time.
  • With backtracking, there is a target backtrack window and an actual backtrack window:
    • Target backtrack window is the amount of time you WANT the DB cluster can be backtracked for e.g 24 hours. The limit for a backtrack window is 72 hours.
    • Actual backtrack window is the actual amount of time you CAN backtrack the DB cluster, which can be smaller than the target backtrack window. The actual backtrack window is based on the workload and the storage available for storing information about database changes, called change records
  • DB cluster with backtracking enabled generates change records.
  • Aurora retains change records for the target backtrack window and charges an hourly rate for storing them.
  • Both the target backtrack window and the workload on the DB cluster determine the number of change records stored.
  • Workload is the number of changes made to the DB cluster in a given amount of time. If the workload is heavy, you store more change records in the backtrack window than you do if your workload is light.
  • Backtracking affects the entire DB cluster and can’t selectively backtrack a single table or a single data update.
  • Backtracking provides the following advantages over traditional backup and restore:
    • Undo mistakes – revert destructive action, such as a DELETE without a WHERE clause
    • Backtrack DB cluster quickly – Restoring a DB cluster to a point in time launches a new DB cluster and restores it from backup data or a DB cluster snapshot, which can take hours. Backtracking a DB cluster doesn’t require a new DB cluster and rewinds the DB cluster in minutes.
    • Explore earlier data changes – repeatedly backtrack a DB cluster back and forth in time to help determine when a particular data change occurred

Aurora Serverless

Refer blog post @ Aurora Serverless

Aurora Global Database

  • Aurora global database consists of one primary AWS Region where the data is mastered, and up to five read-only, secondary AWS Regions.
  • Aurora cluster in the primary AWS Region where your data is mastered performs both read and write operations. The clusters in the secondary Regions enable low-latency reads.
  • Aurora replicates data to the secondary AWS Regions with a typical latency of under a second.
  • Secondary clusters can be scaled independently by adding one of more DB instances (Aurora Replicas) to serve read-only workloads.
  • Aurora global database uses dedicated infrastructure to replicate the data, leaving database resources available entirely to serve applications.
  • Applications with a worldwide footprint can use reader instances in the secondary AWS Regions for low-latency reads.
  • In case of a disaster or an outage, one of the cluster in a secondary AWS Regions can be promoted to take full read/write workloads in under a min.

Aurora Clone

  • Aurora cloning feature helps create Aurora cluster duplicates quickly and cost-effectively
  • Creating a clone is faster and more space-efficient than physically copying the data using a different technique such as restoring a snapshot.
  • Aurora cloning uses a copy-on-write protocol.
  • Aurora clone requires only minimal additional space when first created. In the beginning, Aurora maintains a single copy of the data, which is used by both the original and new DB clusters.
  • Aurora allocates new storage only when data changes, either on the source cluster or the cloned cluster.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Company wants to use MySQL compatible relational database with greater performance. Which AWS service can be used?
    1. Aurora
    2. RDS
    3. SimpleDB
    4. DynamoDB
  2. An application requires a highly available relational database with an initial storage capacity of 8 TB. The database will grow by 8 GB every day. To support expected traffic, at least eight read replicas will be required to handle database reads. Which option will meet these requirements?
    1. DynamoDB
    2. Amazon S3
    3. Amazon Aurora
    4. Amazon Redshift
  3. A company is migrating their on-premise 10TB MySQL database to AWS. As a compliance requirement, the company wants to have the data replicated across three availability zones. Which Amazon RDS engine meets the above business requirement?
    1. Use Multi-AZ RDS
    2. Use RDS
    3. Use Aurora
    4. Use DynamoDB

References

AWS_Aurora