AWS WAF

AWS WAF Overview

  • AWS WAF is a web application firewall that helps monitor the HTTP and HTTPS requests forwarded to CloudFront and allowing control access to the content.
  • WAF allows defining conditions for e.g. request originated IP addresses or query strings values, based on which CloudFront responds to requests either with the requested content or with an access denied (HTTP 403)
  • CloudFront can be configured to return a custom error page when a request is blocked.
  • AWS WAF allows the following behaviors:
    • Allow all requests except the ones specified – Useful when  CloudFront serves content for a public website but want to block requests from attackers.
    • Block all requests except the ones specified – Useful when CloudFront serves content for a restricted website whose users can be readily identifiable by properties in web requests for e.g IP addresses the request originate from
    • Count the requests that match the specified properties – allows counting of the requests that match the defined properties, which can be useful when configuring and testing allow or block requests using new properties. After confirming the config did not accidentally block all of the traffic to your website, configuration can be applied to change the behavior to allow or block requests.

WAF Benefits

  • Additional protection against web attacks using specified conditions
  • Conditions can be defined by using characteristics of web requests such as the following:
    • IP addresses that the requests originate from
    • Values in request headers
    • Strings that appear in the requests
    • Length of requests
    • Presence of SQL code that is likely to be malicious (this is known as SQL injection)
    • Presence of a script that is likely to be malicious (this is known as cross-site scripting)
  • Rules that you can reuse for multiple web applications
  • Real-time metrics and sampled web requests
  • Automated administration using the WAF API

WAF Works

WAF allows controlling behavior to web requests by creating conditions, rules, and web access control lists (web ACLs).

Conditions

  • Conditions define basic characteristics to watch for in a web request
    • Malicious script – XSS  (Cross Site Scripting) – Attackers embed scripts that can exploit vulnerabilities in web applications
    • IP addresses or address ranges that requests originate from.
    • Length of specified parts of the request, such as the query string.
    • Malicious SQL – SQL injection – Attackers try to extract data from the database by embedding malicious SQL code in a web request
    • Strings that appear in the request, for e.g., values that appear in the User-Agent header or text strings that appear in the query string.
      Some conditions take multiple values.

Rules

  • Rules are basically Combination of Conditions to precisely target the requests to be allowed or blocked.
    • for e.g., based on recent requests that you’ve seen from an attacker, you might create a rule that includes the following conditions:
      • The requests come from 192.0.2.44.
      • They contain the value BadBot in the User-Agent header.
      • They appear to include malicious SQL code in the query string.
    • When a rule includes all three conditions, WAF looks for requests that match all three conditions—it ANDs the conditions together.

Web ACLs

  • Web ACLs provides
    • Combination of Rules
    • Action – allow, block or count to perform for each rule
    • Default action determines whether WAF allows or blocks a request that does not match all of the conditions in any of the rules
  • WAF can either allow the request to be forwarded to CloudFront or block the request, when a web request matches all of the conditions in a rule
  • WAF compares a request with the rules in a web ACL in the order in which its listed and takes the action that is associated with the first rule that the request matches.

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You’ve been hired to enhance the overall security posture for a very large e-commerce site. They have a well architected multi-tier application running in a VPC that uses ELBs in front of both the web and the app tier with static assets served directly from S3. They are using a combination of RDS and DynamoDB for their dynamic data and then archiving nightly into S3 for further processing with EMR. They are concerned because they found questionable log entries and suspect someone is attempting to gain unauthorized access. Which approach provides a cost effective scalable mitigation to this kind of attack?
    1. Recommend mat they lease space at a DirectConnect partner location and establish a 1G DirectConnect connection to tneirvPC they would then establish Internet connectivity into their space, filter the traffic in hardware Web Application Firewall (WAF). And then pass the traffic through the DirectConnect connection into their application running in their VPC. (Not cost effective)
    2. Add previously identified hostile source IPs as an explicit INBOUND DENY NACL to the web tier subnet. (does not protect against new source)
    3. Add a WAF tier by creating a new ELB and an AutoScaling group of EC2 Instances running a host-based WAF. They would redirect Route 53 to resolve to the new WAF tier ELB. The WAF tier would then pass the traffic to the current web tier. Web tier Security Groups would be updated to only allow traffic from the WAF tier Security Group
    4. Remove all but TLS 1.2 from the web tier ELB and enable Advanced Protocol Filtering This will enable the ELB itself to perform WAF functionality. (No advanced protocol filtering in ELB)

References

AWS Route 53 Overview

Route 53 Overview

  • Amazon Route 53 provides three main functions:
    • Domain registration
      • allows you to register domain names
    • Domain Name System (DNS) service
      • translates friendly domains names like www.example.com into IP addresses like 192.0.2.1
      • responds to DNS queries using a global network of authoritative DNS servers, which reduces latency
      • can route Internet traffic to CloudFront, Elastic Beanstalk, ELB, or S3. There’s no charge for DNS queries to these resources
    • Health checking
      • can monitor the health of the resources such as web servers and email servers.
      • sends automated requests over the Internet to the application
        to verify that it’s reachable, available, and functional
      • CloudWatch alarms can be configured for the health checks, so that you receive notification when a resource becomes unavailable.
      • can be configured to route Internet traffic away from resources that are unavailable

Supported DNS Resource Record Types

  • A (Address) Format
    • is an IPv4 address in dotted decimal notation for e.g. 192.0.2.1
  • AAAA Format
    • is an IPv6 address in colon-separated hexadecimal format
  • CNAME Format
    • is the same format as a domain name
    • DNS protocol does not allow creation of a CNAME record for the top node of a DNS namespace, also known as the zone apex for e.g. the DNS name example.com registration, the zone apex is example.com, a CNAME record for example.com cannot be created, but CNAME records can be created for www.example.com, newproduct.example.com etc.
    • If a CNAME record is created for a subdomain, any other resource record sets for that subdomain cannot be created for e.g. if a CNAME created for www.example.com, not other resource record sets for which the value of the Name field is www.example.com can be created
    • Alias resource record sets
      • Route 53 supports alias resource record sets, which enables routing of queries to a CloudFront distribution, an Elastic Beanstalk, an ELB, an S3 bucket configured as a static website, or another Amazon Route 53 resource record set
      • Aliases are similar in some ways to the CNAME resource record type; however, you can create an alias for the zone apex
  • MX (Mail Xchange) Format
    • contains a decimal number that represents the priority of the MX record, and the domain name of an email server
  • NS (Name Server) Format
    • An NS record identifies the name servers for the hosted zone. The value for an NS record is the domain name of a name server.
  • PTR Format
    • A PTR record Value element is the same format as a domain name.
  • SOA (Start of Authority) Format
    • SOA record provides information about a domain and the corresponding Amazon Route 53 hosted zone
  • SPF (Sender Policy Framework) Format
    • SPF records were formerly used to verify the identity of the sender of email messages, however is not recommended
    • Instead of an SPF record, a TXT record that contains the applicable value is recommended
  • SRV Format
    • An SRV record Value element consists of four space-separated values.The first three values are decimal numbers representing priority, weight, and port. The fourth value is a domain name for e.g. 10 5 80 hostname.example.com
  • TXT (Text) Format
    • A TXT record contains a space-separated list of double-quoted strings. A single string include a maximum
      of 255 characters. In addition to the characters that are permitted unescaped in domain names, space
      is allowed in TXT strings

Further Reading

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon Route53 provide?
    1. A global Content Delivery Network.
    2. None of these.
    3. A scalable Domain Name System
    4. An SSH endpoint for Amazon EC2.
  2. Does Amazon Route 53 support NS Records?
    1. Yes, it supports Name Service records.
    2. No
    3. It supports only MX records.
    4. Yes, it supports Name Server records. 
  3. Does Route 53 support MX Records?
    1. It supports CNAME records, but not MX records.
    2. No
    3. Only Primary MX records. Secondary MX records are not supported.
  4. Which of the following statements are true about Amazon Route 53 resource records? Choose 2 answers
    1. An Alias record can map one DNS name to another Amazon Route 53 DNS name.
    2. A CNAME record can be created for your zone apex.
    3. An Amazon Route 53 CNAME record can point to any DNS record hosted anywhere.
    4. TTL can be set for an Alias record in Amazon Route 53.
    5. An Amazon Route 53 Alias record can point to any DNS record hosted anywhere.
  5. A customer is hosting their company website on a cluster of web servers that are behind a public-facing load balancer. The customer also uses Amazon Route 53 to manage their public DNS. How should the customer configure the DNS zone apex record to point to the load balancer?
    1. Create an A record pointing to the IP address of the load balancer
    2. Create a CNAME record pointing to the load balancer DNS name.
    3. Create a CNAME record aliased to the load balancer DNS name.
    4. Create an A record aliased to the load balancer DNS name
  6. A user has configured ELB with three instances. The user wants to achieve High Availability as well as redundancy with ELB. Which of the below mentioned AWS services helps the user achieve this for ELB?
    1. Route 53
    2. AWS Mechanical Turk
    3. Auto Scaling
    4. AWS EMR
  7. How can the domain’s zone apex for example “myzoneapexdomain com” be pointed towards an Elastic Load Balancer?
    1. By using an AAAA record
    2. By using an A record
    3. By using an Amazon Route 53 CNAME record
    4. By using an Amazon Route 53 Alias record

AWS EC2 Monitoring

EC2 Monitoring

Status Checks

  • Status monitoring help quickly determine whether EC2 has detected any problems that might prevent instances from running applications.
  • EC2 performs automated checks on every running EC2 instance to identify hardware and software issues.
  • Status checks are performed every minute and each returns a pass or a fail status.
  • If all checks pass, the overall status of the instance is OK.
  • If one or more checks fail, the overall status is Impaired.
  • Status checks are built into EC2, so they cannot be disabled or deleted.
  • Status checks data augments the information that EC2 already provides about the intended state of each instance (such as pending, running, stopping) as well as the utilization metrics that Amazon CloudWatch monitors (CPU utilization, network traffic, and disk activity).
  • Alarms can be created, or deleted, that are triggered based on the result of the status checks. for e.g., an alarm can be created to warn if status checks fail on a specific instance.

System Status Checks

  • monitor the AWS systems required to use your instance to ensure they are working properly.
  • detect problems with the instance that require AWS involvement to repair.
  • When a system status check fails, one can either
    • wait for AWS to fix the issue
    • or resolve it by by stopping and restarting or terminating and replacing an instance
  • System status checks failure might be cause of
    • Loss of network connectivity
    • Loss of system power
    • Software issues on the physical host
    • Hardware issues on the physical host

Instance Status Checks

  • monitor the software and network configuration of the individual instance
  • checks detect problems that requires involvement to repair.
  • When an instance status check fails, it can be resolved by either rebooting the instance or by making modifications in the operating system
  • Instance status checks failure might be cause of
    • Failed system status checks
    • Misconfigured networking or startup configuration
    • Exhausted memory
    • Corrupted file system
    • Incompatible kernel

CloudWatch Monitoring

  • CloudWatch, helps monitor EC2 instances, which collects and processes
    raw data from EC2 into readable, near real-time metrics.
  • Statistics are recorded for a period of two weeks, so that historical information can be accessed and used to gain a better perspective on how
    the application or service is performing.
  • By default Basic monitoring is enabled and EC2 metric data is sent to CloudWatch in 5-minute periods automatically
  • Detailed monitoring can be enabled on EC2 instance, which sends data to CloudWatch in 1-minute periods.
  • Aggregating Statistics Across Instances/ASG/AMI ID
    • Aggregate statistics are available for the instances that have detailed monitoring (at an additional charge) enable, which provides data in 1-minute periods
    • Instances that use basic monitoring are not included in the aggregates.
    • CloudWatch does not aggregate data across Regions. Therefore, metrics are completely separate between Regions.
    • CloudWatch returns statistics for all dimensions in the AWS/EC2 namespace, if no dimension is specified
    • The technique for retrieving all dimensions across an AWS namespace does not work for custom namespaces published to CloudWatch.
    • Statistics include Sum, Average, Minimum, Maximum, Data Samples
    • With custom namespaces, the complete set of dimensions that are associated with any given data point to retrieve statistics that include the data point must be specified
  • CloudWatch alarms
    • can be created to monitor any one of the EC2 instance’s metrics.
    • can be configured to automatically send you a notification when the metric reaches a specified threshold.
    • can automatically stop, terminate, reboot, or recover EC2 instances
    • can automatically recover an EC2 instance when the instance becomes impaired due to an underlying hardware failure a problem that requires AWS involvement to repair
    • can automatically stop or terminate the instances to save costs (EC2 instances that use an EBS volume as the root device can be stopped
      or terminated, whereas instances that use the instance store as the root device can only be terminated)
    • can use EC2ActionsAccess IAM role, which enables AWS to perform stop, terminate, or reboot actions on EC2 instances
    • If you have read/write permissions for CloudWatch but not for EC2, alarms can still be created but the stop or terminate actions won’t be performed on the EC2 instance

EC2 Metrics

  • CPUCreditUsage
    • (Only valid for T2 instances) The number of CPU credits consumed
      during the specified period.
    • This metric identifies the amount of time during which physical CPUs
      were used for processing instructions by virtual CPUs allocated to
      the instance.
    • CPU Credit metrics are available at a 5 minute frequency.
  • CPUCreditBalance
    • (Only valid for T2 instances) The number of CPU credits that an instance has accumulated.
    • This metric is used to determine how long an instance can burst beyond its baseline performance level at a given rate.
    • CPU Credit metrics are available at a 5 minute frequency.
  • CPUUtilization
    • % of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance.
  • DiskReadOps
    • Completed read operations from all instance store volumes available to the instance in a specified period of time.
  • DiskWriteOps
    • Completed write operations to all instance store volumes available to the instance in a specified period of time.
  • DiskReadBytes
    • Bytes read from all instance store volumes available to the instance.
    • This metric is used to determine the volume of the data the application reads from the hard disk of the instance.
    • This can be used to determine the speed of the application.
  • DiskWriteBytes
    • Bytes written to all instance store volumes available to the instance.
    • This metric is used to determine the volume of the data the application writes onto the hard disk of the instance.
    • This can be used to determine the speed of the application.
  • NetworkIn
    • The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to an application on a single instance.
  • NetworkOut
    • The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic to an application on a single instance.
  • NetworkPacketsIn
    • The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance.
    • This metric is available for basic monitoring only
  • NetworkPacketsOut
    • The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance.
    • This metric is available for basic monitoring only.
  • StatusCheckFailed
    • Reports if either of the status checks, StatusCheckFailed_Instance and StatusCheckFailed_System, that has failed.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status check passed. A one indicates a status check failure.
    • Status check metrics are available at 1 minute frequency
  • StatusCheckFailed_Instance
    • Reports whether the instance has passed the Amazon EC2 instance status check in the last minute.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status check passed. A one indicates a status check failure.
    • Status check metrics are available at 1 minute frequency
  • StatusCheckFailed_System
    • Reports whether the instance has passed the EC2 system status check in the last minute.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status check passed. A one indicates a status check failure.
    • Status check metrics are available at a 1 minute frequency

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. In the basic monitoring package for EC2, Amazon CloudWatch provides the following metrics:
    1. Web server visible metrics such as number failed transaction requests
    2. Operating system visible metrics such as memory utilization
    3. Database visible metrics such as number of connections
    4. Hypervisor visible metrics such as CPU utilization
  2. Which of the following requires a custom CloudWatch metric to monitor?
    1. Memory Utilization of an EC2 instance
    2. CPU Utilization of an EC2 instance
    3. Disk usage activity of an EC2 instance
    4. Data transfer of an EC2 instance
  3. A user has configured CloudWatch monitoring on an EBS backed EC2 instance. If the user has not attached any additional device, which of the below mentioned metrics will always show a 0 value?
    1. DiskReadBytes
    2. NetworkIn
    3. NetworkOut
    4. CPUUtilization
  4. A user is running a batch process on EBS backed EC2 instances. The batch process starts a few instances to process Hadoop Map reduce jobs, which can run between 50 – 600 minutes or sometimes for more time. The user wants to configure that the instance gets terminated only when the process is completed. How can the user configure this with CloudWatch?
    1. Setup the CloudWatch action to terminate the instance when the CPU utilization is less than 5%
    2. Setup the CloudWatch with Auto Scaling to terminate all the instances
    3. Setup a job which terminates all instances after 600 minutes
    4. It is not possible to terminate instances automatically
  5. An AWS account owner has setup multiple IAM users. One IAM user only has CloudWatch access. He has setup the alarm action, which stops the EC2 instances when the CPU utilization is below the threshold limit. What will happen in this case?
    1. It is not possible to stop the instance using the CloudWatch alarm
    2. CloudWatch will stop the instance when the action is executed
    3. The user cannot set an alarm on EC2 since he does not have the permission
    4. The user can setup the action but it will not be executed if the user does not have EC2 rights
  6. A user has launched 10 instances from the same AMI ID using Auto Scaling. The user is trying to see the average CPU utilization across all instances of the last 2 weeks under the CloudWatch console. How can the user achieve this?
    1. View the Auto Scaling CPU metrics
    2. Aggregate the data over the instance AMI ID
    3. The user has to use the CloudWatchanalyser to find the average data across instances
    4. It is not possible to see the average CPU utilization of the same AMI ID since the instance ID is different

AWS Relation Database Service – RDS

RDS Overview

  • Amazon Relational Database Service ( RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the cloud.
  • RDS provides cost-efficient, resizeable capacity for an industry-standard relational database and manages common database administration tasks.
  • RDS features & benefits
    • CPU, memory, storage, and IOPS can be scaled independently.
    • manages backups, software patching, automatic failure detection, and recovery.
    • automated backups can be performed as needed, or manual backups can be triggered as well. Backups can be used to restore a database, and the Amazon RDS restore process works reliably and efficiently.
    • provides high availability with a primary instance and a synchronous secondary instance that you can failover seamlessly when a problem occurs.
    • provides elasticity by enabling MySQL, MariaDB, or PostgreSQL Read Replicas to increase read scaling.
    • supports MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server, and the new, MySQL-compatible Amazon Aurora DB engine
    • in addition to the security in the database package, IAM users and permissions can help to control who has access to the RDS databases
    • databases can be further protected by putting them in a VPC
    • However, as it is a managed service, shell (root ssh) access to DB instances is not provided, and this restricts access to certain system procedures and tables that require advanced privileges.

RDS Components

  • DB Instance
    • is a basic building block of RDS
    • is an isolated database environment in the cloud
    • each DB instance runs a DB engine. AWS currently supports MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server & Aurora DB engines
    • can be accessed from Amazon AWS command line tools, Amazon RDS
      APIs, or the AWS Management RDS Console.
    • computation and memory capacity of an DB instance is determined by its DB instance class, which can be selected as per the needs
    • for each DB instance, 5 GB to 6 TB of associated storage capacity can be selected
    • storage comes in three types: Magnetic, General Purpose (SSD), and Provisioned IOPS (SSD), which differ in performance characteristics and price
    • each DB instance has a DB instance identifier, which is customer-supplied name and must be unique for that customer in an AWS region. It uniquely identifies the DB instance when interacting with the Amazon RDS API and AWS CLI commands.
    • each DB instance can host multiple databases, or a single Oracle database with multiple schemas.
    • can be hosted in an AWS VPC environment for better control
  • Regions and Availability Zones
    • AWS resources are housed in highly available data center facilities in different areas of world, these data centers are called regions which further contain multiple distinct locations called Availability Zones
    • Each Availability Zone is engineered to be isolated from failures in other Availability Zones, and to provide inexpensive, low-latency network connectivity to other Availability Zones in the same region
    • DB instances can be hosted in several Availability Zones, an option called a Multi-AZ deployment. Amazon automatically provisions and maintains a synchronous standby replica of the DB instance in a different Availability Zone. Primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support, eliminate I/O freezes, and minimize latency spikes during system backups.
  • Security Groups
    • security group controls the access to a DB instance, by allowing access to the specified IP address ranges or Amazon EC2 instances
  • DB Parameter Groups
    • A DB parameter group contains engine configuration values that can be applied to one or more DB instances of the same instance type
  • DB Option Groups
    • Some DB engines offer tools that simplify managing the databases and making the best use of data.
    • Amazon RDS makes such tools available through option groups for e.g. Oracle Application Express (APEX), SQL Server Transparent Data Encryption, and MySQL memcached support.

RDS Interfaces

  • RDS can be interacted with multiple interfaces
    • AWS RDS Management console
    • Command Line Interface
    • Programmatic Interfaces which include SDKs, libraries in different languages, and RDS API

RDS Pricing

  • Instance class
    • Pricing is based on the class (e.g., micro, small, large, xlarge) of the DB instance consumed.
  • Running time
    • Billed by the instance-hour, which is equivalent to a single instance running for an hour for e.g., a single instance running for two hours = two instances running for one hour, both consume 2 instance-hours.
    • if a DB instance runs for only part of an hour, full instance-hour is charged
  • Storage
    • Storage capacity provisioned for the DB instance is billed per GB per month.
    • If the provisioned storage capacity is scaled within the month, the bill will be pro-rated.
  • I/O requests per month
    • Total number of storage I/O requests made in a billing cycle.
  • Backup storage
    • Automated backups & any active database snapshots consume storage
    • Increasing backup retention period or taking additional database snapshots increases the backup storage consumed by the database.
    • RDS provides backup storage up to 100% of the provisioned database storage at no additional charge for e.g., if you have 10 GB-months of provisioned database storage, RDS provides up to 10 GB-months of backup storage at no additional charge.
    • Most databases require less raw storage for a backup than for the primary dataset, so if multiple backups are not maintained, you will never pay for backup storage.
    • Backup storage is free only for active DB instances.
  • Data transfer
    • Internet data transfer in and out of your DB instance.
  • Reserved Instance
    • In addition to regular RDS pricing, reserved DB instances can be purchased

Further Reading

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon RDS stand for?
    1. Regional Data Server.
    2. Relational Database Service
    3. Regional Database Service.
  2. How many relational database engines does RDS currently support?
    1. MySQL, Postgres, MariaDB, Oracle and Microsoft SQL Server
    2. Just two: MySQL and Oracle.
    3. Five: MySQL, PostgreSQL, MongoDB, Cassandra and SQLite.
    4. Just one: MySQL.
  3. If I modify a DB Instance or the DB parameter group associated with the instance, should I reboot the instance for the changes to take effect?
    1. No
    2. Yes
  4. What is the name of licensing model in which I can use your existing Oracle Database licenses to run Oracle deployments on Amazon RDS?
    1. Bring Your Own License
    2. Role Bases License
    3. Enterprise License
    4. License Included
  5. Will I be charged if the DB instance is idle?
    1. No
    2. Yes
    3. Only is running in GovCloud
    4. Only if running in VPC
  6. What is the minimum charge for the data transferred between Amazon RDS and Amazon EC2 Instances in the same Availability Zone?
    1. USD 0.10 per GB
    2. No charge. It is free.
    3. USD 0.02 per GB
    4. USD 0.01 per GB
  7. Does Amazon RDS allow direct host access via Telnet, Secure Shell (SSH), or Windows Remote Desktop Connection?
    1. Yes
    2. No
    3. Depends on if it is in VPC or not
  8. What are the two types of licensing options available for using Amazon RDS for Oracle?
    1. BYOL and Enterprise License
    2. BYOL and License Included
    3. Enterprise License and License Included
    4. Role based License and License Included

AWS CloudWatch Overview

AWS CloudWatch

  • AWS CloudWatch monitors AWS resources and applications in real-time.
  • CloudWatch can be used to collect and track metrics, which are the variables to be measured for resources and applications.
  • CloudWatch alarms can be configured
    • to send notifications or
    • to automatically make changes to the resources based on defined rules
  • In addition to monitoring the built-in metrics that come with AWS, custom metrics can also be monitored
  • CloudWatch provides system-wide visibility into resource utilization, application performance, and operational health.

CloudWatch Architecture

CloudWatch Architecture

  • CloudWatch collects various metrics from various resources
  • These metrics, as statistics, are available to the user through Console, CLI
  • CloudWatch allows creation of alarms with defined rules to perform actions to auto scaling or stop, start, or terminate instances
  • CloudWatch allows creation of alarms to send notifications using SNS actions on your behalf

CloudWatch Concepts

Metrics

  • Metric is the fundamental concept in CloudWatch.
  • They are uniquely defined by a name, a namespace, and one or more dimensions.
  • It represents a time-ordered set of data points published to CloudWatch.
  • Each data point has a time stamp, and (optionally) a unit of measure
  • These data points can be either custom metrics or metrics from other
    services in AWS.
  • Statistics can be retrieved about those data points as an ordered set of time-series data that occur within a specified time window.
  • When the statistics are requested, the returned data stream is identified by namespace, metric name, dimension, and (optionally) the unit.
  • Metrics exist only in the region in which they are created
  • CloudWatch stores the metric data for two weeks
  • Metrics cannot be deleted, but they automatically expire in 14 days if no new data is published to them.

Namespaces

  • CloudWatch namespaces are containers for metrics.
  • Metrics in different namespaces are isolated from each other, so that metrics from different applications are not mistakenly aggregated into the same statistics.
  • AWS namespaces all follow the convention AWS/<service>, for e.g. AWS/EC2 and AWS/ELB
  • Namespace names must be fewer than 256 characters in length.
  • There is no default namespace. Each data element put into CloudWatch must specify a namespace

Dimensions

  • A dimension is a name/value pair that uniquely identifies a metric.
  • Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics.
  • Dimensions helps design a structure for the statistics plan.
  • Dimensions are part of the unique identifier for a metric, whenever a unique name pair is added to one of the metrics, a new metric is created
  • Dimensions can be used to filter result sets that CloudWatch query returns
  • A metric can be assigned up to ten dimensions to a metric.

Time Stamps

  • Each metric data point must be marked with a time stamp to identify the data point on a time series
  • Time stamp can be up to two weeks in the past and up to two hours into the future.
  • If no time stamp is provided, CloudWatch creates a time stamp based on the time the data element was received.
  • All times reflect the UTC time zone when statistics are retrieved

Units

  • Units represent the statistic’s unit of measure for e.g. count, bytes, % etc

Statistics

  • Statistics are metric data aggregations over specified periods of time
  • Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure, within the specified time period

Periods

  • Period is the length of time associated with a specific statistic.
  • Each statistic represents an aggregation of the metrics data collected for a specified period of time.
  • Although periods are expressed in seconds, the minimum granularity for a period is one minute.

Aggregation

  • CloudWatch aggregates statistics according to the period length specified in calls to GetMetricStatistics.
  • Multiple data points can be published with the same or similar time stamps. CloudWatch aggregates them by period length when the statistics about those data points are requested.
  • Aggregated statistics are only available when using detailed monitoring.
  • Instances that use basic monitoring are not included in the aggregates
  • CloudWatch does not aggregate data across regions.

Alarms

  • Alarms can automatically initiate actions on behalf of the user, based on specified parameters
  • Alarm watches a single metric over a specified time period, and performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods.
  • Alarms invoke actions for sustained state changes only i.e. the state must have changed and been maintained for a specified number of periods
  • Action can be a notification sent to a SNS topic or Auto Scaling policy
  • After an alarm invokes an action due to a change in state, its subsequent behavior depends on the type of action associated with the alarm.
    • For Auto Scaling policy notifications, the alarm continues to invoke the action for every period that the alarm remains in the new state.
    • For SNS notifications, no additional actions are invoked.
  • An alarm has three possible states:
    • OK—The metric is within the defined threshold
    • ALARM—The metric is outside of the defined threshold
    • INSUFFICIENT_DATA—Alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state
  • Alarms exist only in the region in which they are created.
  • Alarm actions must reside in the same region as the alarm
  • Alarm history is available for the last 14 days.
  • Alarm can be tested by setting it to any state using the SetAlarmState API (mon-set-alarm-state command). This temporary state change lasts only until the next alarm comparison occurs.
  • Alarms can be disabled and enabled using the DisableAlarmActions and EnableAlarmActions APIs (mon-disable-alarm-actions and mon-enable-alarm-actions commands).

Regions

  • CloudWatch does not aggregate data across regions. Therefore, metrics are completely separate between regions.

Custom Metrics

  • CloudWatch allows publishing custom metrics with put-metric-data CLI command (or its Query API equivalent PutMetricData)
  • CloudWatch creates a new metric if put-metric-data is called with a new metric name,  else it associates the data with the specified existing metric
  • put-metric-data command can only publish one data point per call
  • CloudWatch stores data about a metric as a series of data points and each data point has an associated time stamp
  • Creating a new metric using the put-metric-data command, can take up to two minutes before statistics can be retrieved on the new metric using the get-metric-statistics command and can take up to fifteen minutes before the new metric appears in the list of metrics retrieved using the list-metrics command.
  • CloudWatch allows publishing
    • Single data point
      • Data points can be published with time stamps as granular as one-thousandth of a second, CloudWatch aggregates the data to a minimum granularity of one minute
      • CloudWatch records the average (sum of all items divided by number of items) of the values received for every 1-minute period, as well as number of samples, maximum value, and minimum value for the same time period
      • CloudWatch uses one-minute boundaries when aggregating data points
    • Aggregated set of data points called a statistics set
      • Data can also be aggregated before being published to CloudWatch
      • Aggregating data minimizes the number of calls reducing it to a single call per minute with the statistic set of data
      • Statistics include Sum, Average, Minimum, Maximum, Data Sample
  • If the application produces data that is more sporadic and have periods that have no associated data, either a the value zero (0) or no value at all can be published
  • However, it can be helpful to publish zero instead of no value
    • to monitor the health of your application for e.g. alarm can be configured to notify if no metrics published every 5 minutes
    • to track the total number of data points
    • to have statistics such as minimum and average to include data points with the value 0.

Supported Services

For Supported Services refer @ CloudWatch Supported Services

Accessing CloudWatch

  • CloudWatch can be accessed using
    • AWS CloudWatch console
    • CloudWatch CLI
    • AWS CLI
    • CloudWatch API
    • AWS SDKs

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company needs to monitor the read and write IOPs metrics for their AWS MySQL RDS instance and send real-time alerts to their operations team. Which AWS services can accomplish this? Choose 2 answers
    1. Amazon Simple Email Service (Cannot be integrated with CloudWatch directly)
    2. Amazon CloudWatch
    3. Amazon Simple Queue Service
    4. Amazon Route 53
    5. Amazon Simple Notification Service
  2. A customer needs to capture all client connection information from their load balancer every five minutes. The company wants to use this data for analyzing traffic patterns and troubleshooting their applications. Which of the following options meets the customer requirements?
    1. Enable AWS CloudTrail for the load balancer.
    2. Enable access logs on the load balancer.
    3. Install the Amazon CloudWatch Logs agent on the load balancer.
    4. Enable Amazon CloudWatch metrics on the load balancer
  3. A user is running a batch process on EBS backed EC2 instances. The batch process starts a few instances to process Hadoop Map reduce jobs, which can run between 50 – 600 minutes or sometimes for more time. The user wants to configure that the instance gets terminated only when the process is completed. How can the user configure this with CloudWatch?
    1. Setup the CloudWatch action to terminate the instance when the CPU utilization is less than 5%
    2. Setup the CloudWatch with Auto Scaling to terminate all the instances
    3. Setup a job which terminates all instances after 600 minutes
    4. It is not possible to terminate instances automatically
  4. A user has two EC2 instances running in two separate regions. The user is running an internal memory management tool, which captures the data and sends it to CloudWatch in US East, using a CLI with the same namespace and metric. Which of the below mentioned options is true with respect to the above statement?
    1. The setup will not work as CloudWatch cannot receive data across regions
    2. CloudWatch will receive and aggregate the data based on the namespace and metric
    3. CloudWatch will give an error since the data will conflict due to two sources
    4. CloudWatch will take the data of the server, which sends the data first
  5. A user is sending the data to CloudWatch using the CloudWatch API. The user is sending data 90 minutes in the future. What will CloudWatch do in this case?
    1. CloudWatch will accept the data
    2. It is not possible to send data of the future
    3. It is not possible to send the data manually to CloudWatch
    4. The user cannot send data for more than 60 minutes in the future
  6. A user is having data generated randomly based on a certain event. The user wants to upload that data to CloudWatch. It may happen that event may not have data generated for some period due to randomness. Which of the below mentioned options is a recommended option for this case?
    1. For the period when there is no data, the user should not send the data at all
    2. For the period when there is no data the user should send a blank value
    3. For the period when there is no data the user should send the value as 0 (Refer User Guide)
    4. The user must upload the data to CloudWatch as having no data for some period will cause an error at CloudWatch monitoring
  7. A user has a weighing plant. The user measures the weight of some goods every 5 minutes and sends data to AWS CloudWatch for monitoring and tracking. Which of the below mentioned parameters is mandatory for the user to include in the request list?
    1. Value
    2. Namespace
    3. Metric Name
    4. Timezone
  8. A user has a refrigerator plant. The user is measuring the temperature of the plant every 15 minutes. If the user wants to send the data to CloudWatch to view the data visually, which of the below mentioned statements is true with respect to the information given above?
    1. The user needs to use AWS CLI or API to upload the data
    2. The user can use the AWS Import Export facility to import data to CloudWatch
    3. The user will upload data from the AWS console
    4. The user cannot upload data to CloudWatch since it is not an AWS service metric
  9. A user has launched an EC2 instance. The user is planning to setup the CloudWatch alarm. Which of the below mentioned actions is not supported by the CloudWatch alarm?
    1. Notify the Auto Scaling launch config to scale up
    2. Send an SMS using SNS
    3. Notify the Auto Scaling group to scale down
    4. Stop the EC2 instance
  10. A user has a refrigerator plant. The user is measuring the temperature of the plant every 15 minutes. If the user wants to send the data to CloudWatch to view the data visually, which of the below mentioned statements is true with respect to the information given above?
    1. The user needs to use AWS CLI or API to upload the data
    2. The user can use the AWS Import Export facility to import data to CloudWatch
    3. The user will upload data from the AWS console
    4. The user cannot upload data to CloudWatch since it is not an AWS service metric
  11. A user is trying to aggregate all the CloudWatch metric data of the last 1 week. Which of the below mentioned statistics is not available for the user as a part of data aggregation?
    1. Aggregate
    2. Sum
    3. Sample data
    4. Average
  12. A user has setup a CloudWatch alarm on an EC2 action when the CPU utilization is above 75%. The alarm sends a notification to SNS on the alarm state. If the user wants to simulate the alarm action how can he achieve this?
    1. Run activities on the CPU such that its utilization reaches above 75%
    2. From the AWS console change the state to ‘Alarm’
    3. The user can set the alarm state to ‘Alarm’ using CLI
    4. Run the SNS action manually
  13. A user is publishing custom metrics to CloudWatch. Which of the below mentioned statements will help the user understand the functionality better?
    1. The user can use the CloudWatch Import tool
    2. The user should be able to see the data in the console after around 15 minutes
    3. If the user is uploading the custom data, the user must supply the namespace, timezone, and metric name as part of the command
    4. The user can view as well as upload data using the console, CLI and APIs

AWS RDS DB Snapshot, Backup & Restore

RDS Back Up, Restore and Snapshots

DB Instance Backups

  • Amazon RDS creates a storage volume snapshot of the DB instance, backing up the entire DB instance and not just individual databases.
  • Amazon RDS provides two different methods Automated and Manual for backing up your DB instances:

Automated backups

  • Automated backup is an Amazon RDS feature that automatically creates a backup of the DB instance.
  • Automated backups are enabled by default for a new DB instance.
  • Automated backup occurs during a daily user-configurable period of time known as the preferred backup window.
  • Backups created during the backup window are retained for a user-configurable number of days (the backup retention period).
  • If the backup retention period is not set, RDS defaults the period retention period to one day
  • Backup retention period can be modified with valid values are 0 (for no backup retention) to a maximum of 35 days.
  • Manual snapshot limits (50 per region) do not apply to automated backups
  • If the backup requires more time than allotted to the backup window, the backup will continue to completion.
  • An immediate outage will occur if the backup retention period is changed from
    • 0 to a non-zero value as the first backup occurs immediately or
    • non-zero value to 0 as it turns off automatic backups, and deletes all existing automated backups for the instance.
  • RDS uses the periodic data backups in conjunction with the transaction logs to enable restoration of the DB Instance to any second during the retention period, up to the LatestRestorableTime (typically up to the last few minutes).
  • During the backup window, storage I/O may be briefly suspended while the backup process initializes (typically under a few seconds) and a brief period of elevated latency might be experienced.
  • There is no I/O suspension for Multi-AZ DB deployments, since the backup is taken from the standby
  • If a preferred backup window is not specified when an DB instance is created, RDS assigns a default 30-minute backup window which is selected at random from an 8-hour block of time per region.
  • Changes to the backup window take effect immediately.
  • Backup window cannot overlap with the weekly maintenance window for the DB instance.
  • Automated DB snapshots are deleted when
    • the retention period expires
    • the automated DB snapshots for a DB instance is disabled
    • the DB instance is deleted
  • When a DB instance is deleted,
    • a final DB snapshot can be created upon deletion; which can be used to restore the deleted DB instance at a later date.
    • RDS retains the final user-created DB snapshot along with all other manually created DB snapshots
    • all automated backups are deleted and cannot be recovered

Manual (customer-initiated) DB snapshots

  • Manual DB snapshots are user-initiated backups that enables to back up a  DB instance to a known state, and restore to that specific state at any time.
  • RDS keeps all manual DB snapshots until explicitly deleted

Point-In-Time Recovery

  • In addition to the daily automated backup, RDS archives database change logs. This enables recovery of the database to any point in time during the backup retention period, up to the last five minutes of database usage.
  • Disabling automated backups because it disables point-in-time recovery
  • RDS stores multiple copies of your data, but for Single-AZ DB instances these copies are stored in a single availability zone.
  • If for any reason a Single-AZ DB instance becomes unusable, point-in-time recovery can be used to launch a new DB instance with the latest restorable data

DB Snapshots

DB Snapshots Creation

  • DB snapshot is a user-initiated storage volume snapshot of DB instance, backing up the entire DB instance and not just individual databases.
  • DB snapshots enable backing up of the DB instance in a known state as needed, and can then be restored to that specific state at any time.
  • DB snapshots are kept until explicitly deleted
  • Creating DB snapshot on a Single-AZ DB instance results in a brief I/O suspension that typically lasting no more than a few minutes.
  • Multi-AZ DB instances are not affected by this I/O suspension since the backup is taken on the standby

DB Snapshot Restore

  • DB instance can be restored to any specific time during this retention period, creating a new DB instance.
  • New DB instance with a different endpoint is created by restoring from a DB snapshot
  • RDS uses the periodic data backups in conjunction with the transaction logs to enable restoration of the DB Instance to any second during the retention period, up to the LatestRestorableTime (typically up to the last few minutes).
  • Option group associated with the DB snapshot is associated with the restored DB instance once it is created. However, option group is associated with the VPC so would apply only when the instance is restored in the same VPC as the DB snapshot
  • However, the default DB parameter and security groups are associated with the restored instance. After the restoration is complete, any custom DB parameter or security groups used by the instance restored from should be associated explicitly
  • A DB instance can be restored with a different storage type than the source DB snapshot. In this case the restoration process will be slower because of the additional work required to migrate the data to the new storage type for e.g. from GP2 to Provisioned IOPS
  • A DB instance can be restored with a different edition of the DB engine only if the DB snapshot has the required storage allocated for the new edition for e.g., to change from SQL Server Web Edition to SQL Server Standard Edition, the DB snapshot must have been created from a SQL Server DB instance that had at least 200 GB of allocated storage, which is the minimum allocated storage for SQL Server Standard edition

DB Snapshot Copy

  • Amazon RDS supports two types of DB snapshot copying.
    • Copy an automated DB snapshot to create a manual DB snapshot in the same AWS region. Manual DB snapshot are not deleted automatically and can be kept indefinitely.
    • Copy either an automated or manual DB snapshot from one region to another region. By copying the DB snapshot to another region, a manual DB snapshot is created that is retained in that
      region.
  • Manual DB snapshots can be shared with other AWS accounts and copy DB snapshots shared to you by other AWS accounts
  • Snapshot Copy Encryption
    • DB snapshot that has been encrypted using an AWS Key Management System (AWS KMS) encryption key can be copied
    • Copying an encrypted DB snapshot, results in an encrypted copy of the DB snapshot
    • When copying, DB snapshot can either be encrypted with the same KMS encryption key as the original DB snapshot, or a different KMS encryption key to encrypt the copy of the DB snapshot.
    • An unencrypted DB snapshot can be copied to an encrypted snapshot, a quick way to add encryption to a previously encrypted DB instance.
    • Encrypted snapshot can be restored only to an encrypted DB instance
    • If a KMS encryption key is specified when restoring from an unencrypted DB cluster snapshot, the restored DB cluster is encrypted using the specified KMS encryption key
    • Copying an encrypted snapshot shared from another AWS account, requires access to the KMS encryption key that was used to encrypt the DB snapshot.
    • Because KMS encryption keys are specific to the region that they are created in, encrypted snapshot cannot be copied to another region

DB Snapshot Sharing

  • Manual DB snapshot or DB cluster snapshot can be shared with up to 20 other AWS accounts.
  • Manual snapshot shared with other AWS accounts can copy the snapshot, or restore a DB instance or DB cluster from that snapshot.
  • Manual snapshot can also be shared as public, which makes the snapshot available to all AWS accounts. Care should be taken when sharing a snapshot as public so that none of your private information is included
  • Shared snapshot can be copied to another region.
  • However, following limitations apply when sharing manual snapshots with other AWS accounts:
    • When a DB instance or DB cluster is restored from a shared snapshot using the AWS CLI or RDS API, the Amazon Resource Name (ARN) of the shared snapshot as the snapshot identifier should be specified
    • DB snapshot that uses an option group with permanent or persistent options cannot be shared
    • A permanent option cannot be removed from an option group. Option groups with persistent options cannot be removed from a DB instance once the option group has been assigned to the DB instance.
  • DB snapshots that have been encrypted “at rest” using the AES-256 encryption algorithm can be shared
  • Users can only copy encrypted DB snapshots if they have access to the AWS Key Management Service (AWS KMS) encryption key that was used to encrypt the DB snapshot.
  • AWS KMS encryption keys can be shared with another AWS account by adding the other account to theKMS key policy.
  • However, KMS key policy must first be updated by adding any accounts to share the snapshot with, before sharing an encrypted DB snapshot
  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Amazon RDS automated backups and DB Snapshots are currently supported for only the __________ storage engine
    1. InnoDB
    2. MyISAM
  2. Automated backups are enabled by default for a new DB Instance.
    1. TRUE
    2. FALSE
  3. Amazon RDS DB snapshots and automated backups are stored in
    1. Amazon S3
    2. Amazon EBS Volume
    3. Amazon RDS
    4. Amazon EMR
  4. You receive a frantic call from a new DBA who accidentally dropped a table containing all your customers. Which Amazon RDS feature will allow you to reliably restore your database to within 5 minutes of when the mistake was made?
    1. Multi-AZ RDS
    2. RDS snapshots
    3. RDS read replicas
    4. RDS automated backup
  5. Disabling automated backups ______ disable the point-in-time recovery.
    1. if configured to can
    2. will never
    3. will
  6. Changes to the backup window take effect ______.
    1. from the next billing cycle
    2. after 30 minutes
    3. immediately
    4. after 24 hours
  7. You can modify the backup retention period; valid values are 0 (for no backup retention) to a maximum of ___________ days.
    1. 45
    2. 35
    3. 15
    4. 5
  8. Amazon RDS automated backups and DB Snapshots are currently supported for only the ______ storage engine
    1. MyISAM
    2. InnoDB 
  9. What happens to the I/O operations while you take a database snapshot?
    1. I/O operations to the database are suspended for a few minutes while the backup is in progress.
    2. I/O operations to the database are sent to a Replica (if available) for a few minutes while the backup is in progress.
    3. I/O operations will be functioning normally
    4. I/O operations to the database are suspended for an hour while the backup is in progress
  10. True or False: When you perform a restore operation to a point in time or from a DB Snapshot, a new DB Instance is created with a new endpoint.
    1. FALSE
    2. TRUE 
  11. True or False: Manually created DB Snapshots are deleted after the DB Instance is deleted.
    1. TRUE
    2. FALSE

References

AWS RDS DB Maintenance & Upgrades

RDS DB Instance Maintenance and Upgrades

Changes to a DB instance can occur when a DB instance is manually modified for e.g. DB engine version is upgraded, or when Amazon RDS performs maintenance on an instance

Amazon RDS Maintenance

  • Periodically, Amazon RDS performs maintenance on Amazon RDS resources, such as DB instances and most often involves updates to the DB instance’s operating system (OS).
  • Maintenance items can either
    • be applied manually on a DB instance at ones convenience
    • or wait for the automatic maintenance process initiated by Amazon RDS during the defined weekly maintenance window.
  • Maintenance window only determines when pending operations start, but does not limit the total execution time of these operations. Maintenance operations are not guaranteed to finish before the maintenance window ends, and can continue beyond the specified end time.
  • Maintenance update availability can be checked both on the RDS console and by using the RDS API. And if an update is available, one can
    • Defer the maintenance items.
    • Apply the maintenance items immediately.
    • Schedule them to start during the next defined maintenance window
  • Maintenance items marked as
    • Required cannot be deferred indefinitely, if deferred AWS will send a notify the time when the update will be performed next
    • Available and can be deferred indefinitely and the update will not be applied to the DB instance.
  • Required patching is automatically scheduled only for patches that are related to security and instance reliability. Such patching occurs infrequently (typically once every few months) and seldom requires more than a fraction of your maintenance window.
  • Maintenance items require that RDS take your DB instance offline for a short time. Maintenance that requires DB instance to be offline include scale compute operations, which generally take only a few minutes from start to finish, and required operating system or database patching.
  • Multi-AZ deployment for the DB instance reduces the impact of a maintenance event by following these steps:
    • Perform maintenance on the standby.
    • Promote the standby to primary.
    • Perform maintenance on the old primary, which becomes the new standby.
  • When database engine for the DB instance is modified in a Multi-AZ deployment, RDS upgrades both the primary and secondary DB instances at the same time. In this case, the database engine for the entire Multi-AZ deployment is shut down during the upgrade.

Operating System Updates

  • Upgrades to the operating system are most often for security issues and should be done as soon as possible.
  • OS updates on a DB instance can be applied at ones convenience or can wait for the maintenance process initiated by RDS to apply the update during the defined maintenance window
  • DB instance is not automatically backed up when an OS update is applied, and should be backup up before the update is applied

Database Engine Version Upgrade

  • DB instance engine version can be upgraded when a new DB engine version is supported by RDS.
  • Database version upgrades consist of major and minor version upgrades.
    • Major database version upgrades
      • can contain changes that are not backward-compatible
      • RDS doesn’t apply major version upgrades automatically
      • DB instance should be manually modified and thoroughly tested before applying it to the production instances.
    • Minor version upgrades
      • Each DB engine handles minor version upgrade slightly differently
        for e.g. RDS automatically apply minor version upgrades to a DB instance running PostgreSQL, but must be manually applied to a DB instance running Oracle.
  • Amazon posts an announcement to the forums announcement page and sends a customer e-mail notification before upgrading an DB instance
  • Amazon schedule the upgrades at specific times through the year, to help plan around them, because downtime is required to upgrade a DB engine version, even for Multi-AZ instances.
  • RDS takes two DB snapshots during the upgrade process.
    • First DB snapshot is of the DB instance before any upgrade changes have been made. If the upgrade fails, it can be restored from the snapshot to create a DB instance running the old version.
    • Second DB snapshot is taken when the upgrade completes. After the upgrade is complete, database engine can’t be reverted to the previous version. For returning to the previous version, restore the first DB snapshot taken to create a new DB instance.
  • If the DB instance is using read replication, all of the Read Replicas must be upgraded before upgrading the source instance.
  • If the DB instance is in a Multi-AZ deployment, both the primary and standby replicas are upgraded at the same time and would result in an outage. The time for the outage varies based on your database engine, version, and the size of your DB instance.

RDS Maintenance Window

  • Every DB instance has a weekly maintenance window defined during which any system changes are applied.
  • Maintenance window is an opportunity to control when DB instance modifications and software patching occur, in the event either are requested or required.
  • If a maintenance event is scheduled for a given week, it will be initiated during the 30 minute maintenance window as defined
  • Maintenance events mostly complete during the 30 minute maintenance window, although larger maintenance events may take more time
  • 30-minute maintenance window is selected at random from an 8-hour block of time per region. If you don’t specify a preferred maintenance window when you create the DB instance, Amazon RDS assigns a 30-minute maintenance window on a randomly selected day of the week.
  • RDS will consume some of the resources on the DB instance while maintenance is being applied, minimally effecting performance.
  • For some maintenance events, a Multi-AZ failover may be required for a maintenance update to complete.
  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user has launched an RDS MySQL DB with the Multi AZ feature. The user has scheduled the scaling of instance storage during maintenance window. What is the correct order of events during maintenance window? 1. Perform maintenance on standby 2. Promote standby to primary 3. Perform maintenance on original primary 4. Promote original master back as primary
    1. 1, 2, 3, 4
    2. 1, 2, 3
    3. 2, 3, 4, 1
  2. Can I control if and when MySQL based RDS Instance is upgraded to new supported versions?
    1. No
    2. Only in VPC
    3. Yes
  3. A user has scheduled the maintenance window of an RDS DB on Monday at 3 AM. Which of the below mentioned events may force to take the DB instance offline during the maintenance window?
    1. Enabling Read Replica
    2. Making the DB Multi AZ
    3. DB password change
    4. Security patching
  4. A user has launched an RDS postgreSQL DB with AWS. The user did not specify the maintenance window during creation. The user has configured RDS to update the DB instance type from micro to large. If the user wants to have it during the maintenance window, what will AWS do?
    1. AWS will not allow to update the DB until the maintenance window is configured
    2. AWS will select the default maintenance window if the user has not provided it
    3. AWS will ask the user to specify the maintenance window during the update
    4. It is not possible to change the DB size from micro to large with RDS
  5. Can I test my DB Instance against a new version before upgrading?
    1. No
    2. Yes
    3. Only in VPC

References

AWS RDS Storage

AWS RDS Storage

  • RDS storage uses Elastic Block Store (EBS) volumes for database and log storage.
  • RDS automatically stripes across multiple EBS volumes to enhance IOPS performance, depending on the amount of storage requested

RDS Storage Types

  • RDS storage provides three storage types: Magnetic, General Purpose (SSD), and Provisioned IOPS (input/output operations per second).
  • These storage types differ in performance characteristics and price, which allows tailoring of storage performance and cost to the database needs
  • MySQL, MariaDB, PostgreSQL, and Oracle RDS DB instances can be created with up to 6TB of storage and SQL Server RDS DB instances with up to 4TB of storage when using the Provisioned IOPS and General Purpose (SSD)
    storage types.
  • Existing MySQL, PostgreSQL, and Oracle RDS database instances can be scaled to these new database storage limits without any downtime.

Magnetic (Standard)

  • Magnetic storage, also called standard storage, offers cost-effective storage that is ideal for applications with light or burst I/O requirements.
  • They deliver approximately 100 IOPS on average, with burst capability of up to hundreds of IOPS, and they can range in size from 5 GB to 3 TB, depending on the DB instance engine.
  • Magnetic storage is not reserved for a single DB instance, so performance can vary greatly depending on the demands placed on shared resources by other customers.

General Purpose (SSD)

  • General purpose, SSD-backed storage, also called gp2, can provide faster access than disk-based storage.
  • They can deliver single-digit millisecond latencies, with a base performance of 3 IOPS per Gigabyte (GB) and the ability to burst to 3,000 IOPS for extended periods of time up to a maximum of 10,000 PIOPS.
  • Gp2 volumes can range in size from 5 GB to 6 TB for MySQL, MariaDB, PostgreSQL, and Oracle DB instances, and from 20 GB to 4 TB for SQL Server DB instances.
  • Gp2 is excellent for small to medium-sized databases.

Provisioned IOPS

  • Provisioned IOPS storage is designed to meet the needs of I/O-intensive workloads, particularly database workloads, that are sensitive to storage performance and consistency in random access I/O throughput.
  • Provisioned IOPS storage is a storage type that delivers fast, predictable, and consistent throughput performance.
  • For any production application that requires fast and consistent I/O performance, Amazon recommends Provisioned IOPS (input/output operations per second) storage.
  • Provisioned IOPS storage is optimized for I/O intensive, online transaction processing (OLTP) workloads that have consistent performance requirements.
  • Provisioned IOPS helps performance tuning.
  • Provisioned IOPS volumes can range in size from 100 GB to 6 TB for MySQL, MariaDB, PostgreSQL, and Oracle DB engines. SQL Server Express and Web editions can range in size from 100 GB to 4 TB, while SQL Server Standard and Enterprise editions can range in size from 200 GB to 4 TB.
  • Dedicated IOPS rate and storage space allocation is specified, when a DB instance is created. RDS provisions that IOPS rate and storage for the lifetime of the DB instance or until its changed.
  • RDS delivers within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year.

For detailed explanation on refer post @ EBS volume Types

Adding Storage and Changing Storage Type

  • DB instance can be modified to use additional storage and converted to a different storage type.
  • However, storage allocated for a DB instance cannot be decreased
  • MySQL, MariaDB, PostgreSQL, and Oracle DB instances can be scaled up for storage, which helps improve I/O capacity.
  • Storage capacity nor the type of storage for a SQL Server DB instance can be changed due to extensibility limitations of striped storage attached to a Windows Server environment.
  • During the scaling process, the DB instance will be available for reads and writes, but may experience performance degradation
  • Adding storage may take several hours; the duration of the process depends on several factors such as load, storage size, storage type, amount of IOPS provisioned (if any), and number of prior scale storage operations.
  • While storage is being added, nightly backups are suspended and no other Amazon RDS operations can take place, including modify, reboot, delete, create Read Replica, and create DB Snapshot

Performance Metrics

  • Amazon RDS provides several metrics that can be used to determine how the DB instance is performing.
    • IOPS
      • the number of I/O operations completed per second.
      • it is reported as the average IOPS for a given time interval.
      • RDS reports read and write IOPS separately on one minute intervals.
      • Total IOPS is the sum of the read and write IOPS.
      • Typical values for IOPS range from zero to tens of thousands per second.
    • Latency
      • the elapsed time between the submission of an I/O request and its completion
      • it is reported as the average latency for a given time interval.
      • RDS reports read and write latency separately on one minute intervals in units of seconds.
      • Typical values for latency are in the millisecond (ms)
    • Throughput
      • the number of bytes per second transferred to or from disk
      • it is reported as the average throughput for a given time interval.
      • RDS reports read and write throughput separately on one minute intervals using units of megabytes per second (MB/s).
      • Typical values for throughput range from zero to the I/O channel’s maximum bandwidth.
    • Queue Depth
      • the number of I/O requests in the queue waiting to be serviced.
      • these are I/O requests that have been submitted by the application but have not been sent to the device because the device is busy servicing other I/O requests.
      • it is reported as the average queue depth for a given time interval.
      • RDS reports queue depth in one minute intervals. Typical values for queue depth range from zero to several hundred.
      • Time spent waiting in the queue is a component of Latency and
        Service Time (not available as a metric).

Amazon RDS Storage Facts

  • First time a DB instance is started and accesses an area of disk for the first time, the process can take longer than all subsequent accesses to the same disk area. This is known as the “first touch penalty”. Once an area of disk has incurred the first touch penalty, that area of disk does not incur the penalty again for the life of the instance, even if the DB instance is rebooted, restarted, or the DB instance class changes. Note that a DB instance created from a snapshot, a point-in-time restore, or a read replica is a new instance and does incur this first touch penalty.
  • RDS manages the DB instance and it reserves overhead space on the instance. While the amount of reserved storage varies by DB instance class and other factors, this reserved space can be as much as one or two percent of the total storage
  • Provisioned IOPS provides a way to reserve I/O capacity by specifying IOPS. Like any other system capacity attribute, maximum throughput under load will be constrained by the resource that is consumed first, which could be IOPS, channel bandwidth, CPU, memory, or database internal resources.
  • Current maximum channel bandwidth available is 4000 megabits per second (Mbps) full duplex. In terms of the read and write throughput metrics, this equates to about 210 megabytes per second (MB/s) in each direction. A perfectly balanced workload of 50% reads and 50% writes may attain a maximum combined throughput of 420 MB/s, which includes protocol overhead, so the actual data throughput may be less.
  • Provisioned IOPS works with an I/O request size of 32 KB. Provisioned IOPS consumption is a linear function of I/O request size above 32 KB. An I/O request smaller than 32 KB is handled as one I/O; for e.g. 1000 16 KB I/O requests are treated the same as 1000 32 KB requests. I/O requests larger than 32 KB consume more than one I/O request; while, a 48 KB I/O request consumes 1.5 I/O requests of storage capacity; a 64 KB I/O request consumes 2 I/O requests

Factors That Impact Storage Performance

  • Several factors can affect the performance of a DB instance, such as instance configuration, I/O characteristics, and workload demand.
  • System related activities also consume I/O capacity and may reduce database instance performance while in progress:
    • DB snapshot creation
    • Nightly backups
    • Multi-AZ peer creation
    • Read replica creation
    • Scaling storage
  • System resources can constrain the throughput of a DB instance, but there can be other reasons for a bottleneck. Database could be the issue if :-
    • Channel throughput limit is not reached
    • Queue depths are consistently low
    • CPU utilization is under 80%
    • Free memory available
    • No swap activity
    • Plenty of free disk space
    • Application has dozens of threads all submitting transactions as fast as the database will take them, but there is clearly unused I/O capacity

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. When should I choose Provisioned IOPS over Standard RDS storage?
    1. If you have batch-oriented workloads
    2. If you use production online transaction processing (OLTP) workloads
    3. If you have workloads that are not sensitive to consistent performance
  2. Is decreasing the storage size of a DB Instance permitted?
    1. Depends on the RDMS used
    2. Yes
    3. No
  3. Because of the extensibility limitations of striped storage attached to Windows Server, Amazon RDS does not currently support increasing storage on a _____ DB Instance.
    1. SQL Server
    2. MySQL
    3. Oracle
  4. If I want to run a database in an Amazon instance, which is the most recommended Amazon storage option?
    1. Amazon Instance Storage
    2. Amazon EBS
    3. You can’t run a database inside an Amazon instance.
    4. Amazon S3
  5. For each DB Instance class, what is the maximum size of associated storage capacity?
    1. 1TB
    2. 2TB
    3. 500GB
    4. 6TB (Except SQL Server which is currently 4TB)

References

AWS EC2 Troubleshooting

EC2 Troubleshooting Instances

EC2 Troubleshooting An Instance Immediately Terminates

  • EBS volume limit was reached. Its a soft limit and can be increased by submitting a support request
  • EBS snapshot is corrupt.
  • Instance store-backed AMI used to launch the instance is missing a required part

EC2 Troubleshooting Connecting to Your Instance

  • Error connecting to your instance: Connection timed out
    • Route table, for the subnet, does not have a  route that sends all traffic destined outside the VPC to the Internet gateway for the VPC.
    • Security group does not allow inbound traffic from the public ip address on the proper port
    • ACL does not allow inbound traffic from and outbound traffic to the public ip address on the proper port
    • Private key used to connect does not match with key that corresponds to the key pair selected for the instance during the launch
    • Appropriate user name for the AMI is not used for e.g. user name for Amazon Linux AMI is ec2-user, Ubuntu AMI is ubuntu, RHEL5 AMI & SUSE Linux can be either root or ec2-user, Fedora AMI can be fedora or ec2-user
    • If connecting from an corporate network, the internal firewall does not
      allow inbound and outbound traffic on port 22 (for Linux instances) or port 3389 (for Windows instances).
    • Instance does not the same public IP address (public ip address changes during restarts). Associate an Elastic IP address with the instance
    • CPU load on the instance is high; the server may be overloaded.
  • User key not recognized by server
    • private key file used to connect has not been converted to the format as required by the server
  • Host key not found, Permission denied (publickey), or Authentication failed, permission denied
    • appropriate user name for the AMI is not used for connecting
    • proper private key file for the instance is not used
  • Unprotected Private Key File
    • private key file is not protected from read and write operations from any other users.
  • Server refused our key or No supported authentication methods available
    • appropriate user name for the AMI is not used for connecting

EC2 Troubleshooting Instances with Failed Status Checks

  • System Status Check – Checks Physical Hosts
    • Lost of Network connectivity
    • Loss of System power
    • Software issues on physical host
    • Hardware issues on physical host
    • Resolution
      • For Amazon EBS-backed AMI instance, stop and restart the instance
      • For Instance-store backed AMI, terminate the instance and launch a replacement.
  • Instance Status Check – Checks Instance or VM
    • Possible reasons
      • Misconfigured networking or startup configuration
      • Exhausted memory
      • Corrupted file system
      • Failed Amazon EBS volume or Physical drive
      • Incompatible kernel
    • Resolution
      • Rebooting of the Instance or making modifications in your Operating system, volumes

EC2 Troubleshooting Instance Capacity

  • InsufficientInstanceCapacity
    • AWS does not currently have enough available capacity to service your request
  • InstanceLimitExceeded
    • Concurrent running instance limit, default is 20, has been reached.

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user has launched an EC2 instance. The instance got terminated as soon as it was launched. Which of the below mentioned options is not a possible reason for this?
    1. The user account has reached the maximum EC2 instance limit
    2. The snapshot is corrupt
    3. The AMI is missing. It is the required part
    4. The user account has reached the maximum volume limit
  2. If you’re unable to connect via SSH to your EC2 instance, which of the following should you check and possibly correct to restore connectivity?
    1. Adjust Security Group to permit egress traffic over TCP port 443 from your IP.
    2. Configure the IAM role to permit changes to security group settings.
    3. Modify the instance security group to allow ingress of ICMP packets from your IP.
    4. Adjust the instance’s Security Group to permit ingress traffic over port 22 from your IP
    5. Apply the most recently released Operating System security patches.
  3. You try to connect via SSH to a newly created Amazon EC2 instance and get one of the following error messages: “Network error: Connection timed out” or “Error connecting to [instance], reason: -> Connection timed out: connect,” You have confirmed that the network and security group rules are configured correctly and the instance is passing status checks. What steps should you take to identify the source of the behavior? Choose 2 answers
    1. Verify that the private key file corresponds to the Amazon EC2 key pair assigned at launch.
    2. Verify that your IAM user policy has permission to launch Amazon EC2 instances.
    3. Verify that you are connecting with the appropriate user name for your AMI.
    4. Verify that the Amazon EC2 Instance was launched with the proper IAM role.
    5. Verify that your federation trust to AWS has been established.
  4. A user has launched an EBS backed EC2 instance in the us-east-1a region. The user stopped the instance and started it back after 20 days. AWS throws up an ‘Insufficient Instance Capacity’ error. What can be the possible reason for this?
    1. AWS does not have sufficient capacity in that availability zone
    2. AWS zone mapping is changed for that user account
    3. There is some issue with the host capacity on which the instance is launched
    4. The user account has reached the maximum EC2 instance limit
  5. A user is trying to connect to a running EC2 instance using SSH. However, the user gets an Unprotected Private Key File error. Which of the below mentioned options can be a possible reason for rejection?
    1. The private key file has the wrong file permission
    2. The ppk file used for SSH is read only
    3. The public key file has the wrong permission
    4. The user has provided the wrong user name for the OS login
  6. A user has launched an EC2 instance. However, due to some reason the instance was terminated. If the user wants to find out the reason for termination, where can he find the details?
    1. It is not possible to find the details after the instance is terminated
    2. The user can get information from the AWS console, by checking the Instance description under the State transition reason label
    3. The user can get information from the AWS console, by checking the Instance description under the Instance Status Change reason label
    4. The user can get information from the AWS console, by checking the Instance description under the Instance Termination reason label
  7. You have a Linux EC2 web server instance running inside a VPC. The instance is in a public subnet and has an EIP associated with it so you can connect to it over the Internet via HTTP or SSH. The instance was also fully accessible when you last logged in via SSH and was also serving web requests on port 80. Now you are not able to SSH into the host nor does it respond to web requests on port 80, that were working fine last time you checked. You have double-checked that all networking configuration parameters (security groups route tables, IGW, EIP. NACLs etc.) are properly configured and you haven’t made any changes to those anyway since you were last able to reach the Instance). You look at the EC2 console and notice that system status check shows “impaired.” Which should be your next step in troubleshooting and attempting to get the instance back to a healthy state so that you can log in again?
    1. Stop and start the instance so that it will be able to be redeployed on a healthy host system that most likely will fix the “impaired” system status (for system status check impaired status you need Stop Start for EBS and terminate and relaunch for Instance store)
    2. Reboot your instance so that the operating system will have a chance to boot in a clean healthy state that most likely will fix the ‘impaired” system status
    3. Add another dynamic private IP address to me instance and try to connect via that new path, since the networking stack of the OS may be locked up causing the “impaired” system status.
    4. Add another Elastic Network Interface to the instance and try to connect via that new path since the networking stack of the OS may be locked up causing the “impaired” system status
    5. un-map and then re-map the EIP to the instance, since the IGW/NAT gateway may not be working properly, causing the “impaired” system status
  8. A user is trying to connect to a running EC2 instance using SSH. However, the user gets a connection time out error. Which of the below mentioned options is not a possible reason for rejection?
    1. The access key to connect to the instance is wrong
    2. The security group is not configured properly
    3. The private key used to launch the instance is not correct
    4. The instance CPU is heavily loaded
  9. A user is trying to connect to a running EC2 instance using SSH. However, the user gets a Host key not found error. Which of the below mentioned options is a possible reason for rejection?
    1. The user has provided the wrong user name for the OS login
    2. The instance CPU is heavily loaded
    3. The security group is not configured properly
    4. The access key to connect to the instance is wrong

AWS RDS Replication – Multi-AZ & Read Replica

RDS Replication Overview

  • DB instances replicas can be created in two ways:
  • Multi-AZ deployment
    • Amazon RDS automatically provisions and manages a synchronous standby replica in a different AZ (independent infrastructure in a physically separate location).
    • Multi-AZ deployment provides high availability and failover support
    • In the event of planned database maintenance, DB instance failure, or an Availability Zone failure, Amazon RDS will automatically failover to the standby so that database operations can resume quickly without administrative intervention.
  • Read Replica
    • Amazon RDS also uses the PostgreSQL, MySQL, and MariaDB DB engines’ built-in replication functionality to create a special type of DB instance called a Read Replica from a source DB instance.
    • Load on the source DB instance can be reduced by routing read queries from applications to the Read Replica.
    • Read Replicas allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads

Multi-AZ deployment

  • Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments.
  • Multi-AZ deployments for Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Mirroring.
  • In a Multi-AZ deployment, RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone.
  • Primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups.
  • Running a DB instance with high availability can enhance availability during planned system maintenance, and also to help protect the databases against DB instance failure and Availability Zone disruption.
  • High-availability feature is not a scaling solution for read-only scenarios; standby replica can’t be used to serve read traffic.
  • Multi-AZ deployment to improve the durability and availability of a critical system, but cannot use the Multi-AZ secondary to serve read-only queries. To service read-only traffic, use a Read Replica.
  • Multi-AZ deployments store copies of your data in different Availability Zones for greater levels of data durability.
  • When using the BYOL licensing model, a license for both the primary instance and the standby replica is required
  • DB instances using Multi-AZ deployments may have increased write and commit latency compared to a Single-AZ deployment, due to the synchronous data replication that occurs.
  • There might be a change in latency if the deployment fails over to the standby replica, although AWS is engineered with low-latency network connectivity between Availability Zones.
  • For production workloads, it is recommended to use Provisioned IOPS and DB instance classes (m1.large and larger), optimized for Provisioned IOPS for fast, consistent performance.
  • When Single-AZ deployment is modified to a Multi-AZ deployment (for engines other than SQL Server or Amazon Aurora)
    • RDS takes a snapshot of the primary DB instance from the deployment and restores the snapshot into another Availability Zone.
    • RDS then sets up synchronous replication between the primary DB instance and the new instance.
    • This avoids downtime when conversion from Single-AZ to Multi-AZ happens

RDS Multi-AZ Failover Process

  • In the event of a planned or unplanned outage of the DB instance,
    • RDS automatically switches to a standby replica in another AZ, if enabled for Multi-AZ.
    • Time it takes for the failover to complete depends on the database activity and other conditions at the time the primary DB instance became unavailable.
    • Failover times are typically 60-120 secs. However, large transactions or a lengthy recovery process can increase failover time.
    • Failover mechanism automatically changes the DNS record of the DB instance to point to the standby DB instance.
    • There is no change in the endpoint URLs used by the applications but needs to re-establish any existing connections to your DB instance.
  • RDS handles failovers automatically so that database operations can be resumed as quickly as possible without administrative intervention.
  • Primary DB instance switches over automatically to the standby replica if any of the following conditions occur:
    • An Availability Zone outage
    • Primary DB instance fails
    • DB instance’s server type is changed
    • Operating system of the DB instance is undergoing software patching
    • A manual failover of the DB instance was initiated using Reboot with failover (also referred to as forced failover)
  • If the Multi-AZ DB instance has failed over can be determined by
    • DB event subscriptions can be setup to notify you via email or SMS that a failover has been initiated.
    • DB events can be viewed via the Amazon RDS console or APIs.
    • Current state of your Multi-AZ deployment can be viewed via the Amazon RDS console and APIs.

Read Replica

  • Amazon RDS uses the MySQL, MariaDB, and PostgreSQL (version 9.3.5 and later) DB engines’ built-in replication functionality to create a Read Replica from a source DB instance.
  • Updates made to the source DB instance are asynchronously copied to the Read Replica.
  • Load on the source DB instance can be reduced by routing read queries from the applications to the Read Replica.
  • Using Read Replicas allow DB to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads.
  • Read Replica operates as a DB instance that allows only read-only connections; applications can connect to a Read Replica the same way they would to any DB instance.

Read Replica creation

  • Up to five Read Replicas can be created from one source DB instance.
  • Creation process
    • Automatic backups must be enabled on the source DB instance by setting the backup retention period to a value other than 0
    • Existing DB instance needs to be specified as the source.
    • RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot.
    • RDS then uses the asynchronous replication method for the DB engine to update the Read Replica any changes to the source DB instance.
  • RDS replicates all databases in the source DB instance.
  • RDS sets up a secure communications channel between the source DB instance and a Read Replica if that Read Replica is in a different AWS region from the DB instance.
  • RDS establishes any AWS security configurations, such as adding security group entries, needed to enable the secure channel.
  • During the Read Replica creation, a brief I/O suspension on the source DB instance can be experienced as the DB snapshot occurs.
  • I/O suspension typically lasts about one minute and can be avoided if the source DB instance is a Multi-AZ deployment (in the case of Multi-AZ deployments, DB snapshots are taken from the standby).
  • While creating a Read Replica wait for long-running transactions to complete as an active, long-running transaction can slow the process of Read Replica creation
  • For multiple Read Replicas created in parallel from the same source DB instance, only one snapshot is taken at the start of the first create action.
  • Promoting a Read Replica would create a new source DB instance and the replication is stopped from the source DB. However, the replication continues for other replicas using it as replication source

Read Replica Deletion & DB Failover

  • Read Replicas must be explicitly deleted, using the same mechanisms for deleting a DB instance.
  • If the source DB instance is deleted without deleting the replicas, each replica is promoted to a stand-alone, single-AZ DB instance.
  • If the source instance of a Multi-AZ deployment fails over to the secondary, any associated Read Replicas are switched to use the secondary as their replication source.

Read Replica Storage & Compute requirements

  • A Read Replica, by default, is created with the same storage type as the source DB instance.
  • For replication to operate effectively, each Read Replica should have the same amount of compute & storage resources as the source DB instance.
  • Source DB instance, if scaled, Read Replicas should be scaled accordingly

Read Replica Features & Limitations

  • RDS does not support circular replication.
  • DB instance cannot be configured to serve as a replication source for an existing DB instance; a new Read Replica can be created only from an existing DB instance for e.g., if MyDBInstance replicates to ReadReplica1, ReadReplica1 can’t be configured to replicate back to MyDBInstance.  From ReadReplica1, only a new Read Replica can be created, such as ReadRep2.
  • Cross-Region Replication
    • MySQL or MariaDB Read Replica can be created in a different region than the source DB instance to improve your disaster recovery capabilities, scale read operations into a region closer to end users, or to ease migration from a data center in one region to another region
    • PostgreSQL Read Replicas can be created only in the same region as the source DB instance (Update June 2016 – PostgresSQL also supports Cross Region replication)
  • Read Replica running any version of MariaDB or MySQL 5.6 can specify it as the source DB instance for another Read Replica. However, the replica lag is higher for these instances and there cannot be more than four instances involved in a replication chain.

Read Replica DifferenceRead Replica Use cases

  • Read Replicas can be used in variety of use cases, including:
    • Scaling beyond the compute or I/O capacity of a single DB instance for read-heavy database workloads, directing excess read traffic to Read Replica(s)
    • Serving read traffic while the source DB instance is unavailable for e.g. If your source DB instance cannot take I/O requests due to backups I/O suspension or scheduled maintenance, the read traffic can be directed to the Read Replica(s). However, the data might be stale.
    • Business reporting or data warehousing scenarios where business reporting queries can be executed against a Read Replica, rather than the primary, production DB instance.

Sample Exam Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are running a successful multi-tier web application on AWS and your marketing department has asked you to add a reporting tier to the application. The reporting tier will aggregate and publish status reports every 30 minutes from user-generated information that is being stored in your web applications database. You are currently running a Multi-AZ RDS MySQL instance for the database tier. You also have implemented ElastiCache as a database caching layer between the application tier and database tier. Please select the answer that will allow you to successfully implement the reporting tier with as little impact as possible to your database.
    1. Continually send transaction logs from your master database to an S3 bucket and generate the reports off the S3 bucket using S3 byte range requests.
    2. Generate the reports by querying the synchronously replicated standby RDS MySQL instance maintained through Multi-AZ
    3. Launch a RDS Read Replica connected to your Multi AZ master database and generate reports by querying the Read Replica.
    4. Generate the reports by querying the ElastiCache database caching tier.
  2. A company is deploying a new two-tier web application in AWS. The company has limited staff and requires high availability, and the application requires complex queries and table joins. Which configuration provides the solution for the company’s requirements?
    1. MySQL Installed on two Amazon EC2 Instances in a single Availability Zone
    2. Amazon RDS for MySQL with Multi-AZ
    3. Amazon ElastiCache
    4. Amazon DynamoDB
  3. Your company is getting ready to do a major public announcement of a social media site on AWS. The website is running on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers)
    1. Deploy ElasticCache in-memory cache running in each availability zone
    2. Implement sharding to distribute load to multiple RDS MySQL instances
    3. Increase the RDS MySQL Instance size and Implement provisioned IOPS
    4. Add an RDS MySQL read replica in each availability zone
  4. Your company has HQ in Tokyo and branch offices all over the world and is using logistics software with a multi-regional deployment on AWS in Japan, Europe and US .The logistic software has a 3-tier architecture and currently uses MySQL 5.6 for data persistence. Each region has deployed its own database. In the HQ region you run an hourly batch process reading data from every region to compute cross-regional reports that are sent by email to all offices this batch process must be completed as fast as possible to quickly optimize logistics how do you build the database architecture in order to meet the requirements?
    1. For each regional deployment, use RDS MySQL with a master in the region and a read replica in the HQ region
    2. For each regional deployment, use MySQL on EC2 with a master in the region and send hourly EBS snapshots to the HQ region
    3. For each regional deployment, use RDS MySQL with a master in the region and send hourly RDS snapshots to the HQ region
    4. For each regional deployment, use MySQL on EC2 with a master in the region and use S3 to copy data files hourly to the HQ region
    5. Use Direct Connect to connect all regional MySQL deployments to the HQ region and reduce network latency for the batch process
  5. What would happen to an RDS (Relational Database Service) multi-Availability Zone deployment if the primary DB instance fails?
    1. The IP of the primary DB Instance is switched to the standby DB Instance.
    2. A new DB instance is created in the standby availability zone.
    3. The canonical name record (CNAME) is changed from primary to standby.
    4. The RDS (Relational Database Service) DB instance reboots.
  6. Your business is building a new application that will store its entire customer database on a RDS MySQL database, and will have various applications and users that will query that data for different purposes. Large analytics jobs on the database are likely to cause other applications to not be able to get the query results they need to, before time out. Also, as your data grows, these analytics jobs will start to take more time, increasing the negative effect on the other applications. How do you solve the contention issues between these different workloads on the same data?
    1. Enable Multi-AZ mode on the RDS instance
    2. Use ElastiCache to offload the analytics job data
    3. Create RDS Read-Replicas for the analytics work
    4. Run the RDS instance on the largest size possible
  7. Will my standby RDS instance be in the same Availability Zone as my primary?
    1. Only for Oracle RDS types
    2. Yes
    3. Only if configured at launch
    4. No
  8. Is creating a Read Replica of another Read Replica supported?
    1. Only in certain regions
    2. Only with MySQL based RDS
    3. Only for Oracle RDS types
    4. No
  9. A user is planning to set up the Multi AZ feature of RDS. Which of the below mentioned conditions won’t take advantage of the Multi AZ feature?
    1. Availability zone outage
    2. A manual failover of the DB instance using Reboot with failover option
    3. Region outage
    4. When the user changes the DB instance’s server type
  10. When you run a DB Instance as a Multi-AZ deployment, the “_____” serves database writes and reads
    1. secondary
    2. backup
    3. stand by
    4. primary
  11. When running my DB Instance as a Multi-AZ deployment, can I use the standby for read or write operations?
    1. Yes
    2. Only with MSSQL based RDS
    3. Only for Oracle RDS instances
    4. No
  12. Read Replicas require a transactional storage engine and are only supported for the _________ storage engine
    1. OracleISAM
    2. MSSQLDB
    3. InnoDB
    4. MyISAM
  13. A user is configuring the Multi AZ feature of an RDS DB. The user came to know that this RDS DB does not use the AWS technology, but uses server mirroring to achieve replication. Which DB is the user using right now?
    1. My SQL
    2. Oracle
    3. MS SQL
    4. PostgreSQL
  14. If I have multiple Read Replicas for my master DB Instance and I promote one of them, what happens to the rest of the Read Replicas?
    1. The remaining Read Replicas will still replicate from the older master DB Instance
    2. The remaining Read Replicas will be deleted
    3. The remaining Read Replicas will be combined to one read replica
  15. If you have chosen Multi-AZ deployment, in the event of a planned or unplanned outage of your primary DB Instance, Amazon RDS automatically switches to the standby replica. The automatic failover mechanism simply changes the ______ record of the main DB Instance to point to the standby DB Instance.
    1. DNAME
    2. CNAME
    3. TXT
    4. MX
  16. When automatic failover occurs, Amazon RDS will emit a DB Instance event to inform you that automatic failover occurred. You can use the _____ to return information about events related to your DB Instance
    1. FetchFailure
    2. DescriveFailure
    3. DescribeEvents
    4. FetchEvents
  17. The new DB Instance that is created when you promote a Read Replica retains the backup window period.
    1. TRUE
    2. FALSE
  18. Will I be alerted when automatic failover occurs?
    1. Only if SNS configured
    2. No
    3. Yes
    4. Only if Cloudwatch configured
  19. Can I initiate a “forced failover” for my MySQL Multi-AZ DB Instance deployment?
    1. Only in certain regions
    2. Only in VPC
    3. Yes
    4. No
  20. A user is accessing RDS from an application. The user has enabled the Multi AZ feature with the MS SQL RDS DB. During a planned outage how will AWS ensure that a switch from DB to a standby replica will not affect access to the application?
    1. RDS will have an internal IP which will redirect all requests to the new DB
    2. RDS uses DNS to switch over to standby replica for seamless transition
    3. The switch over changes Hardware so RDS does not need to worry about access
    4. RDS will have both the DBs running independently and the user has to manually switch over
  21. Which of the following is part of the failover process for a Multi-Availability Zone Amazon Relational Database Service (RDS) instance?
    1. The failed RDS DB instance reboots.
    2. The IP of the primary DB instance is switched to the standby DB instance.
    3. The DNS record for the RDS endpoint is changed from primary to standby.
    4. A new DB instance is created in the standby availability zone.