Kubernetes Overview

December 6, 2022 ~ Last updated on : July 11, 2023 ~ jayendrapatil

Kubernetes Overview

Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.

Kubernetes originates from Greek, meaning helmsman or pilot.
Kubernetes provides an orchestration framework to run distributed systems resiliently. It takes care of scaling and failover for the application, provides deployment patterns, and more.

Container Deployment Model

Deployment evolution

Containers are similar to VMs, but they have relaxed isolation properties to share the Operating System (OS) among the applications.
Containers are lightweight and have their own filesystem, share of CPU, memory, process space, and more.

Containers are decoupled from the underlying infrastructure, they are portable across clouds and OS distributions.
Containers provide the following benefits
- Agile application creation and deployment
- Continuous development, integration, and deployment
- Dev and Ops separation of concerns
- Observability
- Environmental consistency across development, testing, and production
- Cloud and OS distribution portability
- Application-centric management
- Loosely coupled, distributed, elastic, liberated micro-services
- Resource isolation & utilization

Kubernetes Features

Service discovery and load balancing
- Kubernetes can expose a container using the DNS name or using their own IP address.
- If traffic to a container is high, Kubernetes is able to load balance and distribute the network traffic so that the deployment is stable.
Storage orchestration
- Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and more.
Automated rollouts and rollbacks
- Kubernetes can change the actual state of the deployed containers to the desired state at a controlled rate ensuring zero downtime.

Automatic bin packing
- Kubernetes can fit containers onto the available nodes to make the best use of the resources as per the specified container specification.
Self-healing & High Availability
- Kubernetes restarts containers that fail, replaces containers, kills containers that don’t respond to the user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
Scalability
- Kubernetes can help scale the application as per the load.

Secret and configuration management
- Kubernetes helps store and manage sensitive information, such as passwords, OAuth tokens, and SSH keys.
- Secrets and application configuration can be deployed without rebuilding the container images, and without exposing secrets in the stack configuration.

Kubernetes Architecture

Refer to detailed blog post @ Kubernetes Architecture

Kubernetes Architecture Master components

Master components provide the cluster’s control plane.

Master components make global decisions about the cluster (for example, scheduling), and that they detect and answer cluster events (for example, beginning a replacement pod when a deployment’s replicas field is unsatisfied).
Master components include
- Kube-API server – Exposes the API.
- Etcd – key-value stores all cluster data. (Can be run on the same server as a master node or on a dedicated cluster.)
- Kube-scheduler – Schedules new pods on worker nodes.
- Kube-controller-manager – Runs the controllers.
- Cloud-controller-manager – Talks to cloud providers.

Node components

Node components run on every node, maintaining running pods and providing the Kubernetes runtime environment.
- Kubelet – Agent that ensures containers in a pod are running.
- Kube-proxy – Keeps network rules and performs forwarding.
- Container runtime – Runs containers.

Kubernetes Components

Refer to blog post @ Kubernetes Components

Kubernetes Security

Refer to blog post @ Kubernetes Security

AWS ElastiCache

December 5, 2022 ~ Last updated on : December 6, 2022 ~ jayendrapatil ~ 13 Comments

AWS ElastiCache

AWS ElastiCache is a managed web service that helps deploy and run Memcached or Redis protocol-compliant cache clusters in the cloud easily.

ElastiCache is available in two flavours: Memcached and Redis
ElastiCache helps
- simplify and offload the management, monitoring, and operation of in-memory cache environments, enabling the engineering resources to focus on developing applications.
- automate common administrative tasks required to operate a distributed cache environment.
- improves the performance of web applications by allowing retrieval of information from a fast, managed, in-memory caching system, instead of relying entirely on slower disk-based databases.
- helps improve load & response times to user actions and queries, but also reduces the cost associated with scaling web applications.
- helps automatically detect and replace failed cache nodes, providing a resilient system that mitigates the risk of overloaded databases, which can slow website and application load times.
- provides enhanced visibility into key performance metrics associated with the cache nodes through integration with CloudWatch.
- code, applications, and popular tools already using Memcached or Redis environments work seamlessly, with being protocol-compliant with Memcached and Redis environments
ElastiCache provides in-memory caching which can
- significantly lower latency and improve throughput for many
  - read-heavy application workloads e.g. social networking, gaming, media sharing, and Q&A portals.
  - compute-intensive workloads such as a recommendation engine.
- improve application performance by storing critical pieces of data in memory for low-latency access.
- be used to cache the results of I/O-intensive database queries or the results of computationally-intensive calculations.
ElastiCache currently allows access only from the EC2 network and cannot be accessed from outside networks like on-premises servers.

ElastiCache Redis vs Memcached

AWS ElastiCache Redis vs Memcached

Redis

Redis is an open source, BSD licensed, advanced key-value cache & store.
ElastiCache enables the management, monitoring, and operation of a Redis node; creation, deletion, and modification of the node.
ElastiCache for Redis can be used as a primary in-memory key-value data store, providing fast, sub-millisecond data performance, high availability and scalability up to 16 nodes plus up to 5 read replicas, each of up to 3.55 TiB of in-memory data.

ElastiCache for Redis supports (similar to RDS features)
- Redis Master/Slave replication.
- Multi-AZ operation by creating read replicas in another AZ
- Backup and Restore feature for persistence using snapshots
ElastiCache for Redis can be vertically scaled upwards by selecting a larger node type or by adding shards (with cluster mode enabled).
Parameter group can be specified for Redis during installation, which acts as a “container” for Redis configuration values that can be applied to one or more Redis primary clusters.

Append Only File – AOF
- provides persistence and can be enabled for recovery scenarios.
- if a node restarts or service crashes, Redis will replay the updates from an AOF file, thereby recovering the data lost due to the restart or crash.
- cannot protect against all failure scenarios, cause if the underlying hardware fails, a new server would be provisioned and the AOF file will no longer be available to recover the data.
ElastiCache for Redis doesn’t support the AOF feature but you can achieve persistence by snapshotting the Redis data using the Backup and Restore feature.
Enabling Redis Multi-AZ is a Better Approach to Fault Tolerance, as failing over to a read replica is much faster than rebuilding the primary from an AOF file.

Redis Features

High Availability, Fault Tolerance & Auto Recovery
- Multi-AZ for a failed primary cluster to a read replica, in Redis clusters that support replication.
- Fault Tolerance – Flexible AZ placement of nodes and clusters
- High Availability – Primary instance and a synchronous secondary instance to fail over when problems occur. You can also use read replicas to increase read scaling.
- Auto-Recovery – Automatic detection of and recovery from cache node failures.
- Backup & Restore – Automated backups or manual snapshots can be performed. Redis restore process works reliably and efficiently.

Performance
- Data Partitioning – Redis (cluster mode enabled) supports partitioning the data across up to 500 shards.
- Data Tiering – Provides a price-performance option for Redis workloads by utilizing lower-cost solid state drives (SSDs) in each cluster node in addition to storing data in memory. It is ideal for workloads that access up to 20% of their overall dataset regularly, and for applications that can tolerate additional latency when accessing data on SSD.

Security
- Encryption – Supports encryption in transit and encryption at rest encryption with authentication. This support helps you build HIPAA-compliant applications.
- Access Control – Control access to the ElastiCache for Redis clusters by using AWS IAM to define users and permissions.
- Supports Redis AUTH or Managed Role-Based Access Control (RBAC).
Administration
- Low Administration – ElastiCache for Redis manages backups, software patching, automatic failure detection, and recovery.
- Integration with other AWS services such as EC2, CloudWatch, CloudTrail, and SNS.
- Global Datastore for Redis feature provides a fully managed, fast, reliable, and secure replication across AWS Regions. Cross-Region read replica clusters for ElastiCache for Redis can be created to enable low-latency reads and disaster recovery across AWS Regions.

Redis Read Replica

Read Replicas help provide Read scaling and handling failures

Read Replicas are kept in sync with the Primary node using Redis’s asynchronous replication technology
Redis Read Replicas provides
- Horizontal scaling beyond the compute or I/O capacity of a single primary node for read-heavy workloads.
- Serving read traffic while the primary is unavailable either being down due to failure or maintenance
- Data protection scenarios to promote a Read Replica as the primary node, in case the primary node or the AZ of the primary node fails.
ElastiCache supports initiated or forced failover where it flips the DNS record for the primary node to point at the read replica, which is in turn promoted to become the new primary.

Read replica cannot span across regions and may only be provisioned in the same or different AZ of the same Region as the cache node primary.

Redis Multi-AZ

ElastiCache for Redis shard consists of a primary and up to 5 read replicas
Redis asynchronously replicates the data from the primary node to the read replicas

ElastiCache for Redis Multi-AZ mode
- provides enhanced availability and a smaller need for administration as the node failover is automatic.
- impact on the ability to read/write to the primary is limited to the time it takes for automatic failover to complete.
- no longer needs monitoring of Redis nodes and manually initiating a recovery in the event of a primary node disruption.
During certain types of planned maintenance, or in the unlikely event of ElastiCache node failure or AZ failure,
- it automatically detects the failure,
- selects a replica, depending upon the read replica with the smallest asynchronous replication lag to the primary, and promotes it to become the new primary node
- it will also propagate the DNS changes so that the primary endpoint remains the same
If Multi-AZ is not enabled,
- ElastiCache monitors the primary node.
- in case the node becomes unavailable or unresponsive, it will repair the node by acquiring new service resources.
- it propagates the DNS endpoint changes to redirect the node’s existing DNS name to point to the new service resources.
- If the primary node cannot be healed and you will have the choice to promote one of the read replicas to be the new primary.

Redis Backup & Restore

Backup and Restore allow users to create snapshots of the Redis clusters.
Snapshots can be used for recovery, restoration, archiving purposes, or warm start an ElastiCache for Redis cluster with preloaded data

Snapshots can be created on a cluster basis and use Redis’ native mechanism to create and store an RDB file as the snapshot.
Increased latencies for a brief period at the node might be encountered while taking a snapshot and is recommended to be taken from a Read Replica minimizing performance impact
Snapshots can be created either automatically (if configured) or manually

ElastiCache for Redis cluster when deleted removes the automatic snapshots. However, manual snapshots are retained.

Redis Cluster Mode

ElastiCache Redis provides the ability to create distinct types of Redis clusters

A Redis (cluster mode disabled) cluster
- always has a single shard with up to 5 read replica nodes.
A Redis (cluster mode enabled) cluster
- has up to 500 shards with 1 to 5 read replica nodes in each.

ElastiCache Redis Cluster Mode

Scaling vs Partitioning
- Redis (cluster mode disabled) supports Horizontal scaling for read capacity by adding or deleting replica nodes, or vertical scaling by scaling up to a larger node type.
- Redis (cluster mode enabled) supports partitioning the data across up to 500 node groups. The number of shards can be changed dynamically as the demand changes. It also helps spread the load over a greater number of endpoints, which reduces access bottlenecks during peak demand.
Node Size vs Number of Nodes
- Redis (cluster mode disabled) cluster has only one shard and the node type must be large enough to accommodate all the cluster’s data plus necessary overhead.
- Redis (cluster mode enabled) cluster can have smaller node types as the data can be spread across partitions.
Reads vs Writes
- Redis (cluster mode disabled) cluster can be scaled for reads by adding more read replicas (5 max)
- Redis (cluster mode disabled) cluster can be scaled for both reads and writes by adding read replicas and multiple shards.

Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data.
ElastiCache for Memcached can be used to cache a variety of objects
- from the content in persistent data stores such as RDS, DynamoDB, or self-managed databases hosted on EC2)
- dynamically generated web pages e.g. with Nginx
- transient session data that may not require a persistent backing store

ElastiCache for Memcached
- can be scaled Vertically by increasing the node type size
- can be scaled Horizontally by adding and removing nodes
- does not support the persistence of data
ElastiCache for Memcached cluster can have
- nodes that can span across multiple AZs within the same region
- maximum of 20 nodes per cluster with a maximum of 100 nodes per region (soft limit and can be extended).
ElastiCache for Memcached supports auto-discovery, which enables the automatic discovery of cache nodes by clients when they are added to or removed from an ElastiCache cluster.

ElastiCache Mitigating Failures

ElastiCache should be designed to plan so that failures have a minimal impact on the application and data.

Mitigating Failures when Running Memcached
- Mitigating Node Failures
  - spread the cached data over more nodes
  - as Memcached does not support replication, a node failure will always result in some data loss from the cluster
  - having more nodes will reduce the proportion of cache data lost
- Mitigating Availability Zone Failures
  - locate the nodes in as many availability zones as possible, only the data cached in that AZ is lost, not the data cached in the other AZs
Mitigating Failures when Running Redis
- Mitigating Cluster Failures
  - Redis Append Only Files (AOF)
    - enable AOF so whenever data is written to the Redis cluster, a corresponding transaction record is written to a Redis AOF.
    - when Redis process restarts, ElastiCache creates a replacement cluster and provisions it and repopulates it with data from AOF.
    - It is time-consuming
    - AOF can get big.
    - Using AOF cannot protect you from all failure scenarios.
  - Redis Replication Groups
    - A Redis replication group is comprised of a single primary cluster which the application can both read from and write to, and from 1 to 5 read-only replica clusters.
    - Data written to the primary cluster is also asynchronously updated on the read replica clusters.
    - When a Read Replica fails, ElastiCache detects the failure, replaces the instance in the same AZ, and synchronizes with the Primary Cluster.
    - Redis Multi-AZ with Automatic Failover, ElastiCache detects Primary cluster failure and promotes a read replica with the least replication lag to primary.
    - Multi-AZ with Auto Failover is disabled, ElastiCache detects Primary cluster failure, creates a new one and syncs the new Primary with one of the existing replicas.
- Mitigating Availability Zone Failures
  - locate the clusters in as many availability zones as possible

AWS Certification Exam Practice Questions

What does Amazon ElastiCache provide?
1. A service by this name doesn’t exist. Perhaps you mean Amazon CloudCache.
2. A virtual server with a huge amount of memory.
3. A managed In-memory cache service
4. An Amazon EC2 instance with the Memcached software already pre-installed.
You are developing a highly available web application using stateless web servers. Which services are suitable for storing session state data? Choose 3 answers.
1. Elastic Load Balancing
2. Amazon Relational Database Service (RDS)
3. Amazon CloudWatch
4. Amazon ElastiCache
5. Amazon DynamoDB
6. AWS Storage Gateway
Which statement best describes ElastiCache?
1. Reduces the latency by splitting the workload across multiple AZs
2. A simple web services interface to create and store multiple data sets, query your data easily, and return the results
3. Offload the read traffic from your database in order to reduce latency caused by read-heavy workload
4. Managed service that makes it easy to set up, operate and scale a relational database in the cloud
Our company is getting ready to do a major public announcement of a social media site on AWS. The website is running on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers)
1. Deploy ElastiCache in-memory cache running in each availability zone
2. Implement sharding to distribute load to multiple RDS MySQL instances
3. Increase the RDS MySQL Instance size and Implement provisioned IOPS
4. Add an RDS MySQL read replica in each availability zone

You are using ElastiCache Memcached to store session state and cache database queries in your infrastructure. You notice in CloudWatch that Evictions and Get Misses are both very high. What two actions could you take to rectify this? Choose 2 answers
1. Increase the number of nodes in your cluster
2. Tweak the max_item_size parameter
3. Shrink the number of nodes in your cluster
4. Increase the size of the nodes in the cluster
You have been tasked with moving an ecommerce web application from a customer’s datacenter into a VPC. The application must be fault tolerant and well as highly scalable. Moreover, the customer is adamant that service interruptions not affect the user experience. As you near launch, you discover that the application currently uses multicast to share session state between web servers, In order to handle session state within the VPC, you choose to:
1. Store session state in Amazon ElastiCache for Redis (scalable and makes the web applications stateless)
2. Create a mesh VPN between instances and allow multicast on it
3. Store session state in Amazon Relational Database Service (RDS solution not highly scalable)
4. Enable session stickiness via Elastic Load Balancing (affects user experience if the instance goes down)
When you are designing to support a 24-hour flash sale, which one of the following methods best describes a strategy to lower the latency while keeping up with unusually heavy traffic?
1. Launch enhanced networking instances in a placement group to support the heavy traffic (only improves internal communication)
2. Apply Service Oriented Architecture (SOA) principles instead of a 3-tier architecture (just simplifies architecture)
3. Use Elastic Beanstalk to enable blue-green deployment (only minimizes download for applications and ease of rollback)
4. Use ElastiCache as in-memory storage on top of DynamoDB to store user sessions (scalable, faster read/writes and in memory storage)
You are configuring your company’s application to use Auto Scaling and need to move user state information. Which of the following AWS services provides a shared data store with durability and low latency?
1. AWS ElastiCache Memcached (does not provide durability as if the node is gone the data is gone)
2. Amazon Simple Storage Service
3. Amazon EC2 instance storage
4. Amazon DynamoDB
Your application is using an ELB in front of an Auto Scaling group of web/application servers deployed across two AZs and a Multi-AZ RDS Instance for data persistence. The database CPU is often above 80% usage and 90% of I/O operations on the database are reads. To improve performance you recently added a single-node Memcached ElastiCache Cluster to cache frequent DB query results. In the next weeks the overall workload is expected to grow by 30%. Do you need to change anything in the architecture to maintain the high availability for the application with the anticipated additional load and Why?
1. You should deploy two Memcached ElastiCache Clusters in different AZs because the RDS Instance will not be able to handle the load if the cache node fails.
2. If the cache node fails the automated ElastiCache node recovery feature will prevent any availability impact. (does not provide high availability, as data is lost if the node is lost)
3. Yes you should deploy the Memcached ElastiCache Cluster with two nodes in the same AZ as the RDS DB master instance to handle the load if one cache node fails. (Single AZ affects availability as DB is Multi AZ and would be overloaded is the AZ goes down)
4. No if the cache node fails you can always get the same data from the DB without having any availability impact. (Will overload the database affecting availability)
A read only news reporting site with a combined web and application tier and a database tier that receives large and unpredictable traffic demands must be able to respond to these traffic fluctuations automatically. What AWS services should be used meet these requirements?
1. Stateless instances for the web and application tier synchronized using ElastiCache Memcached in an autoscaling group monitored with CloudWatch and RDS with read replicas.
2. Stateful instances for the web and application tier in an autoscaling group monitored with CloudWatch and RDS with read replicas (Stateful instances will not allow for scaling)
3. Stateful instances for the web and application tier in an autoscaling group monitored with CloudWatch and multi-AZ RDS (Stateful instances will allow not for scaling & multi-AZ is for high availability and not scaling)
4. Stateless instances for the web and application tier synchronized using ElastiCache Memcached in an autoscaling group monitored with CloudWatch and multi-AZ RDS (multi-AZ is for high availability and not scaling)
You have written an application that uses the Elastic Load Balancing service to spread traffic to several web servers. Your users complain that they are sometimes forced to login again in the middle of using your application, after they have already logged in. This is not behavior you have designed. What is a possible solution to prevent this happening?
1. Use instance memory to save session state.
2. Use instance storage to save session state.
3. Use EBS to save session state.
4. Use ElastiCache to save session state.
5. Use Glacier to save session slate.

AWS DynamoDB Secondary Indexes

December 4, 2022 ~ Last updated on : June 30, 2023 ~ jayendrapatil ~ 14 Comments

AWS DynamoDB Secondary Indexes

DynamoDB provides fast access to items in a table by specifying primary key values

DynamoDB Secondary indexes on a table allow efficient access to data with attributes other than the primary key.
DynamoDB Secondary indexes
- is a data structure that contains a subset of attributes from a table.
- is associated with exactly one table, from which it obtains its data.
- requires an alternate key for the index partition key and sort key.
- additionally can define projected attributes that are copied from the base table into the index along with the primary key attributes.
- is automatically maintained by DynamoDB.
- indexes on that table are also updated for any addition, modification, or deletion of items in the base table.
- helps reduce the size of the data as compared to the main table, depending upon the project attributes, and hence helps improve provisioned throughput performance
- are automatically maintained as sparse objects. Items will only appear in an index if they exist in the table on which the index is defined, making queries an index very efficient
DynamoDB Secondary indexes support two types
- Global secondary index – an index with a partition key and a sort key that can be different from those on the base table.
- Local secondary index – an index that has the same partition key as the base table, but a different sort key.

Global Secondary Indexes – GSI

DynamoDB creates and maintains indexes for the primary key attributes for efficient access to data in the table, which allows applications to quickly retrieve data by specifying primary key values.

Global Secondary Indexes – GSI are indexes that contain partition or composite partition-and-sort keys that can be different from the keys in the table on which the index is based.
Global secondary index is considered “global” because queries on the index can span all items in a table, across all partitions.
Multiple secondary indexes can be created on a table, and queries issued against these indexes.

Applications benefit from having one or more secondary keys available to allow efficient access to data with attributes other than the primary key.
GSIs support non-unique attributes, which increases query flexibility by enabling queries against any non-key attribute in the table
GSIs support eventual consistency. DynamoDB automatically handles item additions, updates, and deletes in a GSI when corresponding changes are made to the table asynchronously

Data in a secondary index consists of GSI alternate key, primary key and attributes that are projected, or copied, from the table into the index.
Attributes that are part of an item in a table, but not part of the GSI key, the primary key of the table, or projected attributes are not returned on querying the GSI index.
GSIs manage throughput independently of the table they are based on and the provisioned throughput for the table and each associated GSI needs to be specified at the creation time.
- Read provisioned throughput
  - provides one Read Capacity Unit with two eventually consistent reads per second for items < 4KB in size.
  - provides one Write Capacity Unit with one write per second for items < 1KB in size.
- Write provisioned throughput
  - consumes 1 write capacity unit if,
    - a new item is inserted into the table
    - existing item is deleted from the table
    - existing items are updated for projected attributes
  - consumes 2 write capacity units if
    - existing item is updated for key attributes, which results in deletion and addition of the new item into the index
Throttling on a GSI affects the base table depending on whether the throttling is for read or write activity:
- When a GSI has insufficient read capacity, the base table isn’t affected.
- When a GSI has insufficient write capacity, write operations won’t succeed on the base table or any of its GSIs.

Local Secondary Indexes (LSI)

Local secondary indexes are indexes that have the same partition key as the table, but a different sort key.
Local secondary index is “local” cause every partition of a local secondary index is scoped to a table partition that has the same partition key.

LSI allows search using a secondary index in place of the sort key, thus expanding the number of attributes that can be used for queries that can be conducted efficiently
LSI is updated automatically when the primary index is updated and reads support strong, eventual, and transactional consistency options.
LSIs can only be queried via the Query API

LSIs cannot be added to existing tables at this time
LSIs cannot be modified once it is created at this time
LSI cannot be removed from a table once they are created at this time

LSI consumes provisioned throughput capacity as part of the table with which it is associated
- Read Provisioned throughput
  - if data read is indexed and projected attributes
    - provides one Read Capacity Unit with one strongly consistent read (or two eventually consistent reads) per second for items < 4KB
    - data size includes the index and projected attributes only
  - if data read is indexed and a non-projected attribute
    - consumes double the read capacity, with one to read from the index and one to read from the table with the entire data and not just the non-projected attribute
- Write provisioned throughput
  - consumes 1 write capacity unit if,
    - a new item is inserted into the table
    - existing item is deleted from the table
    - existing items are updated for project attributes
  - consumes 2 write capacity units if
    - existing item is updated for key attributes, which results in deletion and addition of the new item into the index

Global Secondary Index vs Local Secondary Index

AWS Certification Exam Practice Questions

In DynamoDB, a secondary index is a data structure that contains a subset of attributes from a table, along with an alternate key to support ____ operations.
1. None of the above
2. Both
3. Query
4. Scan
In regard to DynamoDB, what is the Global secondary index?
1. An index with a partition and sort key that can be different from those on the table
2. An index that has the same sort key as the table, but a different partition key
3. An index that has the same partition key and sort key as the table
4. An index that has the same partition key as the table, but a different sort key

In regard to DynamoDB, can I modify the index once it is created?
1. Yes, if it is a primary hash key index
2. Yes, if it is a Global secondary index (AWS now allows you to modify global secondary indexes after creation)
3. No
4. Yes, if it is a local secondary index
When thinking of DynamoDB, what is true of Global Secondary Key properties?
1. Both the partition key and sort key can be different from the table.
2. Only the partition key can be different from the table.
3. Either the partition key or the sort key can be different from the table, but not both.
4. Only the sort key can be different from the table.

References

AWS_DynamoDB_Developer_Guide – Secondary_Indexes

AWS IAM Access Management

December 3, 2022 ~ Last updated on : July 21, 2023 ~ jayendrapatil ~ 16 Comments

IAM Access Management

IAM Access Management is all about Permissions and Policies.

Permission help define who has access & what actions can they perform.
IAM Policy helps to fine-tune the permissions granted to the policy owner

IAM Policy is a document that formally states one or more permissions.
Most restrictive Policy always wins
IAM Policy is defined in the JSON (JavaScript Object Notation) format

{
    "Version": "2012-10-17",
    "Statement": {
        "Principal": {"AWS": ["arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:root"]},
        "Action": "s3:ListBucket",
        "Effect": "Allow",
        "Resource": "arn:aws:s3:::example_bucket",
        "Condition": {"StringLike": {
            "s3:prefix": [ "home/${aws:username}/" ]
               }
           }
     }
}

{

"Version": "2012-10-17",

"Statement": {

"Principal": {"AWS": ["arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:root"]},

"Action": "s3:ListBucket",

"Effect": "Allow",

"Resource": "arn:aws:s3:::example_bucket",

"Condition": {"StringLike": {

"s3:prefix": [ "home/${aws:username}/" ]

}

An Entity can be associated with Multiple Policies and a Policy can have multiple statements where each statement in a policy refers to a single permission.

If the policy includes multiple statements, a logical OR is applied across the statements at evaluation time. Similarly, if multiple policies are applicable to a request, a logical OR is applied across the policies at evaluation time.
Principal can either be specified within the Policy for Resource based policies while for Identity based policies the principal is the user, group, or role to which the policy is attached.

Identity-Based vs Resource-Based Permissions

Identity-based, or IAM permissions

Identity-based or IAM permissions are attached to an IAM user, group, or role and specify what the user, group, or role can do.

User, group, or the role itself acts as a Principal.
IAM permissions can be applied to almost all the AWS services.
IAM Policies can either be inline or managed (AWS or Customer).

IAM Policy’s current version is 2012-10-17.

Resource-based permissions

Resource-based permissions are attached to a resource for e.g. S3, SNS
Resource-based permissions specify both who has access to the resource (Principal) and what actions they can perform on it (Actions)

Resource-based policies are inline only, not managed.
Resource-based permissions are supported only by some AWS services
Resource-based policies can be defined with version 2012-10-17 or 2008-10-17

Managed Policies and Inline Policies

Managed policies
- Managed policies are Standalone policies that can be attached to multiple users, groups, and roles in an AWS account.
- Managed policies apply only to identities (users, groups, and roles) but not to resources.
- Managed policies allow reusability
- Managed policy changes are implemented as versions (limited to 5), an new change to the existing policy creates a new version which is useful to compare the changes and revert back, if needed
- Managed policies have their own ARN
- Two types of managed policies:
  - AWS managed policies
    - Managed policies that are created and managed by AWS.
    - AWS maintains and can upgrades these policies for e.g. if a new service is introduced, the changes automatically effects all the existing principals attached to the policy
    - AWS takes care of not breaking the policies for e.g. adding an restriction of removal of permission
    - Managed policies cannot be modified
  - Customer managed policies
    - Managed policies are standalone and custom policies created and administered by you.
    - Customer managed policies allows more precise control over the policies than when using AWS managed policies.

Inline policies
- Inline policies are created and managed by you, and are embedded directly into a single user, group, or role.
- Deletion of the Entity (User, Group or Role) or Resource deletes the In-Line policy as well

ABAC – Attribute-Based Access Control

ABAC – Attribute-based access control is an authorization strategy that defines permissions based on attributes called tags.
ABAC policies can be designed to allow operations when the principal’s tag matches the resource tag.
ABAC is helpful in environments that are growing rapidly and help with situations where policy management becomes cumbersome.

ABAC policies are easier to manage as different policies for different job functions need not be created.
Complements RBAC for granular permissions, with RBAC allowing access to only specific resources and ABAC can allow actions on all resources, but only if the resource tag matches the principal’s tag.
ABAC can help use employee attributes from the corporate directory with federation where attributes are applied to their resulting principal.

IAM Permissions Boundaries

Permissions boundary allows using a managed policy to set the maximum permissions that an identity-based policy can grant to an IAM entity.
Permissions boundary allows it to perform only the actions that are allowed by both its identity-based policies and its permissions boundaries.
Permissions boundary supports both the AWS-managed policy and the customer-managed policy to set the boundary for an IAM entity.

Permissions boundary can be applied to an IAM entity (user or role ) but is not supported for IAM Group.
Permissions boundary does not grant permissions on its own.

IAM Policy Simulator

IAM Policy Simulator helps test and troubleshoot IAM and resource-based policies

IAM Policy Simulator can help test the following ways:-
- Test IAM based policies. If multiple policies are attached, you can test all the policies or select individual policies to test. You can test which actions are allowed or denied by the selected policies for specific resources.
- Test Resource based policies. However, Resource-based policies cannot be tested standalone and have to be attached to the Resource
- Test new IAM policies that are not yet attached to a user, group, or role by typing or copying them into the simulator. These are used only in the simulation and are not saved.
- Test the policies with selected services, actions, and resources
- Simulate real-world scenarios by providing context keys, such as an IP address or date, that are included in Condition elements in the policies being tested.
- Identify which specific statement in a policy results in allowing or denying access to a particular resource or action.
IAM Policy Simulator does not make an actual AWS service request and hence does not make unwanted changes to the AWS live environment
IAM Policy Simulator just reports the result Allowed or Denied

IAM Policy Simulator allows to you modify the policy and test. These changes are not propagated to the actual policies attached to the entities
Introductory Video for Policy Simulator

IAM Policy Evaluation

When determining if permission is allowed, the hierarchy is followed

Decision allows starts with Deny.
IAM combines and evaluates all the policies.

Explicit Deny
- First IAM checks for an explicit denial policy.
- Explicit Deny overrides everything and if something is explicitly denied it can never be allowed.

Explicit Allow
- If one does not exist, it then checks for an explicit allow policy.
- For granting the User any permission, the permission must be explicitly allowed

Implicit Deny
- If neither an explicit deny nor explicit allow policy exists, it reverts to the default: implicit deny.
- All permissions are implicity denied by default

IAM Policy Variables

Policy variables provide a feature to specify placeholders in a policy.
When the policy is evaluated, the policy variables are replaced with values that come from the request itself
Policy variables allow a single policy to be applied to a group of users to control access for e.g. all user having access to S3 bucket folder with their name only

Policy variable is marked using a $ prefix followed by a pair of curly braces ({ }). Inside the ${ } characters, with the name of the value from the request that you want to use in the policy
Policy variables work only with policies defined with Version 2012-10-17
Policy variables can only be used in the Resource element and in string comparisons in the Condition element

Policy variables are case sensitive and include variables like aws:username, aws:userid, aws:SourceIp, aws:CurrentTime etc.

AWS Certification Exam Practice Questions

IAM’s Policy Evaluation Logic always starts with a default ____________ for every request, except for those that use the AWS account’s root security credentials b
1. Permit
2. Deny
3. Cancel
An organization has created 10 IAM users. The organization wants each of the IAM users to have access to a separate DynamoDB table. All the users are added to the same group and the organization wants to setup a group level policy for this. How can the organization achieve this?
1. Define the group policy and add a condition which allows the access based on the IAM name
2. Create a DynamoDB table with the same name as the IAM user name and define the policy rule which grants access based on the DynamoDB ARN using a variable
3. Create a separate DynamoDB database for each user and configure a policy in the group based on the DB variable
4. It is not possible to have a group level policy which allows different IAM users to different DynamoDB Tables
An organization has setup multiple IAM users. The organization wants that each IAM user accesses the IAM console only within the organization and not from outside. How can it achieve this?
1. Create an IAM policy with the security group and use that security group for AWS console login
2. Create an IAM policy with a condition which denies access when the IP address range is not from the organization
3. Configure the EC2 instance security group which allows traffic only from the organization’s IP range
4. Create an IAM policy with VPC and allow a secure gateway between the organization and AWS Console

Can I attach more than one policy to a particular entity?
1. Yes always
2. Only if within GovCloud
3. No
4. Only if within VPC
A __________ is a document that provides a formal statement of one or more permissions.
1. policy
2. permission
3. Role
4. resource
A __________ is the concept of allowing (or disallowing) an entity such as a user, group, or role some type of access to one or more resources.
1. user
2. AWS Account
3. resource
4. permission

True or False: When using IAM to control access to your RDS resources, the key names that can be used are case sensitive. For example, aws:CurrentTime is NOT equivalent to AWS:currenttime.
1. TRUE
2. FALSE (Refer link)

A user has set an IAM policy where it allows all requests if a request from IP 10.10.10.1/32. Another policy allows all the requests between 5 PM to 7 PM. What will happen when a user is requesting access from IP 10.10.10.1/32 at 6 PM?
1. IAM will throw an error for policy conflict
2. It is not possible to set a policy based on the time or IP
3. It will deny access
4. It will allow access
Which of the following are correct statements with policy evaluation logic in AWS Identity and Access Management? Choose 2 answers.
1. By default, all requests are denied
2. An explicit allow overrides an explicit deny
3. An explicit allow overrides default deny
4. An explicit deny does not override an explicit allow
5. By default, all request are allowed
A web design company currently runs several FTP servers that their 250 customers use to upload and download large graphic files. They wish to move this system to AWS to make it more scalable, but they wish to maintain customer privacy and keep costs to a minimum. What AWS architecture would you recommend? [PROFESSIONAL]
1. Ask their customers to use an S3 client instead of an FTP client. Create a single S3 bucket. Create an IAM user for each customer. Put the IAM Users in a Group that has an IAM policy that permits access to subdirectories within the bucket via use of the ‘username’ Policy variable.
2. Create a single S3 bucket with Reduced Redundancy Storage turned on and ask their customers to use an S3 client instead of an FTP client. Create a bucket for each customer with a Bucket Policy that permits access only to that one customer. (Creating bucket for each user is not a scalable model, also 100 buckets are a limit earlier without extending which has since changed link)
3. Create an auto-scaling group of FTP servers with a scaling policy to automatically scale-in when minimum network traffic on the auto-scaling group is below a given threshold. Load a central list of ftp users from S3 as part of the user Data startup script on each Instance (Expensive)
4. Create a single S3 bucket with Requester Pays turned on and ask their customers to use an S3 client instead of an FTP client. Create a bucket tor each customer with a Bucket Policy that permits access only to that one customer. (Creating bucket for each user is not a scalable model, also 100 buckets are a limit earlier without extending which has since changed link)

AWS DynamoDB Advanced Features

December 3, 2022 ~ Last updated on : September 1, 2023 ~ jayendrapatil ~ 7 Comments

AWS DynamoDB Advanced Features

DynamoDB Secondary indexes on a table allow efficient access to data with attributes other than the primary key.

DynamoDB Time to Live – TTL enables a per-item timestamp to determine when an item is no longer needed.
DynamoDB cross-region replication allows identical copies (called replicas) of a DynamoDB table (called master table) to be maintained in one or more AWS regions.

DynamoDB Global Tables is a new multi-master, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
DynamoDB Triggers (just like database triggers) are a feature that allows the execution of custom actions based on item-level updates on a table.

DynamoDB Accelerator – DAX is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from ms to µs – even at millions of requests per second.
VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway.

DynamoDB Secondary Indexes

DynamoDB Secondary indexes on a table allow efficient access to data with attributes other than the primary key.

Global secondary index – an index with a partition key and a sort key that can be different from those on the base table.
Local secondary index – an index that has the same partition key as the base table, but a different sort key.

DynamoDB TTL

DynamoDB Time to Live (TTL) enables a per-item timestamp to determine when an item is no longer needed.

After the date and time of the specified timestamp, DynamoDB deletes the item from the table without consuming any write throughput.
DynamoDB TTL is provided at no extra cost and can help reduce data storage by retaining only required data.
Items that are deleted from the table are also removed from any local secondary index and global secondary index in the same way as a DeleteItem operation.

Expired items get removed from the table and indexes within about 48 hours.
DynamoDB Stream tracks the delete operation as a system delete and not a regular delete.
TTL is useful if the stored items lose relevance after a specific time. for e.g.
- Remove user or sensor data after a year of inactivity in an application
- Archive expired items to an S3 data lake via DynamoDB Streams and AWS Lambda.
- Retain sensitive data for a certain amount of time according to contractual or regulatory obligations.

DynamoDB Cross-region Replication

DynamoDB cross-region replication allows identical copies (called replicas) of a DynamoDB table (called master table) to be maintained in one or more AWS regions.
Writes to the table will be automatically propagated to all replicas.
Cross-region replication currently supports a single master mode. A single master has one master table and one or more replica tables.

Read replicas are updated asynchronously as DynamoDB acknowledges a write operation as successful once it has been accepted by the master table. The write will then be propagated to each replica with a slight delay.
Cross-region replication can be helpful in scenarios
- Efficient disaster recovery, in case a data center failure occurs.
- Faster reads, for customers in multiple regions by delivering data faster by reading a DynamoDB table from the closest AWS data center.
- Easier traffic management, to distribute the read workload across tables and thereby consume less read capacity in the master table.
- Easy regional migration, by promoting a read replica to master
- Live data migration, to replicate data and when the tables are in sync, switch the application to write to the destination region
Cross-region replication costing depends on
- Provisioned throughput (Writes and Reads)
- Storage for the replica tables.
- Data Transfer across regions
- Reading data from DynamoDB Streams to keep the tables in sync.
- Cost of EC2 instances provisioned, depending upon the instance types and region, to host the replication process.
NOTE : Cross Region replication on DynamoDB was performed defining AWS Data Pipeline job which used EMR internally to transfer data before the DynamoDB streams and out-of-box cross-region replication support.

DynamoDB Global Tables

DynamoDB Global Tables is a multi-master, active-active, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.

Applications can now perform reads and writes to DynamoDB in AWS regions around the world, with changes in any region propagated to every region where a table is replicated.
Global Tables help in building applications to advantage of data locality to reduce overall latency.
Global Tables supports eventual consistency & strong consistency for same region reads, but only eventual consistency for cross-region reads.

Global Tables replicates data among regions within a single AWS account and currently does not support cross-account access.
Global Tables uses the Last Write Wins approach for conflict resolution.
Global Tables requires DynamoDB streams enabled with New and Old image settings.

DynamoDB Streams

DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
DynamoDB Streams stores the data for the last 24 hours, after which they are erased.
DynamoDB Streams maintains an ordered sequence of the events per item however, sequence across items is not maintained.

Example
- For e.g., suppose that you have a DynamoDB table tracking high scores for a game and that each item in the table represents an individual player. If you make the following three updates in this order:
  - Update 1: Change Player 1’s high score to 100 points
  - Update 2: Change Player 2’s high score to 50 points
  - Update 3: Change Player 1’s high score to 125 points
- DynamoDB Streams will maintain the order for Player 1 score events. However, it would not maintain order across the players. So Player 2 score event is not guaranteed between the 2 Player 1 events

DynamoDB Streams APIs help developers consume updates and receive the item-level data before and after items are changed.
DynamoDB Streams allow reads at up to twice the rate of the provisioned write capacity of the DynamoDB table.
DynamoDB Streams have to be enabled on a per-table basis.

DynamoDB streams support Encryption at rest to encrypt the data.
DynamoDB Streams is designed for No Duplicates so that every update made to the table will be represented exactly once in the stream.
DynamoDB Streams writes stream records in near-real time so that applications can consume these streams and take action based on the contents.

DynamoDB streams can be used for multi-region replication to keep other data stores up-to-date with the latest changes to DynamoDB or to take actions based on the changes made to the table
DynamoDB steam records can be processed using Kinesis Data Streams, Lambda, or KCL application.

DynamoDB Triggers

DynamoDB Triggers (just like database triggers) are a feature that allows the execution of custom actions based on item-level updates on a table.

DynamoDB triggers can be used in scenarios like sending notifications, updating an aggregate table, and connecting DynamoDB tables to other data sources.
DynamoDB Trigger flow
- Custom logic for a DynamoDB trigger is stored in an AWS Lambda function as code.
- A trigger for a given table can be created by associating an AWS Lambda function to the stream (via DynamoDB Streams) on a table.
- When the table is updated, the updates are published to DynamoDB Streams.
- In turn, AWS Lambda reads the updates from the associated stream and executes the code in the function.

DynamoDB Backup and Restore

DynamoDB on-demand backup helps create full backups of the tables for long-term retention, and archiving for regulatory compliance needs.
Backup and restore actions run with no impact on table performance or availability.
Backups are preserved regardless of table deletion and retained until they are explicitly deleted.

On-demand backups are cataloged, and discoverable.
On-demand backups can be created using
- DynamoDB
  - DynamoDB on-demand backups cannot be copied to a different account or Region.
- AWS Backup (Recommended)
  - is a fully managed data protection service that makes it easy to centralize and automate backups across AWS services, in the cloud, and on-premises
  - provides enhanced backup features
  - can configure backup schedule, policies and monitor activity for the AWS resources and on-premises workloads in one place.
  - can copy the on-demand backups across AWS accounts and Regions,
  - encryption using an AWS KMS key that is independent of the DynamoDB table encryption key.
  - apply write-once-read-many (WORM) setting for the backups using the AWS Backup Vault Lock policy.
  - add cost allocation tags to on-demand backups, and
  - transition on-demand backups to cold storage for lower costs.

DynamoDB PITR – Point-In-Time Recovery

DynamoDB point-in-time recovery – PITR enables automatic, continuous, incremental backup of the table with per-second granularity.
PITR-enabled tables that were deleted can be recovered in the preceding 35 days and restored to their state just before they were deleted.

PITR helps protect against accidental writes and deletes.
PITR can back up tables with hundreds of terabytes of data with no impact on the performance or availability of the production applications.

DynamoDB Accelerator – DAX

DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second.

DAX is intended for high-performance read applications. As a write-through cache, DAX writes directly so that the writes are immediately reflected in the item cache.
DAX as a managed service handles the cache invalidation, data population, or cluster management.
DAX provides API-compatible with DynamoDB. Therefore, it requires only minimal functional changes to use with an existing application.

DAX saves costs by reducing the read load (RCU) on DynamoDB.
DAX helps prevent hot partitions.
DAX only supports eventual consistency, and strong consistency requests are passed-through to DynamoDB.

DAX is fault-tolerant and scalable.
DAX cluster has a primary node and zero or more read-replica nodes. Upon a failure for a primary node, DAX will automatically failover and elect a new primary. For scaling, add or remove read replicas.
DAX supports server-side encryption.

DAX also supports encryption in transit, ensuring that all requests and responses between the application and the cluster are encrypted by TLS, and connections to the cluster can be authenticated by verification of a cluster x509 certificate

DynamoDB Accelerator - DAX

VPC Endpoints

VPC endpoints for DynamoDB improve privacy and security, especially those dealing with sensitive workloads with compliance and audit requirements, by enabling private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway.

VPC endpoints for DynamoDB support IAM policies to simplify DynamoDB access control, where access can be restricted to a specific VPC endpoint.
VPC endpoints can be created only for Amazon DynamoDB tables in the same AWS Region as the VPC
DynamoDB Streams cannot be accessed using VPC endpoints for DynamoDB.

VPC Gateway Endpoints

AWS Certification Exam Practice Questions

What are the services supported by VPC endpoints, using Gateway endpoint type? Choose 2 answers
1. Amazon S3
2. Amazon EFS
3. Amazon DynamoDB
4. Amazon Glacier
5. Amazon SQS
A company has setup an application in AWS that interacts with DynamoDB. DynamoDB is currently responding in milliseconds, but the application response guidelines require it to respond within microseconds. How can the performance of DynamoDB be further improved? [SAA-C01]
1. Use ElastiCache in front of DynamoDB
2. Use DynamoDB inbuilt caching
3. Use DynamoDB Accelerator
4. Use RDS with ElastiCache instead

References

AWS RDS DB Maintenance & Upgrades

December 2, 2022 ~ Last updated on : December 7, 2022 ~ jayendrapatil

RDS DB Maintenance and Upgrades

Changes to a DB instance can occur when a DB instance is manually modified for e.g. DB engine version is upgraded, or when RDS performs maintenance on an instance

RDS Maintenance

RDS performs periodic maintenance on RDS resources, such as DB instances, and most often involves updates to the DB instance’s operating system (OS).
Maintenance items can either
- be applied manually on a DB instance at one’s convenience
- or wait for the automatic maintenance process initiated by RDS during the defined weekly maintenance window.
Maintenance window only determines when pending operations start but does not limit the total execution time of these operations.

Maintenance operations are not guaranteed to finish before the maintenance window ends and can continue beyond the specified end time.
Maintenance update availability can be checked both on the RDS console and by using the RDS API. And if an update is available, one can
- Defer the maintenance items.
- Apply the maintenance items immediately.
- Schedule them to start during the next defined maintenance window
Maintenance items marked as
- Required cannot be deferred indefinitely, if deferred AWS will send a notify the time when the update will be performed next
- Available and can be deferred indefinitely and the update will not be applied to the DB instance.
Required patching is automatically scheduled only for patches that are related to security and instance reliability. Such patching occurs infrequently (typically once every few months) and seldom requires more than a fraction of your maintenance window.

Maintenance items require that RDS take the DB instance offline for a short time. Maintenance that requires DB instances to be offline includes scale compute operations, which generally take only a few minutes from start to finish, and required operating system or database patching.
Multi-AZ deployment for the DB instance reduces the impact of a maintenance event by following these steps:
- Perform maintenance on standby.
- Promote the standby to primary.
- Perform maintenance on the old primary, which becomes the new standby.
When the database engine for the DB instance is modified in a Multi-AZ deployment, RDS upgrades both the primary and secondary DB instances at the same time. In this case, the database engine for the entire Multi-AZ deployment is shut down during the upgrade.

Operating System Updates

Upgrades to the operating system are most often for security issues and should be done as soon as possible.
OS updates on a DB instance can be applied at one’s convenience or can wait for the maintenance process initiated by RDS to apply the update during the defined maintenance window
DB instance is not automatically backed up when an OS update is applied and should be backup up before the update is applied

Database Engine Version Upgrade

DB instance engine version can be upgraded when a new DB engine version is supported by RDS.
Database version upgrades consist of major and minor version upgrades.
- Major database version upgrades
  - can contain changes that are not backward-compatible
  - RDS doesn’t apply major version upgrades automatically
  - DB instance should be manually modified and thoroughly tested before applying it to the production instances.
- Minor version upgrades
  - Each DB engine handles minor version upgrade slightly differently
    for e.g. RDS automatically apply minor version upgrades to a DB instance running PostgreSQL, but must be manually applied to a DB instance running Oracle.
Amazon posts an announcement to the forums announcement page and sends a customer e-mail notification before upgrading an DB instance

Amazon schedule the upgrades at specific times through the year, to help plan around them, because downtime is required to upgrade a DB engine version, even for Multi-AZ instances.
RDS takes two DB snapshots during the upgrade process.
- First DB snapshot is of the DB instance before any upgrade changes have been made. If the upgrade fails, it can be restored from the snapshot to create a DB instance running the old version.
- Second DB snapshot is taken when the upgrade completes. After the upgrade is complete, database engine can’t be reverted to the previous version. For returning to the previous version, restore the first DB snapshot taken to create a new DB instance.
If the DB instance is using read replication, all of the Read Replicas must be upgraded before upgrading the source instance.
If the DB instance is in a Multi-AZ deployment, both the primary and standby replicas are upgraded at the same time and would result in an outage. The time for the outage varies based on your database engine, version, and the size of your DB instance.

RDS Maintenance Window

Every DB instance has a weekly maintenance window defined during which any system changes are applied.
Maintenance window is an opportunity to control when DB instance modifications and software patching occur, in the event either are requested or required.
If a maintenance event is scheduled for a given week, it will be initiated during the 30-minute maintenance window as defined

Maintenance events mostly complete during the 30-minute maintenance window, although larger maintenance events may take more time
30-minute maintenance window is selected at random from an 8-hour block of time per region. If you don’t specify a preferred maintenance window when you create the DB instance, Amazon RDS assigns a 30-minute maintenance window on a randomly selected day of the week.
RDS will consume some of the resources on the DB instance while maintenance is being applied, minimally affecting performance.

For some maintenance events, a Multi-AZ failover may be required for a maintenance update to be complete.

AWS Certification Exam Practice Questions

A user has launched an RDS MySQL DB with the Multi AZ feature. The user has scheduled the scaling of instance storage during maintenance window. What is the correct order of events during maintenance window? 1. Perform maintenance on standby 2. Promote standby to primary 3. Perform maintenance on original primary 4. Promote original master back as primary
1. 1, 2, 3, 4
2. 1, 2, 3
3. 2, 3, 4, 1
Can I control if and when MySQL based RDS Instance is upgraded to new supported versions?
1. No
2. Only in VPC
3. Yes

A user has scheduled the maintenance window of an RDS DB on Monday at 3 AM. Which of the below mentioned events may force to take the DB instance offline during the maintenance window?
1. Enabling Read Replica
2. Making the DB Multi AZ
3. DB password change
4. Security patching
A user has launched an RDS postgreSQL DB with AWS. The user did not specify the maintenance window during creation. The user has configured RDS to update the DB instance type from micro to large. If the user wants to have it during the maintenance window, what will AWS do?
1. AWS will not allow to update the DB until the maintenance window is configured
2. AWS will select the default maintenance window if the user has not provided it
3. AWS will ask the user to specify the maintenance window during the update
4. It is not possible to change the DB size from micro to large with RDS
Can I test my DB Instance against a new version before upgrading?
1. No
2. Yes
3. Only in VPC

References

AWS RDS User Guide

AWS RDS Storage

December 2, 2022 ~ Last updated on : December 3, 2022 ~ jayendrapatil ~ 5 Comments

AWS RDS Storage

RDS storage uses Elastic Block Store – EBS volumes for database and log storage.

RDS automatically stripes across multiple EBS volumes to enhance IOPS performance, depending on the amount of storage requested

RDS Storage Types

RDS storage provides three storage types: General Purpose (SSD), Provisioned IOPS (input/output operations per second), and Magnetic.

These storage types differ in performance characteristics and price, which allows tailoring of storage performance and cost to the database needs
MySQL, MariaDB, PostgreSQL, and Oracle RDS DB instances can be created with up to 64TB of storage, and SQL Server RDS DB instances with up to 16TB of storage when using the Provisioned IOPS and General Purpose (SSD) storage types.
Existing MySQL, PostgreSQL, and Oracle RDS database instances can be scaled to these new database storage limits without any downtime.

Magnetic (Standard)

Magnetic storage, also called standard storage, offers cost-effective storage that is ideal for applications with light or burst I/O requirements.
They deliver approximately 100 IOPS on average, with burst capability of up to hundreds of IOPS, and they can range in size from 5 GB to 3 TB, depending on the DB instance engine.
Magnetic storage is not reserved for a single DB instance, so performance can vary greatly depending on the demands placed on shared resources by other customers.

General Purpose (SSD)

General purpose, SSD-backed storage, also called gp2, can provide faster access than disk-based storage.
They can deliver single-digit millisecond latencies, with a base performance of 3 IOPS per Gigabyte (GB) and the ability to burst to 3,000 IOPS for extended periods of time up to a maximum of 10,000 PIOPS.
General Purpose volumes can range in size from 5 GB to 6 TB for MySQL, MariaDB, PostgreSQL, and Oracle DB instances, and from 20 GB to 4 TB for SQL Server DB instances.

General Purpose is excellent for small to medium-sized databases.

Provisioned IOPS

Provisioned IOPS storage is designed to meet the needs of I/O-intensive workloads, particularly database workloads, that are sensitive to storage performance and consistency in random access I/O throughput.
Provisioned IOPS storage is a storage type that delivers fast, predictable, and consistent throughput performance.

For any production application that requires fast and consistent I/O performance, Amazon recommends Provisioned IOPS (input/output operations per second) storage.
Provisioned IOPS storage is optimized for I/O intensive, online transaction processing (OLTP) workloads that have consistent performance requirements.
Provisioned IOPS helps with performance tuning.

Dedicated IOPS rate and storage space allocation is specified, when a DB instance is created. RDS provisions that IOPS rate and storage for the lifetime of the DB instance or until it is changed.
RDS delivers within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year.

Adding Storage and Changing Storage Type

DB instance can be modified to use additional storage and converted to a different storage type.

However, storage allocated for a DB instance cannot be decreased
MySQL, MariaDB, PostgreSQL, and Oracle DB instances can be scaled up for storage, which helps improve I/O capacity.
Storage capacity nor the type of storage for a SQL Server DB instance can be changed due to the extensibility limitations of striped storage attached to a Windows Server environment.

During the scaling process, the DB instance will be available for reads and writes, but may experience performance degradation
Adding storage may take several hours; the duration of the process depends on several factors such as load, storage size, storage type, amount of IOPS provisioned (if any), and number of prior scale storage operations.
While storage is being added, nightly backups are suspended and no other RDS operations can take place, including modify, reboot, delete, create Read Replica, and create DB Snapshot

Performance Metrics

Amazon RDS provides several metrics that can be used to determine how the DB instance is performing.
- IOPS
  - the number of I/O operations completed per second.
  - it is reported as the average IOPS for a given time interval.
  - RDS reports read and write IOPS separately on one minute intervals.
  - Total IOPS is the sum of the read and write IOPS.
  - Typical values for IOPS range from zero to tens of thousands per second.
- Latency
  - the elapsed time between the submission of an I/O request and its completion
  - it is reported as the average latency for a given time interval.
  - RDS reports read and write latency separately on one minute intervals in units of seconds.
  - Typical values for latency are in the millisecond (ms)
- Throughput
  - the number of bytes per second transferred to or from disk
  - it is reported as the average throughput for a given time interval.
  - RDS reports read and write throughput separately on one minute intervals using units of megabytes per second (MB/s).
  - Typical values for throughput range from zero to the I/O channel’s maximum bandwidth.
- Queue Depth
  - the number of I/O requests in the queue waiting to be serviced.
  - these are I/O requests that have been submitted by the application but have not been sent to the device because the device is busy servicing other I/O requests.
  - it is reported as the average queue depth for a given time interval.
  - RDS reports queue depth in one minute intervals. Typical values for queue depth range from zero to several hundred.
  - Time spent waiting in the queue is a component of Latency and
    Service Time (not available as a metric).

RDS Storage Facts

First time a DB instance is started and accesses an area of disk for the first time, the process can take longer than all subsequent accesses to the same disk area. This is known as the “first touch penalty”. Once an area of disk has incurred the first touch penalty, that area of disk does not incur the penalty again for the life of the instance, even if the DB instance is rebooted, restarted, or the DB instance class changes. Note that a DB instance created from a snapshot, a point-in-time restore, or a read replica is a new instance and does incur this first touch penalty.

RDS manages the DB instance and it reserves overhead space on the instance. While the amount of reserved storage varies by DB instance class and other factors, this reserved space can be as much as one or two percent of the total storage
Provisioned IOPS provides a way to reserve I/O capacity by specifying IOPS. Like any other system capacity attribute, maximum throughput under load will be constrained by the resource that is consumed first, which could be IOPS, channel bandwidth, CPU, memory, or database internal resources.
Current maximum channel bandwidth available is 4000 megabits per second (Mbps) full duplex. In terms of the read and write throughput metrics, this equates to about 210 megabytes per second (MB/s) in each direction. A perfectly balanced workload of 50% reads and 50% writes may attain a maximum combined throughput of 420 MB/s, which includes protocol overhead, so the actual data throughput may be less.

Provisioned IOPS works with an I/O request size of 32 KB. Provisioned IOPS consumption is a linear function of I/O request size above 32 KB. An I/O request smaller than 32 KB is handled as one I/O; for e.g. 1000 16 KB I/O requests are treated the same as 1000 32 KB requests. I/O requests larger than 32 KB consume more than one I/O request; while, a 48 KB I/O request consumes 1.5 I/O requests of storage capacity; a 64 KB I/O request consumes 2 I/O requests

Factors That Impact RDS Storage Performance

Several factors can affect the performance of a DB instance, such as instance configuration, I/O characteristics, and workload demand.
System related activities also consume I/O capacity and may reduce database instance performance while in progress:
- DB snapshot creation
- Nightly backups
- Multi-AZ peer creation
- Read replica creation
- Scaling storage
System resources can constrain the throughput of a DB instance, but there can be other reasons for a bottleneck. Database could be the issue if :-
- Channel throughput limit is not reached
- Queue depths are consistently low
- CPU utilization is under 80%
- Free memory available
- No swap activity
- Plenty of free disk space
- Application has dozens of threads all submitting transactions as fast as the database will take them, but there is clearly unused I/O capacity

AWS Certification Exam Practice Questions

When should I choose Provisioned IOPS over Standard RDS storage?
1. If you have batch-oriented workloads
2. If you use production online transaction processing (OLTP) workloads
3. If you have workloads that are not sensitive to consistent performance
Is decreasing the storage size of a DB Instance permitted?
1. Depends on the RDMS used
2. Yes
3. No

Because of the extensibility limitations of striped storage attached to Windows Server, Amazon RDS does not currently support increasing storage on a _____ DB Instance.
1. SQL Server
2. MySQL
3. Oracle
If I want to run a database in an Amazon instance, which is the most recommended Amazon storage option?
1. Amazon Instance Storage
2. Amazon EBS
3. You can’t run a database inside an Amazon instance.
4. Amazon S3

For each DB Instance class, what is the maximum size of associated storage capacity?
1. 1TiB
2. 2TiB
3. 8TiB
4. 16TiB (The limit keeps on changing so please check the latest always)

References

AWS RDS User Guide

AWS Relational Database Service – RDS

December 2, 2022 ~ Last updated on : December 12, 2022 ~ jayendrapatil ~ 13 Comments

Relational Database Service – RDS

Relational Database Service – RDS is a web service that makes it easier to set up, operate, and scale a relational database in the cloud.

provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks such as hardware provisioning, database setup, patching, and backups.
features & benefits
- CPU, memory, storage, and IOPs can be scaled independently.
- manages backups, software patching, automatic failure detection, and recovery.
- automated backups can be performed as needed, or manual backups can be triggered as well. Backups can be used to restore a database, and the restore process works reliably and efficiently.
- provides Multi-AZ high availability with a primary instance and a synchronous standby secondary instance that can failover seamlessly when a problem occurs.
- provides elasticity & scalability by enabling Read Replicas to increase read scaling.
- supports MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server, and the new, MySQL-compatible Aurora DB engine
- supports IAM users and permissions to control who has access to the RDS database service
- databases can be further protected by putting them in a VPC, using SSL for data in transit and encryption for data in rest
- However, as it is a managed service, shell (root ssh) access to DB instances is not provided, and this restricts access to certain system procedures and tables that require advanced privileges.

RDS Components

DB Instance
- is a basic building block of RDS
- is an isolated database environment in the cloud
- each DB instance runs a DB engine. AWS currently supports MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server & Aurora DB engines
- can be accessed from AWS command-line tools, RDS APIs, or the AWS Management RDS Console.
- computation and memory capacity of a DB instance is determined by its DB instance class, which can be selected as per the needs
- supports three storage types: Magnetic, General Purpose (SSD), and Provisioned IOPS (SSD), which differ in performance and price
- each DB instance has a DB instance identifier, which is a customer-supplied name and must be unique for that customer in an AWS region. It uniquely identifies the DB instance when interacting with the RDS API and AWS CLI commands.
- each DB instance can host multiple user-created databases or a single Oracle database with multiple schemas.
- can be hosted in an AWS VPC environment for better control
Regions and Availability Zones
- AWS resources are housed in highly available data center facilities in different areas of the world, these data centers are called regions which further contain multiple distinct locations called Availability Zones
- Each AZ is engineered to be isolated from failures in other AZs and to provide inexpensive, low-latency network connectivity to other AZs in the same region
- DB instances can be hosted in different AZs, an option called a Multi-AZ deployment.
  - RDS automatically provisions and maintains a synchronous standby replica of the DB instance in a different AZ.
  - Primary DB instance is synchronously replicated across AZs to the standby replica
  - Provides data redundancy, failover support, eliminates I/O freezes, and minimizes latency spikes during system backups.
Security Groups
- security group controls the access to a DB instance, by allowing access to the specified IP address ranges or EC2 instances
DB Parameter Groups
- A DB parameter group contains engine configuration values that can be applied to one or more DB instances of the same instance type
- help define configuration values specific to the selected DB Engine for e.g. max_connections, force_ssl , autocommit
- supports default parameter group, which cannot be edited.
- supports custom parameter group, to override values
- supports static and dynamic parameter groups
  - changes to dynamic parameters are applied immediately (irrespective of apply immediately setting)
  - changes to static parameters are NOT applied immediately and require a manual reboot.

DB Option Groups
- Some DB engines offer tools or optional features that simplify managing the databases and making the best use of data.
- RDS makes such tools available through option groups for e.g. Oracle Application Express (APEX), SQL Server Transparent Data Encryption, and MySQL Memcached support.

RDS Interfaces

RDS can be interacted with multiple interfaces
- AWS RDS Management console
- Command Line Interface
- Programmatic Interfaces which include SDKs, libraries in different languages, and RDS API

RDS Multi-AZ & Read Replicas

Multi-AZ deployment
- provides high availability, durability, and automatic failover support
- helps improve the durability and availability of a critical system, enhancing availability during planned system maintenance, DB instance failure, and Availability Zone disruption.
- automatically provisions and manages a synchronous standby instance in a different AZ.
- automatically fails over in case of any issues with the primary instance
- A Multi-AZ DB instance deployment has one standby DB instance that provides failover support but doesn’t serve read traffic.
- A Multi-AZ DB cluster deployment has two standby DB instances that provide failover support and can also serve read traffic.
Read replicas
- enable increased scalability and database availability in the case of an AZ failure.
- allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads

RDS Security

DB instance can be hosted in a VPC for the greatest possible network access control.

IAM policies can be used to assign permissions that determine who is allowed to manage RDS resources.
Security groups allow control of what IP addresses or EC2 instances can connect to the databases on a DB instance.
RDS supports encryption in transit using SSL connections

RDS supports encryption at rest to secure instances and snapshots at rest.
Network encryption and transparent data encryption (TDE) with Oracle DB instances
Authentication can be implemented using Password, Kerberos, and IAM database authentication.

RDS Backups, Snapshot

Automated backups
- are enabled by default for a new DB instance.
- enables recovery of the database to any point in time, with database change logs, during the backup retention period, up to the last five minutes of database usage.

DB snapshots are manual, user-initiated backups that enable backup of the DB instance to a known state, and restore to that specific state at any time.

RDS Monitoring & Notification

RDS integrates with CloudWatch and provides metrics for monitoring
CloudWatch alarms can be created over a single metric that sends an SNS message when the alarm changes state

RDS also provides SNS notification whenever any RDS event occurs
RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
RDS Recommendations provides automated recommendations for database resources.

RDS Pricing

Instance class
- Pricing is based on the class (e.g., micro) of the DB instance consumed.
Running time
- Usage is billed in one-second increments, with a minimum of 10 mins.
Storage
- Storage capacity provisioned for the DB instance is billed per GB per month
- If the provisioned storage capacity is scaled within the month, the bill will be pro-rated.
I/O requests per month
- Total number of storage I/O requests made in a billing cycle.

Provisioned IOPS (per IOPS per month)
- Provisioned IOPS rate, regardless of IOPS consumed, for RDS Provisioned IOPS (SSD) storage only.
- Provisioned storage for EBS volumes is billed in one-second increments, with a minimum of 10 minutes.

Backup storage
- Automated backups & any active database snapshots consume storage
- Increasing backup retention period or taking additional database snapshots increases the backup storage consumed by the database.
- RDS provides backup storage up to 100% of the provisioned database storage at no additional charge for e.g., if you have 10 GB-months of provisioned database storage, RDS provides up to 10 GB-months of backup storage at no additional charge.
- Most databases require less raw storage for a backup than for the primary dataset, so if multiple backups are not maintained, you will never pay for backup storage.
- Backup storage is free only for active DB instances.

Data transfer
- Internet data transfer out of the DB instance.
Reserved Instances
- In addition to regular RDS pricing, reserved DB instances can be purchased

AWS Certification Exam Practice Questions

What does Amazon RDS stand for?
1. Regional Data Server.
2. Relational Database Service
3. Regional Database Service.
How many relational database engines does RDS currently support?
1. MySQL, Postgres, MariaDB, Oracle, and Microsoft SQL Server
2. Just two: MySQL and Oracle.
3. Five: MySQL, PostgreSQL, MongoDB, Cassandra and SQLite.
4. Just one: MySQL.
If I modify a DB Instance or the DB parameter group associated with the instance, should I reboot the instance for the changes to take effect?
1. No
2. Yes
What is the name of licensing model in which I can use your existing Oracle Database licenses to run Oracle deployments on Amazon RDS?
1. Bring Your Own License
2. Role Bases License
3. Enterprise License
4. License Included

Will I be charged if the DB instance is idle?
1. No
2. Yes
3. Only is running in GovCloud
4. Only if running in VPC
What is the minimum charge for the data transferred between Amazon RDS and Amazon EC2 Instances in the same Availability Zone?
1. USD 0.10 per GB
2. No charge. It is free.
3. USD 0.02 per GB
4. USD 0.01 per GB
Does Amazon RDS allow direct host access via Telnet, Secure Shell (SSH), or Windows Remote Desktop Connection?
1. Yes
2. No
3. Depends on if it is in VPC or not
What are the two types of licensing options available for using Amazon RDS for Oracle?
1. BYOL and Enterprise License
2. BYOL and License Included
3. Enterprise License and License Included
4. Role based License and License Included
A user plans to use RDS as a managed DB platform. Which of the below mentioned features is not supported by RDS?
1. Automated backup
2. Automated scaling to manage a higher load
3. Automated failure detection and recovery
4. Automated software patching

A user is launching an AWS RDS with MySQL. Which of the below mentioned options allows the user to configure the InnoDB engine parameters?
1. Options group
2. Engine parameters
3. Parameter groups
4. DB parameters
A user is planning to use the AWS RDS with MySQL. Which of the below mentioned services the user is not going to pay?
1. Data transfer
2. RDS CloudWatch metrics
3. Data storage
4. I/O requests per month

References

AWS_Relational_Database_Service_RDS

AWS RDS Best Practices

December 2, 2022 ~ Last updated on : December 7, 2022 ~ jayendrapatil ~ 11 Comments

AWS RDS Best Practices

AWS recommends RDS best practices in terms of Monitoring, Performance, and security

RDS Basic Operational Guidelines

Monitoring
- Memory, CPU, and storage usage should be monitored.
- CloudWatch can be setup for notifications when usage patterns change or when the capacity of deployment is approached, so that system performance and availability can be maintained
Scaling
- Scale up the DB instance when approaching storage capacity limits.
- There should be some buffer in storage and memory to accommodate unforeseen increases in demand from the applications.
Backups
- Enable Automatic Backups and set the backup window to occur during the daily low in WriteIOPS.
- Use Multi-AZ to reduce to impact of backups on the primary DB instance.
On a MySQL DB instance,
- Do not create more than 10,000 tables using Provisioned IOPS or 1000 tables using standard storage. Large numbers of tables will significantly increase database recovery time after a failover or database crash. If you need to create more tables than recommended, set the innodb_file_per_table parameter to 0.
- Avoid tables in the database growing too large. Provisioned storage limits restrict the maximum size of a MySQL table file to 6 TB. Instead, partition the large tables so that file sizes are well under the 6 TB limit. This can also improve performance and recovery time.
Performance
- If the database workload requires more I/O than provisioned, recovery after a failover or database failure will be slow.
- To increase the I/O capacity of a DB instance,
  - Migrate to a DB instance class with High I/O capacity.
  - Convert from standard storage to Provisioned IOPS storage, and use a DB instance class that is optimized for Provisioned IOPS.
  - if using Provisioned IOPS storage, provision additional throughput capacity.
Multi-AZ & Failover
- Deploy applications in all Availability Zones, so if an AZ goes down, applications in other AZs will still be available.
- Use RDS DB events to monitor failovers.
- Set a TTL of less than 30 seconds, if the client application is caching the DNS data of the DB instances. As the underlying IP address of a DB instance can change after a failover, caching the DNS data for an extended time can lead to connection failures if the application tries to connect to an IP address that no longer is in service.
- Multi-AZ requires the transaction logging feature to be enabled. Do not use features like Simple recover mode, offline mode or Read-only mode which turn of transaction logging.
- To shorten failover time
  - Ensure that sufficient Provisioned IOPS allocated for your workload. Inadequate I/O can lengthen failover times. Database recovery requires I/O.
  - Use smaller transactions. Database recovery relies on transactions, so break up large transactions into multiple smaller transactions to shorten failover time
- Test failover for your DB instance to understand how long the process takes for your use case and to ensure that the application that accesses your DB instance can automatically connect to the new DB instance after failover.

DB Instance RAM Recommendations

An RDS performance best practice is to allocate enough RAM so that the working set resides almost completely in memory.
Value of ReadIOPS should be small and stable.

ReadIOPS metric can be checked, using AWS CloudWatch while the DB instance is under load, to tell if the working set is almost all in memory
If scaling up the DB instance class with more RAM, results in a dramatic drop in ReadIOPS, the working set was not almost completely in memory.
Continue to scale up until ReadIOPS no longer drops dramatically after a scaling operation, or ReadIOPS is reduced to a very small amount.

RDS Security Best Practices

Do not use AWS root credentials to manage RDS resources, and IAM users should be created for everyone,
Grant each user the minimum set of permissions required to perform his or her duties.
Use IAM groups to effectively manage permissions for multiple users.

Rotate your IAM credentials regularly.

Using Enhanced Monitoring to Identify Operating System Issues

RDS provides metrics in real time for the operating system (OS) that your DB instance runs on.
Enhanced monitoring is available for all DB instance classes except for db.t1.micro and db.m1.small.

Using Metrics to Identify Performance Issues

To identify performance issues caused by insufficient resources and other common bottlenecks, you can monitor the metrics available for your Amazon RDS DB instance
Performance metrics should be monitored on a regular basis to benchmark the average, maximum, and minimum values for a variety of time ranges. to help identify performance degradation.
CloudWatch alarms can be set for particular metric thresholds to be alerted when they are reached or breached

A DB instance has a number of different categories of metrics which includes CPU, memory, disk space, IOPS, db connections and network traffic, and how to determine acceptable values depends on the metric.
One of the best ways to improve DB instance performance is to tune the most commonly used and most resource-intensive queries to make them less expensive to run.

Recovery

MySQL
- InnoDB is the recommended and supported storage engine for MySQL DB instances on Amazon RDS.
- However, MyISAM performs better than InnoDB if you require intense, full-text search capability.
- Point-In-Time Restore and snapshot restore features of Amazon RDS for MySQL require a crash-recoverable storage engine and are supported for the InnoDB storage engine only.
- Although MySQL supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability.
- MyISAM storage engine does not support reliable crash recovery and might prevent a Point-In-Time Restore or snapshot restore from working as intended which might result in lost or corrupt data when MySQL is restarted after a crash.
MariaDB
- XtraDB is the recommended and supported storage engine for MariaDB DB instances on Amazon RDS.
- Point-In-Time Restore and snapshot restore features of Amazon RDS for MariaDB require a crash-recoverable storage engine and are supported for the XtraDB storage engine only.
- Although MariaDB supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery
  and data durability.
- For e.g although Aria is a crash-safe replacement for MyISAM, it might still prevent a Point-In-Time Restore or snapshot restore from working as intended. This might result in lost or corrupt data when MariaDB is restarted after a crash.

AWS Certification Exam Practice Questions

You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
1. Move to an SSD backed instance
2. Move the database to an EBS-Optimized Instance
3. Use Provisioned IOPs EBS
4. Use the ephemeral storage on an m2.4xLarge Instance Instead

Amazon RDS automated backups and DB Snapshots are currently supported for only the __________ storage engine
1. InnoDB
2. MyISAM

References

AWS_RDS_Best_Practices

AWS Lambda

December 1, 2022 ~ Last updated on : February 24, 2023 ~ jayendrapatil ~ 19 Comments

AWS Lambda

AWS Lambda offers Serverless computing that allows applications and services to be built and run without thinking about servers.

With serverless computing, the application still runs on servers, but all the server management is done by AWS.
helps run code without provisioning or managing servers, where you pay only for the compute time when the code is running.

is priced on a pay-per-use basis and there are no charges when the code is not running.
allows the running of code for any type of application or backend service with zero administration.
performs all the operational and administrative activities on your behalf, including capacity provisioning, monitoring fleet health, applying security patches to the underlying compute resources, deploying code, running a web service front end, and monitoring and logging the code.

does not provide access to the underlying compute infrastructure.
handles scalability and availability as it
- provides easy scaling and high availability to the code without additional effort on your part.
- is designed to process events within milliseconds.
- is designed to run many instances of the functions in parallel.
- is designed to use replication and redundancy to provide high availability for both the service and the functions it operates.
- has no maintenance windows or scheduled downtimes for either.
- has a default safety throttle for the number of concurrent executions per account per region.
- has a higher latency immediately after a function is created, or updated, or if it has not been used recently.
- for any function updates, there is a brief window of time, less than a minute, when requests would be served by both versions
Security
- stores code in S3 and encrypts it at rest and performs additional integrity checks while the code is in use.
- each function runs in its own isolated environment, with its own resources and file system view
- supports Code Signing using AWS Signer, which offers trust and integrity controls that enable you to verify that only unaltered code from approved developers is deployed in the functions.
Functions must complete execution within 900 seconds. The default timeout is 3 seconds. The timeout can be set the timeout to any value between 1 and 900 seconds.

AWS Step Functions can help coordinate a series of Lambda functions in a specific order. Multiple functions can be invoked sequentially, passing the output of one to the other, and/or in parallel, while the state is being maintained by Step Functions.
AWS X-Ray helps to trace functions, which provides insights such as service overhead, function init time, and function execution time.
Lambda Provisioned Concurrency provides greater control over the performance of serverless applications.

Lambda@Edge allows you to run code across AWS locations globally without provisioning or managing servers, responding to end-users at the lowest network latency.
Lambda Extensions allow integration of Lambda with other third-party tools for monitoring, observability, security, and governance.
Compute Savings Plan can help save money for Lambda executions.

CodePipeline and CodeDeploy can be used to automate the serverless application release process.
RDS Proxy provides a highly available database proxy that manages thousands of concurrent connections to relational databases.
Supports Elastic File Store , to provide a shared, external, persistent, scalable volume using a fully managed elastic NFS file system without the need for provisioning or capacity management.

supports Function URLs, a built-in HTTPS endpoint that can be invoked using the browser, curl, and any HTTP client.

Functions & Event Sources

Core components of Lambda are functions and event sources.
- Event source – an AWS service or custom application that publishes events.
- Function – a custom code that processes the events.

Lambda Functions

Each function has associated configuration information, such as its name, description, runtime, entry point, and resource requirements
Lambda functions should be designed as stateless
- to allow launching of as many copies of the function as needed as per the demand.
- Local file system access, child processes, and similar artifacts may not extend beyond the lifetime of the request
- The state can be maintained externally in DynamoDB or S3

Lambda Execution role can be assigned to the function to grant permission to access other resources.
Functions have the following restrictions
- Inbound network connections are blocked
- Outbound connections only TCP/IP sockets are supported
- ptrace (debugging) system calls are blocked
- TCP port 25 traffic is also blocked as an anti-spam measure.

Lambda may choose to retain an instance of the function and reuse it to serve a subsequent request, rather than creating a new copy.
Lambda Layers provide a convenient way to package libraries and other dependencies that you can use with your Lambda functions.
Function versions can be used to manage the deployment of the functions.

Function Alias supports creating aliases, which are mutable, for each function version.
Functions are automatically monitored, and real-time metrics are reported through CloudWatch, including total requests, latency, error rates, etc.
Lambda automatically integrates with CloudWatch logs, creating a log group for each function and providing basic application lifecycle event log entries, including logging the resources consumed for each use of that function.

Functions support code written in
- Node.js (JavaScript)
- Python
- Ruby
- Java (Java 8 compatible)
- C# (.NET Core)
- Go
- Custom runtime
Container images are also supported.

Failure Handling
- For S3 bucket notifications and custom events, Lambda will attempt execution of the function three times in the event of an error condition in the code or if a service or resource limit is exceeded.
- For ordered event sources that Lambda polls, e.g. DynamoDB Streams and Kinesis streams, it will continue attempting execution in the event of a developer code error until the data expires.
- Kinesis and DynamoDB Streams retain data for a minimum of 24 hours
- Dead Letter Queues (SNS or SQS) can be configured for events to be placed, once the retry policy for asynchronous invocations is exceeded

Read in-depth @ Lambda Functions

Lambda Event Sources

Event Source is an AWS service or developer-created application that produces events that trigger an AWS Lambda function to run
Event source mapping refers to the configuration which maps an event source to a Lambda function.
Event sources can be both push and pull sources
- Services like S3, and SNS publish events to Lambda by invoking the cloud function directly.
- Lambda can also poll resources in services like Kafka, and Kinesis streams that do not publish events to Lambda.

Read in-depth @ Event Sources

Lambda Execution Environment

Lambda invokes the function in an execution environment, which provides a secure and isolated runtime environment.
Execution Context is a temporary runtime environment that initializes any external dependencies of the Lambda function code, e.g. database connections or HTTP endpoints.
When a function is invoked, the Execution environment is launched based on the provided configuration settings i.e. memory and execution time.

After a Lambda function is executed, Lambda maintains the execution environment for some time in anticipation of another function invocation which allows it to reuse the /tmp directory and objects declared outside of the function’s handler method e.g. database connection.
When a Lambda function is invoked for the first time or after it has been updated there is latency for bootstrapping as Lambda tries to reuse the Execution Context for subsequent invocations of the Lambda function
Subsequent invocations perform better performance as there is no need to “cold-start” or initialize those external dependencies

Execution environment
- takes care of provisioning and managing the resources needed to run the function.
- provides lifecycle support for the function’s runtime and any external extensions associated with the function.

Function’s runtime communicates with Lambda using the Runtime API.
Extensions communicate with Lambda using the Extensions API.
Extensions can also receive log messages from the function by subscribing to logs using the Logs API.

Lambda manages Execution Environment creations and deletion, there is no AWS Lambda API to manage Execution Environment.

Lambda Execution Environment

Lambda in VPC

Lambda function always runs inside a VPC owned by the Lambda service which isn’t connected to your account’s default VPC

Lambda applies network access and security rules to this VPC and maintains and monitors the VPC automatically.
A function can be configured to be launched in private subnets in a VPC in your AWS account.
Function connected to VPC can access private resources databases, cache instances, or internal services during the execution.

To enable the function to access resources inside the private VPC, additional VPC-specific configuration information that includes private subnet IDs and security group IDs must be provided.
Lambda uses this information to set up ENIs that enables the function to connect securely to other resources within your private VPC.
Functions connected to VPC can’t access the Internet and need a NAT Gateway to access any external resources outside of AWS.

Functions cannot connect directly to a VPC with dedicated instance tenancy, instead, peer it to a second VPC with default tenancy.

Lambda Security

All data stored in ephemeral storage is encrypted at rest with a key managed by AWS.
Lambda functions provide access only to a single VPC. If multiple subnets are specified, they must all be in the same VPC. Other VPCs can be connected using VPC Peering.

Supports Code Signing using AWS Signer, which offers trust and integrity controls that enable you to verify that only unaltered code from approved developers is deployed in the functions.
AWS Lambda can perform the following signature checks at deployment:
- Corrupt signature – This occurs if the code artifact has been altered since signing.
- Mismatched signature – This occurs if the code artifact is signed by a signing profile that is not approved.
- Expired signature – This occurs if the signature is past the configured expiry date.
- Revoked signature – This occurs if the signing profile owner revokes the signing jobs.

For sensitive information, for e.g. passwords, AWS recommends using client-side encryption using AWS Key Management Service – KMS and store the resulting values as ciphertext in your environment variable.
Function code should include the logic to decrypt these values.

Lambda Permissions

IAM – Use IAM to manage access to the Lambda API and resources like functions and layers.

Execution Role – A Lambda function can be provided with an Execution Role, that grants it permission to access AWS services and resources e.g. send logs to CloudWatch and upload trace data to AWS X-Ray.
Function Policy – Resource-based Policies
- Use resource-based policies to give other accounts and AWS services permission to use the Lambda resources.
- Resource-based permissions policies are supported for functions and layers.

Invoking Lambda Functions

Lambda functions can be invoked
- directly using the Lambda console or API, a function URL HTTP(S) endpoint, an AWS SDK, the AWS CLI, and AWS toolkits.
- other AWS services like S3 and SNS invoke the function.
- to read from a stream or queue and invoke the function.
Functions can be invoked
- Synchronously
  - You wait for the function to process the event and return a response.
  - Error handling and retries need to be handled by the Client.
  - Invocation includes API, and SDK for calls from API Gateway.
- Asynchronously
  - queues the event for processing and returns a response immediately.
  - handles retries and can send invocation records to a destination for successful and failed events.
  - Invocation includes S3, SNS, and CloudWatch Events
  - can define DLQ for handling failed events. AWS recommends using destination instead of DLQ.

Lambda Provisioned Concurrency

Lambda Provisioned Concurrency provides greater control over the performance of serverless applications.
When enabled, Provisioned Concurrency keeps functions initialized and hyper-ready to respond in double-digit milliseconds.
Provisioned Concurrency is ideal for building latency-sensitive applications, such as web or mobile backends, synchronously invoked APIs, and interactive microservices.

The amount of concurrency can be increased during times of high demand and lowered or turn it off completely when demand decreases.
If the concurrency of a function reaches the configured level, subsequent invocations of the function have the latency and scale characteristics of regular functions.

Lambda@Edge

Read in-depth @ Lambda@Edge

Lambda Extensions

Lambda Extensions allow integration of Lambda with other third-party tools for monitoring, observability, security, and governance.

Lambda Best Practices

Lambda function code should be stateless and ensure there is no affinity between the code and the underlying compute infrastructure.

Instantiate AWS clients outside the scope of the handler to take advantage of connection re-use.
Make sure you have set +rx permissions on your files in the uploaded ZIP to ensure Lambda can execute code on your behalf.
Lower costs and improve performance by minimizing the use of startup code not directly related to processing the current event.

Use the built-in CloudWatch monitoring of the Lambda functions to view and optimize request latencies.
Delete old Lambda functions that you are no longer using.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
Open to further feedback, discussion and correction.

Your serverless architecture using AWS API Gateway, AWS Lambda, and AWS DynamoDB experienced a large increase in traffic to a sustained 400 requests per second, and dramatically increased in failure rates. Your requests, during normal operation, last 500 milliseconds on average. Your DynamoDB table did not exceed 50% of provisioned throughput, and Table primary keys are designed correctly. What is the most likely issue?
1. Your API Gateway deployment is throttling your requests.
2. Your AWS API Gateway Deployment is bottlenecking on request (de)serialization.
3. You did not request a limit increase on concurrent Lambda function executions. (Refer link – AWS API Gateway by default throttles at 500 requests per second steady-state, and 1000 requests per second at spike. Lambda, by default, throttles at 100 concurrent requests for safety. At 500 milliseconds (half of a second) per request, you can expect to support 200 requests per second at 100 concurrency. This is less than the 400 requests per second your system now requires. Make a limit increase request via the AWS Support Console.)
4. You used Consistent Read requests on DynamoDB and are experiencing semaphore lock.

Kubernetes Overview

Container Deployment Model

Kubernetes Features

Kubernetes Architecture

Node components

Kubernetes Components

Kubernetes Security

AWS ElastiCache

Redis

Redis Features

Redis Read Replica

Redis Multi-AZ

Redis Backup & Restore

Redis Cluster Mode

Memcached

ElastiCache Mitigating Failures

AWS Certification Exam Practice Questions

AWS DynamoDB Secondary Indexes

Global Secondary Indexes – GSI

Local Secondary Indexes (LSI)

Global Secondary Index vs Local Secondary Index

AWS Certification Exam Practice Questions

References

IAM Access Management

Identity-Based vs Resource-Based Permissions

Identity-based, or IAM permissions

Resource-based permissions

Managed Policies and Inline Policies

ABAC – Attribute-Based Access Control

IAM Permissions Boundaries

IAM Policy Simulator

IAM Policy Evaluation

IAM Policy Variables

AWS Certification Exam Practice Questions

AWS DynamoDB Advanced Features

DynamoDB TTL

DynamoDB Cross-region Replication

DynamoDB Global Tables

DynamoDB Triggers

DynamoDB Backup and Restore

DynamoDB PITR – Point-In-Time Recovery

VPC Endpoints

AWS Certification Exam Practice Questions

References

RDS DB Maintenance and Upgrades

RDS Maintenance

Operating System Updates

Database Engine Version Upgrade

RDS Maintenance Window

AWS Certification Exam Practice Questions

References

AWS RDS Storage

RDS Storage Types

Magnetic (Standard)

Adding Storage and Changing Storage Type

Performance Metrics

RDS Storage Facts

Factors That Impact RDS Storage Performance

AWS Certification Exam Practice Questions

References

Relational Database Service – RDS

RDS Components

RDS Interfaces

RDS Pricing

Further Reading

AWS Certification Exam Practice Questions

References

AWS RDS Best Practices

RDS Basic Operational Guidelines

DB Instance RAM Recommendations

RDS Security Best Practices

Using Enhanced Monitoring to Identify Operating System Issues

Using Metrics to Identify Performance Issues

Recovery

AWS Certification Exam Practice Questions

References

AWS Lambda

Functions & Event Sources

Lambda Execution Environment

Lambda in VPC