AWS Database Services Cheat Sheet

AWS Database Services Cheat Sheet

AWS Database Services

Relational Database Service – RDS

  • provides Relational Database service
  • supports MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server, and the new, MySQL-compatible Amazon Aurora DB engine
  • as it is a managed service, shell (root ssh) access is not provided
  • manages backups, software patching, automatic failure detection, and recovery
  • supports use initiated manual backups and snapshots
  • daily automated backups with database transaction logs enables Point in Time recovery up to the last five minutes of database usage
  • snapshots are user-initiated storage volume snapshot of DB instance, backing up the entire DB instance and not just individual databases that can be restored as a independent RDS instance
  • RDS Security
    • support encryption at rest using KMS as well as encryption in transit using SSL endpoints
    • supports IAM database authentication, which prevents the need to store static user credentials in the database, because authentication is managed externally using IAM.
    • supports Encryption only during creation of an RDS DB instance
    • existing unencrypted DB cannot be encrypted and you need to create a  snapshot, created a encrypted copy of the snapshot and restore as encrypted DB
    • supports Secret Manager for storing and rotating secrets
    • for encrypted database
      • logs, snapshots, backups, read replicas are all encrypted as well
      • cross region replicas and snapshots does not work across region (Note – this is possible now with latest AWS enhancement)
  • Multi-AZ deployment
    • provides high availability and automatic failover support and is NOT a scaling solution
    • maintains a synchronous standby replica in a different AZ
    • transaction success is returned only if the commit is successful both on the primary and the standby DB
    • Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Mirroring
    • snapshots and backups are taken from standby & eliminate I/O freezes
    • during automatic failover, its seamless and RDS switches to the standby instance and updates the DNS record to point to standby
    • failover can be forced with the Reboot with failover option
  • Read Replicas
    • uses the PostgreSQL, MySQL, and MariaDB DB engines’ built-in replication functionality to create a separate Read Only instance
    • updates are asynchronously copied to the Read Replica, and data might be stale
    • can help scale applications and reduce read only load
    • requires automatic backups enabled
    • replicates all databases in the source DB instance
    • for disaster recovery, can be promoted to a full fledged database
    • can be created in a different region for disaster recovery, migration and low latency across regions
    • can’t create encrypted read replicas from unencrypted DB or read replica
  • RDS does not support all the features of underlying databases, and if required the database instance can be launched on an EC2 instance
  • RDS Components
    • DB parameter groups contains engine configuration values that can be applied to one or more DB instances of the same instance type for e.g. SSL, max connections etc.
    • Default DB parameter group cannot be modified, create a custom one and attach to the DB
    • Supports static and dynamic parameters
      • changes to dynamic parameters are applied immediately (irrespective of apply immediately setting)
      • changes to static parameters are NOT applied immediately and require a manual reboot.
  • RDS Monitoring & Notification
    • integrates with CloudWatch and CloudTrail
    • CloudWatch provides metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance
    • Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
    • supports RDS Event Notification which uses the SNS to provide notification when an RDS event like creation, deletion or snapshot creation etc occurs

Aurora

  • is a relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases
  • is a managed services and handles time-consuming tasks such as provisioning, patching, backup, recovery, failure detection and repair
  • is a proprietary technology from AWS (not open sourced)
  • provides PostgreSQL and MySQL compatibility
  • is “AWS cloud optimized” and claims 5x performance improvement
    over MySQL on RDS, over 3x the performance of PostgreSQL on RDS
  • scales storage automatically in increments of 10GB, up to 64 TB with no impact to database performance. Storage is striped across 100s of volumes.
  • no need to provision storage in advance.
  • provides self-healing storage. Data blocks and disks are continuously scanned for errors and repaired automatically.
  • provides instantaneous failover
  • replicates each chunk of my the database volume six ways across three Availability Zones i.e. 6 copies of the data across 3 AZ
    • requires 4 copies out of 6 needed for writes
    • requires 3 copies out of 6 need for reads
  • costs more than RDS (20% more) – but is more efficient
  • Read Replicas
    • can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
    • share the same data volume as the primary instance in the same AWS Region, there is virtually no replication lag
    • supports Automated failover for master in less than 30 seconds
    • supports Cross Region Replication using either physical or logical replication.
  • Security
    • supports Encryption at rest using KMS
    • supports Encryption in flight using SSL (same process as MySQL or Postgres)
    • Automated backups, snapshots and replicas are also encrypted
    • Possibility to authenticate using IAM token (same method as RDS)
    • supports protecting the instance with security groups
    • does not support SSH access to the underlying servers
  • Aurora Serverless
    • provides automated database Client  instantiation and on-demand  autoscaling based on actual usage
    • provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads
    • automatically starts up, shuts down, and scales capacity up or down based on the application’s needs. No capacity planning needed
    • Pay per second, can be more cost-effective
  • Aurora Global Database
    • allows a single Aurora database to span multiple AWS regions.
    • provides Physical replication, which uses dedicated infrastructure that leaves the databases entirely available to serve the application
    • supports 1 Primary Region (read / write)
    • replicates across up to 5 secondary (read-only) regions, replication lag is less than 1 second
    • supports up to 16 Read Replicas per secondary region
    • recommended for low-latency global reads and disaster recovery with an RTO of < 1 minute
    • failover is not automated and if the primary region becomes unavailable, a secondary region can be manually removed from an Aurora Global Database and promote it to take full reads and writes. Application needs to be updated to point to the newly promoted region.
  • Aurora Backtrack
    • Backtracking “rewinds” the DB cluster to the specified time
    • Backtracking performs in place restore and does not create a new instance. There is a minimal downtime associated with it.
  • Aurora Clone feature allows quick and cost-effective creation of Aurora Cluster duplicates
  • supports parallel or distributed query using Aurora Parallel Query, which refers to the ability to push down and distribute the computational load of a single query across thousands of CPUs in Aurora’s storage layer.

DynamoDB

  • fully managed NoSQL database service
  • synchronously replicates data across three facilities in an AWS Region, giving high availability and data durability
  • runs exclusively on SSDs to provide high I/O performance
  • provides provisioned table reads and writes
  • automatically partitions, reallocates, and re-partitions the data and provisions additional server capacity as data or throughput changes
  • creates and maintains indexes for the primary key attributes for efficient access to data in the table
  • DynamoDB Table classes currently support
    • DynamoDB Standard table class is the default and is recommended for the vast majority of workloads.
    • DynamoDB Standard-Infrequent Access (DynamoDB Standard-IA) table class which is optimized for tables where storage is the dominant cost.
  • supports Secondary Indexes
    • allows querying attributes other than the primary key attributes without impacting performance.
    • are automatically maintained as sparse objects
  • Local secondary index vs Global secondary index
    • shares partition key + different sort key vs different partition + sort key
    • search limited to partition vs across all partition
    • unique attributes vs non-unique attributes
    • linked to the base table vs independent separate index
    • only created during the base table creation vs can be created later
    • cannot be deleted after creation vs can be deleted
    • consumes provisioned throughput capacity of the base table vs independent throughput
    • returns all attributes for item vs only projected attributes
    • Eventually or Strongly vs Only Eventually consistent reads
    • size limited to 10Gb per partition vs unlimited
  • DynamoDB Consistency
    • provides Eventually consistent (by default) or Strongly Consistent option to be specified during a read operation
    • supports Strongly consistent reads for a few operations like Query, GetItem, and BatchGetItem using the ConsistentRead parameter
  • DynamoDB Throughput Capacity
    • supports On-demand and Provisioned read/write capacity modes
    • Provisioned mode requires the number of reads and writes per second as required by the application to be specified
    • On-demand mode provides flexible billing option capable of serving thousands of requests per second without capacity planning
  • DynamoDB Auto Scaling helps dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns.
  • DynamoDB Adaptive capacity is a feature that enables DynamoDB to run imbalanced workloads indefinitely.
  • DynamoDB Global Tables provide multi-master, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
  • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table
  • DynamoDB Time to Live (TTL)
    • enables a per-item timestamp to determine when an item expiry
    • expired items are deleted from the table without consuming any write throughput.
  • DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second.
  • DynamoDB cross-region replication
    • allows identical copies (called replicas) of a DynamoDB table (called master table) to be maintained in one or more AWS regions.
    • using DynamoDB streams which leverages Kinesis and provides time-ordered sequence of item-level changes and can help for lower RPO, lower RTO disaster recovery
  • DynamoDB Triggers (just like database triggers) are a feature that allows the execution of custom actions based on item-level updates on a table.
  • VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway.

ElastiCache

  • managed web service that provides in-memory caching to deploy and run Memcached or Redis protocol-compliant cache clusters
  • ElastiCache with Redis,
    • like RDS, supports Multi-AZ, Read Replicas and Snapshots
    • Read Replicas are created across AZ within same region using Redis’s asynchronous replication technology
    • Multi-AZ differs from RDS as there is no standby, but if the primary goes down a Read Replica is promoted as primary
    • Read Replicas cannot span across regions, as RDS supports
    • cannot be scaled out and if scaled up cannot be scaled down
    • allows snapshots for backup and restore
    • AOF can be enabled for recovery scenarios, to recover the data in case the node fails or service crashes. But it does not help in case the underlying hardware fails
    • Enabling Redis Multi-AZ as a Better Approach to Fault Tolerance
  • ElastiCache with Memcached
    • can be scaled up by increasing size and scaled out by adding nodes
    • nodes can span across multiple AZs within the same region
    • cached data is spread across the nodes, and a node failure will always result in some data loss from the cluster
    • supports auto discovery
    • every node should be homogenous and of same instance type
  • ElastiCache Redis vs Memcached
    • complex data objects vs simple key value storage
    • persistent vs non persistent, pure caching
    • automatic failover with Multi-AZ vs Multi-AZ not supported
    • scaling using Read Replicas vs using multiple nodes
    • backup & restore supported vs not supported
  • can be used state management to keep the web application stateless

Redshift

  • fully managed, fast and powerful, petabyte scale data warehouse service
  • uses replication and continuous backups to enhance availability and improve data durability and can automatically recover from node and component failures
  • provides Massive Parallel Processing (MPP) by distributing & parallelizing queries across multiple physical resources
  • columnar data storage improving query performance and allowing advance compression techniques
  • only supports Single-AZ deployments and the nodes are available within the same AZ, if the AZ supports Redshift clusters
  • spot instances are NOT an option

AWS IAM Roles vs Resource Based Policies

AWS IAM Roles vs Resource-Based Policies

AWS allows granting cross-account access to AWS resources, which can be done using IAM Roles or Resource-Based Policies.

IAM Roles

  • Roles can be created to act as a proxy to allow users or services to access resources.
  • Roles support
    • trust policy which helps determine who can access the resources and
    • permission policy which helps to determine what they can access.
  • Users who assume a role temporarily give up their own permissions and instead take on the permissions of the role. The original user permissions are restored when the user exits or stops using the role.
  • Roles can be used to provide access to almost all the AWS resources.
  • Permissions provided to the User through the Role can be further restricted per user by passing an optional policy to the STS request. This policy cannot be used to elevate privileges beyond what the assumed role is allowed to access

Resource-based Policies

  • Resource-based policy allows you to attach a policy directly to the resource you want to share, instead of using a role as a proxy.
  • Resource-based policy specifies the Principal, in the form of a list of AWS account ID numbers, can access that resource and what they can access.
  • Using cross-account access with a resource-based policy, the User still works in the trusted account and does not have to give up their permissions in place of the role permissions.
  • Users can work on the resources from both accounts at the same time and this can be useful for scenarios e.g. copying objects from one bucket to the other bucket in a different AWS account.
  • Resources that you want to share are limited to resources that support resource-based policies
  • Resource-based policies need the trusted account to create users with permissions to be able to access the resources from the trusted account.
  • Only permissions equivalent to, or less than, the permissions granted to your account by the resource owning account can be delegated.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What are the two permission types used by AWS?
    1. Resource-based and Product-based
    2. Product-based and Service-based
    3. Service-based
    4. User-based and Resource-based
  2. What’s the policy used for cross-account access? (Choose 2)
    1. Trust policy
    2. Permissions Policy
    3. Key policy

References

AWS Simple Notification Service – SNS

SNS Delivery Protocols

Simple Notification Service – SNS

  • Simple Notification Service – SNS is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients.
  • SNS provides the ability to create a Topic which is a logical access point and communication channel.
  • Each topic has a unique name that identifies the SNS endpoint for publishers to post messages and subscribers to register for notifications.
  • Producers and Consumers communicate asynchronously with subscribers by producing and sending a message on a topic.
  • Producers push messages to the topic, they created or have access to, and SNS matches the topic to a list of subscribers who have subscribed to that topic and delivers the message to each of those subscribers.
  • Subscribers receive all messages published to the topics to which they subscribe, and all subscribers to a topic receive the same messages.
  • Subscribers (i.e., web servers, email addresses, SQS queues, AWS Lambda functions) consume or receive the message or notification over one of the supported protocols (i.e., SQS, HTTP/S, email, SMS, Lambda) when they are subscribed to the topic.

SNS Delivery Protocols

Accessing SNS

  • Amazon Management console
    • Amazon Management console is the web-based user interface that can be used to manage SNS
  • AWS Command-line Interface (CLI)
    • Provides commands for a broad set of AWS products, and is supported on Windows, Mac, and Linux.
  • AWS Tools for Windows Powershell
    • Provides commands for a broad set of AWS products for those who script in the PowerShell environment
  • AWS SNS Query API
    • Query API allows for requests are HTTP or HTTPS requests that use the HTTP verbs GET or POST and a Query parameter named Action
  • AWS SDK libraries
    • AWS provides libraries in various languages which provide basic functions that automate tasks such as cryptographically signing your requests, retrying requests, and handling error responses

SNS Supported Transport Protocols

  • HTTP, HTTPS – Subscribers specify a URL as part of the subscription registration; notifications will be delivered through an HTTP POST to the specified URL.
  • Email, Email-JSON – Messages are sent to registered addresses as email. Email-JSON sends notifications as a JSON object, while Email sends text-based email.
  • SQS – Users can specify an SQS queue as the endpoint; SNS will enqueue a notification message to the specified queue (which subscribers can then process using SQS APIs such as ReceiveMessage, DeleteMessage, etc.)
  • SMS – Messages are sent to registered phone numbers as SMS text messages

SNS Supported Endpoints

  • Email Notifications
    • SNS provides the ability to send Email notifications
  • Mobile Push Notifications
    • SNS provides an ability to send push notification messages directly to apps on mobile devices. Push notification messages sent to a mobile endpoint can appear in the mobile app as message alerts, badge updates, or even sound alerts
    • Supported push notification services
      • Amazon Device Messaging (ADM)
      • Apple Push Notification Service (APNS)
      • Google Cloud Messaging (GCM)
      • Windows Push Notification Service (WNS) for Windows 8+ and Windows Phone 8.1+
      • Microsoft Push Notification Service (MPNS) for Windows Phone 7+
      • Baidu Cloud Push for Android devices in China
  • SQS Queues
    • SNS with SQS provides the ability for messages to be delivered to applications that require immediate notification of an event, and also persist in an SQS queue for other applications to process at a later time
    • SNS allows applications to send time-critical messages to multiple subscribers through a “push” mechanism, eliminating the need to periodically check or “poll” for updates.
    • SQS can be used by distributed applications to exchange messages through a polling model, and can be used to decouple sending and receiving components, without requiring each component to be concurrently available.
  • SMS Notifications
    • SNS provides the ability to send and receive Short Message Service (SMS) notifications to SMS-enabled mobile phones and smart phones
  • HTTP/HTTPS Endpoints
    • SNS provides the ability to send notification messages to one or more HTTP or HTTPS endpoints.When you subscribe an endpoint to a topic, you can publish a notification to the topic and Amazon SNS sends an HTTP POST request delivering the contents of the notification to the subscribed endpoint
  • Lambda
    • SNS and Lambda are integrated so Lambda functions can be invoked with SNS notifications.
    • When a message is published to an SNS topic that has a Lambda function subscribed to it, the Lambda function is invoked with the payload of the published message
  • Kinesis Data Firehose
    • Deliver events to delivery streams for archiving and analysis purposes.
    • Through delivery streams, events can be delivered to AWS destinations like S3, Redshift, and OpenSearch Service, or to third-party destinations such as Datadog, New Relic, MongoDB, and Splunk.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following notification endpoints or clients does Amazon Simple Notification Service support? Choose 2 answers
    1. Email
    2. CloudFront distribution
    3. File Transfer Protocol
    4. Short Message Service
    5. Simple Network Management Protocol
  2. What happens when you create a topic on Amazon SNS?
    1. The topic is created, and it has the name you specified for it.
    2. An ARN (Amazon Resource Name) is created
    3. You can create a topic on Amazon SQS, not on Amazon SNS.
    4. This question doesn’t make sense.
  3. A user has deployed an application on his private cloud. The user is using his own monitoring tool. He wants to configure that whenever there is an error, the monitoring tool should notify him via SMS. Which of the below mentioned AWS services will help in this scenario?
    1. None because the user infrastructure is in the private cloud/
    2. AWS SNS
    3. AWS SES
    4. AWS SMS
  4. A user wants to make so that whenever the CPU utilization of the AWS EC2 instance is above 90%, the redlight of his bedroom turns on. Which of the below mentioned AWS services is helpful for this purpose?
    1. AWS CloudWatch + AWS SES
    2. AWS CloudWatch + AWS SNS
    3. It is not possible to configure the light with the AWS infrastructure services
    4. AWS CloudWatch and a dedicated software turning on the light
  5. A user is trying to understand AWS SNS. To which of the below mentioned end points is SNS unable to send a notification?
    1. Email JSON
    2. HTTP
    3. AWS SQS
    4. AWS SES
  6. A user is running a webserver on EC2. The user wants to receive the SMS when the EC2 instance utilization is above the threshold limit. Which AWS services should the user configure in this case?
    1. AWS CloudWatch + AWS SES
    2. AWS CloudWatch + AWS SNS
    3. AWS CloudWatch + AWS SQS
    4. AWS EC2 + AWS CloudWatch
  7. A user is planning to host a mobile game on EC2 which sends notifications to active users on either high score or the addition of new features. The user should get this notification when he is online on his mobile device. Which of the below mentioned AWS services can help achieve this functionality?
    1. AWS Simple Notification Service
    2. AWS Simple Queue Service
    3. AWS Mobile Communication Service
    4. AWS Simple Email Service
  8. You are providing AWS consulting service for a company developing a new mobile application that will be leveraging amazon SNS push for push notifications. In order to send direct notification messages to individual devices each device registration identifier or token needs to be registered with SNS, however the developers are not sure of the best way to do this. You advise them to: –
    1. Bulk upload the device tokens contained in a CSV file via the AWS Management Console
    2. Let the push notification service (e.g. Amazon Device messaging) handle the registration
    3. Implement a token vending service to handle the registration
    4. Call the CreatePlatformEndpoint API function to register multiple device tokens. (Refer documentation)
  9. A company is running a batch analysis every hour on their main transactional DB running on an RDS MySQL instance to populate their central Data Warehouse running on Redshift. During the execution of the batch their transactional applications are very slow. When the batch completes they need to update the top management dashboard with the new data. The dashboard is produced by another system running on-premises that is currently started when a manually-sent email notifies that an update is required The on-premises system cannot be modified because is managed by another team. How would you optimize this scenario to solve performance issues and automate the process as much as possible?
    1. Replace RDS with Redshift for the batch analysis and SNS to notify the on-premises system to update the dashboard
    2. Replace RDS with Redshift for the batch analysis and SQS to send a message to the on-premises system to update the dashboard
    3. Create an RDS Read Replica for the batch analysis and SNS to notify me on-premises system to update the dashboard
    4. Create an RDS Read Replica for the batch analysis and SQS to send a message to the on-premises system to update the dashboard.
  10. Which of the following are valid SNS delivery transports? Choose 2 answers.
    1. HTTP
    2. UDP
    3. SMS
    4. DynamoDB
    5. Named Pipes
  11. What is the format of structured notification messages sent by Amazon SNS?
    1. An XML object containing MessageId, UnsubscribeURL, Subject, Message and other values
    2. An JSON object containing MessageId, DuplicateFlag, Message and other values
    3. An XML object containing MessageId, DuplicateFlag, Message and other values
    4. An JSON object containing MessageId, unsubscribeURL, Subject, Message and other values
  12. Which of the following are valid arguments for an SNS Publish request? Choose 3 answers.
    1. TopicAm
    2. Subject
    3. Destination
    4. Format
    5. Message
    6. Language

References

AWS EBS Performance

AWS EBS Performance Tips

  • EBS Performance depends on several factors including I/O characteristics,  instances and volumes configuration and can be improved using PIOPS, EBS-Optimized instances, Pre-Warming, and RAIDed configuration.

EBS-Optimized or 10 Gigabit Network Instances

  • An EBS-Optimized instance uses an optimized configuration stack and provides additional, dedicated capacity for EBS I/O.
  • Optimization provides the best performance for the EBS volumes by minimizing contention between EBS I/O and other traffic from an instance
  • EBS-Optimized instances deliver dedicated throughput to EBS depending on the instance type used.
  • Not all instance types support EBS-Optimization
  • Some Instance types enable EBS-Optimization by default, while it can be enabled for some.
  • EBS optimization enabled for an instance, that is not EBS-Optimized by default, an additional low, hourly fee for the dedicated capacity is charged
  • When attached to an EBS–optimized instance,
    • General Purpose (SSD) volumes are designed to deliver within 10% of their baseline and burst performance 99.9% of the time in a given year
    • Provisioned IOPS (SSD) volumes are designed to deliver within 10% of their provisioned performance 99.9 percent of the time in a given year.

EBS Volume Initialization – Pre-warming

  • Empty EBS volumes receive their maximum performance the moment that they are available and DO NOT require initialization (pre-warming).
  • EBS volumes needed a pre-warming, previously, before being used to get maximum performance to start with. Pre-warming of the volume was possible by writing to the entire volume with 0 for new volumes or reading the entire volume for volumes from snapshots.
  • Storage blocks on volumes that were restored from snapshots must be initialized (pulled down from S3 and written to the volume) before the block can be accessed.
  • This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed.
  • To avoid this initial performance hit in a production environment, the following options can be used
    • Force the immediate initialization of the entire volume by using the dd or fio utilities to read from all of the blocks on a volume.
    • Enable fast snapshot restore – FSR on a snapshot to ensure that the EBS volumes created from it are fully-initialized at creation and instantly deliver all of their provisioned performance.

RAID Configuration

  • EBS volumes can be striped, if a single EBS volume does not meet the performance and more is required.
  • Striping volumes allows pushing tens of thousands of IOPS.
  • EBS volumes are already replicated across multiple servers in an AZ for availability and durability, so AWS generally recommend striping for performance rather than durability.
  • For greater I/O performance than can be achieved with a single volume, RAID 0 can stripe multiple volumes together; for on-instance redundancy, RAID 1 can mirror two volumes together.
  • RAID 0 allows I/O distribution across all volumes in a stripe, allowing straight gains with each addition.
  • RAID 1 can be used for durability to mirror volumes, but in this case, it requires more EC2 to EBS bandwidth as the data is written to multiple volumes simultaneously and should be used with EBS–optimization.
  • EBS volume data is replicated across multiple servers in an AZ to prevent the loss of data from the failure of any single component
  • AWS doesn’t recommend RAID 5 and 6 because the parity write operations of these modes consume the IOPS available for the volumes and can result in 20-30% fewer usable IOPS than RAID 0.
  • A 2-volume RAID 0 config can outperform a 4-volume RAID 6 that costs twice as much.

RAID Configuration

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user is trying to pre-warm a blank EBS volume attached to a Linux instance. Which of the below mentioned steps should be performed by the user?
    1. There is no need to pre-warm an EBS volume (with latest update no pre-warming is needed)
    2. Contact AWS support to pre-warm (This used to be the case before, but pre warming is not necessary now)
    3. Unmount the volume before pre-warming
    4. Format the device
  2. A user has created an EBS volume of 10 GB and attached it to a running instance. The user is trying to access EBS for first time. Which of the below mentioned options is the correct statement with respect to a first time EBS access?
    1. The volume will show a size of 8 GB
    2. The volume will show a loss of the IOPS performance the first time (the volume needed to be wiped cleaned before for new volumes, however pre warming is not needed any more)
    3. The volume will be blank
    4. If the EBS is mounted it will ask the user to create a file system
  3. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  4. You have launched an EC2 instance with four (4) 500 GB EBS Provisioned IOPS volumes attached. The EC2 Instance is EBS-Optimized and supports 500 Mbps throughput between EC2 and EBS. The two EBS volumes are configured as a single RAID 0 device, and each Provisioned IOPS volume is provisioned with 4,000 IOPS (4000 16KB reads or writes) for a total of 16,000 random IOPS on the instance. The EC2 Instance initially delivers the expected 16,000 IOPS random read and write performance. Sometime later in order to increase the total random I/O performance of the instance, you add an additional two 500 GB EBS Provisioned IOPS volumes to the RAID. Each volume is provisioned to 4,000 IOPS like the original four for a total of 24,000 IOPS on the EC2 instance Monitoring shows that the EC2 instance CPU utilization increased from 50% to 70%, but the total random IOPS measured at the instance level does not increase at all. What is the problem and a valid solution?
    1. Larger storage volumes support higher Provisioned IOPS rates: increase the provisioned volume storage of each of the 6 EBS volumes to 1TB.
    2. EBS-Optimized throughput limits the total IOPS that can be utilized use an EBS-Optimized instance that provides larger throughput. (EC2 Instance types have limit on max throughput and would require larger instance types to provide 24000 IOPS)
    3. Small block sizes cause performance degradation, limiting the I’O throughput, configure the instance device driver and file system to use 64KB blocks to increase throughput.
    4. RAID 0 only scales linearly to about 4 devices, use RAID 0 with 4 EBS Provisioned IOPS volumes but increase each Provisioned IOPS EBS volume to 6.000 IOPS.
    5. The standard EBS instance root volume limits the total IOPS rate, change the instant root volume to also be a 500GB 4,000 Provisioned IOPS volume
  5. A user has deployed an application on an EBS backed EC2 instance. For a better performance of application, it requires dedicated EC2 to EBS traffic. How can the user achieve this?
    1. Launch the EC2 instance as EBS provisioned with PIOPS EBS
    2. Launch the EC2 instance as EBS enhanced with PIOPS EBS
    3. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    4. Launch the EC2 instance as EBS optimized with PIOPS EBS

AWS Identity Services Cheat Sheet

AWS Identity Services Cheat Sheet

AWS Identity and Security Services

IAM – Identity & Access Management

  • securely control access to AWS services and resources
  • helps create and manage user identities and grant permissions for those users to access AWS resources
  • helps create groups for multiple users with similar permissions
  • not appropriate for application authentication
  • is Global and does not need to be migrated to a different region
  • helps define Policies,
    • in JSON format
    • all permissions are implicitly denied by default
    • most restrictive policy wins
  • IAM Role
    • helps grants and delegate access to users and services without the need of creating permanent credentials
    • IAM users or AWS services can assume a role to obtain temporary security credentials that can be used to make AWS API calls
    • needs Trust policy to define who and Permission policy to define what the user or service can access
    • used with Security Token Service (STS), a lightweight web service that provides temporary, limited privilege credentials for IAM users or for authenticated federated users
    • IAM role scenarios
      • Service access for e.g. EC2 to access S3 or DynamoDB
      • Cross Account access for users
        • with user within the same account
        • with user within an AWS account owned the same owner
        • with user from a Third Party AWS account with External ID for enhanced security
      • Identity Providers & Federation
        • AssumeRoleWithWebIdentity – Web Identity Federation, where the user can be authenticated using external authentication Identity providers like Amazon, Google or any OpenId IdP
        • AssumeRoleWithSAML – Identity Provider using SAML 2.0, where the user can be authenticated using on premises Active Directory, Open Ldap or any SAML 2.0 compliant IdP
        • AssumeRole (recommended) or GetFederationToken – For other Identity Providers, use Identity Broker to authenticate and provide temporary Credentials
  • IAM Best Practices
    • Do not use Root account for anything other than billing
    • Create Individual IAM users
    • Use groups to assign permissions to IAM users
    • Grant least privilege
    • Use IAM roles for applications on EC2
    • Delegate using roles instead of sharing credentials
    • Rotate credentials regularly
    • Use Policy conditions for increased granularity
    • Use CloudTrail to keep a history of activity
    • Enforce a strong IAM password policy for IAM users
    • Remove all unused users and credentials

AWS Organizations

  • is an account management service that enables consolidating multiple AWS accounts into an organization that can be centrally managed.
  • include consolidated billing and account management capabilities that enable one to better meet the budgetary, security, and compliance needs of your business.
  • As an administrator of an organization, new accounts can be created in an organization and invite existing accounts to join the organization.
  • enables you to
    • Automate AWS account creation and management, and provision resources with AWS CloudFormation Stacksets.
    • Maintain a secure environment with policies and management of AWS security services
    • Govern access to AWS services, resources, and regions
    • Centrally manage policies across multiple AWS accounts
    • Audit your environment for compliance 
    • View and manage costs with consolidated billing 
    • Configure AWS services across multiple accounts 
  • supports Service Control Policies – SCPs
  • offer central control over the maximum available permissions for all of the accounts in your organization, ensuring member accounts stay within the organization’s access control guidelines.
  • are one type of policy that help manage the organization.
  • are available only in an organization that has all features enabled, and aren’t available if the organization has enabled only the consolidated billing features.
  • are NOT sufficient for granting access to the accounts in the organization.
  • defines a guardrail for what actions accounts within the organization root or OU can do, but IAM policies need to be attached to the users and roles in the organization’s accounts to grant permissions to them.
  • Effective permissions are the logical intersection between what is allowed by the SCP and what is allowed by the IAM and resource-based policies.
  • with an SCP attached to member accounts, identity-based and resource-based policies grant permissions to entities only if those policies and the SCP allow the action
  • don’t affect users or roles in the management account. They affect only the member accounts in your organization.

AWS Directory Services

  • gives applications in AWS access to Active Directory services
  • different from SAML + AD, where the access is granted to AWS services through Temporary Credentials
  • Simple AD
    • least expensive but does not support Microsoft AD advanced features
    • provides a Samba 4 Microsoft Active Directory compatible standalone directory service on AWS
    • No single point of Authentication or Authorization, as a separate copy is maintained
    • trust relationships cannot be setup between Simple AD and other Active Directory domains
    • Don’t use it, if the requirement is to leverage access and control through centralized authentication service
  • AD Connector
    • acts just as an hosted proxy service for instances in AWS to connect to on-premises Active Directory
    • enables consistent enforcement of existing security policies, such as password expiration, password history, and account lockouts, whether users are accessing resources on-premises or in the AWS cloud
    • needs VPN connectivity (or Direct Connect)
    • integrates with existing RADIUS-based MFA solutions to enabled multi-factor authentication
    • does not cache data which might lead to latency
  • Read-only Domain Controllers (RODCs)
    • works out as a Read-only Active Directory
    • holds a copy of the Active Directory Domain Service (AD DS) database and respond to authentication requests
    • they cannot be written to and are typically deployed in locations where physical security cannot be guaranteed
    • helps maintain a single point to authentication & authorization controls, however needs to be synced
  • Writable Domain Controllers
    • are expensive to setup
    • operate in a multi-master model; changes can be made on any writable server in the forest, and those changes are replicated to servers throughout the entire forest

AWS Single Sign-On SSO

  • is a cloud-based single sign-on (SSO) service that makes it easy to centrally manage SSO access to all of the AWS accounts and cloud applications.
  • helps manage access and permissions to commonly used third-party software as a service (SaaS) applications, AWS SSO-integrated applications as well as custom applications that support SAML 2.0.
  • includes a user portal where the end-users can find and access all their assigned AWS accounts, cloud applications, and custom applications in one place.

Amazon Cognito

  • Amazon Cognito provides authentication, authorization, and user management for the web and mobile apps.
  • Users can sign in directly with a username and password, or through a third party such as Facebook, Amazon, Google, or Apple.
  • Cognito has two main components.
    • User pools are user directories that provide sign-up and sign-in options for the app users.
    • Identity pools enable you to grant the users access to other AWS services.
  • Cognito Sync helps synchronize data across a user’s devices so that their app experience remains consistent when they switch between devices or upgrade to a new device.

AWS Security Services Cheat Sheet

AWS Identity and Security Services

AWS Security Services Cheat Sheet

AWS Identity and Security Services

Key Management Service – KMS

  • is a managed encryption service that allows the creation and control of encryption keys to enable data encryption.
  • provides a highly available key storage, management, and auditing solution to encrypt the data across AWS services & within applications.
  • uses hardware security modules (HSMs) to protect and validate the KMS keys by the FIPS 140-2 Cryptographic Module Validation Program.
  • seamlessly integrates with several AWS services to make encrypting data in those services easy.
  • supports multi-region keys, which are AWS KMS keys in different AWS Regions. Multi-Region keys are not global and each multi-region key needs to be replicated and managed independently.

CloudHSM

  • provides secure cryptographic key storage to customers by making hardware security modules (HSMs) available in the AWS cloud
  • helps manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs.
  • single tenant, dedicated physical device to securely generate, store, and manage cryptographic keys used for data encryption
  • are inside the VPC (not EC2-classic) & isolated from the rest of the network
  • can use VPC peering to connect to CloudHSM from multiple VPCs
  • integrated with Amazon Redshift and Amazon RDS for Oracle
  • EBS volume encryption, S3 object encryption and key management can be done with CloudHSM but requires custom application scripting
  • is NOT fault-tolerant and would need to build a cluster as if one fails all the keys are lost
  • enables quick scaling by adding and removing HSM capacity on-demand, with no up-front costs.
  • automatically load balance requests and securely duplicates keys stored in any HSM to all of the other HSMs in the cluster.
  • expensive, prefer AWS Key Management Service (KMS) if cost is a criteria.

AWS WAF

  • is a web application firewall that helps monitor the HTTP/HTTPS traffic and allows controlling access to the content.
  • helps protect web applications from attacks by allowing rules configuration that allow, block, or monitor (count) web requests based on defined conditions. These conditions include IP addresses, HTTP headers, HTTP body, URI strings, SQL injection and cross-site scripting.
  • helps define Web ACLs, which is a combination of Rules that is a combinations of Conditions and Action to block or allow
  • integrated with CloudFront, Application Load Balancer (ALB), API Gateway services commonly used to deliver content and applications
  • supports custom origins outside of AWS, when integrated with CloudFront

AWS Secrets Manager

  • helps protect secrets needed to access applications, services, and IT resources.
  • enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle.
  • secure secrets by encrypting them with encryption keys managed using AWS KMS.
  • offers native secret rotation with built-in integration for RDS, Redshift, and DocumentDB.
  • supports Lambda functions to extend secret rotation to other types of secrets, including API keys and OAuth tokens.
  • supports IAM and resource-based policies for fine-grained access control to secrets and centralized secret rotation audit for resources in the AWS Cloud, third-party services, and on-premises.
  • enables secret replication in multiple AWS regions to support multi-region applications and disaster recovery scenarios.
  • supports private access using VPC Interface endpoints

AWS Shield

  • is a managed service that provides protection against Distributed Denial of Service (DDoS) attacks for applications running on AWS
  • provides protection for all AWS customers against common and most frequently occurring infrastructure (layer 3 and 4) attacks like SYN/UDP floods, reflection attacks, and others to support high availability of applications on AWS.
  • provides AWS Shield Advanced with additional protections against more sophisticated and larger attacks for applications running on EC2, ELB, CloudFront, AWS Global Accelerator, and Route 53.

AWS GuardDuty

  • offers threat detection that enables continuous monitoring and protects the AWS accounts and workloads.
  • is a Regional service
  • analyzes continuous streams of meta-data generated from AWS accounts and network activity found in AWS CloudTrail Events, EKS audit logs, VPC Flow Logs, and DNS Logs.
  • integrated threat intelligence
  • combines machine learning, anomaly detection, network monitoring, and malicious file discovery, utilizing both AWS-developed and industry-leading third-party sources to help protect workloads and data on AWS
  • supports suppression rules, trusted IP lists, and thread lists.
  • provides Malware Protection to detect malicious files on EBS volumes
  • operates completely independently from the resources so there is no risk of performance or availability impacts on the workloads.

Amazon Inspector

  • is a vulnerability management service that continuously scans the AWS workloads for vulnerabilities
  • automatically discovers and scans EC2 instances and container images residing in Elastic Container Registry (ECR) for software vulnerabilities and unintended network exposure.
  • creates a finding, when a software vulnerability or network issue is discovered, that describes the vulnerability, rates its severity, identifies the affected resource,  and provides remediation guidance.
  • is a Regional service.
  • requires Systems Manager (SSM) agent to be installed and enabled.

Amazon Detective

  • helps analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities.
  • automatically collects log data from the AWS resources and uses machine learning, statistical analysis, and graph theory to build a linked set of data to easily conduct faster and more efficient security investigations.
  • enables customers to view summaries and analytical data associated with CloudTrail logs, EKS audit logs, VPC Flow Logs.
  • provides detailed summaries, analysis, and visualizations of the behaviors and interactions amongst your AWS accounts, EC2 instances, AWS users, roles, and IP addresses.
  • maintains up to a year of aggregated data
  • is a Regional service and needs to be enabled on a region-by-region basis.
  • is a multi-account service that aggregates data from monitored member accounts under a single administrative account within the same region.
  • has no impact on the performance or availability of the AWS infrastructure since it retrieves the log data and findings directly from the AWS services.

AWS Security Hub

  • a cloud security posture management service that performs security best practice checks, aggregates alerts, and enables automated remediation.
  • collects security data from across AWS accounts, services, and supported third-party partner products and helps you analyze your security trends and identify the highest priority security issues.
  • is Regional abut supports cross-region aggregation of findings.
  • automatically runs continuous, account-level configuration and security checks based on AWS best practices and industry standards which include CIS Foundations, PCI DSS.
  • consolidates the security findings across accounts and provider products and displays results on the Security Hub console.
  • supports integration with Amazon EventBridge. Custom actions can be defined when a finding is received.
  • has multi-account management through AWS Organizations integration, which allows delegating an administrator account for the organization.
  • works with AWS Config to perform most of its security checks for controls

AWS Macie

  • Macie is a data security service that discovers sensitive data by using machine learning and pattern matching, provides visibility into data security risks, and enables automated protection against those risks.
  • provides an inventory of the S3 buckets and automatically evaluates and monitors the buckets for security and access control.
  • automates the discovery, classification, and reporting of sensitive data.
  • generates a finding for you to review and remediate as necessary if it detects a potential issue with the security or privacy of the data, such as a bucket that becomes publicly accessible.
  • provides multi-account support using AWS Organizations to enable Macie across all of the accounts.
  • is a regional service and must be enabled on a region-by-region basis and helps view findings across all the accounts within each Region.
  • supports VPC Interface Endpoints to access Macie privately from a VPC without an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.

AWS Artifact

  • is a self-service audit artifact retrieval portal that provides customers with on-demand access to AWS’ compliance documentation and agreements
  • can use AWS Artifact Reports to download AWS security and compliance documents, such as AWS ISO certifications, Payment Card Industry (PCI), and System and Organization Control (SOC) reports.

References

AWS_Security_Products

AWS Simple Email Service – SES

AWS Simple Email Service – SES

  • SES is a fully managed service that provides an email platform with an easy, cost-effective way to send and receive email using your own email addresses and domains.
  • can be used to send both transactional and promotional emails securely, and globally at scale.
  • acts as an outbound email server and eliminates the need to support its own software or applications to do the heavy lifting of email transport.
  • acts as an inbound email server to receive emails that can help develop software solutions such as email autoresponders, email unsubscribe systems, and applications that generate customer support tickets from incoming emails.
  • existing email server can also be configured to send outgoing emails through SES with no change in any settings in the email clients
  • Maximum message size including attachments is 10 MB per message (after base64 encoding).
  • integrated with CloudWatch and CloudTrail

SES Characteristics

  • Compatible with SMTP
  • Applications can send email using a single API call in many supported languages Java, .Net, PHP, Perl, Ruby, HTTPS, etc
  • Optimized for the highest levels of uptime, availability, and scales as per the demand
  • Provides sandbox environment for testing
  • provides Reputation dashboard, performance insights, anti-spam feedback
  • provides statistics on email deliveries, bounces, feedback loop results, emails opened, etc.
  • supports DomainKeys Identified Mail (DKIM) and Sender Policy Framework (SPF)
  • supports flexible deployment: shared, dedicated, and customer-owned IPs
  • supports attachments with many popular content formats, including documents, images, audio, and video, and scans every attachment for viruses and malware.
  • integrates with KMS to provide the ability to encrypt the mail that it writes to the S3 bucket.
  • uses client-side encryption to encrypt the mail before it sends the email to  S3.

Sending Limits

  • Production SES has a set of sending limits which include
    • Sending Quota – max number of emails in a 24-hour period
    • Maximum Send Rate – max number of emails per second
  • SES automatically adjusts the limits upward as long as emails are of high quality and they are sent in a controlled manner, as any spike in the email sent might be considered to be spam.
  • Limits can also be raised by submitting a Quota increase request

SES Best Practices

  • Send high-quality and real production content that the recipients want
  • Only send to those who have signed up for the mail
  • Unsubscribe recipients who have not interacted with the business recently
  • Have low bounce and compliant rates and remove bounced or complained addresses, using SNS to monitor bounces and complaints, treating them as an opt-out
  • Monitor the sending activity

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon SES stand for?
    1. Simple Elastic Server
    2. Simple Email Service
    3. Software Email Solution
    4. Software Enabled Server
  2. Your startup wants to implement an order fulfillment process for selling a personalized gadget that needs an average of 3-4 days to produce with some orders taking up to 6 months you expect 10 orders per day on your first day. 1000 orders per day after 6 months and 10,000 orders after 12 months. Orders coming in are checked for consistency men dispatched to your manufacturing plant for production quality control packaging shipment and payment processing If the product does not meet the quality standards at any stage of the process employees may force the process to repeat a step Customers are notified via email about order status and any critical issues with their orders such as payment failure. Your case architecture includes AWS Elastic Beanstalk for your website with an RDS MySQL instance for customer data and orders. How can you implement the order fulfillment process while making sure that the emails are delivered reliably? [PROFESSIONAL]
    1. Add a business process management application to your Elastic Beanstalk app servers and re-use the ROS database for tracking order status use one of the Elastic Beanstalk instances to send emails to customers.
    2. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1 Use the decider instance to send emails to customers.
    3. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1 use SES to send emails to customers.
    4. Use an SQS queue to manage all process tasks Use an Auto Scaling group of EC2 Instances that poll the tasks and execute them. Use SES to send emails to customers.

References

AWS SageMaker Built-in Algorithms Summary

SageMaker Built-in Algorithms

BlazingText algorithm

    • provides highly optimized implementations of the Word2vec and text classification algorithms.
    • Word2vec algorithm
      • useful for many downstream natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, machine translation, etc.
      • maps words to high-quality distributed vectors, whose representation is called word embeddings
      • word embeddings capture the semantic relationships between words.
    • Text classification
      • is an important task for applications performing web searches, information retrieval, ranking, and document classification
    • provides the Skip-gram and continuous bag-of-words (CBOW) training architectures

DeepAR forecasting algorithm

    • is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN).
    • use the trained model to generate forecasts for new time series that are similar to the ones it has been trained on.

Factorization machine

    • is a general-purpose supervised learning algorithm used for both classification and regression tasks.
    • extension of a linear model designed to capture interactions between features within high dimensional sparse datasets economically

Image classification algorithm

    • a supervised learning algorithm that supports multi-label classification
    • takes an image as input and outputs one or more labels
    • uses a convolutional neural network (ResNet) that can be trained from scratch or trained using transfer learning when a large number of training images are not available.
    • recommended input format is Apache MXNet RecordIO. Also supports raw images in .jpg or .png format.

IP Insights

    • is an unsupervised learning algorithm that learns the usage patterns for IPv4 addresses.
    • designed to capture associations between IPv4 addresses and various entities, such as user IDs or account numbers

K-means algorithm

    • is an unsupervised learning algorithm for clustering
    • attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups

K-nearest neighbors (k-NN) algorithm

    • is an index-based algorithm.
    • uses a non-parametric method for classification or regression.
    • For classification problems, the algorithm queries the k points that are closest to the sample point and returns the most frequently used label of their class as the predicted label.
    • For regression problems, the algorithm queries the k closest points to the sample point and returns the average of their feature values as the predicted value.

Latent Dirichlet Allocation (LDA) algorithm

    • is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories.
    • used to discover a user-specified number of topics shared by documents within a text corpus.

Linear Learner

    • are supervised learning algorithms used for solving either classification or regression problems

Neural Topic Model (NTM) Algorithm

    • is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution
    • Topic modeling can be used to classify or summarize documents based on the topics detected or to retrieve information or recommend content based on topic similarities.

Object2Vec algorithm

    • is a general-purpose neural embedding algorithm that is highly customizable
    • can learn low-dimensional dense embeddings of high-dimensional objects.

Object Detection algorithm

    • detects and classifies objects in images using a single deep neural network.
    • is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene.

Principal Component Analysis

    • is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible

Random Cut Forest (RCF)

    • is an unsupervised algorithm for detecting anomalous data points within a data set.

Semantic segmentation algorithm

    • provides a fine-grained, pixel-level approach to developing computer vision applications

SageMaker Sequence to Sequence (seq2seq)

    • is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens.
    • key uses cases are machine translation (input a sentence from one language and predict what that sentence would be in another language), text summarization (input a longer string of words and predict a shorter string of words that is a summary), speech-to-text (audio clips converted into output sentences in tokens)

XGBoost (eXtreme Gradient Boosting)

    • is a popular and efficient open-source implementation of the gradient boosted trees algorithm.
    • Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler, weaker models

AWS Certified Big Data -Speciality (BDS-C00) Exam Learning Path

Clearing the AWS Certified Big Data – Speciality (BDS-C00) was a great feeling. This was my third Speciality certification and in terms of the difficulty level (compared to Network and Security Speciality exams), I would rate it between Network (being the toughest) Security (being the simpler one).

Big Data in itself is a very vast topic and with AWS services, there is lots to cover and know for the exam. If you have worked on Big Data technologies including a bit of Visualization and Machine learning, it would be a great asset to pass this exam.

AWS Certified Big Data – Speciality (BDS-C00) exam basically validates

  • Implement core AWS Big Data services according to basic architectural best practices
  • Design and maintain Big Data
  • Leverage tools to automate Data Analysis

Refer AWS Certified Big Data – Speciality Exam Guide for details

                              AWS Certified Big Data – Speciality Domains

AWS Certified Big Data – Speciality (BDS-C00) Exam Summary

  • AWS Certified Big Data – Speciality exam, as its name suggests, covers a lot of Big Data concepts right from data transfer and collection techniques, storage, pre and post processing, analytics, visualization with the added concepts for data security at each layer.
  • One of the key tactic I followed when solving any AWS Certification exam is to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
  • Be sure to cover the following topics
    • Whitepapers and articles
    • Analytics
      • Make sure you know and cover all the services in depth, as 80% of the exam is focused on these topics
      • Elastic Map Reduce
        • Understand EMR in depth
        • Understand EMRFS (hint: Use Consistent view to make sure S3 objects referred by different applications are in sync)
        • Know EMR Best Practices (hint: start with many small nodes instead on few large nodes)
        • Know Hive can be externally hosted using RDS, Aurora and AWS Glue Data Catalog
        • Know also different technologies
          • Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources
          • D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS
          • Spark is a distributed processing framework and programming model that helps do machine learning, stream processing, or graph analytics using Amazon EMR clusters
          • Zeppelin/Jupyter as a notebook for interactive data exploration and provides open-source web application that can be used to create and share documents that contain live code, equations, visualizations, and narrative text
          • Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store
      • Kinesis
        • Understand Kinesis Data Streams and Kinesis Data Firehose in depth
        • Know Kinesis Data Streams vs Kinesis Firehose
          • Know Kinesis Data Streams is open ended on both producer and consumer. It supports KCL and works with Spark.
          • Know Kineses Firehose is open ended for producer only. Data is stored in S3, Redshift and ElasticSearch.
          • Kinesis Firehose works in batches with minimum 60secs interval.
        • Understand Kinesis Encryption (hint: use server side encryption or encrypt in producer for data streams)
        • Know difference between KPL vs SDK (hint: PutRecords are synchronously, while KPL supports batching)
        • Kinesis Best Practices (hint: increase performance increasing the shards)
      • Know ElasticSearch is a search service which supports indexing, full text search, faceting etc.
      • Redshift
        • Understand Redshift in depth
        • Understand Redshift Advance topics like Workload Management, Distribution Style, Sort key
        • Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, COPY command which allows parallelism
        • Know Redshift views to control access to data.
      • Amazon Machine Learning
      • Know Data Pipeline for data transfer
      • QuickSight
      • Know Glue as the ETL tool
    • Security, Identity & Compliance
    • Management & Governance Tools
      • Understand AWS CloudWatch for Logs and Metrics. Also, CloudWatch Events more real time alerts as compared to CloudTrail
    • Storage
    • Compute
      • Know EC2 access to services using IAM Role and Lambda using Execution role.
      • Lambda esp. how to improve performance batching, breaking functions etc.

AWS Certified Big Data – Speciality (BDS-C00) Exam Resources