AWS Certified Alexa Skill Builder – Specialty (AXS-C01) Exam Learning Path

AWS Certified Alexa Skill Builder - Specialty Certificate

Finally All Down for AWS (for now) …

Continuing on my AWS journey with the last AWS certification, I took another step by clearing the AWS Certified Alexa Skill Builder – Specialty (AXS-C01) certification. It is amazing to know and learn how Voice first experiences are making an impact and changing how we think about technology and use cases.

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) exam basically validates your ability to build, test, publish and certify Alexa skills.

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) Exam Summary

  • AWS Certified Alexa Skill Builder – Specialty exam focuses only on Alexa and how to build skills.
  • AWS Certified Alexa Skill Builder – Specialty exam has 65 questions with a time limit of 170 minutes
  • Compared to the other professional and specialty exams, the question and answers are not long and similar to associate exams. So if you are prepared well, it should not need the 170 minutes.
  • As the exam was online from home, there was no access to paper and pen but the trick remains the same, read the question and draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) Exam Topic Summary

Refer AWS Alexa Cheat Sheet

Domain 1: Voice-First Design Practices and Capabilities

1.1 Describe how users interact with skills

1.2 Map features and capabilities to use cases

  • Alexa supports display cards to display text (Simple card) and text with image (Standard card)
  • Alexa Alexa Skill Kits supports APIs
    • Alexa Settings APIs allow developers to retrieve customer preferences for the settings like time zone, distance measuring unit, and temperature measurement unit 
    • Device services – a skill can request the customer’s permission to their address information, which is a static data filled by customer and includes the country/region, postal code and full address
    • Customer Profile services – a skill can request the customer’s permission to their contact information, which includes name, email address and phone number
    • With Location services, a skill can ask a user’s permission to obtain the real-time location of their Alexa-enabled device, specifically at the time of the user’s request to Alexa, so that the skill can provide enhanced services.
  • Alexa Skill Kit APIs need apiAccessToken and deviceId to access the ASK APIs
  • Progressive Response API allows you to keep the user engaged while the skill prepares a full response to the user’s request.
  • Personalization can be provided using userId and state persistence

Domain 2: Skill Design

2.1 Design and develop an interaction model

  • Alexa interaction model includes skill, Invocation name, utterances, slots, Intents
  • A skill is ‘an app for Alexa’, however they are not downloadable but just need to be enabled.
  • Wakeword – Amazon offers a choice of wakewords like ‘Alexa’, ‘Amazon’, ‘Echo’, ‘skill’, ‘app’ or ‘Computer’, with the default being ‘Alexa’.
  • Launch phrases include “run,” “start,” “play,” “resume,” “use,” “launch,” “ask,” “open,” “tell,” “load,” “begin,” and “enable.”
  • Connecting words include “to,” “from,” “in,” “using,” “with,” “about,” “for,” “that,” “by,” “if,” “and,” “whether.”
  • Invocation name
    • is the word or phrase used to trigger the skill for custom skills and the invocation name should adhere to the requirements
    • must not infringe upon the intellectual property rights of an entity or person
    • must be compound of two or more works.
    • One-word invocation names are allowed only for brand/intellectual property.
    • must not include names of people or places
    • if two-word invocation names, one of the words cannot be a definite article (“the”), indefinite article (“a”, “an”) or preposition (“for”, “to”, “of,” “about,” “up,” “by,” “at,” “off,” “with”).
    • must not contain any of the Alexa skill launch phrases, connecting words and wake words
    • must contain only lower-case alphabetic characters, spaces between words, and possessive apostrophes
    • must spell characters like numbers for e.g., twenty one
    • can have periods in the invocation names containing acronyms or abbreviations that are pronounced as a series of individual letters, for e.g. NASA as n. a. s. a.
    • cannot spell out phonemes for e.g., a skill titled “AWS Facts” would need “AWS” represented as “a. w. s. ” and NOT “ay double u ess.”
    • must not create confusion with existing Alexa features.
    • must be written in each supported language
  • An intent is what a user is trying to accomplish.
    • Amazon provides standard built-in intents which can be extended
    • Intents need to have a unique utterance
  • Utterances are the specific phrases that people will use when making a request to Alexa.
  • A slot is a variable that relates to an intent allowing Alexa to understand information about the request
    • Amazon provides standard built-in slots which can be extended
  • Entity resolution improves the way Alexa matches possible slot values in a user’s utterance with the slots defined in your interaction model

2.2 Design a multi-turn conversation

  • Alexa Dialog management model identifies the prompts and utterances to collect, validate, and confirm the slot values and intents.
  • Alexa supports
    • Auto Delegation where Alexa completes all of the dialog steps based on the dialog model.
    • Manual delegation using Dialog.Delegate where Alexa sends the skill an IntentRequest for each turn of the conversation and provides more flexibility.
  • AMAZON.FallbackIntent will not be triggered in the middle of a dialog

2.3 Use built-in intents and slots

  • Standard built-in intents cannot include any slots. If slots are needed, create a custom intent and write your own sample utterances.
  • Alexa recommends using and extending standard built-in intents like Alexa.HelpIntent, Alexa.YesIntent with additional utterances as per the skill requirements
  • Alexa provides Alexa.FallbackIntent for handling any unmatched utterances and can be used to improve the interaction model accuracy.
  • Standard built-in intents cannot include any slots. If slots are needed, create a custom intent and write your own sample utterances.
  • Alexa provides slot which helps capture variables and can be either be a Amazon predefined slot such as dates, numbers, durations, time, etc. or a custom one specific to the skill
  • Predefined slots can be extended to add additional values

2.4 Handle unexpected conversational requests or responses

  • Alexa provides Alexa.FallbackIntent for handling any unmatched utterances and can be used to improve the interaction model accuracy.
  • Alexa also provides Intent History  which provides  a consolidate view with aggregated, anonymized frequent utterances and the resolved intents. These can be used to map the utterances to correct intents

2.5 Design multi-modal skills using one or more service interfaces (for example, audio, video, and gadgets)

  • Alexa enabled devices with a screen handles Page and Scroll intents. Do not handle Next and Previous.
  • Alexa skill with AudioPlayer interface
    • must handle AMAZON.ResumeIntent and AMAZON.PauseIntent
    • PlaybackController events to track AudioPlayer status changes initiated from the device buttons

Domain 3: Skill Architecture

3.1 Identify AWS services for extending Alexa skill functionality (Amazon CloudFront, Amazon S3, Amazon CloudWatch, and Amazon DynamoDB)

  • Focus on standard skill architecture using Lambda for backend, DynamoDB for persistence, S3 for severing static assets, and CloudWatch for monitoring and logs.
  • Lambda provide serverless handling for the Alexa requests, but remember the following limits
    • default concurrency soft limit of 1000 can be increased by raising a support request
    • default timeout of 3 secs, and should be increased to atleast 7 secs to be inline with Alexa timeout of 8 secs
    • default memory of 128mb, increase to improve performance
  • S3 performance can be improved by exposing it through CloudFront esp. for images, audio and video files

3.2 Use AWS Lambda to build Alexa skills

  • Lambda integrates with CloudWatch to provide logs and should be the first thing to check in case of any issues or errors.
  • Alexa allows any http endpoint to act as a backend, but needs to meet following requirements
    • must be accessible over the internet.
    • must accept HTTP requests on port 443.
    • must support HTTP over SSL/TLS, using an Amazon-trusted certificate.

3.3 Follow AWS and Alexa security and privacy best practices

  • Alexa requires the backend to verify that incoming requests come from Alexa using Skill ID verification
  • Child-directed skills cannot use personal and location information
  • Skills cannot be used to capture health information
  • Alexa Skills Kit uses the OAuth 2.0 authentication framework for Account linking, which defines a means by which the service can allow Alexa, with the user’s permission, to access information from the account that the user has set up with you.
  • Alexa smart home skills must have OAuth authorization code grant implementation while custom skills can have authorization code grant or impact grant implementation.

Domain 4: Skill Development

4.1 Implement in-skill purchasing and Amazon Pay for Alexa Skills

  • In-skill purchasing enables selling premium content such as game features and interactive stories in skills with a custom interaction model.
  • In-skill purchasing is handled by Alexa when the skill sends a Upsell directive. As the skill session ends when a Upsell directive is sent, be sure to save any relevant user data in a persistent data store so that the skill can continue where the user left off after the purchase flow is completed and the endpoint is back in control of the user experience.
  • Skill can handle the Connections.Response request that indicates the result of a purchase flow and resume the skill

4.2 Use Speech Synthesis Markup Language (SSML) for expression and MP3 audio

  • SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech.
  • Alexa supports a subset of SSML tags including
    • say-as to interpret text as telephone, date, time etc.
    • phonemeprovides a phonemic/phonetic pronunciation
    • prosody modifies the volume, pitch, and rate of the tagged speech.
    • audioallows playing MP3 player while rendering a response
      • must be in valid MP3 file (MPEG version 2) format
      • must be hosted at an Internet-accessible HTTPS endpoint.
      • For speech response, the audio file cannot be longer than 240 seconds.
        • combined total time for all audio files in the outputSpeech property of the response cannot be more than 240 seconds.
        • combined total time for all audio files in the reprompt property of the response cannot be more than 90 seconds.
      • bit rate must be 48 kbps.
      • sample rate must be 22050Hz, 24000Hz, or 16000Hz.

4.3 Implement state management

  • Alexa Skill state persistence can be handled using session attributes during the session and externally using services like DynamoDB, RDS across sessions.

4.4 Implement Alexa service interfaces (audio player, video player, and screens)

4.5 Parse Alexa JSON requests and provide responses

  • All requests include the session (optional), context, and request objects at the top level.
    •  session object provides additional context associated with the request.
      • session attributes can be used to store data
      • user containing userId to uniquely define an user and accessToken to access other services.
      • system object provides apiAccessToken and device object provides deviceId to access ASK APIs
      • application provide applicationId
      • device object provides supportedInterfaces to list each interface that the device supports
      • user containing userId to uniquely define an user and accessToken to access other services.
    • request object that provides the details of the user’s request.
  • Response includes
    • outputSpeech contains the speech to render to the user.
    • reprompt contains the outputSpeech to use if a re-prompt is necessary.
    • shouldEndSession provides a boolean value that indicates what should happen after Alexa speaks the response.

Domain 5: Test, Validate, and Troubleshoot

5.1 Debug and troubleshoot using Amazon CloudWatch or other tools

  • Lambda integrates with CloudWatch for metric and logs and can be check for any errors and metrics.

5.2 Use the Alexa developer testing tools

  • Utterance profiles – test utterances to know what intent they resolve to 
  • Alexa Skill simulator
    • provides an ability to Interact with Alexa with either your voice or text, without an actual device.
    • maintains the skill session, so the interaction model and dialog flow can be tested.
    • supports multiple languages testing by selecting locale
    • has limitations in testing audio, video, Alexa settings and Device API
  • Manual Json
    • enter a JSON request directly and see the skill returned JSON response
    • does not maintain the skill session and is similar to testing a JSON request in the Lambda console.
  • Voice & Tone – enter plain text or SSML and hear how Alexa speaks the text in a selected language
  • Alexa device – test with an Alexa-enabled device.
  • Alexa app – test the skill with the Alexa app for Android/iOS
  • Lambda Test console – to test Lambda functions

5.3 Perform beta testing

  • Skill beta testing tool can be used to test the Alexa skill in beta before releasing it to production
  • Beat testing allows testing changes to an existing skill, while still keeping the currently live version of the skill available for the general public.
  • Members can be invited using their Alexa email address. Alexa device used by the beta tester must be associated with the email address in the tester’s invitation.

5.4 Troubleshoot errors in the interaction model

Domain 6: Publishing, Operations, and Lifecycle Management

6.1 Describe the skill publishing process

  • Alexa skill needs to go through certification process before the Skill is live and made available to the users
  • Alexa creates an in development version of the skill, once the skill becomes live
  • Alexa Skill live version cannot be edited, and it is recommended to edit the in development skill, test and then re-certify for publishing.
  • Backend changes like changes in Lambda functions or response output from the function, however, can be made on live version and do not require re-certification. However, it is recommended to use Lambda versioning or alias to do such changes.
  • Alexa for Business allows skill to be made private and available to select users within the company

6.2 Add and remove users in the developer console

  • Alexa Skill Developer console access can be shared across multiple users for collaboration
  • Administrator and Analyst roles will also have access to the Earnings and Payments sections.
  • Administrator and Marketer roles will also have access to edit the content associated with apps (i.e. Descriptions, Images & Multimedia) and IAPs
  • Administrator and Developer roles will have access to create, modify and delete Alexa skills using ASK CLI and SMAPI.
  • Administrator, Analyst and Marketer roles have access to sales report

6.3 Perform analysis of skill analytics in the developer console

  • Intent History – View aggregated, anonymized frequent utterances and the resolved intents. You cannot track the user intent history as they are anonymized.
  • Actions – Unique customers per action, total actions, and total utterances per action.
  • Customers – Total number of unique customers who accessed the skill.
  • Intents – Unique customers per intent, total utterances per intent, total intents, and failed intents.
  • Interaction Path – Paths users take when interacting with the skill.
  • Plays Total number of times that a user played the skill content.
  • Retention (live skills only) Usage of the skill over time by groups of customers or cohorts. View the number or percentage of customers who returned to your skill over a 12-week period.
  • Sessions Total sessions, successful session types (sessions that didn’t end due to an error), average sessions per customer. Includes a breakdown of successful, failed, and no-response sessions as a percentage of total sessions. Custom
  • Utterances Metrics for utterances depend on the skill category.

6.4 Differentiate among the statuses/versions of skills (for example, In Development, In Certification, and Live)

  • In Development – skill available for development, testing
  • In Review – A certification review is in progress and the skill cannot be edited
  • Certified – Skill passed certification review, and is not yet available to users
  • Live – skill has been published and is available to users. You cannot edit the configuration for live skills
  • Hidden – skill was previously published, but has since been hidden. Existing users can access the skill. New users cannot discover the skill.
  • Removed – skill was previously published, but has since been removed. Users cannot enable or use the skill.

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) Exam Resources

AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

AWS Certified Database Specialty Certificate

11 Down !!! Continuing on my AWS journey which has lasted for over 3 years now, validating and re-validating the certs multiple times, I took another step and have passed the AWS Certified Database – Specialty (DBS-C01) certification

AWS Certified Database – Specialty (DBS-C01) exam basically validates

  • Understand and differentiate the key features of AWS database services.
  • Analyze needs and requirements to design and recommend appropriate database solutions using AWS services

AWS Certified Database – Specialty (DBS-C01) Exam Summary

  • AWS Certified Database – Specialty exam focuses on Data services from relational, non-relational, graph, caching and data warehousing. It also focuses on data migration.
  • AWS Certified Database – Specialty exam has 65 questions with a time limit of 170 minutes
  • Questions and answer options are pretty long, so you need time to read through each of them to make sense of the requirements and filter out the answers
  • As the exam was online from home, there was no access to paper and pen but the trick remains the same, read the question and draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
  • Be sure to cover the following topics
    • Whitepapers and articles
    • Database
      • Make sure you know and cover all the services in depth, as 80% of the exam is focused on topics like Aurora, RDS, DynamoDB
      • Aurora
        • Understand Aurora in depth
        • Know Aurora DR & HA using Read Replicas an
          • Aurora promotes read replicas as per the priority tier (tier 0 is the highest), largest size if the tier matches
        • Know  Aurora Global Database
          • Aurora provides Global Database with cross region read replicas for low latency reads. remember it is not multi-master and would not provide low latency writes.
        • Know Aurora Connection endpoints
          • cluster for primary read/write
          • reader for read replicas
          • custom for specific group of instances
          • Instance for specific single instance – Not recommended
        • Know Aurora Fast Failover techniques
          • set TCP keepalives low
          • set Java DNS caching timeouts low
          • Set the timeout variables used in the JDBC connection string as low
          • Use the provided read and write Aurora endpoints
          • Use cluster cache management for Aurora PostgreSQL. Cluster cache management ensures that application performance is maintained if there’s a failover.
        • Know Aurora Serverless
        • Know Aurora Backtrack feature which rewinds the DB cluster to the specified time. It is not a replacement for backups.
        • Supports Server Auditing Events for different activities that covers log in, DML, permission changes DCL, schema changes DDL etc.
        • Know Aurora Cluster Cache management feature which helps fast failover
        • Know Aurora Clone feature which allows you to create quick and cost-effective clones
        • Aurora supports fault injection queries to simulate various failover like node down, primary failover etc.
        • RDS PostgreSQL and MySQL can be migrated to Aurora, by creating an Aurora Read Replica from the instance. Once the replica lag is zero, switch the DNS with no data loss
        • Supports Database Activity Streams to stream audit logs to external services like Kinesis
        • Supports stored procedures calling lambda functions
      • DynamoDB
      • RDS
        • Know Relational Database Service (RDS) in depth
        • Understand RDS Snapshots, Backups and Restore
          • restoring a DB from snapshot does not retain the parameter group and security group
          • automated snapshots cannot be shared.
        • Understand RDS Read Replicas
        • Understand RDS Multi-AZ
        • Understand RDS Multi-AZ vs Read Replicas (hint: cross region replication and availability of data)
          • Multi-AZ failover can be simulated using Reboot with Failure option
          • Read Replicas require automated backups enabled
        • Understand DB components esp. DB parameter group, DB options groups
          • Dynamic parameters are applied immediately
          • Static parameters need manual reboot.
          • Default parameter group cannot be modified. Need to create custom parameter group and associate to RDS
          • Know max connections also depends on DB instance size
        • Understand RDS Security
          • RDS supports security groups to control who can access RDS instances
          • RDS supports data at rest encryption and SSL for data in transit encryption
          • RDS also support IAM database authentication
          • Existing RDS instance cannot be encrypted, create a snapshot -> encrypt it -> restore as encrypted DB
          • RDS PosgreSQL requires rds.force_ssl=1 and sslmode=ca/verify-full to enable SSL encryption
          • Know RDS Encrypted Database limitations
        • Understand RDS Monitoring and Notification
          • Know RDS supports notification events through SNS for events like database creation, deletion, snapshot creation etc.
          • CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance.
          • Enhanced Monitoring metrics are useful to understand how different processes or threads on a DB instance use the CPU.
          • RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
        • RDS instance cannot be stopped if with read replicas
      • Neptune
        • provides Neptune loader to quick import data from S3
        • supports VPC endpoints
      • Redshift
        • Understand Redshift at high level. Exam does not cover Redshift if depth.
        • Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
          • COPY command which allows parallelism, and performs better than multiple COPY commands
          • COPY command can use manifest files to load data
          • COPY command handles encrypted data
        • Know Redshift cross region encrypted snapshot copy
          • Create a new key in destination region
          • Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the destination region.
          • In the source region, enable cross-region replication and specify the name of the copy grant created.
        • Know Redshift supports Audit logging which covers authentication attempts, connections and disconnections usually for compliance reasons.
      • Data Migration Service (DMS)
        • Understand Data Migration Service in depth for migration homogeneous and heterogeneous database
        • DMS with Full load plus CDC migration capability can be used to migration databases with zero downtime and no data loss.
        • DMS with SCT (Schema Conversion Tool) can be used to migration heterogeneous database.
        • DMS support validation after the migration to ensure data was migrated correctly
        • DMS supports LOB migration as a 2-step process. It can do a full or limited LOB migration
          • In full LOB mode AWS DMS migrates all LOBs from source to target regardless of size. Full LOB mode can be quite slow.
          • In limited LOB mode, a maximum LOB size can be set that AWS DMS should accept. Doing so allows AWS DMS to pre-allocate memory and load the LOB data in bulk. LOBs that exceed the maximum LOB size are truncated and a warning is issued to the log file. In limited LOB mode, you get significant performance gains over full LOB mode.
          • Recommended to use limited LOB mode whenever possible.
    • Security, Identity & Compliance
      • Data security is a key concept controlled in the Database – Specialty exam
      • Identity and Access Management (IAM)
      • Trusted Advisor provides RDS Idle instances
    • Management & Governance Tools
      • Understand AWS CloudWatch for Logs and Metrics.
        • CloudWatch Events more real time alerts as compared to CloudTrail
        • CloudWatch can be used used to store RDS logs with custom retention period, which is indefinite by default.
        • CloudWatch Application Insights support .Net and SQL Server monitoring
      • Know CloudFormation for provisioning, in terms of
        • Stack drifts – to understand difference between current state and on actual environment with any manual changes
        • Change Set – allows you to verify the changes before being propagated
        • parameters – allows you to configure variables or environment specific values
        • Stack policy defines the update actions that can be performed on designated resources.
        • Deletion policy for RDS allows you to configure if the resources is retained, snapshot or deleted once destroy is initiated
        • Supports secrets manager for DB credentials generation, storage and easy rotation
        • System parameter store for environments specific parameters

AWS Certified Database – Specialty (DBS-C01) Exam Resources

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

  • Recently validated myself with the AWS Certified Data Analytics – Specialty (DAS-C01).
  • Data Analytics – Specialty (DAS-C01) has replaced the previous Big Data – Specialty (DAS-C01).
  • Big Data in itself is a very vast topic and with AWS services, there is lots to cover and know for the exam.
  • If you have worked on Big Data technologies including a bit of Visualization, it would be a great asset to pass this exam.

AWS Certified Data Analytics – Specialty (DAS-C01) exam basically validates

  • Define AWS data analytics services and understand how they integrate with each other.
  • Explain how AWS data analytics services fit in the data lifecycle of collection, storage, processing, and visualization.

Refer AWS Certified Data Analytics – Specialty Exam Guide for details

AWS Certified Data Analytics - Specialty DAS-C01 Domains

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Summary

  • AWS Certified Data Analytics – Specialty exam, as its name suggests, covers a lot of Big Data concepts right from data transfer and collection techniques, storage, pre and post processing, analytics, visualization with the added concepts for data security at each layer.
  • AWS Certified Data Analytics – Specialty exam has 65 questions to be solved within a time limit of 170 minutes
  • Questions and answer options are pretty long, so need time to read through them to make sense of the requirements and filter out the answers
  • As the exam was online from home, there was no access to paper and pen but the trick remains the same, read the question and draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
  • Be sure to cover the following topics
    • Whitepapers and articles
    • Analytics
      • Make sure you know and cover all the services in depth, as 80% of the exam is focused on topics like Glue, Kinesis and Redshift.
      • Glue
        • DAS-C01 covers Glue in detail. This is one of the newly added service as compared to Big Data -Specialty exam
        • Understand Glue as a fully-managed, extract, transform, and load (ETL) service
        • Glue natively supports RDS, Redshift, S3 and databases on EC2 instances.
        • Glue provides Glue crawlers to crawl data and helps discover and create schema in Glue Data Catalog
        • Glue supports Job Bookmark that helps track data that has already been processed during a previous run of an ETL job by persisting state information from the job run. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data or duplicate records.
      • Elastic Map Reduce
        • Understand EMR in depth
        • Understand EMRFS (hint: Use Consistent view to make sure S3 objects referred by different applications are in sync)
        • Know EMR Best Practices (hint: start with many small nodes instead on few large nodes)
        • Know EMR Encryption options
          • supports SSE-S3, SS3-KMS, CSE-KMS and CSE-Custom encryption for EMRFS
          • doesn’t support SSE-C  encryption
          • supports LUKS encryption for local disks
          • supports TLS for data in transit encryption
        • Know Hive can be externally hosted using RDS, Aurora and AWS Glue Data Catalog
        • Know also different technologies
          • Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources
          • Spark is a distributed processing framework and programming model that helps do machine learning, stream processing, or graph analytics using Amazon EMR clusters
          • Zeppelin/Jupyter as a notebook for interactive data exploration and provides open-source web application that can be used to create and share documents that contain live code, equations, visualizations, and narrative text
          • Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store
      • Kinesis
        • Understand Kinesis Data Streams and Kinesis Data Firehose in depth
        • Know Kinesis Data Streams vs Kinesis Firehose
          • Know Kinesis Data Streams is open ended on both producer and consumer. It supports KCL and works with Spark.
          • Know Kinesis Firehose is open ended for producer only. Data is stored in S3, Redshift and ElasticSearch.
          • Kinesis Firehose works in batches with minimum 60secs interval and is near-real time.
          • Kinesis Firehose supports transformation and  custom transformation using Lambda
        • Understand Kinesis Encryption (hint: use server side encryption or encrypt in producer for data streams)
        • Know difference between KPL vs SDK (hint: PutRecords are synchronously, while KPL supports batching)
        • Kinesis Best Practices (hint: increase performance increasing the shards)
      • Elasticsearch
        • Know ElasticSearch is a search service which supports indexing, full text search, faceting etc.
        • Elasticsearch can be used to analysis and supports visualization using Kibana which can be real time.
      • Redshift
        • Understand Redshift in depth
        • Understand Redshift Advanced topics like Workload Management, Distribution Style, Sort key
        • Understand Redshift Spectrum which allows querying data in S3 without loading existing Redshift cluster. It also helps querying S3 data with Redshift data.
        • Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
          • COPY command which allows parallelism, and performs better than multiple COPY commands
          • COPY command can use manifest files to load data
          • COPY command handles encrypted data
        • Understand Redshift Resizing cluster options (elastic resize did not support node type changes before, but does now)
        • Know Redshift views to control access to data.
      • Athena
        • serverless, interactive query service to analyze data in S3 using standard SQL
      • QuickSight
        • Understand QuickSight
        • Know Visual Types (hint: esp. plotting line, bar and story based visualizations)
        • Know Supported Data Sources (hint: supports files)
        • QuickSight provides direct integration with Microsoft AD
        • QuickSight supports Row level security using dataset rules
        • QuickSight supports ML insights as well
      • Know Data Pipeline for data transfer
    • Security, Identity & Compliance
    • Management & Governance Tools
      • Understand AWS CloudWatch for Logs and Metrics. Also, CloudWatch Events more real time alerts as compared to CloudTrail

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Resources

AWS Certified Solutions Architect – Associate SAA-C02 Exam Learning Path

SAA-C02 Certification

AWS Certified Solutions Architect – Associate SAA-C02 Exam Learning Path

AWS Solutions Architect – Associate SAA-C02 exam is the latest AWS exam that has replaced the previous SAA-C01 certification exam. It basically validates the ability to effectively demonstrate knowledge of how to architect and deploy secure and robust applications on AWS technologies

  • Define a solution using architectural design principles based on customer requirements.
  • Provide implementation guidance based on best practices to the organization throughout the life cycle of the project.

Refer AWS_Solution_Architect_-_Associate_SAA-C02_Exam_Blue_Print

AWS Solutions Architect – Associate SAA-C02 Exam Summary

  • SAA-C02 exam consists of 65 questions in 130 minutes, and the time is more than sufficient if you are well prepared.
  • SAA-C02 Exam covers the architecture aspects in deep, so you must be able to visualize the architecture, even draw them out in the exam just to understand how it would work and how different services relate.
  • AWS has updated the exam concepts from the focus being on individual services to more building of scalable, highly available, cost-effective, performant, resilient.
  • If you had been preparing for the SAA-C01 –
    • SAA-C02 is pretty much similar to SAA-C01 except the operational effective architecture domain has been dropped
    • Although, most of the services and concepts covered by the SAA-C01 are the same. There are few new additions like Aurora Serverless, AWS Global Accelerator, FSx for Windows, FSx for Lustre
  • AWS exams are available online, and I took the online one. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join atleast 30 minutes before the actual time.

AWS Solutions Architect – Associate SAA-C02 Exam Topics

Make sure you go through all the topics and focus on hints in italics

Networking

  • Be sure to create VPC from scratch. This is mandatory.
    • Create VPC and understand whats an CIDR and addressing patterns
    • Create public and private subnets, configure proper routes, security groups, NACLs. (hint: Subnets are public or private depending on whether they can route traffic directly through Internet gateway)
    • Create Bastion for communication with instances
    • Create NAT Gateway or Instances for instances in private subnets to interact with internet
    • Create two tier architecture with application in public and database in private subnets
    • Create three tier architecture with web servers in public, application and database servers in private. (hint: focus on security group configuration with least privilege)
    • Make sure to understand how the communication happens between Internet, Public subnets, Private subnets, NAT, Bastion etc.
  • Understand difference between Security Groups and NACLs (hint: Security Groups are Stateful vs NACLs are stateless. Also only NACLs provide an ability to deny or block IPs)
  • Understand VPC endpoints and what services it can help interact (hint: VPC Endpoints routes traffic internally without Internet)
    • VPC Gateway Endpoints supports S3 and DynamoDB.
    • VPC Interface Endpoints OR Private Links supports others
  • Understand difference between NAT Gateway and NAT Instance (hint: NAT Gateway is AWS managed and is scalable and highly available)
  • Understand how NAT high availability can be achieved (hint: provision NAT in each AZ and route traffic from subnets within that AZ through that NAT Gateway)
  • Understand VPN and Direct Connect for on-premises to AWS connectivity
    • VPN provides quick connectivity, cost-effective, secure channel, however routes through internet and does not provide consistent throughput
    • Direct Connect provides consistent dedicated throughput without Internet, however requires time to setup and is not cost-effective
  • Understand Data Migration techniques
    • Choose Snowball vs Snowmobile vs Direct Connect vs VPN depending on the bandwidth available, data transfer needed, time available, encryption requirement, one-time or continuous requirement
    • Snowball, SnowMobile are for one-time data, cost-effective, quick and ideal for huge data transfer
    • Direct Connect, VPN are ideal for continuous or frequent data transfers
  • Understand CloudFront as CDN and the static and dynamic caching it provides, what can be its origin (hint: CloudFront can point to on-premises sources and its usecases with S3 to reduce load and cost)
  • Understand Route 53 for routing
    • Understand Route 53 health checks and failover routing
    • Understand  Route 53 Routing Policies it provides and their use cases mainly for high availability (hint: focus on weighted, latency, geolocation, failover routing)
  • Be sure to cover ELB concepts in deep.
    • SAA-C02 focuses on ALB and NLB and does not cover CLB
    • Understand differences between  CLB vs ALB vs NLB
      • ALB is layer 7 while NLB is layer 4
      • ALB provides content based, host based, path based routing
      • ALB provides dynamic port mapping which allows same tasks to be hosted on ECS node
      • NLB provides low latency and ability to scale
      • NLB provides static IP address

Security

  • Understand IAM as a whole
    • Focus on IAM role (hint: can be used for EC2 application access and Cross-account access)
    • Understand IAM identity providers and federation and use cases
    • Understand MFA and how would implement two factor authentication for an application
    • Understand IAM Policies (hint: except couple of questions with policies defined and you need to select correct statements)
  • Understand encryption services
  • AWS WAF integrates with CloudFront to provide protection against Cross-site scripting (XSS) attacks. It also provide IP blocking and geo-protection.
  • AWS Shield integrates with CloudFront to provide protection against DDoS.
  • Refer Disaster Recovery whitepaper, be sure you know the different recovery types with impact on RTO/RPO.

Storage

  • Understand various storage options S3, EBS, Instance store, EFS, Glacier, FSx and what are the use cases and anti patterns for each
  • Instance Store
    • Understand Instance Store (hint: it is physically attached  to the EC2 instance and provides the lowest latency and highest IOPS)
  • Elastic Block Storage – EBS
    • Understand various EBS volume types and their use cases in terms of IOPS and throughput. SSD for IOPS and HDD for throughput
    • Understand Burst performance and I/O credits to handle occasional peaks
    • Understand EBS Snapshots (hint: backups are automated, snapshots are manual
  • Simple Storage Service – S3
    • Cover S3 in depth
    • Understand S3 storage classes with lifecycle policies
      • Understand the difference between SA Standard vs SA IA vs SA IA One Zone in terms of cost and durability
    • Understand S3 Data Protection (hint: S3 Client side encryption encrypts data before storing it in S3)
    • Understand S3 features including
      • S3 provides a cost effective static website hosting
      • S3 versioning provides protection against accidental overwrites and deletions
      • S3 Pre-Signed URLs for both upload and download provides access without needing AWS credentials
      • S3 CORS allows cross domain calls
      • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket.
    • Understand Glacier as an archival storage with various retrieval patterns
    • Glacier Expedited retrieval now allows object retrieval within mins
  • Understand Storage gateway and its different types.
    • Cached Volume Gateway provides access to frequently accessed data, while using AWS as the actual storage
    • Stored Volume gateway uses AWS as a backup, while the data is being stored on-premises as well
    • File Gateway supports SMB protocol
  • Understand FSx easy and cost effective to launch and run popular file systems.
  • Understand the difference between EBS vs S3 vs EFS
    • EFS provides shared volume across multiple EC2 instances, while EBS can be attached to a single volume within the same AZ.
  • Understand the difference between EBS vs Instance Store
  • Would recommend referring Storage Options whitepaper, although a bit dated 90% still holds right

Compute

  • Understand Elastic Cloud Compute – EC2
  • Understand Auto Scaling and ELB, how they work together to provide High Available and Scalable solution. (hint: Span both ELB and Auto Scaling across Multi-AZs to provide High Availability)
  • Understand EC2 Instance Purchase Types – Reserved, Scheduled Reserved, On-demand and Spot and their use cases
    • Choose Reserved Instances for continuous persistent load
    • Choose Scheduled Reserved Instances for load with fixed scheduled and time interval
    • Choose Spot instances for fault tolerant and Spiky loads
    • Reserved instances provides cost benefits for long terms requirements over On-demand instances
    • Spot instances provides cost benefits for temporary fault tolerant spiky load
  • Understand EC2 Placement Groups (hint: Cluster placement groups provide low latency and high throughput communication, while Spread placement group provides high availability)
  • Understand Lambda and serverless architecture, its features and use cases. (hint: Lambda integrated with API Gateway to provide a serverless, highly scalable, cost-effective architecture)
  • Understand ECS with its ability to deploy containers and micro services architecture.
    • ECS role for tasks can be provided through taskRoleArn
    • ALB provides dynamic port mapping to allow multiple same tasks on the same node
  • Know Elastic Beanstalk at a high level, what it provides and its ability to get an application running quickly.

Databases

  • Understand relational and NoSQLs data storage options which include RDS, DynamoDB, Aurora and their use cases
  • RDS
    • Understand RDS features – Read Replicas vs Multi-AZ
      • Read Replicas for scalability, Multi-AZ for High Availability
      • Multi-AZ are regional only
      • Read Replicas can span across regions and can be used for disaster recovery
    • Understand Automated Backups, underlying volume types
  • Aurora
    • Understand Aurora
      • provides multiple read replicas and replicates 6 copies of data across AZs
    • Understand Aurora Serverless provides a highly scalable cost-effective database solution
  • DynamoDB
    • Understand DynamoDB with its low latency performance, key-value store (hint: DynamoDB is not a relational database)
    • DynamoDB DAX provides caching for DynamoDB
    • Understand DynamoDB provisioned throughput for Read/Writes (It is more cover in Developer exam though.)
  • Know ElastiCache use cases, mainly for caching performance

Integration Tools

  • Understand SQS as message queuing service and SNS as pub/sub notification service
  • Understand SQS features like visibility, long poll vs short poll
  • Focus on SQS as a decoupling service
  • Understand SQS Standard vs SQS FIFO difference (hint: FIFO provides exactly once delivery both low throughput)

Analytics

  • Know Redshift as a business intelligence tool
  • Know Kinesis for real time data capture and analytics
  • Atleast know what AWS Glue does, so you can eliminate the answer

Management Tools

  • Understand CloudWatch monitoring to provide operational transparency
  • Know which EC2 metrics it can track. Remember, it cannot track memory and disk space/swap utilization
  • Understand CloudWatch is extendable with custom metrics
  • Understand CloudTrail for Audit
  • Have a basic understanding of CloudFormation, OpsWorks

AWS Solutions Architect – Associate SAA-C02 Exam Resources

AWS Whitepapers & Cheat sheets

AWS Solutions Architect – Associate Exam Domains

Domain 1: Design Resilient Architectures

  1. Design a multi-tier architecture solution
  2. Design highly available and/or fault-tolerant architectures
  3. Design decoupling mechanisms using AWS services
  4. Choose appropriate resilient storage

Domain 2: Define High-Performing Architectures

  1. Identify elastic and scalable compute solutions for a workload
  2. Select high-performing and scalable storage solutions for a workload
  3. Select high-performing networking solutions for a workload
  4. Choose high-performing database solutions for a workload

Domain 3: Specify Secure Applications and Architectures

  1. Design secure access to AWS resources
  2. Design secure application tiers
  3. Select appropriate data security options

Domain 4: Design Cost-Optimized Architectures

  1. Determine how to design cost-optimized storage.
  2. Determine how to design cost-optimized compute.

Certified Kubernetes Administrator CKA Learning Path

Certified Kubernetes Administrator CKA Learning Path

The journey in the container world continues with the Certified Kubernetes Administrator CKA certification, which I cleared recently with 90%. After knowing how to use Kubernetes, it was really interesting and intriguing to know Kubernetes internals and how the overall system works.

  • CKA focuses on the skills required to be a successful Kubernetes Administrator 
  • CKA is an open book test, where you have access to the official Kubernetes documentation exam, but it focuses more on hands-on experience.
  • Unlike AWS and GCP certifications, you are required to provision, solve, debug actual problems and provision resources on a Kubernetes cluster
  • Even though it is an open book test, you need to know where the information is.

CKA Exam Pattern and Tips

    • CKA requires you to solve 24 questions in 3 hours.
    • CKA exam curriculum includes these general domains and their weights on the exam:
      • Application Lifecycle Management – 8%
      • Installation, Configuration & Validation – 12%
      • Core Concepts – 19%
      • Networking – 11%
      • Scheduling – 5%
      • Security – 12%
      • Cluster Maintenance – 11%
      • Logging / Monitoring – 5%
      • Storage – 7%
      • Troubleshooting – 10%
    • I was not stretched for time for CKA, as compared to CKAD, and was through with the my first attempt in 90 minutes, I took next 30 minutes to review and was done with the exam in 2 hours. However,  I skipped a question with 8%, but that shouldn’t have had a huge impact.
    • Exam questions can be attempted in any order and doesn’t have to be sequential.
    • Each exam question carries a weight so be sure you attempt the exams with higher weights before focusing on the lower ones. So target the ones with higher weights and quicker solutions like debugging ones.
    • 6-8 different K8s clusters are provisioned. Each question refers to a different kubernetes cluster, and the context needs to be switched. Be sure to execute the kubectl use context command, which is available with every question and you just need to copy paste it.
    • Check for the namespace mentioned in the question, to find resources and create resources. Use the -n <namespace>
    • You would be performing most of the interaction from base node. However, pay attention to check for the node you need to execute the exams and make sure you return back to the base node.
    • SSH to nodes and gaining root access is allowed, if needed. Commands are provided. Make sure you use the sudo -i command for running docker commands.
    • Read carefully the Information provided within the questions with the mark. They would provide very useful hints in addressing the question and save time. for e.g. namespaces to look into. for a failed pod, what has already been created like configmap, secrets, network policies so that you do not create the same.
    • CKA was already upgraded to use k8s 1.18 version and kubectl run commands did not work for me. Use kubectl create commands to create deployments.
    • Make sure you know the imperative commands to create resources, as you won’t have time to time to create and edit yaml files.
    • If you need to edit further use --dry-run -o yaml to get a headstart with spec yaml file and edit the same.
    • I personally use alias kk=kubectl to avoid typing kubectl

System Admin Training

CKA Learning Path

CKA Key Topics

Application Lifecycle Management

Installation, Configuration & Validation

Core Concepts

Networking

Scheduling

Security

Cluster Maintenance

Logging / Monitoring

  • Understand and know how to monitor all cluster components, applications, cluster and application logs
  • Know resource usage monitoring as you would be needed to check resource usage using the kubectl top command
  • Know how to Debug running pods using the kubectl logs command

Storage

Troubleshooting

  • Practice Debug application for troubleshooting application failures
  • Practice Debug cluster for troubleshooting control plane failure and worker node failure.
    • Understand the control plane architecture.
    • Focus on kube-apiserver, static pod config which causes the control panel pods to be referred and deployed.
    • Check pods in kube-system if they are all running. Use docker ps -a command on the node to inspect the reason for exiting containers.
    • Check kubelet service if the worker node is shown not ready
  • Troubleshoot networking

General information and practices

  • You can book the exam from CNCF CKA Certification @ $300. Avail limited time 30% discount coupon.
  • Exam can be taken online from anywhere.
  • Make sure you have prepared your workspace well before the exams.
  • Make sure you have a valid government issued ID card as it would be checked.
  • You are not allowed to have anything around you and no one should enter the room.
  • Exam proctor will be watching you always, so refrain from doing any other activities. Your screen is also always shared.
  • I did not have any warnings with the Proctor, except for a request to have camera focused.
  • You would need to install a Google Chrome plugin and the exam provides a web based shell to work on which worked quite well without any glitches. Copy + Paste works fine.
  • You will have an online notepad on the right corner to note down. I hardly used it, but it can be good type and modify text instead of using VI editor.

All the Best ..

Certified Kubernetes Application Developer CKAD Learning Path

CKAD Certification

Certified Kubernetes Application Developer CKAD Learning Path

Finally moving a bit away from the Clouds (AWS and GCP) and as my involvement grew more with Kubernetes, I decided to challenge myself for the Kubernetes certification. I started with the Certified Kubernetes Application Developer and am happy to share that I cleared it in the first attempt with 84%.

  • CKAD is more of an open book test, where you have access to the official Kubernetes documentation exam, but it focuses more on hands-on experience.
  • CKAD focuses on “Using a Kubernetes cluster once already provisioned
  • Unlike AWS and GCP certifications, you would be required to solve, debug actual problems and provision resources on a live Kubernetes cluster.
  • It is surely one of the most challenging exam, I have appeared for in the recent times.
  • Even though it is an open book test, you need to know where the information is.
  • Trust me, if you are not prepared this time is not going to be sufficient.

CKAD Exam Pattern and Tips

    • CKAD requires you to solve 19 questions in 2 hours.
    • CKAD exam curriculum includes these general domains and their weights on the exam:
      • 13% – Core Concepts
      • 18% – Configuration
      • 10% – Multi-Container Pods
      • 18% – Observability
      • 20% – Pod Design
      • 13% – Services & Networking
      • 8% – State Persistence
    • Exam questions can be attempted in any order and doesn’t have to be sequential.
    • Each exam question carries a weight so be sure you attempt the exams with higher weights before focusing on the lower ones. So target the ones with higher weights and quicker solutions like debugging ones.
    • 4 different K8s clusters are provisioned. Each question refers to a different kubernetes cluster, and the context needs to be switched. So be sure to execute the kubectl use context command. This command is available with every question and you just need to copy paste it.
    • Check for the namespace mentioned in the question, to find resources and create resources. Use the -n <namespace>
    • You would be performing most of the interaction from base node. However, pay attention to check for the node you need to execute the exams and make sure you return back to the base node.
    • SSH to nodes and gaining root access is allowed, if needed. Commands are provided.
    • Read carefully the Information provided within the questions with the mark. They would provide very useful hints in addressing the question and save time. for e.g. namespaces to look into. for a failed pod, what has already been created like configmap, secrets, network policies so that you do not create the same.
    • Make sure you know the imperative commands to create resources, as you won’t have time to time to create and edit yaml files. kubectl run with restart flag is is your saviour.

kubectl commands

    • If you need to edit further use –dry-run -o yaml to get a headstart yaml file and edit the same.

System Admin Training

CKAD Learning Path

CKAD Key Topics

General information and practices

  • You can book the exam from CNCF CKAD Certification @ $300. Usually you would get discounts coupons for 15-20% on the exam.
  • Exam can be taken online from anywhere.
  • Make sure you have prepared your workspace well before the exams.
  • Make sure you have a valid government issued ID card as it would be checked.
  • You are not allowed to have anything around you and no one should enter the room.
  • Exam proctor will be watching you always, so refrain from doing any other activities. Your screen is also always shared.
  • I did not have any warnings with the Proctor, except for a request to have camera focused.
  • You would need to install a Google Chrome plugin and the exam provides a web based shell to work on which worked quite well without any glitches. Copy + Paste works fine.
  • You will have a online notepad on the right corner to note down. I hardly used it, but it can be good type and modify text instead of using VI editor.

 

AWS Certified Machine Learning -Specialty (MLS-C01) Exam Learning Path

AWS Certified Machine Learning Specialty Certification

Finally, cleared the AWS Certified Machine Learning – Specialty (MLS-C01). It took me around four months to prepare for the exam. This was my fourth Specialty certification and in terms of the difficulty level of all of them this is the toughest, partly because I am not a machine learning expert and learned everything from basics for this certification. Machine Learning is a vast specialization in itself and with AWS services, there is lots to cover and know for the exam. This is the only exam, where the majority of the focus is on the concepts outside of AWS i.e. pure machine learning. It also includes AWS Machine Learning and Big Data services.

AWS Certified Machine Learning – Specialty (MLS-C01) exam basically validates

  •  Select and justify the appropriate ML approach for a given business problem.
  • Identify appropriate AWS services to implement ML solutions.
  • Design and implement scalable, cost-optimized, reliable, and secure ML solutions.

Refer AWS Certified Machine Learning – Specialty Exam Guide for details

                              AWS Certified Machine Learning – Specialty Domains

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Summary

  • AWS Certified Machine Learning – Specialty exam, as its name suggests, covers a lot of Machine Learning concepts right. It really digs deep into Machine learning concepts, most of which are not related to AWS.
  • AWS Certified Machine Learning – Speciality exam covers the E2E Machine Learning lifecycle, right from data collection, transformation, making it usable and efficient for Machine Learning, pre-processing data for Machine Learning, training and validation and implementation.
  • As always, one of the key tactic I followed when solving any AWS Certification exam is to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.

Preparation Summary

  • Machine Learning
    • Make sure you know and cover all the services in depth, as 60% of the exam is focused on generic Machine learning concepts not related to AWS services.
    • Know about complete generic Machine Learning lifecycle
    • Exploratory Data Analysis
      • Feature selection and Engineering
        • remove features which are not related to training
        • remove features which has same values, very low correlation, very little variance or lot of missing values
        • Apply techniques like Principal Component Analysis (PCA) for dimensionality reduction i.e reduce the number of features.
        • Apply techniques such as One-hot encoding and label encoding to help convert strings to numeric values, which are easier to process.
        • Apply Normalization i.e. values between 0 and 1 to handle data with large variance.
        • Apply feature engineering for feature reduction for e.g. using single height/weight feature instead of both the features
      • Handle Missing data
        • remove the feature or rows with missing data
        • impute using Mean/Median values – valid only for Numeric values and not categorical features also does not factor correlation between features
        • impute using k-NN, Multivariate Imputation by Chained Equation (MICE), Deep Learning – more accurate, factores correlation between features
      • Handle unbalanced data
        • Source more data
        • Oversample minority or Undersample majority
        • Data augmentation using techniques like SMOTE
    • Modeling
      • Know about Algorithms – Supervised, Unsupervised and Reinforcement and which algorithm is best suitable based on the available data either labelled or unlabelled.
        • Supervised learning trains on labelled data for e.g. Linear regression. Logistic regression, Decision trees, Random Forests
        • Unsupervised learning trains on unlabelled data for e.g. PCA, SVD, K-means
        • Reinforcement learning trained based on actions and rewards for e.g. Q-Learning
      • Hyperparameters
        • are parameters exposed by machine learning algorithms that control how the underlying algorithm operates and their values affect the quality of the trained models
        • some of the common hyperparameters are learning rate, batch, epoch (hint:  If the learning rate is too large, the minimum slope might be missed and the graph would oscillate If the learning rate is too small, it requires too many steps which would take the process longer and is less efficient
    • Evaluation
      • Know difference in evaluating model accuracy
        • Use Area Under the (Receiver Operating Characteristic) Curve (AUC) for Binary classification
        • Use root mean square error (RMSE) metric for regression
      • Understand Confusion matrix
        • A true positive is an outcome where the model correctly predicts the positive class. Similarly, a true negative is an outcome where the model correctly predicts the negative class.
        • false positive is an outcome where the model incorrectly predicts the positive class. And a false negative is an outcome where the model incorrectly predicts the negative class.
        • Recall or Sensitivity or TPR (True Positive Rate): Number of items correctly identified as positive out of total true positives- TP/(TP+FN)  (hint: use this for cases like fraud detection,  cost of marking non fraud as frauds is lower than marking fraud as non-frauds)
        • Specificity or TNR (True Negative Rate): Number of items correctly identified as negative out of total negatives- TN/(TN+FP)  (hint: use this for cases like videos for kids, the cost of  dropping few valid videos is lower than showing few bad ones)
      • Handle Overfitting problems
        • Simplify the model, by reducing number of layers
        • Early Stopping – form of regularization while training a model with an iterative method, such as gradient descent
        • Data Augmentation
        • Regularization – technique to reduce the complexity of the model
        • Dropout is a regularization technique that prevents overfitting
        • Never train on test data
  • AWS Machine Learning
    • SageMaker
      • Know SageMaker in depth
      • supports both File mode and Pipe mode
        • File mode loads all of the data from S3 to the training instance volumes VS Pipe mode streams data directly from S3
        • File mode needs disk space to store both the final model artifacts and the full training dataset. VS Pipe mode which helps reduce the required size for EBS volumes
      • Using RecordIO format allows algorithms to take advantage of Pipe mode when training the algorithms that support it. 
      • supports Model tracking capability to manage up to thousands of machine learning model experiments
      • supports Canary deployment using ProductionVariant and deploying multiple variants of a model to the same SageMaker HTTPS endpoint.
      • supports automatic scaling for production variants. Automatic scaling dynamically adjusts the number of instances provisioned for a production variant in response to changes in your workload
      • provides pre-built Docker images for its built-in algorithms and the supported deep learning frameworks used for training & inference
      • SageMaker Automatic Model Tuning
        • is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model.
        • Best practices
          • limit the search to a smaller number as difficulty of a hyperparameter tuning job depends primarily on the number of hyperparameters that Amazon SageMaker has to search
          • DO NOT specify a very large range to cover every possible value for a hyperparameter as it affects the success of hyperparameter optimization.
          • log-scaled hyperparameter can be converted to improve hyperparameter optimization.
          • running one training job at a time achieves the best results with the least amount of compute time.
          • Design distributed training jobs so that you get they report the objective metric that you want.
        • SageMaker Neo enables machine learning models to train once and run anywhere in the cloud and at the edge.
      • know how to take advantage of multiple GPUs (hint: increase learning rate and batch size w.r.t to the increase in GPUs)
      • Algorithms –
        • Blazing text provides Word2vec and text classification algorithms
        • DeepAR provides supervised learning algorithm for forecasting scalar (one-dimensional) time series (hint: train for new products based on existing products sales data)
        • Factorization machines provides supervised classification and regression tasks, helps capture interactions between features within high dimensional sparse datasets economically
        • Image classification algorithm is a supervised learning algorithm that supports multi-label classification
        • IP Insights is an unsupervised learning algorithm that learns the usage patterns for IPv4 addresses
        • K-means is an unsupervised learning algorithm for clustering as it attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups.
        • k-nearest neighbors (k-NN) algorithm is an index-based algorithm. It uses a non-parametric method for classification or regression
        • Latent Dirichlet Allocation (LDA) algorithm is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories. Used to identify number of topics shared by documents within a text corpus
        • Linear models are supervised learning algorithms used for solving either classification or regression problems. 
          • For regression (predictor_type=’regressor’), the score is the prediction produced by the model.
          • For classification (predictor_type=’binary_classifier’ or predictor_type=’multiclass_classifier’)
        • Neural Topic Model (NTM) Algorithm is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution
        • Object Detection algorithm detects and classifies objects in images using a single deep neural network
        • Principal Component Analysis (PCA) is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) (hint: dimensionality reduction)
        • Random Cut Forest (RCF) is an unsupervised algorithm for detecting anomalous data point (hint: anomaly detection)
        • Sequence to Sequence is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens. (hint: text summarization is the key use case)
    • SageMaker Ground Truth 
      • provides automated data labeling using machine learning
      • helps build highly accurate training datasets for machine learning quickly using Amazon Mechanical Turk
      • provides annotation consolidation to help improve the accuracy of the data object’s labels. It combines the results of multiple worker’s annotation tasks into one high-fidelity label.
      • automated data labeling uses machine learning to label portions of the data automatically without having to send them to human workers
    • Comprehend
      • natural language processing (NLP) service to find insights and relationships in text.
      • identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic.
    • Lex
      • provides conversational interfaces using voice and text helpful in building voice and text chatbots
    • Polly
      • text into speech
      • supports Speech Synthesis Markup Language (SSML) tags like prosody so users can adjust the speech rate, pitch or volume.
      • supports pronunciation lexicons to customize the pronunciation of words
    • Rekognition
      • analyze image and video
      • helps identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content.
    • Translate – provides natural and fluent language translation
    • Transcribe – provides speech-to-text capability
    • Elastic Interface helps attach low-cost GPU-powered acceleration to EC2 and SageMaker instances or ECS tasks to reduce the cost of running deep learning inference by up to 75%.
  • Analytics
    • Make sure you know and understand data engineering concepts mainly in terms of data capture, data migration, data transformation and data storage
    • Kinesis
      • Understand Kinesis Data Streams and Kinesis Data Firehose in depth
      • Kinesis Data Analytics can process and analyze streaming data using standard SQL and integrates with Data Streams and Firehose
      • Know Kinesis Data Streams vs Kinesis Firehose
        • Know Kinesis Data Streams is open ended on both producer and consumer. It supports KCL and works with Spark.
        • Know Kinesis Firehose is open ended for producer only. Data is stored in S3, Redshift and ElasticSearch.
        • Kinesis Firehose works in batches with minimum 60secs interval.
        • Kinesis Data Firehose supports data transformation and record format conversion using Lambda function (hint: can be used for transforming csv or JSON into parquet)
    • Know ElasticSearch is a search service which supports indexing, full text search, faceting etc.
    • Know Data Pipeline for data transfer
    • Know Glue as fully managed ETL service
      • helps setup, orchestrate, and monitor complex data flows.
      • AWS Glue Data Catalog
        • is a central repository to store structural and operational metadata for all the data assets.
      • AWS Glue crawler
        • connects to a data store, progresses through a prioritized list of classifiers to extract the schema of the data and other statistics, and then populates the Glue Data Catalog with this metadata
  • Security, Identity & Compliance
    • Security is covered very lightly. (hint : SageMaker can read data from KMS encrypted S3. Make sure, the KMS key policies include the role attached with SageMaker)
  • Management & Governance Tools
    • Understand AWS CloudWatch for Logs and Metrics. (hint: SageMaker is integrated with Cloudwatch and logs and metrics are all stored in it)
  • Storage
    • Understand Data Storage Options – Know patterns for S3 vs RDS vs DynamoDB vs Redshift. (hint: S3 is, by default, the data storage option or Big Data storage and look for it in the answer.)

Whitepapers and articles

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Resources

AWS Certified Big Data -Speciality (BDS-C00) Exam Learning Path

Clearing the AWS Certified Big Data – Speciality (BDS-C00) was a great feeling. This was my third Speciality certification and in terms of the difficulty level (compared to Network and Security Speciality exams), I would rate it between Network (being the toughest) Security (being the simpler one).

Big Data in itself is a very vast topic and with AWS services, there is lots to cover and know for the exam. If you have worked on Big Data technologies including a bit of Visualization and Machine learning, it would be a great asset to pass this exam.

AWS Certified Big Data – Speciality (BDS-C00) exam basically validates

  • Implement core AWS Big Data services according to basic architectural best practices
  • Design and maintain Big Data
  • Leverage tools to automate Data Analysis

Refer AWS Certified Big Data – Speciality Exam Guide for details

                              AWS Certified Big Data – Speciality Domains

AWS Certified Big Data – Speciality (BDS-C00) Exam Summary

  • AWS Certified Big Data – Speciality exam, as its name suggests, covers a lot of Big Data concepts right from data transfer and collection techniques, storage, pre and post processing, analytics, visualization with the added concepts for data security at each layer.
  • One of the key tactic I followed when solving any AWS Certification exam is to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
  • Be sure to cover the following topics
    • Whitepapers and articles
    • Analytics
      • Make sure you know and cover all the services in depth, as 80% of the exam is focused on these topics
      • Elastic Map Reduce
        • Understand EMR in depth
        • Understand EMRFS (hint: Use Consistent view to make sure S3 objects referred by different applications are in sync)
        • Know EMR Best Practices (hint: start with many small nodes instead on few large nodes)
        • Know Hive can be externally hosted using RDS, Aurora and AWS Glue Data Catalog
        • Know also different technologies
          • Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources
          • D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS
          • Spark is a distributed processing framework and programming model that helps do machine learning, stream processing, or graph analytics using Amazon EMR clusters
          • Zeppelin/Jupyter as a notebook for interactive data exploration and provides open-source web application that can be used to create and share documents that contain live code, equations, visualizations, and narrative text
          • Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store
      • Kinesis
        • Understand Kinesis Data Streams and Kinesis Data Firehose in depth
        • Know Kinesis Data Streams vs Kinesis Firehose
          • Know Kinesis Data Streams is open ended on both producer and consumer. It supports KCL and works with Spark.
          • Know Kineses Firehose is open ended for producer only. Data is stored in S3, Redshift and ElasticSearch.
          • Kinesis Firehose works in batches with minimum 60secs interval.
        • Understand Kinesis Encryption (hint: use server side encryption or encrypt in producer for data streams)
        • Know difference between KPL vs SDK (hint: PutRecords are synchronously, while KPL supports batching)
        • Kinesis Best Practices (hint: increase performance increasing the shards)
      • Know ElasticSearch is a search service which supports indexing, full text search, faceting etc.
      • Redshift
        • Understand Redshift in depth
        • Understand Redshift Advance topics like Workload Management, Distribution Style, Sort key
        • Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, COPY command which allows parallelism
        • Know Redshift views to control access to data.
      • Amazon Machine Learning
      • Know Data Pipeline for data transfer
      • QuickSight
      • Know Glue as the ETL tool
    • Security, Identity & Compliance
    • Management & Governance Tools
      • Understand AWS CloudWatch for Logs and Metrics. Also, CloudWatch Events more real time alerts as compared to CloudTrail
    • Storage
    • Compute
      • Know EC2 access to services using IAM Role and Lambda using Execution role.
      • Lambda esp. how to improve performance batching, breaking functions etc.

AWS Certified Big Data – Speciality (BDS-C00) Exam Resources

AWS Certified Solution Architect – Professional (SAP-C01) Exam Learning Path

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Learning Path

AWS Certified Solutions Architect – Professional (SAP-C01) exam is the upgraded pattern of the previous Solution Architect – Professional exam which was released last year (2018) and upgraded this year. I recently passed the latest pattern and difference is quite a lot between the previous pattern and the latest pattern. The amount of overlap between the associates and professional exams and even the Solutions Architect and DevOps has drastically reduced.

AWS Certified Solutions Architect – Professional (SAP-C01) exam basically validates

  • Design and deploy dynamically scalable, highly available, fault-tolerant, and reliable applications on AWS
  • Select appropriate AWS services to design and deploy an application based on given requirements
  • Migrate complex, multi-tier applications on AWS
  • Design and deploy enterprise-wide scalable operations on AWS
  • Implement cost-control strategies

Refer to AWS Certified Solutions Architect – Professional Exam Guide

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Summary

  • AWS Certified Solutions Architect – Professional (SAP-C01) exam was for a total of 170 minutes but it had 75 questions. The questions and answers options are quite long and there is a lot of reading that needs to be done, so be sure you are prepared and manage your time well. As always, mark the questions for review and move on and come back to them after you are done with all.
  • One of the key tactic I followed when solving any question was to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
  • AWS Certified Solutions Architect – Professional (SAP-C01) focuses a lot on concepts and services related to Scalability, High Availability, Disaster Recovery, Migration, Security and Cost Control.
  • Be sure to cover the following topics
    • Analytics
      • Understand Kinesis
        • Understand the difference between Kinesis Data Streams and Kinesis Firehose
      • Know Amazon Elasticsearch provides a managed solution
    • Integration Tools
      • Understand SQS in terms of loose coupling and scaling.
      • Know how CloudWatch integration with SNS and Lambda can help in notification

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Resources

AWS Certified Security – Speciality (SCS-C01) Exam Learning Path

I recently cleared the AWS Certified Security – Speciality (SCS-C01) with a score of 939/1000. If compared with the Advanced Networking – Speciality exam, the Security – Speciality was not as tough mainly cause it covers features and services which you would have used in your day to day working on AWS or services which have a clear demarcation of their purpose.

AWS Certified Security – Speciality (SCS-C01) exam is the focusing on the AWS Security and Compliance concepts. It basically validates

  • An understanding of specialized data classifications and AWS data protection mechanisms.
  • An understanding of data-encryption methods and AWS mechanisms to implement them.
  • An understanding of secure Internet protocols and AWS mechanisms to implement them.
  • A working knowledge of AWS security services and features of services to provide a secure production environment.
  • Competency gained from two or more years of production deployment experience using AWS security services and features.
  • The ability to make tradeoff decisions with regard to cost, security, and deployment complexity given a set of application requirements. An understanding of security operations and risks

Refer to AWS Certified Security – Speciality Exam Guide

AWS Certified Security – Speciality (SCS-C01) Exam Summary

  • AWS Certified Security – Speciality exam, as its name suggests, covers a lot of Security and compliance concepts for VPC, EBS, S3, IAM, KMS services
  • One of the key tactic I followed when solving any AWS exam is to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.
  • Be sure to cover the following topics
    • Security, Identity & Compliance
      • Make sure you know all the services and deep dive into IAM, KMS.
      • Identity and Access Management (IAM)
      • Deep dive into Key Management Service (KMS). There would be quite a few questions on this.
      • Understand AWS Cognito esp. User Pools
      • Know AWS GuardDuty as managed threat detection service
      • Know AWS Inspector as automated security assessment service that helps improve the security and compliance of applications deployed on AWS
      • Know Amazon Macie as a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS
      • Know AWS Artifact as a central resource for compliance-related information that provides on-demand access to AWS’ security and compliance reports and select online agreements
      • Know AWS Certificate Manager (ACM) for certificate management. (hint : To use an ACM Certificate with Amazon CloudFront, you must request or import the certificate in the US East (N. Virginia) region)
      • Know Cloud HSM as a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys on the AWS Cloud
      • Know AWS Secrets Manager to protect secrets needed to access your applications, services, and IT resources. The service enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle
      • Know AWS Shield esp. the Shield Advanced option and the features it provides
      • Know WAF as Web Traffic Firewall – (Hint – WAF can be attached to your CloudFront, Application Load Balancer, API Gateway to dynamically detect and prevent attacks)
    • Networking & Content Delivery
      • Understand VPC
        • Understand VPC Endpoints esp. services supported by Gateway and Interface Endpoints. Interface Endpoints are also called Private Links. (hint: application endpoints can be exposed using private links)
        • Understand VPC Flow Logs to capture information about the IP traffic going to and from network interfaces in the VPC (hint: can help in port scans but not in packet inspection)
      • Know Virtual Private Network & Direct Connect to establish connectivity a secured, low latency access between on-premises data center and AWS VPC
      • Understand CloudFront esp. with S3 (hint: Origin Access Identity to restrict direct access to S3 content)
      • Know Elastic Load Balancer at high level esp. End to End encryption.
    • Management & Governance Tools
      • Understand AWS CloudWatch for Logs and Metrics. Also, CloudWatch Events more real time alerts as compared to CloudTrail
      • Understand CloudTrail for audit and governance (hint: CloudTrail can be enabled for all regions at one go and supports log file integrity validation)
      • Understand AWS Config and its use cases (hint: AWS Config rules can be used to alert for any changes and Config can be used to check the history of changes. AWS Config can also help check approved AMIs compliance)
      • Understand CloudTrail provides the WHO and Config provides the WHAT.
      • Understand Systems Manager
        • Systems Manager provide parameter store which can used to manage secrets (hint: using Systems Manager is cheaper than Secrets manager for storage if limited usage)
        • Systems Manager provides agent based and agentless mode. (hint: agentless does not track process)
        • Systems Manager Patch Manager helps select and deploy operating system and software patches automatically across large groups of EC2 or on-premises instances
        • Systems Manager Run Command provides safe, secure remote management of your instances at scale without logging into the servers, replacing the need for bastion hosts, SSH, or remote PowerShell
      • Understand AWS Organizations to control what member account can do. (hint: can also control the root accounts)
      • Know AWS Trusted Advisor
    • Storage
    • Compute
      • Know EC2 access to services using IAM Role and Lambda using Execution role.
    • Integration Tools
      • Know how CloudWatch integration with SNS and Lambda can help in notification (Topics are not required to be in detail)
    • Whitepapers and articles

AWS Certified Security – Speciality (SCS-C01) Exam Resources