AWS Certified Advanced Networking – Specialty ANS-C01 Exam Learning Path

AWS Certified Advanced Networking - Specialty Certificate

AWS Certified Advanced Networking – Specialty ANS-C01 Exam Learning Path

I recently certified/recertified for the AWS Certified Advanced Networking – Specialty (ANS-C01). Frankly, Networking is something that I am still diving deep into and I just about managed to get through. So a word of caution, this exam is inline or tougher than the professional exams, especially for the reason that some of the Networking concepts covered are not something you can get your hands dirty with easily.

AWS Certified Advanced Networking – Specialty ANS-C01 Exam Content

  • AWS Certified Advanced Networking – Specialty (ANS-C01) exam focuses on the AWS Networking concepts. It basically validates
    • Design and develop hybrid and cloud-based networking solutions by using AWS
    • Implement core AWS networking services according to AWS best practices
    • Operate and maintain hybrid and cloud-based network architecture for all AWS services
    • Use tools to deploy and automate hybrid and cloud-based AWS networking tasks
    • Implement secure AWS networks using AWS native networking constructs and services

Refer to AWS Certified Advanced Networking – Specialty Exam Guide AWS Certified Advanced Networking - Specialty ANS-C01 Exam Domains

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Resources

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Summary

  • Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • ANS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question. 65 questions consists of 50 scored and 15 unscored questions.
  • ANS-C01 exam includes two types of questions, multiple-choice and multiple-response.
  • ANS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
  • Each question mainly touches multiple AWS services.
  • Specialty exams currently cost $ 300 + tax.
  • You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
  • As always, mark the questions for review and move on and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
  • AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Topics

  • AWS Certified Networking – Specialty (ANS-C01) exam focuses a lot on Networking concepts involving Hybrid Connectivity with Direct Connect, VPN, Transit Gateway, Direct Connect Gateway, and a bit of VPC, Route 53, ALB, NLB & CloudFront.

Networking & Content Delivery

  • Virtual Private Cloud – VPC
    • Understand VPC, Subnets
    • AWS allows extending the VPC by adding a secondary VPC
    • Understand Security Groups, NACLs
    • VPC Flow Logs
      • help capture information about the IP traffic going to and from network interfaces in the VPC and can help in monitoring the traffic or troubleshooting any connectivity issues
      • NACLs are stateless and how it is reflected in VPC Flow Logs
        • If ACCEPT followed by REJECT, inbound was accepted by Security Groups and ACLs. However, rejected by NACLs outbound
        • If REJECT, inbound was either rejected by Security Groups OR NACLs.
      • Use pkt-dstaddr instead of dstaddr to track the destination address as dstaddr refers to the primary ENI address always and not the secondary addresses.
      • Pattern: VPC Flow Logs -> CloudWatch Logs -> (Subscription) -> Kinesis Data Firehose -> S3/Open Search.
    • DHCP Option Sets esp. how to resolve DNS from both on-premises data center and AWS.
    • VPC Peering
      • helps point-to-point connectivity between 2 VPCs which can be in the same or different regions and accounts.
      • know VPC Peering Limitations esp. it does not allow overlapping CIDRs and transitive routing.
    • Placement Groups determine how the instances are placed on the underlying hardware
    • VRF – Virtual Routing & Forwarding can be used to route traffic to the same customer gateway from multiple VPCs, that can be overlapping.
  • VPC Endpoints
    • VPC Gateway Endpoints for connectivity with S3 & DynamoDB i.e. VPC -> VPC Gateway Endpoints -> S3/DynamoDB.
    • VPC Interface Endpoints or Private Links for other AWS services and custom hosted services i.e. VPC -> VPC Interface Endpoint OR Private Link -> S3/Kinesis/SQS/CloudWatch/Any custom endpoint.
    • S3 gateway endpoints cannot be accessed through VPC Peering, VPN, or Direct Connect. Need HTTP proxy to route traffic.
    • S3 Private Link can be accessed through VPC Peering, VPN, or Direct Connect. Need to use an endpoint-specific DNS name.
    • VPC endpoint policy can be configured to control which S3 buckets can be accessed and the S3 Bucket policy can be used to control which VPC (includes all VPC Endpoints) or VPC Endpoint can access it.
    • Private Link Patterns
  • VPC Network Access Analyzer
    • helps identify unintended network access to the resources on AWS.
  • Transit Gateway
    • helps consolidate the AWS VPC routing configuration for a region with a hub-and-spoke architecture.
    • Appliance Mode ensures that network flows are symmetrically routed to the same AZ and network appliance
    • Transit Gateway Connect attachment can be used to connect SD-WAN to AWS Cloud. This supports GRE.
    • Transit Gateways are regional and Peering can connect Transit Gateways across regions.
    • Transit Gateway Network Manager includes events and metrics to monitor the quality of the global network, both in AWS and on-premises.
  • VPC Routing Priority
  • NAT Gateways
    • for HA, Scalable, Outgoing traffic. Does not support Security Groups or ICMP pings.
    • times out the connection if it is idle for 350 seconds or more. To prevent the connection from being dropped, initiate more traffic over the connection or enable TCP keepalive on the instance with a value of less than 350 seconds.
    • supports Private NAT Gateways for internal communication.
  • Virtual Private Network
    • to establish connectivity between the on-premises data center and AWS VPC
  • Direct Connect
    • to establish connectivity between the on-premises data center and AWS VPC and Public Services
    • Direct Connect connections – Dedicated and Hosted connections
    • Understand how to create a Direct Connect connection
      • LOA-CFA provides the details for partners to connect to the AWS Direct Connect location
    • Virtual interfaces options – Private Virtual Interface for VPC resources and Public Virtual Interface for Public Resources
      • Private VIF is for resources within a VPC
      • Public VIF is for AWS public resources
      • Private VIF has a limit of 100 routes and Public VIF of 1000 routes. Summarize the routes if you need to configure more.
    • Understand setup Private and Public VIF
    • Understand High Availability options based on cost and time i.e. Second Direct Connect connection OR VPN connection
    • Direct Connect Gateway
      • it provides a way to connect to multiple VPCs from an on-premises data center using the same Direct Connect connection.
      • can connect to VGW or TGW.
    • Understand Active/Passive Direct Connect 
    • supports MACsec which delivers native, near line-rate, point-to-point encryption ensuring that data communications between AWS and the data center, office, or colocation facility remain protected.
    • Understand Route Propagation, propagation priority, BGP connectivity
      • BGP prefers the shortest AS PATH to get to the destination. Traffic from the VPC to on-premises uses the primary router. This is because the secondary router advertises a longer AS-PATH.
      • AS PATH prepending doesn’t work when the Direct Connect connections are in different AWS Regions than the VPC.
      • AS PATH works from AWS to on-premises and Local Pref from on-premises to AWS
      • Use Local Preference BGP community tags to configure Active/Passive when the connections are from different regions. The higher tag has a higher preference for 7224:7300 > 7224:7100
      • NO_EXPORT works only for Public VIFs
      • 7224:9100, 7224:9200, and 7224:9300 apply only to public prefixes. Usually used to restrict traffic to regions. Can help control if routes should propagate to the local Region only, all Regions within a continent, or all public Regions.
        • 7224:9100 — Local AWS Region
        • 7224:9200 — All AWS Regions for a continent, North America–wide, Asia Pacific, Europe, the Middle East and Africa
        • 7224:9300 — Global (all public AWS Regions)
      • 7224:8100 — Routes that originate from the same AWS Region in which the AWS Direct Connect point of presence is associated.
      • 7224:8200 — Routes that originate from the same continent with which the AWS Direct Connect point of presence is associated.
      • No-tag — Global (all public AWS Regions).
  • Route 53
    • provides a highly available and scalable DNS web service.
    • Routing Policies and their use cases Focus on Weighted,  Latency, and Failover routing policies.
    • supports Alias resource record sets, which enables routing of queries to a CloudFront distribution, Elastic Beanstalk, ELB, an S3 bucket configured as a static website, or another Route 53 resource record set.
    • CNAME does not support zone apex or root records. 
    • Route 53 DNSSEC
      • secures DNS traffic, and helps protect a domain from DNS spoofing man-in-the-middle attacks. 
      • Requirements
        • Asymmetric Customer Managed Keys
        • us-east-1 with ECC_NIST_P256 spec
    • Route 53 Resolver DNS Firewall
      • protection for outbound DNS requests from the VPCs and can monitor and control the domains that the applications can query.
      • allows you to define allow and deny list.
      • can be used for DNS exfiltration.
      • supports FirewallFailOpen configuration which determines how Route 53 Resolver handles queries during failures.
        • disabled, favors security over availability and blocks queries that it is unable to evaluate properly.
        • enabled, favors availability over security and allows queries to proceed if it is unable to properly evaluate them.
    • Route 53 Resolver (Hybrid DNS)
      • Inbound Endpoint for On-premises -> AWS
      • Outbound Endpoint for AWS -> On-premises
    • Route 53 DNS Query Logging
      • Can be logged to CloudWatch logs, S3, and Kinesis Data Firehose
    • Route 53 Resolver rules take precedence over privately hosted zones.
    • Route 53 Split View DNS helps to have the same DNS to access a site externally and internally
    • Know the Domain Migration process
  • CloudFront
    • provides a fully managed, fast CDN service that speeds up the distribution of static, dynamic web, or streaming content to end-users.
    • supports geo-restriction, WAF & AWS Shield for protection.
    • provides Cloud Functions (Edge location) & Lambda@Edge (Regional location) to execute scripts closer to the user.
    • supports encryption at rest and end-to-end encryption
    • CloudFront Origin Shield
      • helps improve the cache hit ratio and reduce the load on the origin.
      • requests from other regional caches would hit the Origin shield rather than the Origin.
      • should be placed at the regional cache and not in the edge cache
      • should be deployed to the region closer to the origin server
  • Global Accelerator
    • provides 2 static IPs
    • does not support client IP address preservation for NLB and Elastic IP address endpoints.
    • does not support IPv6 address
    • know CloudFront vs Global Accelerator
  • Understand ELB, ALB and NLB
    • Differences between ALB and NLB
    • ALB provides Content, Host, and Path-based Routing while NLB provides the ability to have a static IP address
    • Maintain original Client IP to the backend instances using X-Forwarded-for and Proxy Protocol
    • ALB/NLB do not support TLS renegotiation or mutual TLS authentication (mTLS). For implementing mTLS, use NLB with TCP listener on port 443 and terminate on the instances.
    • NLB
      • also provides local zonal endpoints to keep the traffic within AZ
      • can front Private Link endpoints and provide static IPs.
    • ALB supports Forward Secrecy, through Security Policies, that provide additional safeguards against the eavesdropping of encrypted data, through the use of a unique random session key.
    • Supports sticky session feature (session affinity) to enable the LB to bind a user’s session to a specific target. This ensures that all requests from the user during the session are sent to the same target. Sticky Sessions is configured on the target groups.
  • Gateway Load Balancer – GWLB
    • helps deploy, scale, and manage virtual appliances, such as firewalls, IDS/IPS systems, and deep packet inspection systems.
  • Athena integrates with S3 only and not with CloudWatch logs.
  • Transit VPC
    • helps connect multiple, geographically disperse VPCs and remote networks in order to create a global network transit center.
    • Use Transit Gateway instead now.
  • Know CloudHub and its use case

Security

  • AWS GuardDuty
    • managed threat detection service
    • provides Malware protection
  • AWS Shield
    • managed DDoS protection service
    • AWS Shield Advanced provides 24×7 access to the AWS Shield Response Team (SRT), protection against DDoS-related spike, and DDoS cost protection to safeguard against scaling charges.
  • WAF as Web Traffic Firewall
    • helps protect web applications from attacks by allowing rules configuration that allow, block, or monitor (count) web requests based on defined conditions.
    • integrates with CloudFront, ALB, API Gateway to dynamically detect and prevent attacks
  • Network Firewall
  • AWS Inspector
    • is a vulnerability management service that continuously scans the AWS workloads for vulnerabilities

Monitoring & Management Tools

  • Understand AWS CloudFormation esp. in terms of Network creation.
    • Custom resources can be used to handle activities not supported by AWS
    • While configuring VPN connections use depends_on on route tables to define a dependency on other resources as the VPN gateway route propagation depends on a VPC-gateway attachment when you have a VPN gateway.
  • AWS Config
    • fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security, compliance, and governance.
    • can be used to monitor resource changes e.g. Security Groups and invoke Systems Manager Automation scripts for remediation.
  • CloudTrail for audit and governance

Integration Tools

Networking Architecture Patterns

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Day

  • Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
  • If you are taking the AWS Online exam
    • Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
    • The online verification process does take some time and usually, there are glitches.
    • Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
    • Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Learning Path

AWS Certified Solutions Architect - Professional certificate

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Learning Path

NOTE – Refer to SAP-C02 Learning Path

  • AWS Certified Solutions Architect – Professional (SAP-C01) exam is the upgraded pattern of the previous Solution Architect – Professional exam which was released in the year (2018) and would be upgraded this year (Nov. 2022).
  • I recently recertified the existing pattern and the difference is quite a lot between the previous pattern and the latest pattern. The amount of overlap between the associates and professional exams and even the Solutions Architect and DevOps has drastically reduced.

AWS Certified Solutions Architect – Professional (SAP-C01) exam basically validates

  • Design and deploy dynamically scalable, highly available, fault-tolerant, and reliable applications on AWS
  • Select appropriate AWS services to design and deploy an application based on given requirements
  • Migrate complex, multi-tier applications on AWS
  • Design and deploy enterprise-wide scalable operations on AWS
  • Implement cost-control strategies

Refer to AWS Certified Solutions Architect – Professional Exam Guide

AWS Certified Solutions Architect - Professional Exam Domains

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Resources

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Summary

  • AWS Certified Solutions Architect – Professional (SAP-C01) exam was for a total of 170 minutes and it had 75 questions.
  • AWS Certified Solutions Architect – Professional (SAP-C01) focuses a lot on concepts and services related to Architecture & Design, Scalability, High Availability, Disaster Recovery, Migration, Security and Cost Control.
  • Each question mainly touches multiple AWS services.
  • Questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • As always, mark the questions for review and move on and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Topics

Storage

  • S3
    • S3 Permissions & S3 Data Protection
      • S3 bucket policies to control access to VPC Endpoints
    • S3 Storage Classes & Lifecycle policies
      • covers S3 Standard, Infrequent access, intelligent tier and Glacier for archival and object transitions & deletions for cost management.
    • S3 Transfer Acceleration can be used for fast, easy, and secure transfers of files over long distances between the client and an S3 bucket.
    • supports the same and cross-region replication for disaster recovery.
    • integrates with CloudFront for caching to improve performance
    • S3 supports Object Lock and Glacier supports Vault lock to prevent the deletion of objects, especially required for compliance requirements.
    • supports S3 Select feature to query selective data from a single object.
  • Elastic Block Store
    • EBS Backup using snapshots for HA and Disaster recovery
    • Data Lifecycle Manager can be used to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes.
  • Storage Gateway
  • Elastic File System
    • provides a fully managed, scalable, serverless, shared and cost-optimized file storage for use with AWS and on-premises resources.
    • supports cross-region replication for disaster recovery
    • supports storage classes like S3
  • AWS Transfer Family
    • provides a secure transfer service (FTP, SFTP, FTPs) that helps transfer files into and out of AWS storage services.
    • supports transferring data from or to S3 and EFS.
  • FSx for Lustre
    • managed, cost-effective service to launch and run the HPC high-performance Lustre file system.

Database

  • DynamoDB
    • DynamoDB Auto Scaling
    • DynamoDB Streams for tracking changes
    • TTL to expire objects automatically and cost-effectively.
    • Global tables for multi-master, active-active inter-region storage needs.
    • Global tables do not support strong global consistency
    • DynamoDB Accelerator – DAX for seamlessly caching to reduce the load on DynamoDB for read-heavy requirements.
  • RDS
    • supports cross-region read replicas ideal for disaster recovery with low RTO and RPO.
    • provides RDS proxy for effective database connection polling
    • RDS Multi-AZ vs Read Replicas
  • Aurora
    • fully managed, MySQL- and PostgreSQL-compatible, relational database engine
    • supports Aurora Serverless to on-demand, autoscaling configuration
    • Aurora Global Database consists of one primary AWS Region where the data is mastered, and up to five read-only, secondary AWS Regions. It is a multi-master setup but can be used for disaster recovery.
  • DocumentDB as a replacement for MongoDB

Data Migration & Transfer

  • Cloud Migration Services
    • Cloud Migration (hint: make sure you understand the difference between rehost, replatform, and rearchitect
    • Server Migration Service helps to migrate servers and applications.
    • Database Migration Service
      • enables quick and secure data migration with minimal to zero downtime
      • supports Full and Change Data Capture – CDC migration to support continuous replication for zero downtime migration.
      • homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations (using SCT) between different database platforms, such as Oracle or Microsoft SQL Server to Aurora.
      • Hint: Elasticsearch is not supported as a target by DMS
    • Snow Family
      • Ideal for one-time big data transfers usually for use cases with limited bandwidth from on-premises to AWS.
  • Application Discovery Service
    • Agent ones can be used for hyper-v and physical services
    • Agentless can be used for VMware but does not track processes.
  • Disaster Recovery
    • Disaster Recovery whitepaper, although outdated, make sure you understand the difference between each type esp. pilot light, warm standby w.r.t RTO and RPO.
    • Compute
      • Make components available in an alternate region,
      • either as AMIs that can be restored
      • CloudFormation to create infra as needed
      • partial which can be scaled once the failover happens
      • or fully running compute in active-active confirmation with health checks.
    • Storage
      • S3 and EFS support cross-region replication
      • DynamoDB supports Global tables for multi-master, active-active inter-region storage needs.
      • Aurora Global Database provides a multi-master setup but can be used for disaster recovery.
      • RDS supports cross-region read replicas which can be promoted to master in case of a disaster. This can be done using Route 53, CloudWatch and lambda functions.
    • Network
      • Route 53 failover routing with health checks to failover across regions.

Networking & Content Delivery

  • VPC – Virtual Private Cloud
    • Understand Security Groups, NACLs (Hint: know NACLs are stateless and need to open ephemeral ports for response traffic )
    • Understand VPC Gateway Endpoints to provide access to S3 and DynamoDB (hint: know how to restrict access on S3 to specific VPC Endpoint)
    • Understand VPC Interface Endpoints or PrivateLink to provide access to a variety of services like SQS, Kinesis or Private APIs exposed through NLB.
    • Understand VPC Flow Logs
    • Understand VPC Peering to enable communication between VPCs within the same or different regions. (hint: VPC peering does not support transitive routing)
  • Route 53
    • Routing Policies
      • focus on Weighted, Latency and failover routing policies
      • failover routing provides active-passive configuration for disaster recovery while the others are active-active configuration.
    • Route 53 Resolver
      • Outbound endpoint for AWS -> On-premises DNS query resolution
      • Inbound endpoint for On-premises DNS query resolution
  • CloudFront
    • fully managed, fast CDN service that speeds up the distribution of static, dynamic web or streaming content to end-users.
    • supports multiple origins including S3, ALB etc.
    • does not support Auto Scaling as an origin
    • supports Geo-restriction
    • supports Lambda@Edge and Cloud Functions to execute code closer to the user.
    • Lambda@Edge can be used for quick auth checks, and redirect users based on request data.
    • Security can be enhanced by whitlisting CloudFront IPs or adding custom header in CloudFront and verifiing it in ALB.
  • API Gateway
    • supports throttling, caching and helps define usage plans with API keys to identify clients
    • provides regional and edge-optimized endpoint types
    • supports authentication mechanisms, such as AWS IAM policies, Lambda authorizer functions, and Amazon Cognito user pools.
  • Load Balancer – ELB, ALB and NLB 
  • Global Accelerator
    • optimizes the path to applications to keep packet loss, jitter, and latency consistently low.
    • helps improve the performance of the applications by lowering first-byte latency
    • provides 2 static IP address
    • does not preserve the client’s IP address with NLB
  • Transit Gateway or Transit VPC
    • is a network transit hub that can be used to interconnect VPCs and on-premises networks via Direct Connect or VPN.
    • Transit Gateway is regional and Transit Gateway Peering needs to be configured to peer regional Transit gateways.
  • Placement Groups
    • Cluster placement group with Enhanced Networking for HPC
    • Spread placement group for fault tolerance and high availability.
  • Direct Connect & VPN
    • provide on-premises to AWS connectivity
    • know Direct Connect vs VPN
    • VPN can provide a cost-effective, quick failover for Direct Connect.
    • VPN over Direct Connect provides a secure dedicated connection and requires a public virtual interface.
    • Direct Connect Gateway is a global network device that helps establish connectivity that spans VPCs spread across multiple AWS Regions with a single Direct Connect connection.

Security, Identity & Compliance

  • AWS Identity and Access Management
  • AWS Shield & Shield Advanced
    • for DDoS protection and integrates with Route 53, CloudFront, ALB and Global Accelerator.
  • AWS WAF
    • protects from common attack techniques like SQL injection and Cross-Site Scripting (XSS), Conditions based include IP addresses, HTTP headers, HTTP body, and URI strings.
    • integrates with CloudFront, ALB, and API Gateway.
    • supports Web ACLs and can block traffic based on IPs, Rate limits, and specific countries as well.
  • ACM – AWS Certificate Manager
    • helps easily provision, manage, and deploy public and private SSL/TLS certificates
    • is regional and you need to request certificates in all regions and associate individually in all regions.
    • does not provide certificates for EC2 instances.
  • AWS KMS – Key Management Service
    • managed encryption service that allows the creation and control of encryption keys to enable data encryption.
    • KMS Multi-region keys
      • are AWS KMS keys in different AWS Regions that can be used interchangeably – as though having the same key in multiple Regions.
      • are not global and each multi-region key needs to be replicated and managed independently.
  • Secrets Manager
    • helps protect secrets needed to access applications, services, and IT resources.
    • Secrets Manager vs SSM Parameter Store.
      • Supports automatic rotation of secrets, which is not provided by SSM Parameter Store.
      • Costs more than SSM Parameter Store.

Compute

  • EC2
  • Auto Scaling
  • Elastic Beanstalk supports Blue/Green deployment using swap URLs.
  • Lambda
    • Lambda running in VPC requires NAT Gateway to communicate with external public services
    • Lambda CPU can be increased by increasing memory only.
    • helps define reserved concurrency limit to reduce the impact
    • Lambda Alias now supports canary deployments
  • ECS – Elastic Container Service
    • container management service that supports Docker containers
    • supports two launch types – EC2 and Fargate which provides the serverless capability
    • For least privilege, the role should be assigned to the Task.
    • awsvpc network mode gives ECS tasks the same networking properties as EC2 instances.

Management & Governance tools

  • AWS Organizations
  • Systems Manager
    • AWS Systems Manager and its various services like parameter store, patch manager
    • Parameter Store provides secure, scalable, centralized, hierarchical storage for configuration data and secret management. Does not support secrets rotation. Use Secrets Manager.
    • Session Manager helps manage EC2 instances through an interactive one-click browser-based shell or through the AWS CLI without opening ports or creating bastion hosts.
    • Patch Manager helps automate the process of patching managed instances with both security-related and other types of updates.
  • CloudWatch
  • CloudTrail
    • for audit and governance
    • With Organizations, the trail can be configured to log CloudTrail from all accounts to a central account.
  • CloudFormation
    • Handle disaster Recovery by automating the infra to replicate the environment across regions.
    • Deletion Policy to prevent, retain or backup RDS, EBS Volumes
    • Stack policy can prevent stack resources from being unintentionally updated or deleted during a stack update. Stack Policy only applies for Stack updates and not stack deletion.
    • StackSets helps to create, update, or delete stacks across multiple accounts and Regions with a single operation.
  • Control Tower
    • to setup, govern, and secure a multi-account environment
    • strongly recommended guardrails cover EBS encryption
  • Service Catalog
    • allows organizations to create and manage catalogues of IT services that are approved for use on AWS with minimal permissions.
  • Trusted Advisor
    • helps with cost optimization and service limits in addition to security, performance and fault tolerance.
  • Compute Optimizer recommends optimal AWS resources for the workloads to reduce costs and improve performance by using machine learning to analyze historical utilization metrics.
  • AWS Budgets to see usage-to-date and current estimated charges from AWS, set limits and provide alerts or notifications.
  • Cost Allocation Tags can be used to organize AWS resources, and cost allocation tags to track the AWS costs on a detailed level.
  • Cost Explorer helps visualize, understand, manage and forecast the AWS costs and usage over time.

Analytics

Integration Tools

  • SQS in terms of loose coupling and scaling.
    • Difference between SQS Standard and FIFO esp. with throughput and order
    • SQS supports dead letter queues
  • CloudWatch integration with SNS and Lambda for notifications.

Architecture & Design Flows

Google Cloud – Professional Cloud DevOps Engineer Certification learning path

Google Cloud Professional Cloud DevOps Engineer Certification

Google Cloud – Professional Cloud DevOps Engineer Certification learning path

Continuing on the Google Cloud Journey, glad to have passed the 8th certification with the Professional Cloud DevOps Engineer certification. Google Cloud – Professional Cloud DevOps Engineer certification exam focuses on almost all of the Google Cloud DevOps services with Cloud Developer tools, Operations Suite, and SRE concepts.

Google Cloud -Professional Cloud DevOps Engineer Certification Summary

  • Had 50 questions to be answered in 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on DevOps toolset including Cloud Developer tools, Operations Suite with a focus on monitoring and logging, and SRE concepts.
  • The exam has been updated to use
    • Cloud Operations, Cloud Monitoring & Logging and does not refer to Stackdriver in any of the questions.
    • Artifact Registry instead of Container Registry.
  • There are no case studies for the exam.
  • As mentioned for all the exams, Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless about some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud DevOps Engineer Certification Resources

Google Cloud – Professional Cloud DevOps Engineer Certification Topics

Developer Tools

  • Google Cloud Build
    • Cloud Build integrates with Cloud Source Repository, Github, and Gitlab and can be used for Continous Integration and Deployments.
    • Cloud Build can import source code, execute build to the specifications, and produce artifacts such as Docker containers or Java archives
    • Cloud Build can trigger builds on source commits in Cloud Source Repositories or other git repositories.
    • Cloud Build build config file specifies the instructions to perform, with steps defined to each task like the test, build and deploy.
    • Cloud Build step specifies an action to be performed and is run in a Docker container.
    • Cloud Build supports custom images as well for the steps
    • Cloud Build integrates with Pub/Sub to publish messages on build’s state changes.
    • Cloud Build can trigger the Spinnaker pipeline through Cloud Pub/Sub notifications.
    • Cloud Build should use a Service Account with a Container Developer role to perform deployments on GKE
    • Cloud Build uses a directory named /workspace as a working directory and the assets produced by one step can be passed to the next one via the persistence of the /workspace directory.
  • Binary Authorization and Vulnerability Scanning
    • Binary Authorization provides software supply-chain security for container-based applications. It enables you to configure a policy that the service enforces when an attempt is made to deploy a container image on one of the supported container-based platforms.
    • Binary Authorization uses attestations to verify that an image was built by a specific build system or continuous integration (CI) pipeline.
    • Vulnerability scanning helps scan images for vulnerabilities by Container Analysis.
    • Hint: For Security and compliance reasons if the image deployed needs to be trusted, use Binary Authorization
  • Google Source Repositories
    • Cloud Source Repositories are fully-featured, private Git repositories hosted on Google Cloud.
    • Cloud Source Repositories can be used for collaborative, version-controlled development of any app or service
    • Hint: If the code needs to be versioned controlled and needs collaboration with multiple members, choose Git related options
  • Google Container Registry/Artifact Registry
    • Google Artifact Registry supports all types of artifacts as compared to Container Registry which was limited to container images
    • Container Registry is not referred to in the exam
    • Artifact Registry supports both regional and multi-regional repositories
  • Google Cloud Code
    • Cloud Code helps write, debug, and deploy the cloud-based applications for IntelliJ, VS Code, or in the browser.
  • Google Cloud Client Libraries
    • Google Cloud Client Libraries provide client libraries and SDKs in various languages for calling Google Cloud APIs.
    • If the language is not supported, Cloud Rest APIs can be used.
  • Deployment Techniques
    • Recreate deployment – fully scale down the existing application version before you scale up the new application version.
    • Rolling update – update a subset of running application instances instead of simultaneously updating every application instance
    • Blue/Green deployment – (also known as a red/black deployment), you perform two identical deployments of your application
    • GKE supports Rolling and Recreate deployments.
      • Rolling deployments support maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted)
    • Managed Instance groups support Rolling deployments using the
    • maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted) configurations
  • Testing Strategies
    • Canary testing – partially roll out a change and then evaluate its performance against a baseline deployment
    • A/B testing – test a hypothesis by using variant implementations. A/B testing is used to make business decisions (not only predictions) based on the results derived from data.
  • Spinnaker
    • Spinnaker supports Blue/Green rollouts by dynamically enabling and disabling traffic to a particular Kubernetes resource.
    • Spinnaker recommends comparing canary against an equivalent baseline, deployed at the same time instead of production deployment.

Cloud Operations Suite

  • Cloud Operations Suite provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
  • Google Cloud Monitoring or Stackdriver Monitoring
    • Cloud Monitoring helps gain visibility into the performance, availability, and health of your applications and infrastructure.
    • Cloud Monitoring Agent/Ops Agent helps capture additional metrics like Memory utilization, Disk IOPS, etc.
    • Cloud Monitoring supports log exports where the logs can be sunk to Cloud Storage, Pub/Sub, BigQuery, or an external destination like Splunk.
    • Cloud Monitoring API supports push or export custom metrics
    • Uptime checks help check if the resource responds. It can check the availability of any public service on VM, App Engine, URL, GKE, or AWS Load Balancer.
    • Process health checks can be used to check if any process is healthy
  • Google Cloud Logging or Stackdriver logging
    • Cloud Logging provides real-time log management and analysis
    • Cloud Logging allows ingestion of custom log data from any source
    • Logs can be exported by configuring log sinks to BigQuery, Cloud Storage, or Pub/Sub.
    • Cloud Logging Agent can be installed for logging and capturing application logs.
    • Cloud Logging Agent uses fluentd and fluentd filter can be applied to filter, modify logs before being pushed to Cloud Logging.
    • VPC Flow Logs helps record network flows sent from and received by VM instances.
    • Cloud Logging Log-based metrics can be used to create alerts on logs.
    • Hint: If the logs from VM do not appear on Cloud Logging, check if the agent is installed and running and it has proper permissions to write the logs to Cloud Logging.
  • Cloud Error Reporting
    • counts, analyzes and aggregates the crashes in the running cloud services
  • Cloud Profiler
    • Cloud Profiler allows for monitoring of system resources like CPU and memory on both GCP and on-premises resources.
  • Cloud Trace
    • is a distributed tracing system that collects latency data from the applications and displays it in the Google Cloud Console.
  • Cloud Debugger
    • is a feature of Google Cloud that lets you inspect the state of a running application in real-time, without stopping or slowing it down
    • Debug Logpoints allow logging injection into running services without restarting or interfering with the normal function of the service
    • Debug Snapshots help capture local variables and the call stack at a specific line location in your app’s source code

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the security aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for computing and provides fine-grained control
    • Preemptible VMs and their use cases. HINT – use for short term needs
    • Committed Usage Discounts – CUD help provide cost benefits for long-term stable and predictable usage.
    • Managed Instance Group can help scale VMs as per the demand. It also helps provide auto-healing and high availability with health checks, in case an application fails.
  • Google Kubernetes Engine
    • GKE can be scaled using
      • Cluster AutoScaler to scale the cluster
      • Vertical Pod Scaler to scale the pods with increasing resource needs
      • Horizontal Pod Autoscaler helps scale Kubernetes workload by automatically increasing or decreasing the number of Pods in response to the workload’s CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.
    • Kubernetes Secrets can be used to store secrets (although they are just base64 encoded values)
    • Kubernetes supports rolling and recreate deployment strategies.

Security

  • Cloud Key Management Service – KMS
    • Cloud KMS can be used to store keys to encrypt data in Cloud Storage and other integrated storage
  • Cloud Secret Manager
    • Cloud Secret Manager can be used to store secrets as well

Site Reliability Engineering – SRE

  • SRE is a DevOps implementation and focuses on increasing reliability and observability, collaboration, and reducing toil using automation.
  • SLOs help specify a target level for the reliability of your service using SLIs which provide actual measurements.
  •  SLI Types
    • Availability
    • Freshness
    • Latency
    • Quality
  • SLOs – Choosing the measurement method
    • Synthetic clients to measure user experience
    • Client-side instrumentation
    • Application and Infrastructure metrics
    • Logs processing
  • SLOs help defines Error Budget and Error Budget Policy which need to be aligned with all the stakeholders and help plan releases to focus on features vs reliability.
  • SRE focuses on Reducing Toil – Identifying repetitive tasks and automating them.
  • Production Readiness Review – PRR
    • Applications should be performance tested for volumes before being deployed to production
    • SLOs should not be modified/adjusted to facilitate production deployments. Teams should work to make the applications SLO compliant before they are deployed to production.
  • SRE Practices include
    • Incident Management and Response
      • Priority should be to mitigate the issue, and then investigate and find the root cause. Mitigating would include
        • Rollbacking the release causes issues
        • Routing traffic to working site to restore user experience
      • Incident Live State Document helps track the events and decision making which can be useful for postmortem.
      • involves the following roles
        • Incident Commander/Manager
          • Setup a communication channel for all to collaborate
          • Assign and delegate roles. IC would assume any role, if not delegated.
          • Responsible for Incident Live State Document
        • Communications Lead
          • Provide periodic updates to all the stakeholders and customers
        • Operations Lead
          • Responds to the incident and should be the only group modifying the system during an incident.
    • Postmortem
      • should contain the root cause
      • should be Blameless
      • should be shared with all for collaboration and feedback
      • should be shared with all the shareholders
      • should have proper action items to prevent recurrence with an owner and collaborators, if required.

All the Best !!

Google Cloud Certified – Cloud Digital Leader Learning Path

Google Cloud Certified - Cloud Digital Leader Certificate

Google Cloud – Cloud Digital Leader Certification Learning Path

Continuing on the Google Cloud Journey, glad to have passed the seventh certification with the Professional Cloud Digital Leader certification. Google Cloud was missing the initial entry-level certification similar to AWS Cloud Practitioner certification, which was introduced as the Cloud Digital Leader certification. Cloud Digital Leader focuses on general Cloud knowledge,  Google Cloud knowledge with its products and services.

Google Cloud – Cloud Digital Leader Certification Summary

  • Had 59 questions (somewhat odd !!) to be answered in 90 minutes.
  • Covers a wide range of General Cloud and Google Cloud services and products knowledge.
  • This exam does not require much Hands-on and theoretical knowledge is good enough to clear the exam.

Google Cloud – Cloud Digital Leader Certification Resources

Google Cloud – Cloud Digital Leader Certification Topics

General cloud knowledge

  1. Define basic cloud technologies. Considerations include:
    1. Differentiate between traditional infrastructure, public cloud, and private cloud
      1. Traditional infrastructure includes on-premises data centers
      2. Public cloud include Google Cloud, AWS, and Azure
      3. Private Cloud includes services like AWS Outpost
    2. Define cloud infrastructure ownership
    3. Shared Responsibility Model
      1. Security of the Cloud is Google Cloud’s responsibility
      2. Security on the Cloud depends on the services used and is shared between Google Cloud and the Customer
    4. Essential characteristics of cloud computing
      1. On-demand computing
      2. Pay-as-you-use
      3. Scalability and Elasticity
      4. High Availability and Resiliency
      5. Security
  2. Differentiate cloud service models. Considerations include:
    1. Infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS)
      1. IaaS – everything is done by you – more flexibility more management
      2. PaaS – most of the things are done by Cloud with few things done by you – moderate flexibility and management
      3. SaaS – everything is taken care of by the Cloud, you would just it – no flexibility and management
    2. Describe the trade-offs between level of management versus flexibility when comparing cloud services
    3. Define the trade-offs between costs versus responsibility
    4. Appropriate implementation and alignment with given budget and resources
  3. Identify common cloud procurement financial concepts. Considerations include:
    1. Operating expenses (OpEx), capital expenditures (CapEx), and total cost of operations (TCO)
      1. On-premises has more of Capex and less OpEx
      2. Cloud has no to least Capex and more of OpEx
    2. Recognize the relationship between OpEx and CapEx related to networking and compute infrastructure
    3. Summarize the key cost differentiators between cloud and on-premises environments

General Google Cloud knowledge

  1. Recognize how Google Cloud meets common compliance requirements. Considerations include:
    1. Locating current Google Cloud compliance requirements
    2. Familiarity with Compliance Reports Manager
  2. Recognize the main elements of Google Cloud resource hierarchy. Considerations include:
    1. Describe the relationship between organization, folders, projects, and resources i.e. Organization -> Folder -> Folder or Projects -> Resources
  3. Describe controlling and optimizing Google Cloud costs. Considerations include:
    1. Google Cloud billing models and applicability to different service classes
    2. Define a consumption-based use model
    3. Application of discounts (e.g., flat-rate, committed-use discounts [CUD], sustained-use discounts [SUD])
      1. Sustained-use discounts [SUD] are automatic discounts for running specific resources for a significant portion of the billing month
      2. Committed use discounts [CUD] help with committed use contracts in return for deeply discounted prices for VM usage
  4. Describe Google Cloud’s geographical segmentation strategy. Considerations include:
    1. Regions are collections of zones. Zones have high-bandwidth, low-latency network connections to other zones in the same region. Regions help design fault-tolerant and highly available solutions.
    2. Zones are deployment areas within a region and provide the lowest latency usually less than 10ms
    3. Regional resources are accessible by any resources within the same region
    4. Zonal resources are hosted in a zone are called per-zone resources.
    5. Multiregional resources or Global resources are accessible by any resource in any zone within the same project.
  5. Define Google Cloud support options. Considerations include:
    1. Distinguish between billing support, technical support, role-based support, and enterprise support
      1. Role-Based Support provides more predictable rates and a flexible configuration. Although they are legacy, the exam does cover these.
      2. Enterprise Support provides the fastest case response times and a dedicated Technical Account Management (TAM) contact who helps you execute a Google Cloud strategy.
    2. Recognize a variety of Service Level Agreement (SLA) applications

Google Cloud products and services

  1. Describe the benefits of Google Cloud virtual machine (VM)-based compute options. Considerations include:
    1. Compute Engine provides virtual machines (VM) hosted on Google’s infrastructure.
    2. Google Cloud VMware Engine helps easy lift and shift VMware-based applications to Google Cloud without changes to the apps, tools, or processes
    3. Bare Metal lets businesses run specialized workloads such as Oracle databases close to Google Cloud while lowering overall costs and reducing risks associated with migration
    4. Custom versus standard sizing
    5. Free, premium, and custom service options
    6. Attached storage/disk options
    7. Preemptible VMs is an instance that can be created and run at a much lower price than normal instances.
  2. Identify and evaluate container-based compute options. Considerations include:
    1. Define the function of a container registry
      1. Container Registry is a single place to manage Docker images, perform vulnerability analysis, and decide who can access what with fine-grained access control.
    2. Distinguish between VMs, containers, and Google Kubernetes Engine
  3. Identify and evaluate serverless compute options. Considerations include:
    1. Define the function and use of App Engine, Cloud Functions, and Cloud Run
    2. Define rationale for versioning with serverless compute options
    3. Cost and performance tradeoffs of scale to zero
      1. Scale to zero helps provides cost efficiency by scaling down to zero when there is no load but comes with an issue with cold starts
      2. Serverless technologies like Cloud Functions, Cloud Run, App Standard Engine provides these capabilities
  4. Identify and evaluate multiple data management offerings. Considerations include:
    1. Describe the differences and benefits of Google Cloud’s relational and non-relational database offerings
      1. Cloud SQL provides fully managed, relational SQL databases and offers MySQL, PostgreSQL, MSSQL databases as a service
      2. Cloud Spanner provides fully managed, relational SQL databases with joins and secondary indexes
      3. Cloud Bigtable provides a scalable, fully managed, non-relational NoSQL wide-column analytical big data database service suitable for low-latency single-point lookups and precalculated analytics
      4. BigQuery provides fully managed, no-ops, OLAP, enterprise data warehouse (EDW) with SQL and fast ad-hoc queries.
    2. Describe Google Cloud’s database offerings and how they compare to commercial offerings
  5. Distinguish between ML/AI offerings. Considerations include:
    1. Describe the differences and benefits of Google Cloud’s hardware accelerators (e.g., Vision API, AI Platform, TPUs)
    2. Identify when to train your own model, use a Google Cloud pre-trained model, or build on an existing model
      1. Vision API provides out-of-the-box pre-trained models to extract data from images
      2. AutoML provides the ability to train models
      3. BigQuery Machine Learning provides support for limited models and SQL interface
  6. Differentiate between data movement and data pipelines. Considerations include:
    1. Describe Google Cloud’s data pipeline offerings
      1. Cloud Pub/Sub provides reliable, many-to-many, asynchronous messaging between applications. By decoupling senders and receivers, Google Cloud Pub/Sub allows developers to communicate between independently written applications.
      2. Cloud Dataflow is a fully managed service for strongly consistent, parallel data-processing pipelines
      3. Cloud Data Fusion is a fully managed, cloud-native, enterprise data integration service for quickly building & managing data pipelines
      4. BigQuery Service is a fully managed, highly scalable data analysis service that enables businesses to analyze Big Data.
      5. Looker provides an enterprise platform for business intelligence, data applications, and embedded analytics.
    2. Define data ingestion options
  7. Apply use cases to a high-level Google Cloud architecture. Considerations include:
    1. Define Google Cloud’s offerings around the Software Development Life Cycle (SDLC)
    2. Describe Google Cloud’s platform visibility and alerting offerings covers Cloud Monitoring and Cloud Logging
  8. Describe solutions for migrating workloads to Google Cloud. Considerations include:
    1. Identify data migration options
    2. Differentiate when to use Migrate for Compute Engine versus Migrate for Anthos
      1. Migrate for Compute Engine provides fast, flexible, and safe migration to Google Cloud
      2. Migrate for Anthos and GKE makes it fast and easy to modernize traditional applications away from virtual machines and into native containers. This significantly reduces the cost and labor that would be required for a manual application modernization project.
    3. Distinguish between lift and shift versus application modernization
      1. involves lift and shift migration with zero to minimal changes and is usually performed with time constraints
      2. Application modernization requires a redesign of infra and applications and takes time. It can include moving legacy monolithic architecture to microservices architecture, building CI/CD pipelines for automated builds and deployments, frequent releases with zero downtime, etc.
  9. Describe networking to on-premises locations. Considerations include:
    1. Define Software-Defined WAN (SD-WAN) – did not have any questions regarding the same.
    2. Determine the best connectivity option based on networking and security requirements – covers Cloud VPN, Interconnect, and Peering.
    3. Private Google Access provides access from VM instances to Google provides services like Cloud Storage or third-party provided services
  10. Define identity and access features. Considerations include:
    1. Cloud Identity & Access Management (Cloud IAM) provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
    2. Google Cloud Directory Sync enables administrators to synchronize users, groups, and other data from an Active Directory/LDAP service to their Google Cloud domain directory.

Google Cloud – Professional Cloud Developer Certification learning path

Google Cloud Profressional Cloud Developer Certificate

Google Cloud – Professional Cloud Developer Certification learning path

Continuing on the Google Cloud Journey, glad to have passed the sixth certification with the Professional Cloud Developer certification.

Google Cloud -Professional Cloud Developer Certification Summary

  • Had 60 questions to be answered in 2 hours. The number of questions was 50 with the other exams in the same 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on application and deployment services
  • Make sure you cover the case studies beforehand. I got  ~5-6 questions and it can really be a savior for you in the exams.
  • As mentioned for all the exams, Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless about some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud Developer Certification Resources

Google Cloud – Professional Cloud Developer Certification Topics

Case Studies

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the security aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for compute and provides fine-grained control
    • Compute Engine is recommended to be used with Service Account with the least privilege to provide access to Google services and the information can be queried from instance metadata.
    • Compute Engine Persistent disks can be attached to multiple VMs in read-only mode.
    • Compute Engine launch issues reasons
      • Boot disk is full.
      • Boot disk is corrupted
      • Boot Disk has an invalid master boot record (MBR).
      • Quota Errors
      • Can be debugged using Serial console
    • Preemptible VMs and their use cases. HINT –  shutdown script to perform cleanup actions
  • Google Kubernetes Engine
    • Google Kubernetes Engine, enables running containers on Google Cloud
    • Understand GKE containers, Pods, Deployments, Service, DaemonSet, StatefulSets
      • Pods are the smallest, most basic deployable objects in Kubernetes. A Pod represents a single instance of a running process in the cluster and can contain single or multiple containers
      • Deployments represent a set of multiple, identical Pods with no unique identities. A Deployment runs multiple replicas of the application and automatically replaces any instances that fail or become unresponsive.
      • StatefulSets represent a set of Pods with unique, persistent identities and stable hostnames that GKE maintains regardless of where they are scheduled
      • DaemonSets manages groups of replicated Pods. However, DaemonSets attempt to adhere to a one-Pod-per-node model, either across the entire cluster or a subset of nodes
      • Service is to group a set of Pod endpoints into a single resource. GKE Services can be exposed as ClusterIP, NodePort, and Load Balancer
      • Ingress object defines rules for routing HTTP(S) traffic to applications running in a cluster. An Ingress object is associated with one or more Service objects, each of which is associated with a set of Pods
    • GKE supports Horizontal Pod Autoscaler (HPA) to autoscale deployments based on CPU and Memory
    • GKE supports health checks using liveness and readiness probe
      • Readiness probes are designed to let Kubernetes know when the app is ready to serve traffic.
      • Liveness probes let Kubernetes know if the app is alive or dead.
    • Understand Workload Identity for security, which is a recommended way to provide Pods running on the cluster access to Google resources.
    • GKE integrates with Istio to provide MTLS feature
  • Google App Engine
  • Cloud Tasks
    • is a fully managed service that allows you to manage the execution, dispatch, and delivery of a large number of distributed tasks.

Security Services

  • Cloud Identity-Aware Proxy
    • Identity-Aware Proxy IAP allows managing access to HTTP-based apps both on Google Cloud and outside of Google Cloud.
    • IAP uses Google identities and IAM and can leverage external identity providers as well like OAuth with Facebook, Microsoft, SAML, etc.
    • Signed headers using JWT provide secondary security in case someone bypasses IAP.
  • Cloud Data Loss Prevention – DLP
    • Cloud Data Loss Prevention – DLP is a fully managed service designed to help discover, classify, and protect the most sensitive data.
    • provides two key features
      • Classification is the process to inspect the data and know what data we have, how sensitive it is, and the likelihood.
      • De-identification is the process of removing, masking, redaction, replacing information from data.
  • Web Security Scanner
    • Web Security Scanner identifies security vulnerabilities in the App Engine, GKE, and Compute Engine web applications.
    • scans provide information about application vulnerability findings, like OWASP, XSS, Flash injection, outdated libraries, cross-site scripting, clear-text passwords, or use of mixed content

Networking Services

  • Virtual Private Cloud
    • Understand Virtual Private Cloud (VPC), subnets, and host applications within them
    • Private Access options for services allow instances with internal IP addresses can communicate with Google APIs and services.
    • Private Google Access allows VMs to connect to the set of external IP addresses used by Google APIs and services by enabling Private Google Access on the subnet used by the VM’s network interface.
  • Cloud Load Balancing
    • Google Cloud Load Balancing provides scaling, high availability, and traffic management for your internet-facing and private applications.

Identity Services

  • Resource Manager
    • Understand Resource Manager the hierarchy Organization -> Folders -> Projects -> Resources
    • IAM Policy inheritance is transitive and resources inherit the policies of all of their parent resources.
    • Effective policy for a resource is the union of the policy set on that resource and the policies inherited from higher up in the hierarchy.
  • Identity and Access Management
    • Identify and Access Management – IAM provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
    • A service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person.
    • Understand IAM Best Practices
      • Use groups for users requiring the same responsibilities
      • Use service accounts for server-to-server interactions.
      • Use Organization Policy Service to get centralized and programmatic control over the organization’s cloud resources.
    • Domain-wide delegation of authority to grant third-party and internal applications access to the users’ data for e.g. Google Drive etc.

Storage Services

  • Cloud Storage
    • Cloud Storage is cost-effective object storage for unstructured data and provides an option for long term data retention
    • Understand Signed URL to give temporary access and the users do not need to be GCP users HINT: Signed URL would work for direct upload to GCS without routing the traffic through App Engine or CE
    • Understand Google Cloud Storage Classes and Object Lifecycle Management to transition objects
    • Retention Policies help define the retention period for the bucket, before which the objects in the bucket cannot be deleted.
    • Bucket Lock feature allows configuring a data retention policy for a bucket that governs how long objects in the bucket must be retained. The feature also allows locking the data retention policy, permanently preventing the policy from being reduced or removed
    • Know Cloud Storage Best Practices esp. GCS auto-scaling performs well if requests ramp up gradually rather than having a sudden spike. Also, retry using exponential back-off strategy
    • Cloud Storage can be used to host static websites
    • Cloud CDN can be used with Cloud Storage to improve performance and enable caching
  • DataStore/FireStore
    • Cloud Datastore/Firestore provides a managed NoSQL document database built for automatic scaling, high performance, and ease of application development.

Developer Tools

  • Google Cloud Build
    • Cloud Build integrates with Cloud Source Repository, Github, and Gitlab and can be used for Continous Integration and Deployments.
    • Cloud Build can import source code, execute build to the specifications, and produce artifacts such as Docker containers or Java archives
    • Cloud Build build config file specifies the instructions to perform, with steps defined to each task like test, build and deploy.
    • Cloud Build supports custom images as well for the steps
    • Cloud Build uses a directory named /workspace as a working directory and the assets produced by one step can be passed to the next one via the persistence of the /workspace directory.
  • Google Cloud Code
    • Cloud Code helps write, debug, and deploy the cloud-based applications for IntelliJ, VS Code, or in the browser.
  • Google Cloud Client Libraries
    • Google Cloud Client Libraries provide client libraries and SDKs in various languages for calling Google Cloud APIs.
    • If the language is not supported, Cloud Rest APIs can be used.
  • Deployment Techniques
    • Recreate deployment – fully scale down the existing application version before you scale up the new application version.
    • Rolling update – update a subset of running application instances instead of simultaneously updating every application instance
    • Blue/Green deployment – (also known as a red/black deployment), you perform two identical deployments of your application
    • GKE supports Rolling and Recreate deployments.
      • Rolling deployments support maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted)
    • Managed Instance groups support Rolling deployments using the
    • maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted) configurations
  • Testing Strategies
    • Canary testing – partially roll out a change and then evaluate its performance against a baseline deployment
    • A/B testing – test a hypothesis by using variant implementations. A/B testing is used to make business decisions (not only predictions) based on the results derived from data.

Data Services

  • Bigtable
  • Cloud Pub/Sub
    • Understand Cloud Pub/Sub as an asynchronous messaging service
    • Know patterns for One to Many, Many to One, and Many to Many
    • roles/publisher and roles/pubsub.subscriber provides applications with the ability to publish and consume.
  • Cloud SQL
    • Cloud SQL is a fully managed service that provides MySQL, PostgreSQL, and Microsoft SQL Server.
    • HA configuration provides data redundancy and failover capability with minimal downtime when a zone or instance becomes unavailable due to a zonal outage, or an instance corruption
    • Read replicas help scale horizontally the use of data in a database without degrading performance
  • Cloud Spanner
    • is a fully managed relational database with unlimited scale, strong consistency, and up to 99.999% availability.
    • can read and write up-to-date strongly consistent data globally
    • Multi-region instances give higher availability guarantees (99.999% availability) and global scale.
    • Cloud Spanner’s table interleaving is a good choice for many parent-child relationships where the child table’s primary key includes the parent table’s primary key columns.

Monitoring

  • Google Cloud Monitoring or Stackdriver
    • provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
    • Cloud Monitoring helps gain visibility into the performance, availability, and health of your applications and infrastructure.
  • Google Cloud Logging or Stackdriver logging
    • Cloud Logging provides real-time log management and analysis
    • Cloud Logging allows ingestion of custom log data from any source
    • Logs can be exported by configuring log sinks to BigQuery, Cloud Storage, or Pub/Sub.
    • Cloud Logging Agent can be installed for logging and capturing application logs.
  • Cloud Error Reporting
    • counts, analyzes, and aggregates the crashes in the running cloud services
  • Cloud Trace
    • is a distributed tracing system that collects latency data from the applications and displays it in the Google Cloud Console.
  • Cloud Debugger
    • is a feature of Google Cloud that lets you inspect the state of a running application in real-time, without stopping or slowing it down
    • Debug Logpoints allow logging injection into running services without restarting or interfering with the normal function of the service
    • Debug Snapshots help capture local variables and the call stack at a specific line location in your app’s source code

All the Best !!

Google Cloud – Professional Cloud Security Engineer Certification learning path

GCP - Professional Cloud Security Engineer Certificate

Google Cloud – Professional Cloud Security Engineer Certification learning path

Continuing on the Google Cloud Journey, have just cleared the Professional Cloud Security certification. Google Cloud – Professional Cloud Security Engineer certification exam focuses on almost all of the Google Cloud security services with storage, compute, networking services with their security aspects only.

Google Cloud -Professional Cloud Security Engineer Certification Summary

  • Has 50 questions to be answered in 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on security and network services
  • As mentioned for all the exams, Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless about some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud Security Engineer Certification Resources

Google Cloud – Professional Cloud Security Engineer Certification Topics

Security Services

  • Google Cloud – Security Services Cheat Sheet
  • Cloud Key Management Service – KMS
    • Cloud KMS provides a centralized, scalable, fast cloud key management service to manage encryption keys
    • KMS Key is a named object containing one or more key versions, along with metadata for the key.
    • KMS KeyRing provides grouping keys with related permissions that allow you to grant, revoke, or modify permissions to those keys at the key ring level without needing to act on each key individually.
  • Cloud Armor
    • Cloud Armor protects the applications from multiple types of threats, including DDoS attacks and application attacks like XSS and SQLi
    • works with the external HTTP(S) load balancer to automatically block network protocol and volumetric DDoS attacks such as protocol floods (SYN, TCP, HTTP, and ICMP) and amplification attacks (NTP, UDP, DNS)
    • with GKE needs to be configured with GKE Ingress
    • can be used to blacklist IPs
    • supports preview mode to understand patterns without blocking the users
  • Cloud Identity-Aware Proxy
    • Identity-Aware Proxy IAP allows managing access to HTTP-based apps both on Google Cloud and outside of Google Cloud.
    • IAP uses Google identities and IAM and can leverage external identity providers as well like OAuth with Facebook, Microsoft, SAML, etc.
    • Signed headers using JWT provide secondary security in case someone bypasses IAP.
  • Cloud Data Loss Prevention – DLP
    • Cloud Data Loss Prevention – DLP is a fully managed service designed to help discover, classify, and protect the most sensitive data.
    • provides two key features
      • Classification is the process to inspect the data and know what data we have, how sensitive it is, and the likelihood.
      • De-identification is the process of removing, masking, redaction, replacing information from data.
    • supports text, image, and storage classification with scans on data stored in Cloud Storage, Datastore, and BigQuery
    • supports scanning of binary, text, image, Microsoft Word, PDF, and Apache Avro files
  • Web Security Scanner
    • Web Security Scanner identifies security vulnerabilities in the App Engine, GKE, and Compute Engine web applications.
    • scans provide information about application vulnerability findings, like OWASP, XSS, Flash injection, outdated libraries, cross-site scripting, clear-text passwords, or use of mixed content
  • Security Command Center – SCC
    • is a Security and risk management platform that helps generate curated insights and provides a unique view of incoming threats and attacks to the assets
    • displays possible security risks, called findings, that are associated with each asset.
  • Forseti Security
    • the open-source security toolkit, and third-party security information and event management (SIEM) applications
    • keeps track of the environment with inventory snapshots of GCP resources on a recurring cadence
  • Access Context Manager
    • Access Context Manager allows organization administrators to define fine-grained, attribute-based access control for projects and resources
    • Access Context Manager helps reduce the size of the privileged network and move to a model where endpoints do not carry ambient authority based on the network.
    • Access Context Manager helps prevent data exfiltration with proper access levels and security perimeter rules

Compliance

  • FIPS 140-2 Validated
    • FIPS 140-2 Validated certification was established to aid in the protection of digitally stored unclassified, yet sensitive, information.
    • Google Cloud uses a FIPS 140-2 validated encryption module called BoringCrypto in the production environment. This means that both data in transit to the customer and between data centers, and data at rest are encrypted using FIPS 140-2 validated encryption.
    • BoringCrypto module that achieved FIPS 140-2 validation is part of the BoringSSL library.
    • BoringSSL library as a whole is not FIPS 140-2 validated
  • PCI/DSS Compliance
    • PCI/DSS compliance is a shared responsibility model
    • Egress rules cannot be controlled for App Engine, Cloud Functions, and Cloud Storage. Google recommends using compute Engine and GKE to ensure that all egress traffic is authorized.
    • Antivirus software and File Integrity monitoring must be used on all systems commonly affected by malware to protect systems from current and evolving malicious software threats including containers
    • For payment processing, the security can be improved and compliance proved by isolating each of these environments into its own VPC network and reduce the scope of systems subject to PCI audit standards

Networking Services

  • Refer Google Cloud Security Services Cheat Sheet
  • Virtual Private Cloud
    • Understand Virtual Private Cloud (VPC), subnets, and host applications within them
    • Firewall rules control the Traffic to and from instances. HINT: rules with lower integers indicate higher priorities. Firewall rules can be applied to specific tags.
    • Know implied firewall rules which deny all ingress and allow all egress
    • Understand the difference between using Service Account vs Network Tags for filtering in Firewall rules. HINT: Use SA over tags as it provides access control while tags can be easily inferred.
    • VPC Peering allows internal or private IP address connectivity across two VPC networks regardless of whether they belong to the same project or the same organization. HINT: VPC Peering uses private IPs and does not support transitive peering
    • Shared VPC allows an organization to connect resources from multiple projects to a common VPC network so that they can communicate with each other securely and efficiently using internal IPs from that network
    • Private Access options for services allow instances with internal IP addresses can communicate with Google APIs and services.
    • Private Google Access allows VMs to connect to the set of external IP addresses used by Google APIs and services by enabling Private Google Access on the subnet used by the VM’s network interface.
    • VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as GKE nodes.
    • Firewall Rules Logging enables auditing, verifying, and analyzing the effects of the firewall rules
  • Hybrid Connectivity
    • Understand Hybrid Connectivity options in terms of security.
    • Cloud VPN provides secure connectivity from the on-premises data center to the GCP network through the public internet. Cloud VPN does not provide internal or private IP connectivity
    • Cloud Interconnect provides direct connectivity from the on-premises data center to the GCP network
  • Cloud NAT
    • Cloud NAT allows VM instances without external IP addresses and private GKE clusters to send outbound packets to the internet and receive any corresponding established inbound response packets.
    • Requests would not be routed through Cloud NAT if they have an external IP address
  • Cloud DNS
    • Understand Cloud DNS and its features 
    • supports DNSSEC, a feature of DNS, that authenticates responses to domain name lookups and protects the domains from spoofing and cache poisoning attacks
  • Cloud Load Balancing
    • Google Cloud Load Balancing provides scaling, high availability, and traffic management for your internet-facing and private applications.
    • Understand Google Load Balancing options and their use cases esp. which is global, internal and does they support SSL offloading
      • Network Load Balancer – regional, external, pass through and supports TCP/UDP
      • Internal TCP/UDP Load Balancer – regional, internal, pass through and supports TCP/UDP
      • HTTP/S Load Balancer – regional/global, external, pass through and supports HTTP/S
      • Internal HTTP/S Load Balancer – regional/global, internal, pass through and supports HTTP/S
      • SSL Proxy Load Balancer – regional/global, external, proxy, supports SSL with SSL offload capability
      • TCP Proxy Load Balancer – regional/global, external, proxy, supports TCP without SSL offload capability

Identity Services

  • Resource Manager
    • Understand Resource Manager the hierarchy Organization -> Folders -> Projects -> Resources
    • IAM Policy inheritance is transitive and resources inherit the policies of all of their parent resources.
    • Effective policy for a resource is the union of the policy set on that resource and the policies inherited from higher up in the hierarchy.
  • Identity and Access Management
    • Identify and Access Management – IAM provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
    • A service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person.
    • Service Account, if accidentally deleted, can be recovered if the time gap is less than 30 days and a service account by the same name wasn’t created
    • Understand IAM Best Practices
      • Use groups for users requiring the same responsibilities
      • Use service accounts for server-to-server interactions.
      • Use Organization Policy Service to get centralized and programmatic control over the organization’s cloud resources.
    • Domain-wide delegation of authority to grant third-party and internal applications access to the users’ data for e.g. Google Drive etc.
  • Cloud Identity
    • Cloud Identity provides IDaaS (Identity as a Service) and provides single sign-on functionality and federation with external identity provides like Active Directory.
    • Cloud Identity supports federating with Active Directory using GCDS to implement the synchronization

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the security aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for compute and provides fine-grained control
    • Managing access using OS Login or project and instance metadata
    • Compute Engine is recommended to be used with Service Account with the least privilege to provide access to Google services and the information can be queried from instance metadata.
  • Google Kubernetes Engine
    • Google Kubernetes Engine, enables running containers on Google Cloud
    • Understand Best Practices for Building Containers
      • Package a single app per container
      • Properly handle PID 1, signal handling, and zombie processes
      • Optimize for the Docker build cache
      • Remove unnecessary tools
      • Build the smallest image possible
      • Scan images for vulnerabilities
      • Restrict using Public Image
      • Managed Base Images

Storage Services

  • Cloud Storage
    • Cloud Storage is cost-effective object storage for unstructured data and provides an option for long term data retention
    • Understand Cloud Storage Security features
      • Understand various Data Encryption techniques including Envelope Encryption, CMEK, and CSEK. HINT: CSEK works with Cloud Storage and Persistent Disks only. CSEK manages KEK and not DEK.
      • Cloud Storage default encryption uses AES256
      • Understand Signed URL to give temporary access and the users do not need to be GCP users
      • Understand access control and permissions – IAM (Uniform) vs ACLs (fine-grained control)
      • Bucket Lock feature allows configuring a data retention policy for a bucket that governs how long objects in the bucket must be retained. The feature also allows locking the data retention policy, permanently preventing the policy from being reduced or removed

Monitoring

  • Google Cloud Monitoring or Stackdriver
    • provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
  • Google Cloud Logging or Stackdriver logging
    • Audit logs are provided through Cloud logging using Admin Activity and Data Access Audit logs
    • VPC Flow logs and Firewall Rules logs help monitor traffic to and from Compute Engine instances.
    • log sinks can export data to external providers via Cloud Pub/Sub

All the Best !!

Google Cloud – Professional Cloud Network Engineer Certification learning path

Google Cloud - Professional Cloud Network Engineer Certification

Google Cloud – Professional Cloud Network Engineer Certification learning path

Google Cloud – Professional Cloud Network Engineer certification exam focuses on almost all of the Google Cloud network services.

Google Cloud -Professional Cloud Network Engineer Certification Summary

  • Has 50 questions to be answered in 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on network services
  • Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless for some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud Network Engineer Certification Resources

Google Cloud – Professional Cloud Network Engineer Certification Topics

Network Services

  • Refer Google Cloud Networking Services Cheat Sheet
  • Virtual Private Cloud
    • Understand Virtual Private Cloud (VPC), subnets, and host applications within them
    • VPC Routes determine the next hop for the traffic. HINT: It can be defined for specific tags as well. More specific takes priority.
    • Firewall rules control the Traffic to and from instances. HINT: rules with lower integers indicate higher priorities. Firewall rules can be applied to specific tags.
    • VPC Peering allows internal or private IP address connectivity across two VPC networks regardless of whether they belong to the same project or the same organization. HINT: VPC Peering uses private IPs and does not support transitive peering
    • Shared VPC allows an organization to connect resources from multiple projects to a common VPC network so that they can communicate with each other securely and efficiently using internal IPs from that network HINT: VLAN attachments and Cloud Routers for Interconnect must be created in the host project
    • Understand the concept internal and external IPs and the difference between static and ephemeral IPs
    • VPC Subnets support primary and secondary (alias) IP range
    • Primary IP range of an existing subnet can be expanded by modifying its subnet mask, setting the prefix length to a smaller number.
    • Private Access options for services allow instances with internal IP addresses can communicate with Google APIs and services.
    • Private Google Access allows VMs to connect to the set of external IP addresses used by Google APIs and services by enabling Private Google Access on the subnet used by the VM’s network interface. HINT: Private Google Access is enabled on the subnet and not on the VPC level
    • VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as GKE nodes.
    • Firewall Rules Logging enables auditing, verifying, and analyzing the effects of the firewall rules HINT: Default implicit ingress deny rule is not captured by firewall rules logging. Add an explicit deny rule
    • Resources within a VPC network can communicate with one another by using internal IPv4 addresses
  • Hybrid Connectivity
  • Cloud VPN
    • Cloud VPN provides secure connectivity from the on-premises data center to the GCP network through the public internet. Cloud VPN does not provide internal or private IP connectivity
    • Understand what are the requirements to setup Cloud VPN.
    • Cloud VPN is quick to setup and test hybrid connectivity
    • Understand limitations of Cloud VPN esp. 3Gbps limit. How it can be improved with multiple tunnels.
    • Cloud VPN requires non overlapping primary and secondary IPs address between on-premises and GCP VPC networks
    • Cloud VPN HA provides a highly available and secure connection between the on-premises and the VPC network through an IPsec VPN connection in a single region
  • Cloud Interconnect
    • Cloud Interconnect provides direct connectivity from the on-premises data center to GCP network
    • Dedicated Interconnect provides a direct physical connection between the on-premises network and Google’s network. Supports > 10Gbps
    • Partner Interconnect provides connectivity between the on-premises and VPC networks through a supported service provider. Supports 50Mbps to 10 Gbps
    • Understand Dedicated Interconnect vs Partner Interconnect  and when to choose
    • Know Interconnect as the reliable high speed, low latency, and dedicated bandwidth option.
    • Cloud Monitoring monitors interconnect links. Circuit Operational Status metric threshold tracks the circuits while Interconnect Operational Status metric tracks all the links
  • Cloud Router
    • Cloud Router provides dynamic routing using BGP with HA VPN and Cloud Interconnect
    • Cloud Router Global routing mode provides visibility to resources in all regions
    • Cloud Router uses Multi-exit Discriminator (MED) value to route traffic. The same MED value results in Active/Active connection and different MED results in Active/Passive connection
  • Cloud NAT
    • Cloud NAT allows VM instances without external IP addresses and private GKE clusters to send outbound packets to the internet and receive any corresponding established inbound response packets.
    • Requests would not be routed through Cloud NAT if they have an external IP address
  • Cloud Peering
    • Google Cloud Peering provides Direct Peering and Carrier Peering
    • Peering provides a direct path from the on-premises network to Google services, including Google Cloud products that can be exposed through one or more public IP addresses does not provide a private dedicated connection
  • Cloud Load Balancing
    • Google Cloud Load Balancing provides scaling, high availability, and traffic management for your internet-facing and private applications.
    • Understand Google Load Balancing options and their use cases esp. which is global and internal and what protocols they support.
      • Network Load Balancer – regional, external, pass through and supports TCP/UDP
      • Internal TCP/UDP Load Balancer – regional, internal, pass through and supports TCP/UDP
      • HTTP/S Load Balancer – regional/global, external, pass through and supports HTTP/S
      • Internal HTTP/S Load Balancer – regional/global, internal, pass through and supports HTTP/S
      • SSL Proxy Load Balancer – regional/global, external, proxy, supports SSL with SSL offload capability
      • TCP Proxy Load Balancer – regional/global, external, proxy, supports TCP without SSL offload capability
    • Cloud Load Balancing supports health checks with managed instance groups
  • Cloud CDN
    • Understand Cloud CDN as the global content delivery network
    • Know CDN works only for global external HTTP/S Load Balancer
    • Cache is not removed if the underlying origin data is removed. Cache has to be invalidated explicitly, or is removed once expired.
    • Cloud CDN does not compress but serves response from the origin as is. HINT: As LB adds Via header some web server do not compress response and must be configured to ignore the Via header
  • Cloud DNS
    • Understand Cloud DNS and its features
    • supports migration or importing of records from on-premises using JSON/YAML format
    • supports DNSSEC, a feature of DNS, that authenticates responses to domain name lookups and protects the domains from spoofing and cache poisoning attacks

Identity Services

  • Cloud Identity and Access Management
    • Identify and Access Management – IAM provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
    • Compute Network Admin does not provide access to SSL certificates and firewall rules. Need to assign Security Admin role

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the networking aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for compute and provides fine grained control
    • Difference between managed vs unmanaged instance groups and auto-healing feature
    • Regional Managed Instance group helps spread load across instances in multiple zones within the same region providing scalability and HA
    • Managed Instance group helps perform canary and rolling updates
    • Managed Instance group autoscaling can be configured on CPU or load balancer metrics or custom metrics.
    • Managing access using OS Login or project and instance metadata
  • Google Kubernetes Engine

Security Services

  • Cloud Armor
    • Cloud Armor protects the applications from multiple types of threats, including DDoS attacks and application attacks like XSS and SQLi
    • with GKE needs to be configured with GKE Ingress
    • can be used to blacklist IP
    • supports preview mode to understand patterns without blocking the users

All the Best !!

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) Exam Learning Path

AWS Certified Alexa Skill Builder - Specialty Certificate

Finally All Down for AWS (for now) …

Continuing on my AWS journey with the last AWS certification, I took another step by clearing the AWS Certified Alexa Skill Builder – Specialty (AXS-C01) certification. It is amazing to know and learn how Voice first experiences are making an impact and changing how we think about technology and use cases.

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) exam basically validates your ability to build, test, publish and certify Alexa skills.

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) Exam Summary

  • AWS Certified Alexa Skill Builder – Specialty exam focuses only on Alexa and how to build skills.
  • AWS Certified Alexa Skill Builder – Specialty exam has 65 questions with a time limit of 170 minutes
  • Compared to the other professional and specialty exams, the question and answers are not long and similar to associate exams. So if you are prepared well, it should not need the 170 minutes.
  • As the exam was online from home, there was no access to paper and pen but the trick remains the same, read the question and draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach to the right answer or atleast have a 50% chance of getting it right.

Are you looking for a job? Visit Jooble!

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) Exam Topic Summary

Refer AWS Alexa Cheat Sheet

Domain 1: Voice-First Design Practices and Capabilities

1.1 Describe how users interact with skills

1.2 Map features and capabilities to use cases

  • Alexa supports display cards to display text (Simple card) and text with image (Standard card)
  • Alexa Alexa Skill Kits supports APIs
    • Alexa Settings APIs allow developers to retrieve customer preferences for the settings like time zone, distance measuring unit, and temperature measurement unit 
    • Device services – a skill can request the customer’s permission to their address information, which is a static data filled by customer and includes the country/region, postal code and full address
    • Customer Profile services – a skill can request the customer’s permission to their contact information, which includes name, email address and phone number
    • With Location services, a skill can ask a user’s permission to obtain the real-time location of their Alexa-enabled device, specifically at the time of the user’s request to Alexa, so that the skill can provide enhanced services.
  • Alexa Skill Kit APIs need apiAccessToken and deviceId to access the ASK APIs
  • Progressive Response API allows you to keep the user engaged while the skill prepares a full response to the user’s request.
  • Personalization can be provided using userId and state persistence

Domain 2: Skill Design

2.1 Design and develop an interaction model

  • Alexa interaction model includes skill, Invocation name, utterances, slots, Intents
  • A skill is ‘an app for Alexa’, however they are not downloadable but just need to be enabled.
  • Wakeword – Amazon offers a choice of wakewords like ‘Alexa’, ‘Amazon’, ‘Echo’, ‘skill’, ‘app’ or ‘Computer’, with the default being ‘Alexa’.
  • Launch phrases include “run,” “start,” “play,” “resume,” “use,” “launch,” “ask,” “open,” “tell,” “load,” “begin,” and “enable.”
  • Connecting words include “to,” “from,” “in,” “using,” “with,” “about,” “for,” “that,” “by,” “if,” “and,” “whether.”
  • Invocation name
    • is the word or phrase used to trigger the skill for custom skills and the invocation name should adhere to the requirements
    • must not infringe upon the intellectual property rights of an entity or person
    • must be compound of two or more works.
    • One-word invocation names are allowed only for brand/intellectual property.
    • must not include names of people or places
    • if two-word invocation names, one of the words cannot be a definite article (“the”), indefinite article (“a”, “an”) or preposition (“for”, “to”, “of,” “about,” “up,” “by,” “at,” “off,” “with”).
    • must not contain any of the Alexa skill launch phrases, connecting words and wake words
    • must contain only lower-case alphabetic characters, spaces between words, and possessive apostrophes
    • must spell characters like numbers for e.g., twenty one
    • can have periods in the invocation names containing acronyms or abbreviations that are pronounced as a series of individual letters, for e.g. NASA as n. a. s. a.
    • cannot spell out phonemes for e.g., a skill titled “AWS Facts” would need “AWS” represented as “a. w. s. ” and NOT “ay double u ess.”
    • must not create confusion with existing Alexa features.
    • must be written in each supported language
  • An intent is what a user is trying to accomplish.
    • Amazon provides standard built-in intents which can be extended
    • Intents need to have a unique utterance
  • Utterances are the specific phrases that people will use when making a request to Alexa.
  • A slot is a variable that relates to an intent allowing Alexa to understand information about the request
    • Amazon provides standard built-in slots which can be extended
  • Entity resolution improves the way Alexa matches possible slot values in a user’s utterance with the slots defined in your interaction model

2.2 Design a multi-turn conversation

  • Alexa Dialog management model identifies the prompts and utterances to collect, validate, and confirm the slot values and intents.
  • Alexa supports
    • Auto Delegation where Alexa completes all of the dialog steps based on the dialog model.
    • Manual delegation using Dialog.Delegate where Alexa sends the skill an IntentRequest for each turn of the conversation and provides more flexibility.
  • AMAZON.FallbackIntent will not be triggered in the middle of a dialog

2.3 Use built-in intents and slots

  • Standard built-in intents cannot include any slots. If slots are needed, create a custom intent and write your own sample utterances.
  • Alexa recommends using and extending standard built-in intents like Alexa.HelpIntent, Alexa.YesIntent with additional utterances as per the skill requirements
  • Alexa provides Alexa.FallbackIntent for handling any unmatched utterances and can be used to improve the interaction model accuracy.
  • Standard built-in intents cannot include any slots. If slots are needed, create a custom intent and write your own sample utterances.
  • Alexa provides slot which helps capture variables and can be either be a Amazon predefined slot such as dates, numbers, durations, time, etc. or a custom one specific to the skill
  • Predefined slots can be extended to add additional values

2.4 Handle unexpected conversational requests or responses

  • Alexa provides Alexa.FallbackIntent for handling any unmatched utterances and can be used to improve the interaction model accuracy.
  • Alexa also provides Intent History  which provides  a consolidate view with aggregated, anonymized frequent utterances and the resolved intents. These can be used to map the utterances to correct intents

2.5 Design multi-modal skills using one or more service interfaces (for example, audio, video, and gadgets)

  • Alexa enabled devices with a screen handles Page and Scroll intents. Do not handle Next and Previous.
  • Alexa skill with AudioPlayer interface
    • must handle AMAZON.ResumeIntent and AMAZON.PauseIntent
    • PlaybackController events to track AudioPlayer status changes initiated from the device buttons

Domain 3: Skill Architecture

3.1 Identify AWS services for extending Alexa skill functionality (Amazon CloudFront, Amazon S3, Amazon CloudWatch, and Amazon DynamoDB)

  • Focus on standard skill architecture using Lambda for backend, DynamoDB for persistence, S3 for severing static assets, and CloudWatch for monitoring and logs.
  • Lambda provide serverless handling for the Alexa requests, but remember the following limits
    • default concurrency soft limit of 1000 can be increased by raising a support request
    • default timeout of 3 secs, and should be increased to atleast 7 secs to be inline with Alexa timeout of 8 secs
    • default memory of 128mb, increase to improve performance
  • S3 performance can be improved by exposing it through CloudFront esp. for images, audio and video files

3.2 Use AWS Lambda to build Alexa skills

  • Lambda integrates with CloudWatch to provide logs and should be the first thing to check in case of any issues or errors.
  • Alexa allows any http endpoint to act as a backend, but needs to meet following requirements
    • must be accessible over the internet.
    • must accept HTTP requests on port 443.
    • must support HTTP over SSL/TLS, using an Amazon-trusted certificate.

3.3 Follow AWS and Alexa security and privacy best practices

  • Alexa requires the backend to verify that incoming requests come from Alexa using Skill ID verification
  • Child-directed skills cannot use personal and location information
  • Skills cannot be used to capture health information
  • Alexa Skills Kit uses the OAuth 2.0 authentication framework for Account linking, which defines a means by which the service can allow Alexa, with the user’s permission, to access information from the account that the user has set up with you.
  • Alexa smart home skills must have OAuth authorization code grant implementation while custom skills can have authorization code grant or impact grant implementation.

Domain 4: Skill Development

4.1 Implement in-skill purchasing and Amazon Pay for Alexa Skills

  • In-skill purchasing enables selling premium content such as game features and interactive stories in skills with a custom interaction model.
  • In-skill purchasing is handled by Alexa when the skill sends a Upsell directive. As the skill session ends when a Upsell directive is sent, be sure to save any relevant user data in a persistent data store so that the skill can continue where the user left off after the purchase flow is completed and the endpoint is back in control of the user experience.
  • Skill can handle the Connections.Response request that indicates the result of a purchase flow and resume the skill

4.2 Use Speech Synthesis Markup Language (SSML) for expression and MP3 audio

  • SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech.
  • Alexa supports a subset of SSML tags including
    • say-as to interpret text as telephone, date, time etc.
    • phonemeprovides a phonemic/phonetic pronunciation
    • prosody modifies the volume, pitch, and rate of the tagged speech.
    • audioallows playing MP3 player while rendering a response
      • must be in valid MP3 file (MPEG version 2) format
      • must be hosted at an Internet-accessible HTTPS endpoint.
      • For speech response, the audio file cannot be longer than 240 seconds.
        • combined total time for all audio files in the outputSpeech property of the response cannot be more than 240 seconds.
        • combined total time for all audio files in the reprompt property of the response cannot be more than 90 seconds.
      • bit rate must be 48 kbps.
      • sample rate must be 22050Hz, 24000Hz, or 16000Hz.

4.3 Implement state management

  • Alexa Skill state persistence can be handled using session attributes during the session and externally using services like DynamoDB, RDS across sessions.

4.4 Implement Alexa service interfaces (audio player, video player, and screens)

4.5 Parse Alexa JSON requests and provide responses

  • All requests include the session (optional), context, and request objects at the top level.
    •  session object provides additional context associated with the request.
      • session attributes can be used to store data
      • user containing userId to uniquely define an user and accessToken to access other services.
      • system object provides apiAccessToken and device object provides deviceId to access ASK APIs
      • application provide applicationId
      • device object provides supportedInterfaces to list each interface that the device supports
      • user containing userId to uniquely define an user and accessToken to access other services.
    • request object that provides the details of the user’s request.
  • Response includes
    • outputSpeech contains the speech to render to the user.
    • reprompt contains the outputSpeech to use if a re-prompt is necessary.
    • shouldEndSession provides a boolean value that indicates what should happen after Alexa speaks the response.

Domain 5: Test, Validate, and Troubleshoot

5.1 Debug and troubleshoot using Amazon CloudWatch or other tools

  • Lambda integrates with CloudWatch for metric and logs and can be check for any errors and metrics.

5.2 Use the Alexa developer testing tools

  • Utterance profiles – test utterances to know what intent they resolve to 
  • Alexa Skill simulator
    • provides an ability to Interact with Alexa with either your voice or text, without an actual device.
    • maintains the skill session, so the interaction model and dialog flow can be tested.
    • supports multiple languages testing by selecting locale
    • has limitations in testing audio, video, Alexa settings and Device API
  • Manual Json
    • enter a JSON request directly and see the skill returned JSON response
    • does not maintain the skill session and is similar to testing a JSON request in the Lambda console.
  • Voice & Tone – enter plain text or SSML and hear how Alexa speaks the text in a selected language
  • Alexa device – test with an Alexa-enabled device.
  • Alexa app – test the skill with the Alexa app for Android/iOS
  • Lambda Test console – to test Lambda functions

5.3 Perform beta testing

  • Skill beta testing tool can be used to test the Alexa skill in beta before releasing it to production
  • Beat testing allows testing changes to an existing skill, while still keeping the currently live version of the skill available for the general public.
  • Members can be invited using their Alexa email address. Alexa device used by the beta tester must be associated with the email address in the tester’s invitation.

5.4 Troubleshoot errors in the interaction model

Domain 6: Publishing, Operations, and Lifecycle Management

6.1 Describe the skill publishing process

  • Alexa skill needs to go through certification process before the Skill is live and made available to the users
  • Alexa creates an in development version of the skill, once the skill becomes live
  • Alexa Skill live version cannot be edited, and it is recommended to edit the in development skill, test and then re-certify for publishing.
  • Backend changes like changes in Lambda functions or response output from the function, however, can be made on live version and do not require re-certification. However, it is recommended to use Lambda versioning or alias to do such changes.
  • Alexa for Business allows skill to be made private and available to select users within the company

6.2 Add and remove users in the developer console

  • Alexa Skill Developer console access can be shared across multiple users for collaboration
  • Administrator and Analyst roles will also have access to the Earnings and Payments sections.
  • Administrator and Marketer roles will also have access to edit the content associated with apps (i.e. Descriptions, Images & Multimedia) and IAPs
  • Administrator and Developer roles will have access to create, modify and delete Alexa skills using ASK CLI and SMAPI.
  • Administrator, Analyst and Marketer roles have access to sales report

6.3 Perform analysis of skill analytics in the developer console

  • Intent History – View aggregated, anonymized frequent utterances and the resolved intents. You cannot track the user intent history as they are anonymized.
  • Actions – Unique customers per action, total actions, and total utterances per action.
  • Customers – Total number of unique customers who accessed the skill.
  • Intents – Unique customers per intent, total utterances per intent, total intents, and failed intents.
  • Interaction Path – Paths users take when interacting with the skill.
  • Plays Total number of times that a user played the skill content.
  • Retention (live skills only) Usage of the skill over time by groups of customers or cohorts. View the number or percentage of customers who returned to your skill over a 12-week period.
  • Sessions Total sessions, successful session types (sessions that didn’t end due to an error), average sessions per customer. Includes a breakdown of successful, failed, and no-response sessions as a percentage of total sessions. Custom
  • Utterances Metrics for utterances depend on the skill category.

6.4 Differentiate among the statuses/versions of skills (for example, In Development, In Certification, and Live)

  • In Development – skill available for development, testing
  • In Review – A certification review is in progress and the skill cannot be edited
  • Certified – Skill passed certification review, and is not yet available to users
  • Live – skill has been published and is available to users. You cannot edit the configuration for live skills
  • Hidden – skill was previously published, but has since been hidden. Existing users can access the skill. New users cannot discover the skill.
  • Removed – skill was previously published, but has since been removed. Users cannot enable or use the skill.

AWS Certified Alexa Skill Builder – Specialty (AXS-C01) Exam Resources

AWS Certified Solutions Architect – Associate SAA-C02 Exam Learning Path

SAA-C02 Certification

AWS Certified Solutions Architect – Associate SAA-C02 Exam Learning Path

AWS Solutions Architect – Associate SAA-C02 exam is the latest AWS exam that has replaced the previous SAA-C01 certification exam. It basically validates the ability to effectively demonstrate knowledge of how to architect and deploy secure and robust applications on AWS technologies

  • Define a solution using architectural design principles based on customer requirements.
  • Provide implementation guidance based on best practices to the organization throughout the life cycle of the project.

Refer AWS_Solution_Architect_-_Associate_SAA-C02_Exam_Blue_Print

AWS Solutions Architect – Associate SAA-C02 Exam Summary

  • SAA-C02 exam consists of 65 questions in 130 minutes, and the time is more than sufficient if you are well prepared.
  • SAA-C02 Exam covers the architecture aspects in deep, so you must be able to visualize the architecture, even draw them out in the exam just to understand how it would work and how different services relate.
  • AWS has updated the exam concepts from the focus being on individual services to more building of scalable, highly available, cost-effective, performant, resilient.
  • If you had been preparing for the SAA-C01 –
    • SAA-C02 is pretty much similar to SAA-C01 except the operational effective architecture domain has been dropped
    • Although, most of the services and concepts covered by the SAA-C01 are the same. There are few new additions like Aurora Serverless, AWS Global Accelerator, FSx for Windows, FSx for Lustre
  • AWS exams are available online, and I took the online one. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join atleast 30 minutes before the actual time.

AWS Solutions Architect – Associate SAA-C02 Exam Resources

AWS Solutions Architect – Associate SAA-C02 Exam Topics

Make sure you go through all the topics and focus on hints in italics

Networking

  • Be sure to create VPC from scratch. This is mandatory.
    • Create VPC and understand whats an CIDR and addressing patterns
    • Create public and private subnets, configure proper routes, security groups, NACLs. (hint: Subnets are public or private depending on whether they can route traffic directly through Internet gateway)
    • Create Bastion for communication with instances
    • Create NAT Gateway or Instances for instances in private subnets to interact with internet
    • Create two tier architecture with application in public and database in private subnets
    • Create three tier architecture with web servers in public, application and database servers in private. (hint: focus on security group configuration with least privilege)
    • Make sure to understand how the communication happens between Internet, Public subnets, Private subnets, NAT, Bastion etc.
  • Understand difference between Security Groups and NACLs (hint: Security Groups are Stateful vs NACLs are stateless. Also only NACLs provide an ability to deny or block IPs)
  • Understand VPC endpoints and what services it can help interact (hint: VPC Endpoints routes traffic internally without Internet)
    • VPC Gateway Endpoints supports S3 and DynamoDB.
    • VPC Interface Endpoints OR Private Links supports others
  • Understand difference between NAT Gateway and NAT Instance (hint: NAT Gateway is AWS managed and is scalable and highly available)
  • Understand how NAT high availability can be achieved (hint: provision NAT in each AZ and route traffic from subnets within that AZ through that NAT Gateway)
  • Understand VPN and Direct Connect for on-premises to AWS connectivity
    • VPN provides quick connectivity, cost-effective, secure channel, however routes through internet and does not provide consistent throughput
    • Direct Connect provides consistent dedicated throughput without Internet, however requires time to setup and is not cost-effective
  • Understand Data Migration techniques
    • Choose Snowball vs Snowmobile vs Direct Connect vs VPN depending on the bandwidth available, data transfer needed, time available, encryption requirement, one-time or continuous requirement
    • Snowball, SnowMobile are for one-time data, cost-effective, quick and ideal for huge data transfer
    • Direct Connect, VPN are ideal for continuous or frequent data transfers
  • Understand CloudFront as CDN and the static and dynamic caching it provides, what can be its origin (hint: CloudFront can point to on-premises sources and its usecases with S3 to reduce load and cost)
  • Understand Route 53 for routing
    • Understand Route 53 health checks and failover routing
    • Understand  Route 53 Routing Policies it provides and their use cases mainly for high availability (hint: focus on weighted, latency, geolocation, failover routing)
  • Be sure to cover ELB concepts in deep.
    • SAA-C02 focuses on ALB and NLB and does not cover CLB
    • Understand differences between  CLB vs ALB vs NLB
      • ALB is layer 7 while NLB is layer 4
      • ALB provides content based, host based, path based routing
      • ALB provides dynamic port mapping which allows same tasks to be hosted on ECS node
      • NLB provides low latency and ability to scale
      • NLB provides static IP address

Security

  • Understand IAM as a whole
    • Focus on IAM role (hint: can be used for EC2 application access and Cross-account access)
    • Understand IAM identity providers and federation and use cases
    • Understand MFA and how would implement two factor authentication for an application
    • Understand IAM Policies (hint: expect couple of questions with policies defined and you need to select correct statements)
  • Understand encryption services
  • AWS WAF integrates with CloudFront to provide protection against Cross-site scripting (XSS) attacks. It also provide IP blocking and geo-protection.
  • AWS Shield integrates with CloudFront to provide protection against DDoS.
  • Refer Disaster Recovery whitepaper, be sure you know the different recovery types with impact on RTO/RPO.

Storage

  • Understand various storage options S3, EBS, Instance store, EFS, Glacier, FSx and what are the use cases and anti patterns for each
  • Instance Store
    • Understand Instance Store (hint: it is physically attached  to the EC2 instance and provides the lowest latency and highest IOPS)
  • Elastic Block Storage – EBS
    • Understand various EBS volume types and their use cases in terms of IOPS and throughput. SSD for IOPS and HDD for throughput
    • Understand Burst performance and I/O credits to handle occasional peaks
    • Understand EBS Snapshots (hint: backups are automated, snapshots are manual
  • Simple Storage Service – S3
    • Cover S3 in depth
    • Understand S3 storage classes with lifecycle policies
      • Understand the difference between SA Standard vs SA IA vs SA IA One Zone in terms of cost and durability
    • Understand S3 Data Protection (hint: S3 Client side encryption encrypts data before storing it in S3)
    • Understand S3 features including
      • S3 provides a cost effective static website hosting
      • S3 versioning provides protection against accidental overwrites and deletions
      • S3 Pre-Signed URLs for both upload and download provides access without needing AWS credentials
      • S3 CORS allows cross domain calls
      • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket.
    • Understand Glacier as an archival storage with various retrieval patterns
    • Glacier Expedited retrieval now allows object retrieval within mins
  • Understand Storage gateway and its different types.
    • Cached Volume Gateway provides access to frequently accessed data, while using AWS as the actual storage
    • Stored Volume gateway uses AWS as a backup, while the data is being stored on-premises as well
    • File Gateway supports SMB protocol
  • Understand FSx easy and cost effective to launch and run popular file systems.
  • Understand the difference between EBS vs S3 vs EFS
    • EFS provides shared volume across multiple EC2 instances, while EBS can be attached to a single volume within the same AZ.
  • Understand the difference between EBS vs Instance Store
  • Would recommend referring Storage Options whitepaper, although a bit dated 90% still holds right

Compute

  • Understand Elastic Cloud Compute – EC2
  • Understand Auto Scaling and ELB, how they work together to provide High Available and Scalable solution. (hint: Span both ELB and Auto Scaling across Multi-AZs to provide High Availability)
  • Understand EC2 Instance Purchase Types – Reserved, Scheduled Reserved, On-demand and Spot and their use cases
    • Choose Reserved Instances for continuous persistent load
    • Choose Scheduled Reserved Instances for load with fixed scheduled and time interval
    • Choose Spot instances for fault tolerant and Spiky loads
    • Reserved instances provides cost benefits for long terms requirements over On-demand instances
    • Spot instances provides cost benefits for temporary fault tolerant spiky load
  • Understand EC2 Placement Groups (hint: Cluster placement groups provide low latency and high throughput communication, while Spread placement group provides high availability)
  • Understand Lambda and serverless architecture, its features and use cases. (hint: Lambda integrated with API Gateway to provide a serverless, highly scalable, cost-effective architecture)
  • Understand ECS with its ability to deploy containers and micro services architecture.
    • ECS role for tasks can be provided through taskRoleArn
    • ALB provides dynamic port mapping to allow multiple same tasks on the same node
  • Know Elastic Beanstalk at a high level, what it provides and its ability to get an application running quickly.

Databases

  • Understand relational and NoSQLs data storage options which include RDS, DynamoDB, Aurora and their use cases
  • RDS
    • Understand RDS features – Read Replicas vs Multi-AZ
      • Read Replicas for scalability, Multi-AZ for High Availability
      • Multi-AZ are regional only
      • Read Replicas can span across regions and can be used for disaster recovery
    • Understand Automated Backups, underlying volume types
  • Aurora
    • Understand Aurora
      • provides multiple read replicas and replicates 6 copies of data across AZs
    • Understand Aurora Serverless provides a highly scalable cost-effective database solution
  • DynamoDB
    • Understand DynamoDB with its low latency performance, key-value store (hint: DynamoDB is not a relational database)
    • DynamoDB DAX provides caching for DynamoDB
    • Understand DynamoDB provisioned throughput for Read/Writes (It is more cover in Developer exam though.)
  • Know ElastiCache use cases, mainly for caching performance

Integration Tools

  • Understand SQS as message queuing service and SNS as pub/sub notification service
  • Understand SQS features like visibility, long poll vs short poll
  • Focus on SQS as a decoupling service
  • Understand SQS Standard vs SQS FIFO difference (hint: FIFO provides exactly once delivery both low throughput)

Analytics

  • Know Redshift as a business intelligence tool
  • Know Kinesis for real time data capture and analytics
  • Atleast know what AWS Glue does, so you can eliminate the answer

Management Tools

  • Understand CloudWatch monitoring to provide operational transparency
  • Know which EC2 metrics it can track. Remember, it cannot track memory and disk space/swap utilization
  • Understand CloudWatch is extendable with custom metrics
  • Understand CloudTrail for Audit
  • Have a basic understanding of CloudFormation, OpsWorks

AWS Whitepapers & Cheat sheets

AWS Solutions Architect – Associate Exam Domains

Domain 1: Design Resilient Architectures

  1. Design a multi-tier architecture solution
  2. Design highly available and/or fault-tolerant architectures
  3. Design decoupling mechanisms using AWS services
  4. Choose appropriate resilient storage

Domain 2: Define High-Performing Architectures

  1. Identify elastic and scalable compute solutions for a workload
  2. Select high-performing and scalable storage solutions for a workload
  3. Select high-performing networking solutions for a workload
  4. Choose high-performing database solutions for a workload

Domain 3: Specify Secure Applications and Architectures

  1. Design secure access to AWS resources
  2. Design secure application tiers
  3. Select appropriate data security options

Domain 4: Design Cost-Optimized Architectures

  1. Determine how to design cost-optimized storage.
  2. Determine how to design cost-optimized compute.

AWS Certified Machine Learning -Specialty (MLS-C01) Exam Learning Path

AWS Certified Machine Learning Specialty Certification

AWS Certified Machine Learning -Specialty (MLS-C01) Exam Learning Path

Finally, cleared the AWS Certified Machine Learning – Specialty (MLS-C01). It took me around four months to prepare for the exam. This was my fourth Specialty certification and in terms of the difficulty level of all of them, this is the toughest, partly because I am not a machine learning expert and learned everything from basics for this certification. Machine Learning is a vast specialization in itself and with AWS services, there is lots to cover and know for the exam. This is the only exam, where the majority of the focus is on the concepts outside of AWS i.e. pure machine learning. It also includes AWS Machine Learning and Big Data services.

AWS Certified Machine Learning – Specialty (MLS-C01) exam basically validates

  •  Select and justify the appropriate ML approach for a given business problem.
  • Identify appropriate AWS services to implement ML solutions.
  • Design and implement scalable, cost-optimized, reliable, and secure ML solutions.

Refer AWS Certified Machine Learning – Specialty Exam Guide for details

                              AWS Certified Machine Learning – Specialty Domains

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Summary

  • AWS Certified Machine Learning – Specialty exam, as its name suggests, covers a lot of Machine Learning concepts right. It really digs deep into Machine learning concepts, most of which are not related to AWS.
  • AWS Certified Machine Learning – Speciality exam covers the E2E Machine Learning lifecycle, right from data collection, transformation, making it usable and efficient for Machine Learning, pre-processing data for Machine Learning, training and validation and implementation.
  • As always, one of the key tactic I followed when solving any AWS Certification exam is to read the question and use paper and pencil to draw a rough architecture and focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or atleast have a 50% chance of getting it right.

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Resources

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Topics

  • Machine Learning
    • Make sure you know and cover all the services in depth, as 60% of the exam is focused on generic Machine learning concepts not related to AWS services.
    • Know about complete generic Machine Learning lifecycle
    • Exploratory Data Analysis
      • Feature selection and Engineering
        • remove features which are not related to training
        • remove features which has same values, very low correlation, very little variance or lot of missing values
        • Apply techniques like Principal Component Analysis (PCA) for dimensionality reduction i.e reduce the number of features.
        • Apply techniques such as One-hot encoding and label encoding to help convert strings to numeric values, which are easier to process.
        • Apply Normalization i.e. values between 0 and 1 to handle data with large variance.
        • Apply feature engineering for feature reduction for e.g. using single height/weight feature instead of both the features
      • Handle Missing data
        • remove the feature or rows with missing data
        • impute using Mean/Median values – valid only for Numeric values and not categorical features also does not factor correlation between features
        • impute using k-NN, Multivariate Imputation by Chained Equation (MICE), Deep Learning – more accurate, factores correlation between features
      • Handle unbalanced data
        • Source more data
        • Oversample minority or Undersample majority
        • Data augmentation using techniques like SMOTE
    • Modeling
      • Know about Algorithms – Supervised, Unsupervised and Reinforcement and which algorithm is best suitable based on the available data either labelled or unlabelled.
        • Supervised learning trains on labelled data for e.g. Linear regression. Logistic regression, Decision trees, Random Forests
        • Unsupervised learning trains on unlabelled data for e.g. PCA, SVD, K-means
        • Reinforcement learning trained based on actions and rewards for e.g. Q-Learning
      • Hyperparameters
        • are parameters exposed by machine learning algorithms that control how the underlying algorithm operates and their values affect the quality of the trained models
        • some of the common hyperparameters are learning rate, batch, epoch (hint:  If the learning rate is too large, the minimum slope might be missed and the graph would oscillate If the learning rate is too small, it requires too many steps which would take the process longer and is less efficient
    • Evaluation
      • Know difference in evaluating model accuracy
        • Use Area Under the (Receiver Operating Characteristic) Curve (AUC) for Binary classification
        • Use root mean square error (RMSE) metric for regression
      • Understand Confusion matrix
        • A true positive is an outcome where the model correctly predicts the positive class. Similarly, a true negative is an outcome where the model correctly predicts the negative class.
        • false positive is an outcome where the model incorrectly predicts the positive class. And a false negative is an outcome where the model incorrectly predicts the negative class.
        • Recall or Sensitivity or TPR (True Positive Rate): Number of items correctly identified as positive out of total true positives- TP/(TP+FN)  (hint: use this for cases like fraud detection,  cost of marking non fraud as frauds is lower than marking fraud as non-frauds)
        • Specificity or TNR (True Negative Rate): Number of items correctly identified as negative out of total negatives- TN/(TN+FP)  (hint: use this for cases like videos for kids, the cost of  dropping few valid videos is lower than showing few bad ones)
      • Handle Overfitting problems
        • Simplify the model, by reducing number of layers
        • Early Stopping – form of regularization while training a model with an iterative method, such as gradient descent
        • Data Augmentation
        • Regularization – technique to reduce the complexity of the model
        • Dropout is a regularization technique that prevents overfitting
        • Never train on test data
  • AWS Machine Learning
    • SageMaker
      • Know SageMaker in depth
      • supports both File mode and Pipe mode
        • File mode loads all of the data from S3 to the training instance volumes VS Pipe mode streams data directly from S3
        • File mode needs disk space to store both the final model artifacts and the full training dataset. VS Pipe mode which helps reduce the required size for EBS volumes
      • Using RecordIO format allows algorithms to take advantage of Pipe mode when training the algorithms that support it. 
      • supports Model tracking capability to manage up to thousands of machine learning model experiments
      • supports Canary deployment using ProductionVariant and deploying multiple variants of a model to the same SageMaker HTTPS endpoint.
      • supports automatic scaling for production variants. Automatic scaling dynamically adjusts the number of instances provisioned for a production variant in response to changes in your workload
      • provides pre-built Docker images for its built-in algorithms and the supported deep learning frameworks used for training & inference
      • SageMaker Automatic Model Tuning
        • is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model.
        • Best practices
          • limit the search to a smaller number as difficulty of a hyperparameter tuning job depends primarily on the number of hyperparameters that Amazon SageMaker has to search
          • DO NOT specify a very large range to cover every possible value for a hyperparameter as it affects the success of hyperparameter optimization.
          • log-scaled hyperparameter can be converted to improve hyperparameter optimization.
          • running one training job at a time achieves the best results with the least amount of compute time.
          • Design distributed training jobs so that you get they report the objective metric that you want.
        • SageMaker Neo enables machine learning models to train once and run anywhere in the cloud and at the edge.
      • know how to take advantage of multiple GPUs (hint: increase learning rate and batch size w.r.t to the increase in GPUs)
      • Algorithms –
        • Blazing text provides Word2vec and text classification algorithms
        • DeepAR provides supervised learning algorithm for forecasting scalar (one-dimensional) time series (hint: train for new products based on existing products sales data)
        • Factorization machines provides supervised classification and regression tasks, helps capture interactions between features within high dimensional sparse datasets economically
        • Image classification algorithm is a supervised learning algorithm that supports multi-label classification
        • IP Insights is an unsupervised learning algorithm that learns the usage patterns for IPv4 addresses
        • K-means is an unsupervised learning algorithm for clustering as it attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups.
        • k-nearest neighbors (k-NN) algorithm is an index-based algorithm. It uses a non-parametric method for classification or regression
        • Latent Dirichlet Allocation (LDA) algorithm is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories. Used to identify number of topics shared by documents within a text corpus
        • Linear models are supervised learning algorithms used for solving either classification or regression problems. 
          • For regression (predictor_type=’regressor’), the score is the prediction produced by the model.
          • For classification (predictor_type=’binary_classifier’ or predictor_type=’multiclass_classifier’)
        • Neural Topic Model (NTM) Algorithm is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution
        • Object Detection algorithm detects and classifies objects in images using a single deep neural network
        • Principal Component Analysis (PCA) is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) (hint: dimensionality reduction)
        • Random Cut Forest (RCF) is an unsupervised algorithm for detecting anomalous data point (hint: anomaly detection)
        • Sequence to Sequence is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens. (hint: text summarization is the key use case)
    • SageMaker Ground Truth 
      • provides automated data labeling using machine learning
      • helps build highly accurate training datasets for machine learning quickly using Amazon Mechanical Turk
      • provides annotation consolidation to help improve the accuracy of the data object’s labels. It combines the results of multiple worker’s annotation tasks into one high-fidelity label.
      • automated data labeling uses machine learning to label portions of the data automatically without having to send them to human workers
    • Comprehend
      • natural language processing (NLP) service to find insights and relationships in text.
      • identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic.
    • Lex
      • provides conversational interfaces using voice and text helpful in building voice and text chatbots
    • Polly
      • text into speech
      • supports Speech Synthesis Markup Language (SSML) tags like prosody so users can adjust the speech rate, pitch or volume.
      • supports pronunciation lexicons to customize the pronunciation of words
    • Rekognition
      • analyze image and video
      • helps identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content.
    • Translate – provides natural and fluent language translation
    • Transcribe – provides speech-to-text capability
    • Elastic Interface helps attach low-cost GPU-powered acceleration to EC2 and SageMaker instances or ECS tasks to reduce the cost of running deep learning inference by up to 75%.
  • Analytics
    • Make sure you know and understand data engineering concepts mainly in terms of data capture, data migration, data transformation and data storage
    • Kinesis
      • Understand Kinesis Data Streams and Kinesis Data Firehose in depth
      • Kinesis Data Analytics can process and analyze streaming data using standard SQL and integrates with Data Streams and Firehose
      • Know Kinesis Data Streams vs Kinesis Firehose
        • Know Kinesis Data Streams is open ended on both producer and consumer. It supports KCL and works with Spark.
        • Know Kinesis Firehose is open ended for producer only. Data is stored in S3, Redshift and ElasticSearch.
        • Kinesis Firehose works in batches with minimum 60secs interval.
        • Kinesis Data Firehose supports data transformation and record format conversion using Lambda function (hint: can be used for transforming csv or JSON into parquet)
    • Know ElasticSearch is a search service which supports indexing, full text search, faceting etc.
    • Know Data Pipeline for data transfer
    • Know Glue as fully managed ETL service
      • helps setup, orchestrate, and monitor complex data flows.
      • AWS Glue Data Catalog
        • is a central repository to store structural and operational metadata for all the data assets.
      • AWS Glue crawler
        • connects to a data store, progresses through a prioritized list of classifiers to extract the schema of the data and other statistics, and then populates the Glue Data Catalog with this metadata
  • Security, Identity & Compliance
    • Security is covered very lightly. (hint : SageMaker can read data from KMS encrypted S3. Make sure, the KMS key policies include the role attached with SageMaker)
  • Management & Governance Tools
    • Understand AWS CloudWatch for Logs and Metrics. (hint: SageMaker is integrated with Cloudwatch and logs and metrics are all stored in it)
  • Storage
    • Understand Data Storage Options – Know patterns for S3 vs RDS vs DynamoDB vs Redshift. (hint: S3 is, by default, the data storage option or Big Data storage and look for it in the answer.)

Whitepapers and articles

  •