AWS Certified Solutions Architect – Associate SAA-C03 Exam Learning Path

AWS Solutions Architect - Associate Certificate

AWS Certified Solutions Architect – Associate SAA-C03 Exam Learning Path

  • AWS Solutions Architect – Associate SAA-C03 exam is the latest AWS exam released on 30th August 2022 and has replaced the previous AWS Solutions Architect – SAA-C02 certification exam.
  • It basically validates the ability to effectively demonstrate knowledge of how to design, architect and deploy secure, cost-effective, and robust applications on AWS technologies
  • The exam also validates a candidate’s ability to complete the following tasks:
    • Design solutions that incorporate AWS services to meet current business requirements and future projected needs
    • Design architectures that are secure, resilient, high-performing, and cost-optimized
    • Review existing solutions and determine improvements

Refer AWS Solutions Architect – Associate SAA-C03 Exam Guide 

AWS Solutions Architect – Associate SAA-C03 Exam Summary

  • SAA-C03 exam consists of 65 questions in 170 minutes, and the time is more than sufficient if you are well prepared.
  • You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams, but is helpful for Professional and Specialty ones.
  • SAA-C03 Exam covers the design and architecture aspects in deep, so you must be able to visualize the architecture, even draw them out or prepare a mental picture just to understand how it would work and how different services relate.
  • AWS SAA-C03 exam concepts cover solutions that fall within AWS Well-Architected framework to cover scalable, highly available, cost-effective, performant, and resilient pillars.
  • If you had been preparing for the SAA-C02, SAA-C03 is pretty much similar to SAA-C02 except for the addition of some new services Aurora Serverless, AWS Global Accelerator, FSx for Windows, and FSx for Lustre.
  • AWS exams are available online, and I took the online one. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Solutions Architect – Associate SAA-C03 Exam Resources

AWS Solutions Architect – Associate SAA-C03 Exam Topics

Networking

  • Virtual Private Network – VPC
    • Create a VPC from scratch with public, private, and dedicated subnets with proper route tables, security groups, and NACLs.
    • Understand what a CIDR is and address patterns.
    • Subnets are public or private depending on whether they can route traffic directly through an Internet gateway
    • Understand how communication happens between the Internet, Public subnets, Private subnets, NAT, Bastion, etc.
    • Bastion (also referred to as a Jump server) can be used to securely access instances in the private subnets.
    • Create two-tier architecture with application in public and database in private subnets
    • Create three-tier architecture with web servers in public, application, and database servers in private. (hint: focus on security group configuration with least privilege)
  • Security Groups and NACLs
    • Security Groups are Stateful vs NACLs are stateless.
    • Also, only NACLs provide the ability to deny or block IPs
  • NAT Gateway or Instances
    • help enables instances in a private subnet to connect to the Internet.
    • Understand the difference between NAT Gateway & NAT Instance. 
    • NAT Gateway is AWS-managed and is scalable and highly available.
  • VPC endpoints
    • enable the creation of a private connection between VPC to supported AWS services and VPC endpoint services powered by PrivateLink using its private IP address without needing an Internet or NAT Gateway.
    • VPC Gateway Endpoints supports S3 and DynamoDB.
    • VPC Interface Endpoints OR Private Links supports others
  • VPN and Direct Connect for on-premises to AWS connectivity
    • VPN provides a quick, cost-effective, secure channel, however, routes through the internet and does not provide consistent throughput
    • Direct Connect provides consistent, dedicated throughput without Internet, however, requires time to set up and is not cost-effective.
  • Understand Data Migration techniques at a high level
    • VPN and Direct Connect for continuous, frequent data transfers.
    • Snow Family is ideal for one-time, cost-effective huge data transfer.
    • Choose a technique depending on the available bandwidth, data transfer needed, time available, encryption, one-time or continuous.
  • CloudFront
    • fully managed, fast CDN service that speeds up the distribution of static, dynamic web, or streaming content to end-users
    • S3 frontend by CloudFront provides low latency, performant experience for global users.
    • provides static and dynamic caching for both AWS and on-premises origin.
  • Global Accelerator
    • optimizes the path to applications to keep packet loss, jitter, and latency consistently low.
    • helps improve the performance by lowering first-byte latency
    • provides 2 static IP address
  • Know CloudFront vs Global Accelerator
  • Route 53
    • highly available and scalable DNS web service.
    • Health checks and failover routing helps provide resilient and active-passive solutions
    • Route 53 Routing Policies and their use cases (hint: focus on weighted, latency, geolocation, failover routing)
  • Elastic Load Balancer
    • Focus on ALB and NLB
    • Differences between ALB vs NLB
      • ALB is layer 7 vs NLB is layer 4
      • ALB provides content-based, host-based, path-based routing
      • ALB provides dynamic port mapping which allows the same tasks to be hosted on the ECS node
      • NLB provides low latency, the ability to scale rapidly, and a static IP address
      • ALB works with WAF while NLB does not.
    • Gateway Load Balancer – GWLB
      • helps deploy, scale, and manage virtual appliances, such as firewalls, IDS/IPS, and deep packet inspection systems.

Security

  • Identity Access Management – IAM
    • IAM role
      • provides permissions that are not associated with a particular user, group, or service and are intended to be assumable by anyone who needs it.
      • can be used for EC2 application access and Cross-account access
    • IAM identity providers and federation and use cases – Although did not see much in SAA-C03
  • Key Management Services – KMS encryption service
  • AWS WAF
    • integrates with CloudFront, and ALB to provide protection against Cross-site scripting (XSS), and SQL injection attacks.
    • provides IP blocking and geo-protection, rate limiting, etc.
  • AWS Shield
    • managed DDoS protection service
    • integrates with CloudFront, ALB, and Route 53
    • Advanced provides additional detection and mitigation against large and sophisticated DDoS attacks, near real-time visibility into attacks
  • AWS GuardDuty
    • managed threat detection service and provides Malware protection
  • AWS Inspector
    • is a vulnerability management service that continuously scans the AWS workloads for vulnerabilities
  • AWS Secrets Manager
    • helps protect secrets needed to access applications, services, and IT resources.
    • supports rotations of secrets, which Systems Manager Parameter Stores does not support.
  • Disaster Recovery whitepaper
    • Be sure you know the different recovery types with impact on RTO/RPO.

Storage

  • Understand various storage options S3, EBS, Instance store, EFS, Glacier, FSx, and what are the use cases and anti-patterns for each
  • Instance Store
    •  is physically attached  to the EC2 instance and provides the lowest latency and highest IOPS
  • Elastic Block Storage – EBS
    • EBS volume types and their use cases in terms of IOPS and throughput. SSD for IOPS and HDD for throughput
    • EBS Snapshots
      • Backups are automated, snapshots are manual
      • Can be used to encrypt an unencrypted EBS volume
    • Multi-Attach EBS feature allows attaching an EBS volume to multiple instances within the same AZ only.
    • EBS fast snapshot restore feature helps ensure that the EBS volumes created from a snapshot are fully-initialized at creation and instantly deliver all of their provisioned performance.
  • Simple Storage Service – S3
    • S3 storage classes with lifecycle policies
      • Understand the difference between SA Standard vs SA IA vs SA IA One Zone in terms of cost and durability
    • S3 Data Protection
      • S3 Client-side encryption encrypts data before storing it in S3
    • S3 features including
      • S3 provides cost-effective static website hosting. Can be integrated with CloudFront.
      • S3 versioning provides protection against accidental overwrites and deletions. Used with MFA Delete feature.
      • S3 Pre-Signed URLs for both upload and download provide access without needing AWS credentials
      • S3 CORS allows cross-domain calls
      • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket.
      • S3 Event Notifications to trigger events on various S3 events like objects added or deleted. Supports SQS, SNS, and Lambda functions.
      • Integrates with Amazon Macie to detect PII data
      • Replication that supports the same and cross-region replication required versioning to be enabled.
      • Integrates with Athena to analyze data in S3 using standard SQL.
  • Glacier
    • as archival storage with various retrieval patterns
    • Glacier Instant Retrieval allows retrieval in milliseconds. 
    • Glacier Expedited retrieval allows object retrieval within mins.
  • Storage gateway and its different types.
    • Cached Volume Gateway provides access to frequently accessed data while using AWS as the actual storage
    • Stored Volume gateway uses AWS as a backup, while the data is being stored on-premises as well
    • File Gateway supports SMB protocol
  • FSx is easy and cost-effective to launch and run popular file systems.
    • FSx provides two file systems to choose from:
    • Amazon FSx for Windows File Server
      • works with both Linux and Windows
      • provides Windows File System features including integration with Active Directory.
    • Amazon FSx for Lustre
      • for high-performance workloads
      • works with only Linux
  • Elastic File System – EFS
    • simple, fully managed, scalable, serverless, and cost-optimized file storage for use with AWS Cloud and on-premises resources.
    • provides shared volume across multiple EC2 instances, while EBS can be attached to a single instance within the same AZ or EBS Multi-Attach can be attached to multiple instances within the same AZ
    • supports the NFS protocol, and is compatible with Linux-based AMIs
    • supports cross-region replication, storage classes for cost.
  • AWS Transfer Family
    • secure transfer service that helps transfer files into and out of AWS storage services using FTP, SFTP and FTPS protocol.
  • Difference between EBS vs S3 vs EFS
  • Difference between EBS vs Instance Store
  • Would recommend referring Storage Options whitepaper, although a bit dated 90% still holds right

Compute

  • Elastic Cloud Compute – EC2
  • Auto Scaling and ELB
    • Auto Scaling provides the ability to ensure a correct number of EC2 instances are always running to handle the load of the application
    • Elastic Load Balancer allows the incoming traffic to be distributed automatically across multiple healthy EC2 instances
  • Autoscaling & ELB
    • work together to provide High Availability and Scalability.
    • Span both ELB and Auto Scaling across Multi-AZs to provide High Availability
    • Do not span across regions. Use Route 53 or Global Accelerator to route traffic across regions.
  • EC2 Instance Purchase Types – Reserved, Scheduled Reserved, On-demand, and Spot and their use cases
    • Reserved instances provide cost benefits for long terms requirements over On-demand instances for continuous persistent load
    • Scheduled Reserved Instances for load with fixed scheduled and time interval
    • Spot instances provide cost benefits for temporary, fault-tolerant, spiky load
  • EC2 Placement Groups
    • Cluster placement groups provide low latency and high throughput communication
    • Spread placement group provides high availability
  • Lambda and serverless architecture, its features, and use cases.
    • Lambda integrated with API Gateway to provide a serverless, highly scalable, cost-effective architecture
  • Elastic Container Service – ECS with its ability to deploy containers and microservices architecture.
    • ECS role for tasks can be provided through taskRoleArn
    • ALB provides dynamic port mapping to allow multiple same tasks on the same node.
  • Elastic Kubernetes Service – EKS
    • managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers
    • ideal for migration of an existing workload on Kubernetes
  • Elastic Beanstalk at a high level, what it provides, and its ability to get an application running quickly.

Databases

  • Understand relational and NoSQL data storage options which include RDS, DynamoDB, and Aurora with their use cases
  • Relational Database Service – RDS
    • Read Replicas vs Multi-AZ
      • Read Replicas for scalability, Multi-AZ for High Availability
      • Multi-AZ are regional only
      • Read Replicas can span across regions and can be used for disaster recovery
    • Understand Automated Backups, underlying volume types (which are the same as EBS volume types)
  • Aurora
    • provides multiple read replicas and replicates 6 copies of data across AZs
    • Aurora Serverless
      • provides a highly scalable cost-effective database solution
      • automatically starts up, shuts down, and scales capacity up or down based on the application’s needs.
      • supports only MySQL and PostgreSQL
  • DynamoDB
    • provides low latency performance, a key-value store
    • is not a relational database
    • DynamoDB DAX provides caching for DynamoDB
    • DynamoDB TTL helps expire data in DynamoDB without any cost or consuming any write throughput.
  • ElastiCache use cases, mainly for caching performance

Integration Tools

  • Simple Queue Service
    • as message queuing service and SNS as pub/sub notification service
    • as a decoupling service and provide resiliency
    • SQS features like visibility, and long poll vs short poll
    • provide scaling for the Auto Scaling group based on the SQS size.
    • SQS Standard vs SQS FIFO difference
      • FIFO provides exactly-once delivery both low throughput
  • Simple Notification Service – SNS
    • is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients
    • Fanout pattern can be used to push messages to multiple subscribers

Analytics

  • Redshift as a business intelligence tool
  • Kinesis
    • for real-time data capture and analytics.
    • Integrates with Lambda functions to perform transformations
  • AWS Glue
    • fully-managed, ETL service that automates the time-consuming steps of data preparation for analytics

Management Tools

  • CloudWatch
    • monitoring to provide operational transparency
    • is extendable with custom metrics
    • CloudWatch -> (Subscription filter) -> Kinesis Data Firehose -> S3
  • CloudTrail
    • helps enable governance, compliance, and operational and risk auditing of the AWS account.
    • helps to get a history of AWS API calls and related events for the AWS account.
  • CloudFormation
    • easy way to create and manage a collection of related AWS resources, and provision and update them in an orderly and predictable fashion.
  • AWS Config
    • fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security, compliance, and governance.

AWS Whitepapers & Cheat sheets

Finally, All the Best 🙂

AWS Certified Advanced Networking – Specialty ANS-C01 Exam Learning Path

AWS Certified Advanced Networking - Specialty Certificate

AWS Certified Advanced Networking – Specialty ANS-C01 Exam Learning Path

I recently certified/recertified for the AWS Certified Advanced Networking – Specialty (ANS-C01). Frankly, Networking is something that I am still diving deep into and I just about managed to get through. So a word of caution, this exam is inline or tougher than the professional exams, especially for the reason that some of the Networking concepts covered are not something you can get your hands dirty with easily.

AWS Certified Advanced Networking – Specialty (ANS-C01) exam focuses on the AWS Networking concepts. It basically validates

  • Design and develop hybrid and cloud-based networking solutions by using AWS
  • Implement core AWS networking services according to AWS best practices
  • Operate and maintain hybrid and cloud-based network architecture for all AWS services
  • Use tools to deploy and automate hybrid and cloud-based AWS networking tasks
  • Implement secure AWS networks using AWS native networking constructs and services

Refer to AWS Certified Advanced Networking – Specialty Exam Guide AWS Certified Advanced Networking - Specialty ANS-C01 Exam Domains

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Resources

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Summary

  • AWS Certified Networking – Specialty (ANS-C01) exam has 65 questions to be solved in 170 minutes and I made sure I utilized the complete time.
  • AWS Certified Networking – Specialty (ANS-C01) exam focuses a lot on Networking concepts involving Hybrid Connectivity with Direct Connect, VPN, Transit Gateway, Direct Connect Gateway, and a bit of VPC, Route 53, ALB, NLB & CloudFront.
  • Each question mainly touches multiple AWS services.
  • Questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • As always, mark the questions for review and move on and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.

AWS Certified Advanced Networking – Specialty (ANS-C01) Exam Topics

Networking & Content Delivery

  • Virtual Private Cloud – VPC
    • Understand VPC, Subnets
    • AWS allows extending the VPC by adding a secondary VPC
    • Understand Security Groups, NACLs
    • VPC Flow Logs
      • help capture information about the IP traffic going to and from network interfaces in the VPC and can help in monitoring the traffic or troubleshooting any connectivity issues
      • NACLs are stateless and how it is reflected in VPC Flow Logs
        • If ACCEPT followed by REJECT, inbound was accepted by Security Groups and ACLs. However, rejected by NACLs outbound
        • If REJECT, inbound was either rejected by Security Groups OR NACLs.
      • Use pkt-dstaddr instead of dstaddr to track the destination address as dstaddr refers to the primary ENI address always and not the secondary addresses.
      • Pattern: VPC Flow Logs -> CloudWatch Logs -> (Subscription) -> Kinesis Data Firehose -> S3/Open Search.
    • DHCP Option Sets esp. how to resolve DNS from both on-premises data center and AWS.
    • VPC Peering
      • helps point-to-point connectivity between 2 VPCs which can be in the same or different regions and accounts.
      • know VPC Peering Limitations esp. it does not allow overlapping CIDRs and transitive routing.
    • Placement Groups determine how the instances are placed on the underlying hardware
    • VRF – Virtual Routing & Forwarding can be used to route traffic to the same customer gateway from multiple VPCs, that can be overlapping.
  • VPC Endpoints
    • VPC Gateway Endpoints for connectivity with S3 & DynamoDB i.e. VPC -> VPC Gateway Endpoints -> S3/DynamoDB.
    • VPC Interface Endpoints or Private Links for other AWS services and custom hosted services i.e. VPC -> VPC Interface Endpoint OR Private Link -> S3/Kinesis/SQS/CloudWatch/Any custom endpoint.
    • S3 gateway endpoints cannot be accessed through VPC Peering, VPN, or Direct Connect. Need HTTP proxy to route traffic.
    • S3 Private Link can be accessed through VPC Peering, VPN, or Direct Connect. Need to use an endpoint-specific DNS name.
    • VPC endpoint policy can be configured to control which S3 buckets can be accessed and the S3 Bucket policy can be used to control which VPC (includes all VPC Endpoints) or VPC Endpoint can access it.
    • Private Link Patterns
  • VPC Network Access Analyzer
    • helps identify unintended network access to the resources on AWS.
  • Transit Gateway
    • helps consolidate the AWS VPC routing configuration for a region with a hub-and-spoke architecture.
    • Appliance Mode ensures that network flows are symmetrically routed to the same AZ and network appliance
    • Transit Gateway Connect attachment can be used to connect SD-WAN to AWS Cloud. This supports GRE.
    • Transit Gateways are regional and Peering can connect Transit Gateways across regions.
    • Transit Gateway Network Manager includes events and metrics to monitor the quality of the global network, both in AWS and on-premises.
  • VPC Routing Priority
  • NAT Gateways
    • for HA, Scalable, Outgoing traffic. Does not support Security Groups or ICMP pings.
    • times out the connection if it is idle for 350 seconds or more. To prevent the connection from being dropped, initiate more traffic over the connection or enable TCP keepalive on the instance with a value of less than 350 seconds.
    • supports Private NAT Gateways for internal communication.
  • Virtual Private Network
    • to establish connectivity between the on-premises data center and AWS VPC
  • Direct Connect
    • to establish connectivity between the on-premises data center and AWS VPC and Public Services
    • Direct Connect connections – Dedicated and Hosted connections
    • Understand how to create a Direct Connect connection
      • LOA-CFA provides the details for partners to connect to the AWS Direct Connect location
    • Virtual interfaces options – Private Virtual Interface for VPC resources and Public Virtual Interface for Public Resources
      • Private VIF is for resources within a VPC
      • Public VIF is for AWS public resources
      • Private VIF has a limit of 100 routes and Public VIF of 1000 routes. Summarize the routes if you need to configure more.
    • Understand setup Private and Public VIF
    • Understand High Availability options based on cost and time i.e. Second Direct Connect connection OR VPN connection
    • Direct Connect Gateway
      • it provides a way to connect to multiple VPCs from an on-premises data center using the same Direct Connect connection.
      • can connect to VGW or TGW.
    • Understand Active/Passive Direct Connect 
    • supports MACsec which delivers native, near line-rate, point-to-point encryption ensuring that data communications between AWS and the data center, office, or colocation facility remain protected.
    • Understand Route Propagation, propagation priority, BGP connectivity
      • BGP prefers the shortest AS PATH to get to the destination. Traffic from the VPC to on-premises uses the primary router. This is because the secondary router advertises a longer AS-PATH.
      • AS PATH prepending doesn’t work when the Direct Connect connections are in different AWS Regions than the VPC.
      • AS PATH works from AWS to on-premises and Local Pref from on-premises to AWS
      • Use Local Preference BGP community tags to configure Active/Passive when the connections are from different regions. The higher tag has a higher preference for 7224:7300 > 7224:7100
      • NO_EXPORT works only for Public VIFs
      • 7224:9100, 7224:9200, and 7224:9300 apply only to public prefixes. Usually used to restrict traffic to regions. Can help control if routes should propagate to the local Region only, all Regions within a continent, or all public Regions.
        • 7224:9100 — Local AWS Region
        • 7224:9200 — All AWS Regions for a continent, North America–wide, Asia Pacific, Europe, the Middle East and Africa
        • 7224:9300 — Global (all public AWS Regions)
      • 7224:8100 — Routes that originate from the same AWS Region in which the AWS Direct Connect point of presence is associated.
      • 7224:8200 — Routes that originate from the same continent with which the AWS Direct Connect point of presence is associated.
      • No-tag — Global (all public AWS Regions).
  • Route 53
    • provides a highly available and scalable DNS web service.
    • Routing Policies and their use cases Focus on Weighted,  Latency, and Failover routing policies.
    • supports Alias resource record sets, which enables routing of queries to a CloudFront distribution, Elastic Beanstalk, ELB, an S3 bucket configured as a static website, or another Route 53 resource record set.
    • CNAME does not support zone apex or root records. 
    • Route 53 DNSSEC
      • secures DNS traffic, and helps protect a domain from DNS spoofing man-in-the-middle attacks. 
      • Requirements
        • Asymmetric Customer Managed Keys
        • us-east-1 with ECC_NIST_P256 spec
    • Route 53 Resolver DNS Firewall
      • protection for outbound DNS requests from the VPCs and can monitor and control the domains that the applications can query.
      • allows you to define allow and deny list.
      • can be used for DNS exfiltration.
      • supports FirewallFailOpen configuration which determines how Route 53 Resolver handles queries during failures.
        • disabled, favors security over availability and blocks queries that it is unable to evaluate properly.
        • enabled, favors availability over security and allows queries to proceed if it is unable to properly evaluate them.
    • Route 53 Resolver (Hybrid DNS)
      • Inbound Endpoint for On-premises -> AWS
      • Outbound Endpoint for AWS -> On-premises
    • Route 53 DNS Query Logging
      • Can be logged to CloudWatch logs, S3, and Kinesis Data Firehose
    • Route 53 Resolver rules take precedence over privately hosted zones.
    • Route 53 Split View DNS helps to have the same DNS to access a site externally and internally
    • Know the Domain Migration process
  • CloudFront
    • provides a fully managed, fast CDN service that speeds up the distribution of static, dynamic web, or streaming content to end-users.
    • supports geo-restriction, WAF & AWS Shield for protection.
    • provides Cloud Functions (Edge location) & Lambda@Edge (Regional location) to execute scripts closer to the user.
    • supports encryption at rest and end-to-end encryption
    • CloudFront Origin Shield
      • helps improve the cache hit ratio and reduce the load on the origin.
      • requests from other regional caches would hit the Origin shield rather than the Origin.
      • should be placed at the regional cache and not in the edge cache
      • should be deployed to the region closer to the origin server
  • Global Accelerator
    • provides 2 static IPs
    • does not support client IP address preservation for NLB and Elastic IP address endpoints.
    • does not support IPv6 address
    • know CloudFront vs Global Accelerator
  • Understand ELB, ALB and NLB
    • Differences between ALB and NLB
    • ALB provides Content, Host, and Path-based Routing while NLB provides the ability to have a static IP address
    • Maintain original Client IP to the backend instances using X-Forwarded-for and Proxy Protocol
    • ALB/NLB do not support TLS renegotiation or mutual TLS authentication (mTLS). For implementing mTLS, use NLB with TCP listener on port 443 and terminate on the instances.
    • NLB
      • also provides local zonal endpoints to keep the traffic within AZ
      • can front Private Link endpoints and provide static IPs.
    • ALB supports Forward Secrecy, through Security Policies, that provide additional safeguards against the eavesdropping of encrypted data, through the use of a unique random session key.
    • Supports sticky session feature (session affinity) to enable the LB to bind a user’s session to a specific target. This ensures that all requests from the user during the session are sent to the same target. Sticky Sessions is configured on the target groups.
  • Gateway Load Balancer – GWLB
    • helps deploy, scale, and manage virtual appliances, such as firewalls, IDS/IPS systems, and deep packet inspection systems.
  • Athena integrates with S3 only and not with CloudWatch logs.
  • Transit VPC
    • helps connect multiple, geographically disperse VPCs and remote networks in order to create a global network transit center.
    • Use Transit Gateway instead now.
  • Know CloudHub and its use case

Security

  • AWS GuardDuty
    • managed threat detection service
    • provides Malware protection
  • AWS Shield
    • managed DDoS protection service
    • AWS Shield Advanced provides 24×7 access to the AWS Shield Response Team (SRT), protection against DDoS-related spike, and DDoS cost protection to safeguard against scaling charges.
  • WAF as Web Traffic Firewall
    • helps protect web applications from attacks by allowing rules configuration that allow, block, or monitor (count) web requests based on defined conditions.
    • integrates with CloudFront, ALB, API Gateway to dynamically detect and prevent attacks
  • Network Firewall
  • AWS Inspector
    • is a vulnerability management service that continuously scans the AWS workloads for vulnerabilities

Monitoring & Management Tools

  • Understand AWS CloudFormation esp. in terms of Network creation.
    • Custom resources can be used to handle activities not supported by AWS
    • While configuring VPN connections use depends_on on route tables to define a dependency on other resources as the VPN gateway route propagation depends on a VPC-gateway attachment when you have a VPN gateway.
  • AWS Config
    • fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security, compliance, and governance.
    • can be used to monitor resource changes e.g. Security Groups and invoke Systems Manager Automation scripts for remediation.
  • CloudTrail for audit and governance

Integration Tools

Networking Architecture Patterns

Finally, All the Best 🙂

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Learning Path

AWS Certified Solutions Architect - Professional certificate

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Learning Path

  • AWS Certified Solutions Architect – Professional (SAP-C01) exam is the upgraded pattern of the previous Solution Architect – Professional exam which was released in the year (2018) and would be upgraded this year (Nov. 2022).
  • I recently recertified the existing pattern and the difference is quite a lot between the previous pattern and the latest pattern. The amount of overlap between the associates and professional exams and even the Solutions Architect and DevOps has drastically reduced.

AWS Certified Solutions Architect – Professional (SAP-C01) exam basically validates

  • Design and deploy dynamically scalable, highly available, fault-tolerant, and reliable applications on AWS
  • Select appropriate AWS services to design and deploy an application based on given requirements
  • Migrate complex, multi-tier applications on AWS
  • Design and deploy enterprise-wide scalable operations on AWS
  • Implement cost-control strategies

Refer to AWS Certified Solutions Architect – Professional Exam Guide

AWS Certified Solutions Architect - Professional Exam Domains

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Resources

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Summary

  • AWS Certified Solutions Architect – Professional (SAP-C01) exam was for a total of 170 minutes and it had 75 questions.
  • AWS Certified Solutions Architect – Professional (SAP-C01) focuses a lot on concepts and services related to Architecture & Design, Scalability, High Availability, Disaster Recovery, Migration, Security and Cost Control.
  • Each question mainly touches multiple AWS services.
  • Questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • As always, mark the questions for review and move on and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.

AWS Certified Solutions Architect – Professional (SAP-C01) Exam Topics

Storage

  • S3
    • S3 Permissions & S3 Data Protection
      • S3 bucket policies to control access to VPC Endpoints
    • S3 Storage Classes & Lifecycle policies
      • covers S3 Standard, Infrequent access, intelligent tier and Glacier for archival and object transitions & deletions for cost management.
    • S3 Transfer Acceleration can be used for fast, easy, and secure transfers of files over long distances between the client and an S3 bucket.
    • supports the same and cross-region replication for disaster recovery.
    • integrates with CloudFront for caching to improve performance
    • S3 supports Object Lock and Glacier supports Vault lock to prevent the deletion of objects, especially required for compliance requirements.
    • supports S3 Select feature to query selective data from a single object.
  • Elastic Block Store
    • EBS Backup using snapshots for HA and Disaster recovery
    • Data Lifecycle Manager can be used to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes.
  • Storage Gateway
  • Elastic File System
    • provides a fully managed, scalable, serverless, shared and cost-optimized file storage for use with AWS and on-premises resources.
    • supports cross-region replication for disaster recovery
    • supports storage classes like S3
  • AWS Transfer Family
    • provides a secure transfer service (FTP, SFTP, FTPs) that helps transfer files into and out of AWS storage services.
    • supports transferring data from or to S3 and EFS.
  • FSx for Lustre
    • managed, cost-effective service to launch and run the HPC high-performance Lustre file system.

Database

  • DynamoDB
    • DynamoDB Auto Scaling
    • DynamoDB Streams for tracking changes
    • TTL to expire objects automatically and cost-effectively.
    • Global tables for multi-master, active-active inter-region storage needs.
    • Global tables do not support strong global consistency
    • DynamoDB Accelerator – DAX for seamlessly caching to reduce the load on DynamoDB for read-heavy requirements.
  • RDS
    • supports cross-region read replicas ideal for disaster recovery with low RTO and RPO.
    • provides RDS proxy for effective database connection polling
    • RDS Multi-AZ vs Read Replicas
  • Aurora
    • fully managed, MySQL- and PostgreSQL-compatible, relational database engine
    • supports Aurora Serverless to on-demand, autoscaling configuration
    • Aurora Global Database consists of one primary AWS Region where the data is mastered, and up to five read-only, secondary AWS Regions. It is a multi-master setup but can be used for disaster recovery.
  • DocumentDB as a replacement for MongoDB

Data Migration & Transfer

  • Cloud Migration Services
    • Cloud Migration (hint: make sure you understand the difference between rehost, replatform, and rearchitect
    • Server Migration Service helps to migrate servers and applications.
    • Database Migration Service
      • enables quick and secure data migration with minimal to zero downtime
      • supports Full and Change Data Capture – CDC migration to support continuous replication for zero downtime migration.
      • homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations (using SCT) between different database platforms, such as Oracle or Microsoft SQL Server to Aurora.
      • Hint: Elasticsearch is not supported as a target by DMS
    • Snow Family
      • Ideal for one-time big data transfers usually for use cases with limited bandwidth from on-premises to AWS.
  • Application Discovery Service
    • Agent ones can be used for hyper-v and physical services
    • Agentless can be used for VMware but does not track processes.
  • Disaster Recovery
    • Disaster Recovery whitepaper, although outdated, make sure you understand the difference between each type esp. pilot light, warm standby w.r.t RTO and RPO.
    • Compute
      • Make components available in an alternate region,
      • either as AMIs that can be restored
      • CloudFormation to create infra as needed
      • partial which can be scaled once the failover happens
      • or fully running compute in active-active confirmation with health checks.
    • Storage
      • S3 and EFS support cross-region replication
      • DynamoDB supports Global tables for multi-master, active-active inter-region storage needs.
      • Aurora Global Database provides a multi-master setup but can be used for disaster recovery.
      • RDS supports cross-region read replicas which can be promoted to master in case of a disaster. This can be done using Route 53, CloudWatch and lambda functions.
    • Network
      • Route 53 failover routing with health checks to failover across regions.

Networking & Content Delivery

  • VPC – Virtual Private Cloud
    • Understand Security Groups, NACLs (Hint: know NACLs are stateless and need to open ephemeral ports for response traffic )
    • Understand VPC Gateway Endpoints to provide access to S3 and DynamoDB (hint: know how to restrict access on S3 to specific VPC Endpoint)
    • Understand VPC Interface Endpoints or PrivateLink to provide access to a variety of services like SQS, Kinesis or Private APIs exposed through NLB.
    • Understand VPC Flow Logs
    • Understand VPC Peering to enable communication between VPCs within the same or different regions. (hint: VPC peering does not support transitive routing)
  • Route 53
    • Routing Policies
      • focus on Weighted, Latency and failover routing policies
      • failover routing provides active-passive configuration for disaster recovery while the others are active-active configuration.
    • Route 53 Resolver
      • Outbound endpoint for AWS -> On-premises DNS query resolution
      • Inbound endpoint for On-premises DNS query resolution
  • CloudFront
    • fully managed, fast CDN service that speeds up the distribution of static, dynamic web or streaming content to end-users.
    • supports multiple origins including S3, ALB etc.
    • does not support Auto Scaling as an origin
    • supports Geo-restriction
    • supports Lambda@Edge and Cloud Functions to execute code closer to the user.
    • Lambda@Edge can be used for quick auth checks, and redirect users based on request data.
    • Security can be enhanced by whitlisting CloudFront IPs or adding custom header in CloudFront and verifiing it in ALB.
  • API Gateway
    • supports throttling, caching and helps define usage plans with API keys to identify clients
    • provides regional and edge-optimized endpoint types
    • supports authentication mechanisms, such as AWS IAM policies, Lambda authorizer functions, and Amazon Cognito user pools.
  • Load Balancer – ELB, ALB and NLB 
  • Global Accelerator
    • optimizes the path to applications to keep packet loss, jitter, and latency consistently low.
    • helps improve the performance of the applications by lowering first-byte latency
    • provides 2 static IP address
    • does not preserve the client’s IP address with NLB
  • Transit Gateway or Transit VPC
    • is a network transit hub that can be used to interconnect VPCs and on-premises networks via Direct Connect or VPN.
    • Transit Gateway is regional and Transit Gateway Peering needs to be configured to peer regional Transit gateways.
  • Placement Groups
    • Cluster placement group with Enhanced Networking for HPC
    • Spread placement group for fault tolerance and high availability.
  • Direct Connect & VPN
    • provide on-premises to AWS connectivity
    • know Direct Connect vs VPN
    • VPN can provide a cost-effective, quick failover for Direct Connect.
    • VPN over Direct Connect provides a secure dedicated connection and requires a public virtual interface.
    • Direct Connect Gateway is a global network device that helps establish connectivity that spans VPCs spread across multiple AWS Regions with a single Direct Connect connection.

Security, Identity & Compliance

  • AWS Identity and Access Management
  • AWS Shield & Shield Advanced
    • for DDoS protection and integrates with Route 53, CloudFront, ALB and Global Accelerator.
  • AWS WAF
    • protects from common attack techniques like SQL injection and Cross-Site Scripting (XSS), Conditions based include IP addresses, HTTP headers, HTTP body, and URI strings.
    • integrates with CloudFront, ALB, and API Gateway.
    • supports Web ACLs and can block traffic based on IPs, Rate limits, and specific countries as well.
  • ACM – AWS Certificate Manager
    • helps easily provision, manage, and deploy public and private SSL/TLS certificates
    • is regional and you need to request certificates in all regions and associate individually in all regions.
    • does not provide certificates for EC2 instances.
  • AWS KMS – Key Management Service
    • managed encryption service that allows the creation and control of encryption keys to enable data encryption.
    • KMS Multi-region keys
      • are AWS KMS keys in different AWS Regions that can be used interchangeably – as though having the same key in multiple Regions.
      • are not global and each multi-region key needs to be replicated and managed independently.
  • Secrets Manager
    • helps protect secrets needed to access applications, services, and IT resources.
    • Secrets Manager vs SSM Parameter Store.
      • Supports automatic rotation of secrets, which is not provided by SSM Parameter Store.
      • Costs more than SSM Parameter Store.

Compute

  • EC2
  • Auto Scaling
  • Elastic Beanstalk supports Blue/Green deployment using swap URLs.
  • Lambda
    • Lambda running in VPC requires NAT Gateway to communicate with external public services
    • Lambda CPU can be increased by increasing memory only.
    • helps define reserved concurrency limit to reduce the impact
    • Lambda Alias now supports canary deployments
  • ECS – Elastic Container Service
    • container management service that supports Docker containers
    • supports two launch types – EC2 and Fargate which provides the serverless capability
    • For least privilege, the role should be assigned to the Task.
    • awsvpc network mode gives ECS tasks the same networking properties as EC2 instances.

Management & Governance tools

  • AWS Organizations
  • Systems Manager
    • AWS Systems Manager and its various services like parameter store, patch manager
    • Parameter Store provides secure, scalable, centralized, hierarchical storage for configuration data and secret management. Does not support secrets rotation. Use Secrets Manager.
    • Session Manager helps manage EC2 instances through an interactive one-click browser-based shell or through the AWS CLI without opening ports or creating bastion hosts.
    • Patch Manager helps automate the process of patching managed instances with both security-related and other types of updates.
  • CloudWatch
  • CloudTrail
    • for audit and governance
    • With Organizations, the trail can be configured to log CloudTrail from all accounts to a central account.
  • CloudFormation
    • Handle disaster Recovery by automating the infra to replicate the environment across regions.
    • Deletion Policy to prevent, retain or backup RDS, EBS Volumes
    • Stack policy can prevent stack resources from being unintentionally updated or deleted during a stack update. Stack Policy only applies for Stack updates and not stack deletion.
    • StackSets helps to create, update, or delete stacks across multiple accounts and Regions with a single operation.
  • Control Tower
    • to setup, govern, and secure a multi-account environment
    • strongly recommended guardrails cover EBS encryption
  • Service Catalog
    • allows organizations to create and manage catalogues of IT services that are approved for use on AWS with minimal permissions.
  • Trusted Advisor
    • helps with cost optimization and service limits in addition to security, performance and fault tolerance.
  • Compute Optimizer recommends optimal AWS resources for the workloads to reduce costs and improve performance by using machine learning to analyze historical utilization metrics.
  • AWS Budgets to see usage-to-date and current estimated charges from AWS, set limits and provide alerts or notifications.
  • Cost Allocation Tags can be used to organize AWS resources, and cost allocation tags to track the AWS costs on a detailed level.
  • Cost Explorer helps visualize, understand, manage and forecast the AWS costs and usage over time.

Analytics

Integration Tools

  • SQS in terms of loose coupling and scaling.
    • Difference between SQS Standard and FIFO esp. with throughput and order
    • SQS supports dead letter queues
  • CloudWatch integration with SNS and Lambda for notifications.

Architecture & Design Flows

Google Cloud – Professional Cloud DevOps Engineer Certification learning path

Google Cloud Professional Cloud DevOps Engineer Certification

Google Cloud – Professional Cloud DevOps Engineer Certification learning path

Continuing on the Google Cloud Journey, glad to have passed the 8th certification with the Professional Cloud DevOps Engineer certification. Google Cloud – Professional Cloud DevOps Engineer certification exam focuses on almost all of the Google Cloud DevOps services with Cloud Developer tools, Operations Suite, and SRE concepts.

Google Cloud -Professional Cloud DevOps Engineer Certification Summary

  • Had 50 questions to be answered in 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on DevOps toolset including Cloud Developer tools, Operations Suite with a focus on monitoring and logging, and SRE concepts.
  • The exam has been updated to use
    • Cloud Operations, Cloud Monitoring & Logging and does not refer to Stackdriver in any of the questions.
    • Artifact Registry instead of Container Registry.
  • There are no case studies for the exam.
  • As mentioned for all the exams, Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless about some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud DevOps Engineer Certification Resources

Google Cloud – Professional Cloud DevOps Engineer Certification Topics

Developer Tools

  • Google Cloud Build
    • Cloud Build integrates with Cloud Source Repository, Github, and Gitlab and can be used for Continous Integration and Deployments.
    • Cloud Build can import source code, execute build to the specifications, and produce artifacts such as Docker containers or Java archives
    • Cloud Build can trigger builds on source commits in Cloud Source Repositories or other git repositories.
    • Cloud Build build config file specifies the instructions to perform, with steps defined to each task like the test, build and deploy.
    • Cloud Build step specifies an action to be performed and is run in a Docker container.
    • Cloud Build supports custom images as well for the steps
    • Cloud Build integrates with Pub/Sub to publish messages on build’s state changes.
    • Cloud Build can trigger the Spinnaker pipeline through Cloud Pub/Sub notifications.
    • Cloud Build should use a Service Account with a Container Developer role to perform deployments on GKE
    • Cloud Build uses a directory named /workspace as a working directory and the assets produced by one step can be passed to the next one via the persistence of the /workspace directory.
  • Binary Authorization and Vulnerability Scanning
    • Binary Authorization provides software supply-chain security for container-based applications. It enables you to configure a policy that the service enforces when an attempt is made to deploy a container image on one of the supported container-based platforms.
    • Binary Authorization uses attestations to verify that an image was built by a specific build system or continuous integration (CI) pipeline.
    • Vulnerability scanning helps scan images for vulnerabilities by Container Analysis.
    • Hint: For Security and compliance reasons if the image deployed needs to be trusted, use Binary Authorization
  • Google Source Repositories
    • Cloud Source Repositories are fully-featured, private Git repositories hosted on Google Cloud.
    • Cloud Source Repositories can be used for collaborative, version-controlled development of any app or service
    • Hint: If the code needs to be versioned controlled and needs collaboration with multiple members, choose Git related options
  • Google Container Registry/Artifact Registry
    • Google Artifact Registry supports all types of artifacts as compared to Container Registry which was limited to container images
    • Container Registry is not referred to in the exam
    • Artifact Registry supports both regional and multi-regional repositories
  • Google Cloud Code
    • Cloud Code helps write, debug, and deploy the cloud-based applications for IntelliJ, VS Code, or in the browser.
  • Google Cloud Client Libraries
    • Google Cloud Client Libraries provide client libraries and SDKs in various languages for calling Google Cloud APIs.
    • If the language is not supported, Cloud Rest APIs can be used.
  • Deployment Techniques
    • Recreate deployment – fully scale down the existing application version before you scale up the new application version.
    • Rolling update – update a subset of running application instances instead of simultaneously updating every application instance
    • Blue/Green deployment – (also known as a red/black deployment), you perform two identical deployments of your application
    • GKE supports Rolling and Recreate deployments.
      • Rolling deployments support maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted)
    • Managed Instance groups support Rolling deployments using the
    • maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted) configurations
  • Testing Strategies
    • Canary testing – partially roll out a change and then evaluate its performance against a baseline deployment
    • A/B testing – test a hypothesis by using variant implementations. A/B testing is used to make business decisions (not only predictions) based on the results derived from data.
  • Spinnaker
    • Spinnaker supports Blue/Green rollouts by dynamically enabling and disabling traffic to a particular Kubernetes resource.
    • Spinnaker recommends comparing canary against an equivalent baseline, deployed at the same time instead of production deployment.

Cloud Operations Suite

  • Cloud Operations Suite provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
  • Google Cloud Monitoring or Stackdriver Monitoring
    • Cloud Monitoring helps gain visibility into the performance, availability, and health of your applications and infrastructure.
    • Cloud Monitoring Agent/Ops Agent helps capture additional metrics like Memory utilization, Disk IOPS, etc.
    • Cloud Monitoring supports log exports where the logs can be sunk to Cloud Storage, Pub/Sub, BigQuery, or an external destination like Splunk.
    • Cloud Monitoring API supports push or export custom metrics
    • Uptime checks help check if the resource responds. It can check the availability of any public service on VM, App Engine, URL, GKE, or AWS Load Balancer.
    • Process health checks can be used to check if any process is healthy
  • Google Cloud Logging or Stackdriver logging
    • Cloud Logging provides real-time log management and analysis
    • Cloud Logging allows ingestion of custom log data from any source
    • Logs can be exported by configuring log sinks to BigQuery, Cloud Storage, or Pub/Sub.
    • Cloud Logging Agent can be installed for logging and capturing application logs.
    • Cloud Logging Agent uses fluentd and fluentd filter can be applied to filter, modify logs before being pushed to Cloud Logging.
    • VPC Flow Logs helps record network flows sent from and received by VM instances.
    • Cloud Logging Log-based metrics can be used to create alerts on logs.
    • Hint: If the logs from VM do not appear on Cloud Logging, check if the agent is installed and running and it has proper permissions to write the logs to Cloud Logging.
  • Cloud Error Reporting
    • counts, analyzes and aggregates the crashes in the running cloud services
  • Cloud Profiler
    • Cloud Profiler allows for monitoring of system resources like CPU and memory on both GCP and on-premises resources.
  • Cloud Trace
    • is a distributed tracing system that collects latency data from the applications and displays it in the Google Cloud Console.
  • Cloud Debugger
    • is a feature of Google Cloud that lets you inspect the state of a running application in real-time, without stopping or slowing it down
    • Debug Logpoints allow logging injection into running services without restarting or interfering with the normal function of the service
    • Debug Snapshots help capture local variables and the call stack at a specific line location in your app’s source code

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the security aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for computing and provides fine-grained control
    • Preemptible VMs and their use cases. HINT – use for short term needs
    • Committed Usage Discounts – CUD help provide cost benefits for long-term stable and predictable usage.
    • Managed Instance Group can help scale VMs as per the demand. It also helps provide auto-healing and high availability with health checks, in case an application fails.
  • Google Kubernetes Engine
    • GKE can be scaled using
      • Cluster AutoScaler to scale the cluster
      • Vertical Pod Scaler to scale the pods with increasing resource needs
      • Horizontal Pod Autoscaler helps scale Kubernetes workload by automatically increasing or decreasing the number of Pods in response to the workload’s CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.
    • Kubernetes Secrets can be used to store secrets (although they are just base64 encoded values)
    • Kubernetes supports rolling and recreate deployment strategies.

Security

  • Cloud Key Management Service – KMS
    • Cloud KMS can be used to store keys to encrypt data in Cloud Storage and other integrated storage
  • Cloud Secret Manager
    • Cloud Secret Manager can be used to store secrets as well

Site Reliability Engineering – SRE

  • SRE is a DevOps implementation and focuses on increasing reliability and observability, collaboration, and reducing toil using automation.
  • SLOs help specify a target level for the reliability of your service using SLIs which provide actual measurements.
  •  SLI Types
    • Availability
    • Freshness
    • Latency
    • Quality
  • SLOs – Choosing the measurement method
    • Synthetic clients to measure user experience
    • Client-side instrumentation
    • Application and Infrastructure metrics
    • Logs processing
  • SLOs help defines Error Budget and Error Budget Policy which need to be aligned with all the stakeholders and help plan releases to focus on features vs reliability.
  • SRE focuses on Reducing Toil – Identifying repetitive tasks and automating them.
  • Production Readiness Review – PRR
    • Applications should be performance tested for volumes before being deployed to production
    • SLOs should not be modified/adjusted to facilitate production deployments. Teams should work to make the applications SLO compliant before they are deployed to production.
  • SRE Practices include
    • Incident Management and Response
      • Priority should be to mitigate the issue, and then investigate and find the root cause. Mitigating would include
        • Rollbacking the release causes issues
        • Routing traffic to working site to restore user experience
      • Incident Live State Document helps track the events and decision making which can be useful for postmortem.
      • involves the following roles
        • Incident Commander/Manager
          • Setup a communication channel for all to collaborate
          • Assign and delegate roles. IC would assume any role, if not delegated.
          • Responsible for Incident Live State Document
        • Communications Lead
          • Provide periodic updates to all the stakeholders and customers
        • Operations Lead
          • Responds to the incident and should be the only group modifying the system during an incident.
    • Postmortem
      • should contain the root cause
      • should be Blameless
      • should be shared with all for collaboration and feedback
      • should be shared with all the shareholders
      • should have proper action items to prevent recurrence with an owner and collaborators, if required.

All the Best !!

Certified Kubernetes Security Specialist CKS Learning Path

Certified Kubernetes Security Specialist Certificate

Certified Kubernetes Security Specialist CKS Learning Path

With Certified Kubernetes Security Specialist CKS certification, I have completed the triad of Kubernetes certification. After knowing how to use and administer Kubernetes, the last piece was to understand the security intricacies and CKS preparation does provide you a deep dive into it.

  • CKS focuses on securing container-based applications and Kubernetes platforms during build, deployment, and runtime
  • CKS focuses more on hands-on experience and is an open book test, where you have access to the official Kubernetes documentation as well as some of the products documentation.
  • Unlike AWS and GCP certifications, you are required to provision, solve, debug actual problems, and provision resources on a Kubernetes cluster
  • Even though it is an open book test, you need to know where the information is and what to use.

CKS Exam Pattern

  • CKS exam curriculum includes these general domains and their weights on the exam:
    • Cluster Setup – 10%
    • Cluster Hardening – 15%
    • System Hardening – 15%
    • Minimize Microservice Vulnerabilities – 20%
    • Supply Chain Security – 20%
    • Monitoring, Logging and Runtime Security – 20%
  • CKS requires you to solve 15 questions in 2 hours.
  • CKS was already upgraded to use the k8s 1.22 version.
  • You are allowed to open another browser tab which can be from kubernetes.io or other products documentation like Falco. Do not open any other windows.
  • Exam questions can be attempted in any order and don’t have to be sequential. So be sure to move ahead and come back later.

CKS Exam Preparation and Tips

  • I used the courses from KodeKloud for practicing and it would good enough to cover what is required for the exam.
  • When you book your exam, there are 2 exam simulator sessions provided by killer.sh. These mock exams are VERY tough as compared to the actual exams, as they mention, but do provide a great learning experience. Do not get demotivated if you flunk badly on time on this one :).
  • Time was surely a constraint on the actual exam and I was able to complete the 15 questions only with 15 mins left. There was not much time to review and could only get through half of them.
  • Each exam question carries weight so be sure you attempt the exams with higher weights before focusing on the lower ones. So target the ones with higher weights and quicker solutions like debugging ones.
  • The exam is provided by killer.sh with 6-8 different preconfigured K8s clusters. Each question refers to a different Kubernetes cluster, and the context needs to be switched. Be sure to execute the kubectl use context command, which is available with every question and you just need to copy-paste it.
  • Check for the namespace mentioned in the question, to find resources and create resources. Use the -n <namespace>
  • You would be performing most of the interaction from the client node. However, pay attention to the node (master or worker) you need to execute the exams and make sure you return back to the base node.
  • With CKS is important to move to the master node for any changes to the cluster kube-apiserver .
  • SSH to nodes and gaining root access is allowed if needed.
  • Read carefully the Information provided within the questions with the mark. They would provide very useful hints in addressing the question and save time. for e.g. namespaces to look into. for a failed pod, what has already been created like configmap, secrets, network policies so that you do not create the same.
  • Make sure you know the imperative commands to create resources, as you won’t have much time to create and edit YAML files.
  • If you need to edit further use --dry-run -o yaml to get a headstart with the YAML spec file and edit the same.
  • I personally use alias kk=kubectl to avoid typing kubectl

CKS Resources


CKS Key Topics

Cluster Setup – 10%

Cluster Hardening – 15%

System Hardening – 15%

  • Practice CKS Exercises – System Harding
  • Minimize host OS footprint (reduce attack surface)
    • Control access using SSH, disable root and password-based logins
    • Remove unwanted packages and ports
  • Minimize IAM roles
    • IAM roles are usually with Cloud providers and relate to the least privilege access principle.
  • Minimize external access to the network
    • External access can be controlled using Network Policies through egress policies.
  • Appropriately use kernel hardening tools such as AppArmor, seccomp
    • Runtime classes provided by gvisor and kata containers can help provide further isolation of the containers
    • Secure Computing – Seccomp tool helps control syscalls made by containers
    • AppArmor can be configured for any application to reduce its potential host attack surface and provide a greater in-depth defense.
    • PodSecurityPolicies – PSP enables fine-grained authorization of pod creation and updates.
      • Apply host updates
      • Install minimal required OS fingerprint
      • Identify and address open ports
      • Remove unnecessary packages
      • Protect access to data with permissions
    • Exam tip: Know how to load AppArmor profiles, and enable them for the pods. AppArmor is in beta and needs to be enabled using container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>

Minimize Microservice Vulnerabilities – 20%

  • Practice CKS Exercises – Minimize Microservice Vulnerabilities
  • Setup appropriate OS-level security domains e.g. using PSP, OPA, security contexts.
    • Pod Security Contexts help define security for pods and containers at the pod or at the container level. Capabilities can be added at the container level only.
    • Pod Security Policies enable fine-grained authorization of pod creation and updates and is implemented as an optional admission controller.
    • Open Policy Agent helps enforce custom policies on Kubernetes objects without recompiling or reconfiguring the Kubernetes API server.
    • Admission controllers
      • can be used for validating configurations as well as mutating the configurations.
      • Mutating controllers are triggered before validating controllers.
      • Allows extension by adding custom controllers using MutatingAdmissionWebhook and ValidatingAdmissionWebhook.
    • Exam tip: Know how to configure Pod Security Context, Pod Security Policies
  • Manage Kubernetes secrets
    • Exam Tip: Know how to read secret values, create secrets and mount the same on the pods.
  • Use container runtime sandboxes in multi-tenant environments (e.g. gvisor, kata containers)
    • Exam tip: Know how to create a Runtime and associate it with a pod using runtimeClassName
  • Implement pod to pod encryption by use of mTLS
    • Practice manage TLS certificates in a Cluster
    • Service Mesh Istio can be used to establish MTLS for Intra pod communication.
    • Istio automatically configures workload sidecars to use mutual TLS when calling other workloads. By default, Istio configures the destination workloads using PERMISSIVE mode. When PERMISSIVE mode is enabled, a service can accept both plain text and mutual TLS traffic. In order to only allow mutual TLS traffic, the configuration needs to be changed to STRICT mode.
    • Exam tip: No questions related to mTLS appeared in the exam

Supply Chain Security – 20%

  • Practice CKS Exercises – Supply Chain Security
  • Minimize base image footprint
    • Remove unnecessary tools. Remove shells, package manager & vi tools.
    • Use slim/minimal images with required packages only. Do not include unnecessary software like build tools and utilities, troubleshooting, and debug binaries.
    • Build the smallest image possible – To reduce the size of the image, install only what is strictly needed
    • Use distroless, Alpine, or relevant base images for the app.
    • Use official images from verified sources only.
  • Secure your supply chain: whitelist allowed registries, sign and validate images
  • Use static analysis of user workloads (e.g.Kubernetes resources, Docker files)
    • Tools like Kubesec can be used to perform a static security risk analysis of the configurations files.
  • Scan images for known vulnerabilities
    • Aqua Security Trivy & Anchore can be used for scanning vulnerabilities in the container images.
    • Exam Tip: Know how to use the Trivy tool to scan images for vulnerabilities. Also, remember to use the --severity for e.g. --severity=CRITICAL flag for filtering a specific category.

Monitoring, Logging and Runtime Security – 20%

  • Practice CKS Exercises – Monitoring, Logging, and Runtime Security
  • Perform behavioral analytics of syscall process and file activities at the host and container level to detect malicious activities
  • Detect threats within a physical infrastructure, apps, networks, data, users, and workloads
  • Detect all phases of attack regardless of where it occurs and how it spreads
  • Perform deep analytical investigation and identification of bad actors within the environment
    • Tools like strace and Aqua Security Tracee can be used to check the syscalls. However, with a number of processes, it would be tough to track and monitor all and they do not provide alerting.
    • Tools like Falco & Sysdig provide deep, process-level visibility into dynamic, distributed production environments and can be used to define rules to track, monitor, and alert on activities when a certain rule is violated.
    • Exam Tip: Know how to use Falco, define new rules, enable logging. Make use of the falco_rules.local.yaml file for overrides. (I did not get questions for Falco in my exam).
  • Ensure immutability of containers at runtime
    • Immutability prevents any changes from being made to the container or to the underlying host through the container.
    • It is recommended to create new images and perform a rolling deployment instead of modifying the existing running containers.
    • Launch the container in read-only mode using the --read-only flag from the docker run or by using the readOnlyRootFilesystem option in Kubernetes.
    • PodSecurityContext and PodSecurityPolicy can be used to define and enforce container immutability
      • ReadOnlyRootFilesystem – Requires that containers must run with a read-only root filesystem (i.e. no writable layer).
      • Privileged – determines if any container in a pod can enable privileged mode. This allows the container nearly all the same access as processes running on the host.
    • Task @ Configure Pod Container Security Context
    • Exam Tip: Know how to define a PodSecurityPolicy to enforce rules. Remember, Cluster Roles and Role Binding needs to be configured to provide access to the PSP to make it work.
  • Use Audit Logs to monitor access
    • Kubernetes auditing is handled by the kube-apiserver which requires defining an audit policy file.
    • Auditing captures the stages as RequestReceived -> (Authn and Authz) -> ResponseStarted (-w) -> ResponseComplete (for success) OR Panic (for failures)
    • Exam Tip: Know how to configure audit policies and enable audit on the kube-apiserver. Make sure the kube-apiserver is up and running.
    • Task @ Kubernetes Auditing

CKS Articles

CKS General information and practices

  • The exam can be taken online from anywhere.
  • Make sure you have prepared your workspace well before the exams.
  • Make sure you have a valid government-issued ID card as it would be checked.
  • You are not allowed to have anything around you and no one should enter the room.
  • The exam proctor will be watching you always, so refrain from doing any other activities. Your screen is also always shared.
  • Copy + Paste works fine.
  • You will have an online notepad on the right corner to note down. I hardly used it, but it can be useful to type and modify text instead of using VI editor.

All the Best …

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Learning Path

AWS SysOps Administor - Associate SOA-C02 Certification

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Learning Path

I recently recertified for the AWS Certified SysOps Administrator – Associate (SOA-C02) exam. SOA-C02 is the updated version of the SOA-C01 AWS exam with hands-on labs included, which is the first with AWS.

SOA-C02 basically validates

  • Deploy, manage, and operate workloads on AWS
  • Support and maintain AWS workloads according to the AWS Well-Architected Framework
  • Perform operations by using the AWS Management Console and the AWS CLI
  • Implement security controls to meet compliance requirements
  • Monitor, log, and troubleshoot systems
  • Apply networking concepts (for example, DNS, TCP/IP, firewalls)
  • Implement architectural requirements (for example, high availability, performance, capacity)
  • Perform business continuity and disaster recovery procedures
  • Identify, classify, and remediate incidents

Refer AWS Certified SysOps – Associate (SOA-C02) Exam Guide Sep 18

SOA-C02 Exam Domains

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Summary

  • SOA-C02 is the first AWS exam that includes 2 sections
    • Objective questions
    • Hands-on labs
  • SOA-C02 Exam is for 190 minutes with 51 (somewhat odd !!) objective-type questions and 3 Hands-on labs.
  • Labs are performed in a separate instance. Copy-paste works, so make sure you copy the exact names on resource creation.
  • Labs are pretty easy if you have worked on AWS.
  • NOTE: Once you complete a section and click next you cannot go back to the section. The same is for the labs. Once a lab is completed, you cannot return back to the lab.
  • Practice the Sample Lab provided when you book the exam, which would give you a feel of how the hands-on exam would actually be.

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Resources

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Topics

Practice Labs

  • Create IAM users, IAM roles with specific limited policies.
  • Create a private S3 bucket
    • enable versioning
    • enable default encryption
    • enable lifecycle policies to transition and expire the objects
    • enable same region replication
  • Create a public S3 bucket with static website hosting
  • Set up a VPC with public and private subnets with Routes, SGs, NACLs.
  • Set up a VPC with public and private subnets and enable communication from private subnets to the Internet using NAT gateway
  • Create EC2 instance, create a Snapshot and restore it as a new instance.
  • Set up Security Groups for ALB and Target Groups, and create ALB, Launch Template, Auto Scaling Group, and target groups with sample applications. Test the flow.
  • Create Multi-AZ RDS instance and instance force failover.
  • Set up SNS topic. Use Cloud Watch Metrics to create a CloudWatch alarm on specific thresholds and send notifications to the SNS topic
  • Set up SNS topic. Use Cloud Watch Logs to create a CloudWatch alarm on log patterns and send notifications to the SNS topic.
  • Update a CloudFormation template and re-run the stack and check the impact.
  • Use AWS Data Lifecycle Manager to define snapshot lifecycle.
  • Use AWS Backup to define EFS backup with hourly and daily backup rules.

Management & Governance Tools

  • CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, and visualizes it.
    • EC2 metrics can track (disk, network, CPU, status checks) but do not capture metrics like memory, disk swap, disk storage, etc.
    • CloudWatch unified agent can be used to gather custom metrics like memory, disk swap, disk storage, etc.
    • CloudWatch Alarm actions can be configured to perform actions based on various metrics for e.g. CPU below 5%
    • CloudWatch alarm can monitor StatusCheckFailed_System status on an EC2 instance and automatically recover the instance if it becomes impaired due to an underlying hardware failure or a problem that requires AWS involvement to repair
    • Know ELB monitoring
      • Load Balancer metrics SurgeQueueLength and SpilloverCount
      • HealthyHostCount, UnHealthyHostCount determines the number of healthy and unhealthy instances registered with the load balancer.
      • Reasons for 4XX and 5XX errors
  • Understand CloudTrail for audit and governance
    • CloudTrail log file integrity validation can be used to check whether a log file was modified, deleted, or unchanged after being delivered.
  • Understand AWS CloudFormation as an Infrastructure as a Code service
    • Know templates, stacks, nested stacks.
    • DependsOn attribute can specify the resource creation order and control the creation of a specific resource follows another.
    • Deletion Policies help control deletion behavior (delete, retain, snapshot) for the resources.
    • Nested stacks can separate out reusable, common components and create dedicated templates to mix and match different templates but use nested stacks to create a single, unified stack
    • Change Sets presents a summary or preview of the proposed changes that CloudFormation will make when a stack is updated
    • Drift detection enables you to detect whether a stack’s actual configuration differs, or has drifted, from its expected configuration.
    • Termination protection helps prevent a stack from being accidentally deleted.
    • Stack policy can prevent stack resources from being unintentionally updated or deleted during a stack update.
    • StackSets extends the functionality of stacks by enabling you to create, update, or delete stacks across multiple accounts and Regions with a single operation.
    • Know how to wait for resources set up to be completed before proceeding esp. cfn-signal
  • AWS Config helps to assess, audit, and evaluate the configurations of the AWS resources
    • AWS Config can monitor and detect deviations from desired configurations, and it can also be used together with other services, such as AWS Systems Manager, to automatically remediate such deviations when they are detected
  • AWS Systems Manager is the operations hub
    • Patch Manager automates the process of patching managed instances with both security-related and other types of updates.
    • Session Manager helps manage EC2 instances, on-premises instances, and VMs through a browser-based shell or through the AWS CLI without requiring ssh keys, ports to be opened, or bastion hosts.
  • AWS Trusted Advisor provides recommendations that help follow AWS best practices. Trusted Advisor evaluates your account by using checks.
  • AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet.
  • Personal Health Dashboard provides alerts and guidance for AWS events that might affect your environment & the Service Health Dashboard shows the general status of AWS services.
  • Data Lifecycle Manager to automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs. 
  • AWS DataSync automates moving data between on-premises storage and S3 or Elastic File System (EFS).
  • AWS Control Tower provides the easiest way to set up and govern a secure, multi-account AWS environment, called a landing zone

Networking & Content Delivery

  • VPC – Virtual Private Cloud is a virtual network in AWS
    • Understand Public Subnet (has access to the Internet) vs Private Subnet (no access to the Internet)
    • Route table defines rules, termed as routes, which determine where network traffic from the subnet would be routed
    • Internet Gateway enables access to the internet
    • Bastion host – allow access to instances in the private subnet without directly exposing them to the internet.
    • NAT helps route traffic from private subnets to the internet
    • NAT instance vs NAT Gateway
    • Virtual Private Gateway – Connectivity between on-premises and VPC
    • Egress-Only Internet Gateway – relevant to IPv6 only to allow egress traffic from private subnet to internet, without allowing ingress traffic
    • VPC Flow Logs enables you to capture information about the IP traffic going to and from network interfaces in the VPC and can help in monitoring the traffic or troubleshooting any connectivity issues
    • Security Groups vs NACLs esp. Security Groups are stateful and NACLs are stateless.
    • VPC Peering provides a connection between two VPCs that enables routing of traffic between them using private IP addresses.
    • VPC Endpoints enables the creation of a private connection between VPC to supported AWS services and VPC endpoint services powered by PrivateLink using its private IP address
    • Ability to debug networking issues like EC2 not accessible, EC2 not reachable, or not able to communicate with others or Internet.
  • Route 53 provides a scalable DNS system
    • supports ALIAS record type helps map zone apex records to ELB, CloudFront, and S3 endpoints.
    • Understand Routing Policies and their use cases
      • Failover routing policy helps to configure active-passive failover.
      • Geolocation routing policy helps route traffic based on the location of the users.
      • Geoproximity routing policy helps route traffic based on the location of the resources and, optionally, shift traffic from resources in one location to resources in another.
      • Latency routing policy use with resources in multiple AWS Regions and you want to route traffic to the Region that provides the best latency with less round-trip time.
      • Weighted routing policy helps route traffic to multiple resources in specified proportions.
    • Focus on Weighted, Latency routing policies
  • Understand ELB, ALB, and NLB and what features they provide like
    • Understand keys differences ELB vs ALB vs NLB
    • ALB provides content and path routing
    • NLB provides the ability to give static IPs to the load balancer esp. if there is a requirement to whitelist IPs.
    • LB access logs provide the source IP address
    • supports Sticky sessions to enable the load balancer to bind a user’s session to a specific target.
  • Understand CloudFront and use cases
    • CloudFront can be used with S3 to expose static data and website
  • Know VPN and Direct Connect to provide AWS to on-premises connectivity. Not covered in detail.

Compute

  • Understand EC2 in depth
    • Understand EC2 instance types and use cases.
    • Understand EC2 purchase options esp. spot instances and improved reserved instances options.
    • Understand EC2 Metadata & Userdata.
    • Understand EC2 Security. 
      • Use IAM Role work with EC2 instances to access services
      • IAM Role can now be attached to stopped and runnings instances
    • AMIs provide the information required to launch an instance, which is a virtual server in the cloud.
      • AMIs are regional and can be shared publicly or with other accounts
      • Only AMIs with unencrypted volumes or encrypted with a CMK (customer-managed keys) can be shared.
      • The best practice is to use prebaked or golden images to reduce startup time for the applications. Leverage EC2 Image Builder.
    • Troubleshooting EC2 issues
      • RequestLimitExceeded
      • InstanceLimitExceeded – Concurrent running instance limit, default is 20, has been reached in a region. Request increase in limits.
      • InsufficientInstanceCapacity – AWS does not currently have enough available capacity to service the request. Change AZ or Instance Type.
    • Monitoring EC2 instances
      • System status checks failure – Stop and Start
      • Instance status checks failure – Reboot
    • EC2 supports Instance Recovery where the recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata.
    • EC2 Image Builder can be used to pre-baked images with software to speed up booting and launching time.
  • Understand Placement groups
    • Cluster Placement Group provide low latency, High-Performance Computing by the logical grouping of instances within a Single AZ
    • Spread Placement Groups is a group of instances that are each placed on distinct underlying hardware i.e. each instance on a distinct rack across AZ
    • Partition Placement Groups is a group of instances spread across partitions i.e. group of instances spread across racks across AZs
  • Understand Auto Scaling
    • Auto Scaling can be configured with multiple AZs for high availability to launch instances across multiple AZs
    • Auto Scaling attempts to distribute instances evenly between the AZs that are enabled for the Auto Scaling group
    • Auto Scaling supports
      • Dynamic scaling, which allows you to scale automatically in response to the changing demand
      • Schedule scaling, which allows you to scale the application in response to predictable load changes
      • Manual scaling can be performed by changing the desired capacity or adding and removing instances
    • Auto Scaling life cycle hooks can be used to perform activities before instance termination.
  • Understand Lambda and its use cases
    • Lambda functions can be hosted in VPC with internet access controlled by a NAT instance.
    • RDS Proxy acts as an intermediary between the application and an RDS database. RDS Proxy establishes and manages the necessary connection pools to the database so that the application creates fewer database connections.

Storage

  • S3 provides object storage service
    • Understand storage classes with lifecycle policies
    • S3 data protection provides encryption at rest and encryption in transit
      • S3 default encryption can be used to encrypt the data with S3 bucket policies to prevent or reject unencrypted object uploads.
    • Multi-part handling for fault-tolerant and performant large file uploads
    • static website hosting, CORS
    • S3 Versioning can help recover from accidental deletes and overwrites.
    • Pre-Signed URLs for both upload and download
    • S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and an S3 bucket using globally distributed edge locations in CloudFront.
  • Understand Glacier as archival storage. Glacier does not provide immediate access to the data even with expediated retrievals.
  • Understand EBS storage option
  • Storage Gateway allows storage of data in the AWS cloud for scalable and cost-effective storage while maintaining data security.
    •  Gateway-cached volumes stores data is stored in S3 and retains a copy of recently read data locally for low latency access to the frequently accessed data
    • Gateway-stored volumes maintain the entire data set locally to provide low latency access
  • EFS is a cost-optimized, serverless, scalable, and fully managed file storage for use with AWS Cloud and on-premises resources.
    • supports data at rest encryption only during the creation. After creation, the file system cannot be encrypted and must be copied over to a new encrypted disk.
    • supports General purpose and Max I/O performance mode.
    • If hitting PercentIOLimit issue move to Max I/O performance mode.
  • FSx makes it easy and cost-effective to launch, run, and scale feature-rich, high-performance file systems in the cloud
  • FSx for Windows supports SMB protocol and a Multi-AZ file system to provide high availability across multiple AZs.
  • AWS Backup can be used to automate backup for EC2 instances and EFS file systems

Databases

  • RDS provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks.
    • Understand RDS Multi-AZ vs Read Replicas and use cases
    • Multi-AZ deployment provides high availability, durability, and failover support
    • Read replicas enable increased scalability and database availability in the case of an AZ failure.
    • Automated backups and database change logs enable point-in-time recovery of the database during the backup retention period, up to the last five minutes of database usage.
  • Aurora is a fully managed, MySQL- and PostgreSQL-compatible, relational database engine
    • Backtracking “rewinds” the DB cluster to the specified time and performs in-place restore and does not create a new instance.
    • Automated Backups that help restore the DB as a new instance
  • Know ElastiCache use cases, mainly for caching performance
    • Understand ElastiCache Redis vs Memcached
    • Redis provides Multi-AZ support helps provide high availability across AZs and Online resharding to dynamically scale.
    • ElastiCache can be used as a caching layer for RDS.
  • Know DynamoDB. Not covered in detail

Security

  • IAM provides Identity and Access Management services.
  • S3 Encryption supports data at rest and in transit encryption
    • Understand S3 with SSE, SSE-C, SSE-KMS
    • S3 default encryption can help encrypt objects, however, it does not encrypt existing objects before the setting was enabled. You can use S3 Inventory to list the objects and S3 Batch to encrypt them.
  • Understand KMS for key management and envelope encryption
    • KMS with imported customer key material does not support rotation and has to be done manually.
  • AWS WAF – Web Application Firewall helps protect the applications against common web exploits like XSS or SQL Injection and bots that may affect availability, compromise security, or consume excessive resources
  • AWS GuardDuty is a threat detection service that continuously monitors the AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
  • AWS Secrets Manager can help securely expose credentials as well as rotate them.
    • Secrets Manager integrates with Lambda and supports credentials rotation
  • AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS
  • Amazon Inspector
    • is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS.
    • automatically assesses applications for exposure, vulnerabilities, and deviations from best practices.
  • AWS Certificate Manager (ACM) handles the complexity of creating, storing, and renewing public and private SSL/TLS X.509 certificates and keys that protect the AWS websites and applications.
  • Know AWS Artifact as on-demand access to compliance reports

Analytics

  • Amazon Athena can be used to query S3 data without duplicating the data and using SQL queries
  • Elasticsearch service is a distributed search and analytics engine built on Apache Lucene.
    • Elasticsearch production setup would be 3 AZs, 3 dedicated master nodes, 6 nodes with two replicas in each AZ.

Integration Tools

  • Understand SQS as message queuing service and SNS as pub/sub notification service
    • Focus on SQS as a decoupling service
    • Understand SQS FIFO, make sure you know the differences between standard and FIFO
  • Understand CloudWatch integration with SNS for notification

Financial Management

  • Know AWS Organizations
    • Service control policies (SCPs) are a type of organization policy that you can use to manage permissions in your organization centrally.
  • Consolidated billing enables consolidating payments from multiple AWS accounts and includes combined usage and volume discounts including sharing of Reserved Instances across accounts.
  • Understand how to setup Billing Alerts using CloudWatch
  • Cost allocation tags can be used to differentiate resource costs and analyzed using Cost Explorer or on a Cost Allocation report.

All the Best

Google Cloud Certified – Cloud Digital Leader Learning Path

Google Cloud Certified - Cloud Digital Leader Certificate

Google Cloud – Cloud Digital Leader Certification Learning Path

Continuing on the Google Cloud Journey, glad to have passed the seventh certification with the Professional Cloud Digital Leader certification. Google Cloud was missing the initial entry-level certification similar to AWS Cloud Practitioner certification, which was introduced as the Cloud Digital Leader certification. Cloud Digital Leader focuses on general Cloud knowledge,  Google Cloud knowledge with its products and services.

Google Cloud – Cloud Digital Leader Certification Summary

  • Had 59 questions (somewhat odd !!) to be answered in 90 minutes.
  • Covers a wide range of General Cloud and Google Cloud services and products knowledge.
  • This exam does not require much Hands-on and theoretical knowledge is good enough to clear the exam.

Google Cloud – Cloud Digital Leader Certification Resources

Google Cloud – Cloud Digital Leader Certification Topics

General cloud knowledge

  1. Define basic cloud technologies. Considerations include:
    1. Differentiate between traditional infrastructure, public cloud, and private cloud
      1. Traditional infrastructure includes on-premises data centers
      2. Public cloud include Google Cloud, AWS, and Azure
      3. Private Cloud includes services like AWS Outpost
    2. Define cloud infrastructure ownership
    3. Shared Responsibility Model
      1. Security of the Cloud is Google Cloud’s responsibility
      2. Security on the Cloud depends on the services used and is shared between Google Cloud and the Customer
    4. Essential characteristics of cloud computing
      1. On-demand computing
      2. Pay-as-you-use
      3. Scalability and Elasticity
      4. High Availability and Resiliency
      5. Security
  2. Differentiate cloud service models. Considerations include:
    1. Infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS)
      1. IaaS – everything is done by you – more flexibility more management
      2. PaaS – most of the things are done by Cloud with few things done by you – moderate flexibility and management
      3. SaaS – everything is taken care of by the Cloud, you would just it – no flexibility and management
    2. Describe the trade-offs between level of management versus flexibility when comparing cloud services
    3. Define the trade-offs between costs versus responsibility
    4. Appropriate implementation and alignment with given budget and resources
  3. Identify common cloud procurement financial concepts. Considerations include:
    1. Operating expenses (OpEx), capital expenditures (CapEx), and total cost of operations (TCO)
      1. On-premises has more of Capex and less OpEx
      2. Cloud has no to least Capex and more of OpEx
    2. Recognize the relationship between OpEx and CapEx related to networking and compute infrastructure
    3. Summarize the key cost differentiators between cloud and on-premises environments

General Google Cloud knowledge

  1. Recognize how Google Cloud meets common compliance requirements. Considerations include:
    1. Locating current Google Cloud compliance requirements
    2. Familiarity with Compliance Reports Manager
  2. Recognize the main elements of Google Cloud resource hierarchy. Considerations include:
    1. Describe the relationship between organization, folders, projects, and resources i.e. Organization -> Folder -> Folder or Projects -> Resources
  3. Describe controlling and optimizing Google Cloud costs. Considerations include:
    1. Google Cloud billing models and applicability to different service classes
    2. Define a consumption-based use model
    3. Application of discounts (e.g., flat-rate, committed-use discounts [CUD], sustained-use discounts [SUD])
      1. Sustained-use discounts [SUD] are automatic discounts for running specific resources for a significant portion of the billing month
      2. Committed use discounts [CUD] help with committed use contracts in return for deeply discounted prices for VM usage
  4. Describe Google Cloud’s geographical segmentation strategy. Considerations include:
    1. Regions are collections of zones. Zones have high-bandwidth, low-latency network connections to other zones in the same region. Regions help design fault-tolerant and highly available solutions.
    2. Zones are deployment areas within a region and provide the lowest latency usually less than 10ms
    3. Regional resources are accessible by any resources within the same region
    4. Zonal resources are hosted in a zone are called per-zone resources.
    5. Multiregional resources or Global resources are accessible by any resource in any zone within the same project.
  5. Define Google Cloud support options. Considerations include:
    1. Distinguish between billing support, technical support, role-based support, and enterprise support
      1. Role-Based Support provides more predictable rates and a flexible configuration. Although they are legacy, the exam does cover these.
      2. Enterprise Support provides the fastest case response times and a dedicated Technical Account Management (TAM) contact who helps you execute a Google Cloud strategy.
    2. Recognize a variety of Service Level Agreement (SLA) applications

Google Cloud products and services

  1. Describe the benefits of Google Cloud virtual machine (VM)-based compute options. Considerations include:
    1. Compute Engine provides virtual machines (VM) hosted on Google’s infrastructure.
    2. Google Cloud VMware Engine helps easy lift and shift VMware-based applications to Google Cloud without changes to the apps, tools, or processes
    3. Bare Metal lets businesses run specialized workloads such as Oracle databases close to Google Cloud while lowering overall costs and reducing risks associated with migration
    4. Custom versus standard sizing
    5. Free, premium, and custom service options
    6. Attached storage/disk options
    7. Preemptible VMs is an instance that can be created and run at a much lower price than normal instances.
  2. Identify and evaluate container-based compute options. Considerations include:
    1. Define the function of a container registry
      1. Container Registry is a single place to manage Docker images, perform vulnerability analysis, and decide who can access what with fine-grained access control.
    2. Distinguish between VMs, containers, and Google Kubernetes Engine
  3. Identify and evaluate serverless compute options. Considerations include:
    1. Define the function and use of App Engine, Cloud Functions, and Cloud Run
    2. Define rationale for versioning with serverless compute options
    3. Cost and performance tradeoffs of scale to zero
      1. Scale to zero helps provides cost efficiency by scaling down to zero when there is no load but comes with an issue with cold starts
      2. Serverless technologies like Cloud Functions, Cloud Run, App Standard Engine provides these capabilities
  4. Identify and evaluate multiple data management offerings. Considerations include:
    1. Describe the differences and benefits of Google Cloud’s relational and non-relational database offerings
      1. Cloud SQL provides fully managed, relational SQL databases and offers MySQL, PostgreSQL, MSSQL databases as a service
      2. Cloud Spanner provides fully managed, relational SQL databases with joins and secondary indexes
      3. Cloud Bigtable provides a scalable, fully managed, non-relational NoSQL wide-column analytical big data database service suitable for low-latency single-point lookups and precalculated analytics
      4. BigQuery provides fully managed, no-ops, OLAP, enterprise data warehouse (EDW) with SQL and fast ad-hoc queries.
    2. Describe Google Cloud’s database offerings and how they compare to commercial offerings
  5. Distinguish between ML/AI offerings. Considerations include:
    1. Describe the differences and benefits of Google Cloud’s hardware accelerators (e.g., Vision API, AI Platform, TPUs)
    2. Identify when to train your own model, use a Google Cloud pre-trained model, or build on an existing model
      1. Vision API provides out-of-the-box pre-trained models to extract data from images
      2. AutoML provides the ability to train models
      3. BigQuery Machine Learning provides support for limited models and SQL interface
  6. Differentiate between data movement and data pipelines. Considerations include:
    1. Describe Google Cloud’s data pipeline offerings
      1. Cloud Pub/Sub provides reliable, many-to-many, asynchronous messaging between applications. By decoupling senders and receivers, Google Cloud Pub/Sub allows developers to communicate between independently written applications.
      2. Cloud Dataflow is a fully managed service for strongly consistent, parallel data-processing pipelines
      3. Cloud Data Fusion is a fully managed, cloud-native, enterprise data integration service for quickly building & managing data pipelines
      4. BigQuery Service is a fully managed, highly scalable data analysis service that enables businesses to analyze Big Data.
      5. Looker provides an enterprise platform for business intelligence, data applications, and embedded analytics.
    2. Define data ingestion options
  7. Apply use cases to a high-level Google Cloud architecture. Considerations include:
    1. Define Google Cloud’s offerings around the Software Development Life Cycle (SDLC)
    2. Describe Google Cloud’s platform visibility and alerting offerings covers Cloud Monitoring and Cloud Logging
  8. Describe solutions for migrating workloads to Google Cloud. Considerations include:
    1. Identify data migration options
    2. Differentiate when to use Migrate for Compute Engine versus Migrate for Anthos
      1. Migrate for Compute Engine provides fast, flexible, and safe migration to Google Cloud
      2. Migrate for Anthos and GKE makes it fast and easy to modernize traditional applications away from virtual machines and into native containers. This significantly reduces the cost and labor that would be required for a manual application modernization project.
    3. Distinguish between lift and shift versus application modernization
      1. involves lift and shift migration with zero to minimal changes and is usually performed with time constraints
      2. Application modernization requires a redesign of infra and applications and takes time. It can include moving legacy monolithic architecture to microservices architecture, building CI/CD pipelines for automated builds and deployments, frequent releases with zero downtime, etc.
  9. Describe networking to on-premises locations. Considerations include:
    1. Define Software-Defined WAN (SD-WAN) – did not have any questions regarding the same.
    2. Determine the best connectivity option based on networking and security requirements – covers Cloud VPN, Interconnect, and Peering.
    3. Private Google Access provides access from VM instances to Google provides services like Cloud Storage or third-party provided services
  10. Define identity and access features. Considerations include:
    1. Cloud Identity & Access Management (Cloud IAM) provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
    2. Google Cloud Directory Sync enables administrators to synchronize users, groups, and other data from an Active Directory/LDAP service to their Google Cloud domain directory.

Google Cloud – Professional Cloud Developer Certification learning path

Google Cloud Profressional Cloud Developer Certificate

Google Cloud – Professional Cloud Developer Certification learning path

Continuing on the Google Cloud Journey, glad to have passed the sixth certification with the Professional Cloud Developer certification.

Google Cloud -Professional Cloud Developer Certification Summary

  • Had 60 questions to be answered in 2 hours. The number of questions was 50 with the other exams in the same 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on application and deployment services
  • Make sure you cover the case studies beforehand. I got  ~5-6 questions and it can really be a savior for you in the exams.
  • As mentioned for all the exams, Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless about some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud Developer Certification Resources

Google Cloud – Professional Cloud Developer Certification Topics

Case Studies

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the security aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for compute and provides fine-grained control
    • Compute Engine is recommended to be used with Service Account with the least privilege to provide access to Google services and the information can be queried from instance metadata.
    • Compute Engine Persistent disks can be attached to multiple VMs in read-only mode.
    • Compute Engine launch issues reasons
      • Boot disk is full.
      • Boot disk is corrupted
      • Boot Disk has an invalid master boot record (MBR).
      • Quota Errors
      • Can be debugged using Serial console
    • Preemptible VMs and their use cases. HINT –  shutdown script to perform cleanup actions
  • Google Kubernetes Engine
    • Google Kubernetes Engine, enables running containers on Google Cloud
    • Understand GKE containers, Pods, Deployments, Service, DaemonSet, StatefulSets
      • Pods are the smallest, most basic deployable objects in Kubernetes. A Pod represents a single instance of a running process in the cluster and can contain single or multiple containers
      • Deployments represent a set of multiple, identical Pods with no unique identities. A Deployment runs multiple replicas of the application and automatically replaces any instances that fail or become unresponsive.
      • StatefulSets represent a set of Pods with unique, persistent identities and stable hostnames that GKE maintains regardless of where they are scheduled
      • DaemonSets manages groups of replicated Pods. However, DaemonSets attempt to adhere to a one-Pod-per-node model, either across the entire cluster or a subset of nodes
      • Service is to group a set of Pod endpoints into a single resource. GKE Services can be exposed as ClusterIP, NodePort, and Load Balancer
      • Ingress object defines rules for routing HTTP(S) traffic to applications running in a cluster. An Ingress object is associated with one or more Service objects, each of which is associated with a set of Pods
    • GKE supports Horizontal Pod Autoscaler (HPA) to autoscale deployments based on CPU and Memory
    • GKE supports health checks using liveness and readiness probe
      • Readiness probes are designed to let Kubernetes know when the app is ready to serve traffic.
      • Liveness probes let Kubernetes know if the app is alive or dead.
    • Understand Workload Identity for security, which is a recommended way to provide Pods running on the cluster access to Google resources.
    • GKE integrates with Istio to provide MTLS feature
  • Google App Engine
  • Cloud Tasks
    • is a fully managed service that allows you to manage the execution, dispatch, and delivery of a large number of distributed tasks.

Security Services

  • Cloud Identity-Aware Proxy
    • Identity-Aware Proxy IAP allows managing access to HTTP-based apps both on Google Cloud and outside of Google Cloud.
    • IAP uses Google identities and IAM and can leverage external identity providers as well like OAuth with Facebook, Microsoft, SAML, etc.
    • Signed headers using JWT provide secondary security in case someone bypasses IAP.
  • Cloud Data Loss Prevention – DLP
    • Cloud Data Loss Prevention – DLP is a fully managed service designed to help discover, classify, and protect the most sensitive data.
    • provides two key features
      • Classification is the process to inspect the data and know what data we have, how sensitive it is, and the likelihood.
      • De-identification is the process of removing, masking, redaction, replacing information from data.
  • Web Security Scanner
    • Web Security Scanner identifies security vulnerabilities in the App Engine, GKE, and Compute Engine web applications.
    • scans provide information about application vulnerability findings, like OWASP, XSS, Flash injection, outdated libraries, cross-site scripting, clear-text passwords, or use of mixed content

Networking Services

  • Virtual Private Cloud
    • Understand Virtual Private Cloud (VPC), subnets, and host applications within them
    • Private Access options for services allow instances with internal IP addresses can communicate with Google APIs and services.
    • Private Google Access allows VMs to connect to the set of external IP addresses used by Google APIs and services by enabling Private Google Access on the subnet used by the VM’s network interface.
  • Cloud Load Balancing
    • Google Cloud Load Balancing provides scaling, high availability, and traffic management for your internet-facing and private applications.

Identity Services

  • Resource Manager
    • Understand Resource Manager the hierarchy Organization -> Folders -> Projects -> Resources
    • IAM Policy inheritance is transitive and resources inherit the policies of all of their parent resources.
    • Effective policy for a resource is the union of the policy set on that resource and the policies inherited from higher up in the hierarchy.
  • Identity and Access Management
    • Identify and Access Management – IAM provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
    • A service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person.
    • Understand IAM Best Practices
      • Use groups for users requiring the same responsibilities
      • Use service accounts for server-to-server interactions.
      • Use Organization Policy Service to get centralized and programmatic control over the organization’s cloud resources.
    • Domain-wide delegation of authority to grant third-party and internal applications access to the users’ data for e.g. Google Drive etc.

Storage Services

  • Cloud Storage
    • Cloud Storage is cost-effective object storage for unstructured data and provides an option for long term data retention
    • Understand Signed URL to give temporary access and the users do not need to be GCP users HINT: Signed URL would work for direct upload to GCS without routing the traffic through App Engine or CE
    • Understand Google Cloud Storage Classes and Object Lifecycle Management to transition objects
    • Retention Policies help define the retention period for the bucket, before which the objects in the bucket cannot be deleted.
    • Bucket Lock feature allows configuring a data retention policy for a bucket that governs how long objects in the bucket must be retained. The feature also allows locking the data retention policy, permanently preventing the policy from being reduced or removed
    • Know Cloud Storage Best Practices esp. GCS auto-scaling performs well if requests ramp up gradually rather than having a sudden spike. Also, retry using exponential back-off strategy
    • Cloud Storage can be used to host static websites
    • Cloud CDN can be used with Cloud Storage to improve performance and enable caching
  • DataStore/FireStore
    • Cloud Datastore/Firestore provides a managed NoSQL document database built for automatic scaling, high performance, and ease of application development.

Developer Tools

  • Google Cloud Build
    • Cloud Build integrates with Cloud Source Repository, Github, and Gitlab and can be used for Continous Integration and Deployments.
    • Cloud Build can import source code, execute build to the specifications, and produce artifacts such as Docker containers or Java archives
    • Cloud Build build config file specifies the instructions to perform, with steps defined to each task like test, build and deploy.
    • Cloud Build supports custom images as well for the steps
    • Cloud Build uses a directory named /workspace as a working directory and the assets produced by one step can be passed to the next one via the persistence of the /workspace directory.
  • Google Cloud Code
    • Cloud Code helps write, debug, and deploy the cloud-based applications for IntelliJ, VS Code, or in the browser.
  • Google Cloud Client Libraries
    • Google Cloud Client Libraries provide client libraries and SDKs in various languages for calling Google Cloud APIs.
    • If the language is not supported, Cloud Rest APIs can be used.
  • Deployment Techniques
    • Recreate deployment – fully scale down the existing application version before you scale up the new application version.
    • Rolling update – update a subset of running application instances instead of simultaneously updating every application instance
    • Blue/Green deployment – (also known as a red/black deployment), you perform two identical deployments of your application
    • GKE supports Rolling and Recreate deployments.
      • Rolling deployments support maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted)
    • Managed Instance groups support Rolling deployments using the
    • maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted) configurations
  • Testing Strategies
    • Canary testing – partially roll out a change and then evaluate its performance against a baseline deployment
    • A/B testing – test a hypothesis by using variant implementations. A/B testing is used to make business decisions (not only predictions) based on the results derived from data.

Data Services

  • Bigtable
  • Cloud Pub/Sub
    • Understand Cloud Pub/Sub as an asynchronous messaging service
    • Know patterns for One to Many, Many to One, and Many to Many
    • roles/publisher and roles/pubsub.subscriber provides applications with the ability to publish and consume.
  • Cloud SQL
    • Cloud SQL is a fully managed service that provides MySQL, PostgreSQL, and Microsoft SQL Server.
    • HA configuration provides data redundancy and failover capability with minimal downtime when a zone or instance becomes unavailable due to a zonal outage, or an instance corruption
    • Read replicas help scale horizontally the use of data in a database without degrading performance
  • Cloud Spanner
    • is a fully managed relational database with unlimited scale, strong consistency, and up to 99.999% availability.
    • can read and write up-to-date strongly consistent data globally
    • Multi-region instances give higher availability guarantees (99.999% availability) and global scale.
    • Cloud Spanner’s table interleaving is a good choice for many parent-child relationships where the child table’s primary key includes the parent table’s primary key columns.

Monitoring

  • Google Cloud Monitoring or Stackdriver
    • provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
    • Cloud Monitoring helps gain visibility into the performance, availability, and health of your applications and infrastructure.
  • Google Cloud Logging or Stackdriver logging
    • Cloud Logging provides real-time log management and analysis
    • Cloud Logging allows ingestion of custom log data from any source
    • Logs can be exported by configuring log sinks to BigQuery, Cloud Storage, or Pub/Sub.
    • Cloud Logging Agent can be installed for logging and capturing application logs.
  • Cloud Error Reporting
    • counts, analyzes, and aggregates the crashes in the running cloud services
  • Cloud Trace
    • is a distributed tracing system that collects latency data from the applications and displays it in the Google Cloud Console.
  • Cloud Debugger
    • is a feature of Google Cloud that lets you inspect the state of a running application in real-time, without stopping or slowing it down
    • Debug Logpoints allow logging injection into running services without restarting or interfering with the normal function of the service
    • Debug Snapshots help capture local variables and the call stack at a specific line location in your app’s source code

All the Best !!

Google Cloud – Professional Cloud Security Engineer Certification learning path

GCP - Professional Cloud Security Engineer Certificate

Google Cloud – Professional Cloud Security Engineer Certification learning path

Continuing on the Google Cloud Journey, have just cleared the Professional Cloud Security certification. Google Cloud – Professional Cloud Security Engineer certification exam focuses on almost all of the Google Cloud security services with storage, compute, networking services with their security aspects only.

Google Cloud -Professional Cloud Security Engineer Certification Summary

  • Has 50 questions to be answered in 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on security and network services
  • As mentioned for all the exams, Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless about some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud Security Engineer Certification Resources

Google Cloud – Professional Cloud Security Engineer Certification Topics

Security Services

  • Google Cloud – Security Services Cheat Sheet
  • Cloud Key Management Service – KMS
    • Cloud KMS provides a centralized, scalable, fast cloud key management service to manage encryption keys
    • KMS Key is a named object containing one or more key versions, along with metadata for the key.
    • KMS KeyRing provides grouping keys with related permissions that allow you to grant, revoke, or modify permissions to those keys at the key ring level without needing to act on each key individually.
  • Cloud Armor
    • Cloud Armor protects the applications from multiple types of threats, including DDoS attacks and application attacks like XSS and SQLi
    • works with the external HTTP(S) load balancer to automatically block network protocol and volumetric DDoS attacks such as protocol floods (SYN, TCP, HTTP, and ICMP) and amplification attacks (NTP, UDP, DNS)
    • with GKE needs to be configured with GKE Ingress
    • can be used to blacklist IPs
    • supports preview mode to understand patterns without blocking the users
  • Cloud Identity-Aware Proxy
    • Identity-Aware Proxy IAP allows managing access to HTTP-based apps both on Google Cloud and outside of Google Cloud.
    • IAP uses Google identities and IAM and can leverage external identity providers as well like OAuth with Facebook, Microsoft, SAML, etc.
    • Signed headers using JWT provide secondary security in case someone bypasses IAP.
  • Cloud Data Loss Prevention – DLP
    • Cloud Data Loss Prevention – DLP is a fully managed service designed to help discover, classify, and protect the most sensitive data.
    • provides two key features
      • Classification is the process to inspect the data and know what data we have, how sensitive it is, and the likelihood.
      • De-identification is the process of removing, masking, redaction, replacing information from data.
    • supports text, image, and storage classification with scans on data stored in Cloud Storage, Datastore, and BigQuery
    • supports scanning of binary, text, image, Microsoft Word, PDF, and Apache Avro files
  • Web Security Scanner
    • Web Security Scanner identifies security vulnerabilities in the App Engine, GKE, and Compute Engine web applications.
    • scans provide information about application vulnerability findings, like OWASP, XSS, Flash injection, outdated libraries, cross-site scripting, clear-text passwords, or use of mixed content
  • Security Command Center – SCC
    • is a Security and risk management platform that helps generate curated insights and provides a unique view of incoming threats and attacks to the assets
    • displays possible security risks, called findings, that are associated with each asset.
  • Forseti Security
    • the open-source security toolkit, and third-party security information and event management (SIEM) applications
    • keeps track of the environment with inventory snapshots of GCP resources on a recurring cadence
  • Access Context Manager
    • Access Context Manager allows organization administrators to define fine-grained, attribute-based access control for projects and resources
    • Access Context Manager helps reduce the size of the privileged network and move to a model where endpoints do not carry ambient authority based on the network.
    • Access Context Manager helps prevent data exfiltration with proper access levels and security perimeter rules

Compliance

  • FIPS 140-2 Validated
    • FIPS 140-2 Validated certification was established to aid in the protection of digitally stored unclassified, yet sensitive, information.
    • Google Cloud uses a FIPS 140-2 validated encryption module called BoringCrypto in the production environment. This means that both data in transit to the customer and between data centers, and data at rest are encrypted using FIPS 140-2 validated encryption.
    • BoringCrypto module that achieved FIPS 140-2 validation is part of the BoringSSL library.
    • BoringSSL library as a whole is not FIPS 140-2 validated
  • PCI/DSS Compliance
    • PCI/DSS compliance is a shared responsibility model
    • Egress rules cannot be controlled for App Engine, Cloud Functions, and Cloud Storage. Google recommends using compute Engine and GKE to ensure that all egress traffic is authorized.
    • Antivirus software and File Integrity monitoring must be used on all systems commonly affected by malware to protect systems from current and evolving malicious software threats including containers
    • For payment processing, the security can be improved and compliance proved by isolating each of these environments into its own VPC network and reduce the scope of systems subject to PCI audit standards

Networking Services

  • Refer Google Cloud Security Services Cheat Sheet
  • Virtual Private Cloud
    • Understand Virtual Private Cloud (VPC), subnets, and host applications within them
    • Firewall rules control the Traffic to and from instances. HINT: rules with lower integers indicate higher priorities. Firewall rules can be applied to specific tags.
    • Know implied firewall rules which deny all ingress and allow all egress
    • Understand the difference between using Service Account vs Network Tags for filtering in Firewall rules. HINT: Use SA over tags as it provides access control while tags can be easily inferred.
    • VPC Peering allows internal or private IP address connectivity across two VPC networks regardless of whether they belong to the same project or the same organization. HINT: VPC Peering uses private IPs and does not support transitive peering
    • Shared VPC allows an organization to connect resources from multiple projects to a common VPC network so that they can communicate with each other securely and efficiently using internal IPs from that network
    • Private Access options for services allow instances with internal IP addresses can communicate with Google APIs and services.
    • Private Google Access allows VMs to connect to the set of external IP addresses used by Google APIs and services by enabling Private Google Access on the subnet used by the VM’s network interface.
    • VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as GKE nodes.
    • Firewall Rules Logging enables auditing, verifying, and analyzing the effects of the firewall rules
  • Hybrid Connectivity
    • Understand Hybrid Connectivity options in terms of security.
    • Cloud VPN provides secure connectivity from the on-premises data center to the GCP network through the public internet. Cloud VPN does not provide internal or private IP connectivity
    • Cloud Interconnect provides direct connectivity from the on-premises data center to the GCP network
  • Cloud NAT
    • Cloud NAT allows VM instances without external IP addresses and private GKE clusters to send outbound packets to the internet and receive any corresponding established inbound response packets.
    • Requests would not be routed through Cloud NAT if they have an external IP address
  • Cloud DNS
    • Understand Cloud DNS and its features 
    • supports DNSSEC, a feature of DNS, that authenticates responses to domain name lookups and protects the domains from spoofing and cache poisoning attacks
  • Cloud Load Balancing
    • Google Cloud Load Balancing provides scaling, high availability, and traffic management for your internet-facing and private applications.
    • Understand Google Load Balancing options and their use cases esp. which is global, internal and does they support SSL offloading
      • Network Load Balancer – regional, external, pass through and supports TCP/UDP
      • Internal TCP/UDP Load Balancer – regional, internal, pass through and supports TCP/UDP
      • HTTP/S Load Balancer – regional/global, external, pass through and supports HTTP/S
      • Internal HTTP/S Load Balancer – regional/global, internal, pass through and supports HTTP/S
      • SSL Proxy Load Balancer – regional/global, external, proxy, supports SSL with SSL offload capability
      • TCP Proxy Load Balancer – regional/global, external, proxy, supports TCP without SSL offload capability

Identity Services

  • Resource Manager
    • Understand Resource Manager the hierarchy Organization -> Folders -> Projects -> Resources
    • IAM Policy inheritance is transitive and resources inherit the policies of all of their parent resources.
    • Effective policy for a resource is the union of the policy set on that resource and the policies inherited from higher up in the hierarchy.
  • Identity and Access Management
    • Identify and Access Management – IAM provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
    • A service account is a special kind of account used by an application or a virtual machine (VM) instance, not a person.
    • Service Account, if accidentally deleted, can be recovered if the time gap is less than 30 days and a service account by the same name wasn’t created
    • Understand IAM Best Practices
      • Use groups for users requiring the same responsibilities
      • Use service accounts for server-to-server interactions.
      • Use Organization Policy Service to get centralized and programmatic control over the organization’s cloud resources.
    • Domain-wide delegation of authority to grant third-party and internal applications access to the users’ data for e.g. Google Drive etc.
  • Cloud Identity
    • Cloud Identity provides IDaaS (Identity as a Service) and provides single sign-on functionality and federation with external identity provides like Active Directory.
    • Cloud Identity supports federating with Active Directory using GCDS to implement the synchronization

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the security aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for compute and provides fine-grained control
    • Managing access using OS Login or project and instance metadata
    • Compute Engine is recommended to be used with Service Account with the least privilege to provide access to Google services and the information can be queried from instance metadata.
  • Google Kubernetes Engine
    • Google Kubernetes Engine, enables running containers on Google Cloud
    • Understand Best Practices for Building Containers
      • Package a single app per container
      • Properly handle PID 1, signal handling, and zombie processes
      • Optimize for the Docker build cache
      • Remove unnecessary tools
      • Build the smallest image possible
      • Scan images for vulnerabilities
      • Restrict using Public Image
      • Managed Base Images

Storage Services

  • Cloud Storage
    • Cloud Storage is cost-effective object storage for unstructured data and provides an option for long term data retention
    • Understand Cloud Storage Security features
      • Understand various Data Encryption techniques including Envelope Encryption, CMEK, and CSEK. HINT: CSEK works with Cloud Storage and Persistent Disks only. CSEK manages KEK and not DEK.
      • Cloud Storage default encryption uses AES256
      • Understand Signed URL to give temporary access and the users do not need to be GCP users
      • Understand access control and permissions – IAM (Uniform) vs ACLs (fine-grained control)
      • Bucket Lock feature allows configuring a data retention policy for a bucket that governs how long objects in the bucket must be retained. The feature also allows locking the data retention policy, permanently preventing the policy from being reduced or removed

Monitoring

  • Google Cloud Monitoring or Stackdriver
    • provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
  • Google Cloud Logging or Stackdriver logging
    • Audit logs are provided through Cloud logging using Admin Activity and Data Access Audit logs
    • VPC Flow logs and Firewall Rules logs help monitor traffic to and from Compute Engine instances.
    • log sinks can export data to external providers via Cloud Pub/Sub

All the Best !!

Google Cloud – Professional Cloud Network Engineer Certification learning path

Google Cloud - Professional Cloud Network Engineer Certification

Google Cloud – Professional Cloud Network Engineer Certification learning path

Google Cloud – Professional Cloud Network Engineer certification exam focuses on almost all of the Google Cloud network services.

Google Cloud -Professional Cloud Network Engineer Certification Summary

  • Has 50 questions to be answered in 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on network services
  • Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless for some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud Network Engineer Certification Resources

Google Cloud – Professional Cloud Network Engineer Certification Topics

Network Services

  • Refer Google Cloud Networking Services Cheat Sheet
  • Virtual Private Cloud
    • Understand Virtual Private Cloud (VPC), subnets, and host applications within them
    • VPC Routes determine the next hop for the traffic. HINT: It can be defined for specific tags as well. More specific takes priority.
    • Firewall rules control the Traffic to and from instances. HINT: rules with lower integers indicate higher priorities. Firewall rules can be applied to specific tags.
    • VPC Peering allows internal or private IP address connectivity across two VPC networks regardless of whether they belong to the same project or the same organization. HINT: VPC Peering uses private IPs and does not support transitive peering
    • Shared VPC allows an organization to connect resources from multiple projects to a common VPC network so that they can communicate with each other securely and efficiently using internal IPs from that network HINT: VLAN attachments and Cloud Routers for Interconnect must be created in the host project
    • Understand the concept internal and external IPs and the difference between static and ephemeral IPs
    • VPC Subnets support primary and secondary (alias) IP range
    • Primary IP range of an existing subnet can be expanded by modifying its subnet mask, setting the prefix length to a smaller number.
    • Private Access options for services allow instances with internal IP addresses can communicate with Google APIs and services.
    • Private Google Access allows VMs to connect to the set of external IP addresses used by Google APIs and services by enabling Private Google Access on the subnet used by the VM’s network interface. HINT: Private Google Access is enabled on the subnet and not on the VPC level
    • VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as GKE nodes.
    • Firewall Rules Logging enables auditing, verifying, and analyzing the effects of the firewall rules HINT: Default implicit ingress deny rule is not captured by firewall rules logging. Add an explicit deny rule
    • Resources within a VPC network can communicate with one another by using internal IPv4 addresses
  • Hybrid Connectivity
  • Cloud VPN
    • Cloud VPN provides secure connectivity from the on-premises data center to the GCP network through the public internet. Cloud VPN does not provide internal or private IP connectivity
    • Understand what are the requirements to setup Cloud VPN.
    • Cloud VPN is quick to setup and test hybrid connectivity
    • Understand limitations of Cloud VPN esp. 3Gbps limit. How it can be improved with multiple tunnels.
    • Cloud VPN requires non overlapping primary and secondary IPs address between on-premises and GCP VPC networks
    • Cloud VPN HA provides a highly available and secure connection between the on-premises and the VPC network through an IPsec VPN connection in a single region
  • Cloud Interconnect
    • Cloud Interconnect provides direct connectivity from the on-premises data center to GCP network
    • Dedicated Interconnect provides a direct physical connection between the on-premises network and Google’s network. Supports > 10Gbps
    • Partner Interconnect provides connectivity between the on-premises and VPC networks through a supported service provider. Supports 50Mbps to 10 Gbps
    • Understand Dedicated Interconnect vs Partner Interconnect  and when to choose
    • Know Interconnect as the reliable high speed, low latency, and dedicated bandwidth option.
    • Cloud Monitoring monitors interconnect links. Circuit Operational Status metric threshold tracks the circuits while Interconnect Operational Status metric tracks all the links
  • Cloud Router
    • Cloud Router provides dynamic routing using BGP with HA VPN and Cloud Interconnect
    • Cloud Router Global routing mode provides visibility to resources in all regions
    • Cloud Router uses Multi-exit Discriminator (MED) value to route traffic. The same MED value results in Active/Active connection and different MED results in Active/Passive connection
  • Cloud NAT
    • Cloud NAT allows VM instances without external IP addresses and private GKE clusters to send outbound packets to the internet and receive any corresponding established inbound response packets.
    • Requests would not be routed through Cloud NAT if they have an external IP address
  • Cloud Peering
    • Google Cloud Peering provides Direct Peering and Carrier Peering
    • Peering provides a direct path from the on-premises network to Google services, including Google Cloud products that can be exposed through one or more public IP addresses does not provide a private dedicated connection
  • Cloud Load Balancing
    • Google Cloud Load Balancing provides scaling, high availability, and traffic management for your internet-facing and private applications.
    • Understand Google Load Balancing options and their use cases esp. which is global and internal and what protocols they support.
      • Network Load Balancer – regional, external, pass through and supports TCP/UDP
      • Internal TCP/UDP Load Balancer – regional, internal, pass through and supports TCP/UDP
      • HTTP/S Load Balancer – regional/global, external, pass through and supports HTTP/S
      • Internal HTTP/S Load Balancer – regional/global, internal, pass through and supports HTTP/S
      • SSL Proxy Load Balancer – regional/global, external, proxy, supports SSL with SSL offload capability
      • TCP Proxy Load Balancer – regional/global, external, proxy, supports TCP without SSL offload capability
    • Cloud Load Balancing supports health checks with managed instance groups
  • Cloud CDN
    • Understand Cloud CDN as the global content delivery network
    • Know CDN works only for global external HTTP/S Load Balancer
    • Cache is not removed if the underlying origin data is removed. Cache has to be invalidated explicitly, or is removed once expired.
    • Cloud CDN does not compress but serves response from the origin as is. HINT: As LB adds Via header some web server do not compress response and must be configured to ignore the Via header
  • Cloud DNS
    • Understand Cloud DNS and its features
    • supports migration or importing of records from on-premises using JSON/YAML format
    • supports DNSSEC, a feature of DNS, that authenticates responses to domain name lookups and protects the domains from spoofing and cache poisoning attacks

Identity Services

  • Cloud Identity and Access Management
    • Identify and Access Management – IAM provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
    • Compute Network Admin does not provide access to SSL certificates and firewall rules. Need to assign Security Admin role

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the networking aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for compute and provides fine grained control
    • Difference between managed vs unmanaged instance groups and auto-healing feature
    • Regional Managed Instance group helps spread load across instances in multiple zones within the same region providing scalability and HA
    • Managed Instance group helps perform canary and rolling updates
    • Managed Instance group autoscaling can be configured on CPU or load balancer metrics or custom metrics.
    • Managing access using OS Login or project and instance metadata
  • Google Kubernetes Engine

Security Services

  • Cloud Armor
    • Cloud Armor protects the applications from multiple types of threats, including DDoS attacks and application attacks like XSS and SQLi
    • with GKE needs to be configured with GKE Ingress
    • can be used to blacklist IP
    • supports preview mode to understand patterns without blocking the users

All the Best !!