AWS Transit VPC

AWS Transit VPC

  • Transit Gateway can be used instead of Transit VPC. AWS Transit Gateway offers the same advantages as transit VPC, but it is a managed service that scales elastically in a highly available product.
  • Transit VPC helps connect multiple, geographically disperse VPCs and remote networks in order to create a global network transit center.
  • Transit VPC can solve some of the shortcomings of VPC peering by introducing a hub and spoke design for inter-VPC connectivity.
  • A transit VPC simplifies network management and minimizes the number of connections required to connect multiple VPCs and remote networks.
  • Transit VPC allows an easy way to implement shared services or packet inspection/replication in a VPC.
  • Transit VPC can be used to support important use cases
    • Private Networking – build a private network that spans two or more AWS Regions.
    • Shared Connectivity – Multiple VPCs can share connections to data centers, partner networks, and other clouds.
    • Cross-Account AWS Usage – The VPCs and the AWS resources within them can reside in multiple AWS accounts.
  • Transit VPC design helps implement more complex routing rules, such as network address translation between overlapping network ranges, or to add additional network-level packet filtering or inspection

Transit VPC Configuration

  • Transit VPC network consists of a central VPC (the hub VPC) connecting with every other VPC (spoke VPC) through a VPN connection typically leveraging BGP over IPsec.
  • Central VPC contains EC2 instances running software appliances that route incoming traffic to their destinations using the VPN overlay.

Transit VPC Advantages & Disadvantages

  • supports Transitive routing using the overlay VPN network — allowing for a simpler hub and spoke design. Can be used to provide shared services for VPC Endpoints, Direct Connect connection, etc.
  • supports network address translation between overlapping network ranges.
  • supports vendor functionality around advanced security (layer 7 firewall/Intrusion Prevention System (IPS)/Intrusion Detection System (IDS) ) using third-party software on EC2
  • leverages instance-based routing that increases costs while lowering availability and limiting the bandwidth.
  • Customers are responsible for managing the HA and redundancy of EC2 instances running the third-party vendor virtual appliances

Transit VPC High Availability

Transit VPC High Availability

Transit VPC vs VPC Peering vs Transit Gateway

VPC Peering vs Transit VPC vs Transit Gateway

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Under increased cyber security concerns, a company is deploying a near real-time intrusion detection system (IDS) solution. A system must be put in place as soon as possible. The architecture consists of many AWS accounts, and all results must be delivered to a central location. Which solution will meet this requirement, while minimizing downtime and costs?
    1. Deploy a third-party vendor solution to perform deep packet inspection in a transit VPC.
    2. Enable VPC Flow Logs on each VPC. Set up a stream of the flow logs to a central Amazon Elasticsearch cluster.
    3. Enable Amazon Macie on each AWS account and configure central reporting.
    4. Enable Amazon GuardDuty on each account as members of a central account.
  2. Your company has set up a VPN connection between their on-premises infrastructure and AWS. They have multiple VPCs defined. They also need to ensure that all traffic flows through a security VPC from their on-premise infrastructure. How would you architect the solution? (Select TWO)
    1. Create a VPN connection between the On-premise environment and the Security VPC (Transit VPC pattern)
    2. Create a VPN connection between the On-premise environment to all other VPC’s
    3. Create a VPN connection between the Security VPC to all other VPC’s (Transit VPC pattern)
    4. Create a VPC peering connection between the Security VPC and all other VPC’s

References

AWS_Transit_VPC

Let’s Talk About…

Let’s Talk About Cloud Security

Guest post by Dustin Albertson – Manager of Cloud & Applications, Product Management -Veeam.

I want to discuss something that’s important to me, security. Far too often I have discussions with customers and other engineers where they’re discussing an architecture or problem they are running into, and I spot issues with the design or holes in the thought process. One of the best things about the cloud model is also one of its worst traits: it’s “easy.” What I mean by this is that it’s easy to log into AWS and set up an EC2 instance, connect it to the internet and configure basic settings. This usually leads to issues down the road because the basic security or architectural best practices were not followed. Therefore, I want to talk about a few things that everyone should be aware of.

The Well-Architected Framework

AWS Well-Architected Framework

AWS has done a great job at creating a framework for its customer to adhere to when planning and deploying workloads in AWS.    This framework is called the AWS Well-Architected Framework.   The framework has 6 pillars that helps you learn architectural best practices for designing and operating secure, reliable, efficient, cost-effective, and sustainable workloads in the AWS Cloud.   The pillars are :

  • Operational Excellence: The ability to support the development and run workloads effectively, gain insight into their operations, and continuously improve supporting processes and procedures to deliver business value.
  • Security: The security pillar describes how to take advantage of cloud technologies to protect data, systems, and assets in a way that can improve your security posture.
  • Reliability: The reliability pillar encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to. This includes the ability to operate and test the workload through its total lifecycle. This paper provides in-depth, best practice guidance for implementing reliable workloads on AWS.
  • Performance Efficiency: The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
  • Cost Optimization: The ability to run systems to deliver business value at the lowest price point.
  • Sustainability: The ability to continually improve sustainability impacts by reducing energy consumption and increasing efficiency across all components of a workload by maximizing the benefits from the provisioned resources and minimizing the total resources required.

This framework is important to read and understand for not only a customer but a software vendor or a services provider as well. As a company that provides software in the AWS marketplace, Veeam must go through a few processes prior to listing in the marketplace. Those processes are what’s called a W.A.R (Well-Architected Review) and a T.F.R (Technical Foundation Review).   A W.A.R. is a deep dive into the product and APIs to make sure that the best practices are being used in the way the products not only interact with the APIs in AWS but also how the software is deployed and the architecture it uses.    The T.F.R. is a review to validate that all the appropriate documentation and help guides are in place so that a customer can easily find out how to deploy, protect, secure, and obtain support when using a product deployed via the AWS Marketplace. This can give customers peace of mind when deploying software from the marketplace because they’ll know that it has been rigorously tested and validated.

I have mostly been talking at a high level here and want to break this down into a real-world example. Veeam has a product in the AWS Marketplace called Veeam Backup for AWS. One of the best practices for this product is to deploy it into a separate AWS account than your production account.

Veeam Data Protection

The reason for this is that the software will reach into the production account and back up the instances you wish to protect into an isolated protection account where you can limit the number of people who have access.    It’s also a best practice to have your backup data stored away from production data.    Now here is where the story gets interesting, a lot of people like to use encryption on their EBS volumes.   But since it’s so easy to enable encryption, most people just turn it on and move on.    The root of the issue is that AWS has made it easy to encrypt a volume since they have a default key that you choose when creating an instance.

They have also made it easy to set a policy that every new volume is encrypted and the default choice is the default key.

This is where the problem begins. Now, this may be fine for now or for a lot of users, but what this does is create issues later down the road.    Default encryption keys cannot be shared outside of the account that the key resides in. This means that you would not be able to back that instance up to another account, you can’t rotate the keys, you can’t delete the keys, you can’t audit the keys, and more. Customer managed keys (CMK) give you the ability to create, rotate, disable, enable and audit the encryption key used to protect the data.   I don’t want to go too deep here but this is an example that I run into a lot and people don’t realize the impact of this setting until it’s too late. To change from a default key to a CMK requires downtime of the instance and is a very manual process, although it can be scripted out, it still can be a very cumbersome task if we are talking about hundreds to thousands of instances.

Don’t just take my word for it, Trend Micro also lists this as a Medium Risk.

Aqua Vulnerability Database also lists this as a threat.

Conclusion

I’ am not trying to scare people or shame people for not knowing this information. A lot of the time in the field, we are so busy and just get things working and move on.   My goal here is to try to get you to stop for a second and think about if the choices you are making are the best ones for your security.   Take advantage of the resources and help that companies like AWS and Veeam are offering and learn about data protection and security best practices.   Take a step back from time to time and evaluate the architecture or design that you are implementing.   Get a second set of eyes on the project.   It may sound complicated or confusing, but I promise it’s not that hard and the best bet is to just ask others. Also, don’t forget to check the “Choose Your Cloud Adventure” interactive e-book to learn how to manage your AWS data like a hero.

Thank you for reading.

Google Cloud – Professional Cloud DevOps Engineer Certification learning path

Google Cloud Professional Cloud DevOps Engineer Certification

Google Cloud – Professional Cloud DevOps Engineer Certification learning path

Continuing on the Google Cloud Journey, glad to have passed the 8th certification with the Professional Cloud DevOps Engineer certification. Google Cloud – Professional Cloud DevOps Engineer certification exam focuses on almost all of the Google Cloud DevOps services with Cloud Developer tools, Operations Suite, and SRE concepts.

Google Cloud -Professional Cloud DevOps Engineer Certification Summary

  • Had 50 questions to be answered in 2 hours.
  • Covers a wide range of Google Cloud services mainly focusing on DevOps toolset including Cloud Developer tools, Operations Suite with a focus on monitoring and logging, and SRE concepts.
  • The exam has been updated to use
    • Cloud Operations, Cloud Monitoring & Logging and does not refer to Stackdriver in any of the questions.
    • Artifact Registry instead of Container Registry.
  • There are no case studies for the exam.
  • As mentioned for all the exams, Hands-on is a MUST, if you have not worked on GCP before make sure you do lots of labs else you would be absolutely clueless about some of the questions and commands
  • I did Coursera and ACloud Guru which is really vast, but hands-on or practical knowledge is MUST.

Google Cloud – Professional Cloud DevOps Engineer Certification Resources

Google Cloud – Professional Cloud DevOps Engineer Certification Topics

Developer Tools

  • Google Cloud Build
    • Cloud Build integrates with Cloud Source Repository, Github, and Gitlab and can be used for Continous Integration and Deployments.
    • Cloud Build can import source code, execute build to the specifications, and produce artifacts such as Docker containers or Java archives
    • Cloud Build can trigger builds on source commits in Cloud Source Repositories or other git repositories.
    • Cloud Build build config file specifies the instructions to perform, with steps defined to each task like the test, build and deploy.
    • Cloud Build step specifies an action to be performed and is run in a Docker container.
    • Cloud Build supports custom images as well for the steps
    • Cloud Build integrates with Pub/Sub to publish messages on build’s state changes.
    • Cloud Build can trigger the Spinnaker pipeline through Cloud Pub/Sub notifications.
    • Cloud Build should use a Service Account with a Container Developer role to perform deployments on GKE
    • Cloud Build uses a directory named /workspace as a working directory and the assets produced by one step can be passed to the next one via the persistence of the /workspace directory.
  • Binary Authorization and Vulnerability Scanning
    • Binary Authorization provides software supply-chain security for container-based applications. It enables you to configure a policy that the service enforces when an attempt is made to deploy a container image on one of the supported container-based platforms.
    • Binary Authorization uses attestations to verify that an image was built by a specific build system or continuous integration (CI) pipeline.
    • Vulnerability scanning helps scan images for vulnerabilities by Container Analysis.
    • Hint: For Security and compliance reasons if the image deployed needs to be trusted, use Binary Authorization
  • Google Source Repositories
    • Cloud Source Repositories are fully-featured, private Git repositories hosted on Google Cloud.
    • Cloud Source Repositories can be used for collaborative, version-controlled development of any app or service
    • Hint: If the code needs to be versioned controlled and needs collaboration with multiple members, choose Git related options
  • Google Container Registry/Artifact Registry
    • Google Artifact Registry supports all types of artifacts as compared to Container Registry which was limited to container images
    • Container Registry is not referred to in the exam
    • Artifact Registry supports both regional and multi-regional repositories
  • Google Cloud Code
    • Cloud Code helps write, debug, and deploy the cloud-based applications for IntelliJ, VS Code, or in the browser.
  • Google Cloud Client Libraries
    • Google Cloud Client Libraries provide client libraries and SDKs in various languages for calling Google Cloud APIs.
    • If the language is not supported, Cloud Rest APIs can be used.
  • Deployment Techniques
    • Recreate deployment – fully scale down the existing application version before you scale up the new application version.
    • Rolling update – update a subset of running application instances instead of simultaneously updating every application instance
    • Blue/Green deployment – (also known as a red/black deployment), you perform two identical deployments of your application
    • GKE supports Rolling and Recreate deployments.
      • Rolling deployments support maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted)
    • Managed Instance groups support Rolling deployments using the
    • maxSurge (new pods would be created) and maxUnavailable (existing pods would be deleted) configurations
  • Testing Strategies
    • Canary testing – partially roll out a change and then evaluate its performance against a baseline deployment
    • A/B testing – test a hypothesis by using variant implementations. A/B testing is used to make business decisions (not only predictions) based on the results derived from data.
  • Spinnaker
    • Spinnaker supports Blue/Green rollouts by dynamically enabling and disabling traffic to a particular Kubernetes resource.
    • Spinnaker recommends comparing canary against an equivalent baseline, deployed at the same time instead of production deployment.

Cloud Operations Suite

  • Cloud Operations Suite provides everything from monitoring, alert, error reporting, metrics, diagnostics, debugging, trace.
  • Google Cloud Monitoring or Stackdriver Monitoring
    • Cloud Monitoring helps gain visibility into the performance, availability, and health of your applications and infrastructure.
    • Cloud Monitoring Agent/Ops Agent helps capture additional metrics like Memory utilization, Disk IOPS, etc.
    • Cloud Monitoring supports log exports where the logs can be sunk to Cloud Storage, Pub/Sub, BigQuery, or an external destination like Splunk.
    • Cloud Monitoring API supports push or export custom metrics
    • Uptime checks help check if the resource responds. It can check the availability of any public service on VM, App Engine, URL, GKE, or AWS Load Balancer.
    • Process health checks can be used to check if any process is healthy
  • Google Cloud Logging or Stackdriver logging
    • Cloud Logging provides real-time log management and analysis
    • Cloud Logging allows ingestion of custom log data from any source
    • Logs can be exported by configuring log sinks to BigQuery, Cloud Storage, or Pub/Sub.
    • Cloud Logging Agent can be installed for logging and capturing application logs.
    • Cloud Logging Agent uses fluentd and fluentd filter can be applied to filter, modify logs before being pushed to Cloud Logging.
    • VPC Flow Logs helps record network flows sent from and received by VM instances.
    • Cloud Logging Log-based metrics can be used to create alerts on logs.
    • Hint: If the logs from VM do not appear on Cloud Logging, check if the agent is installed and running and it has proper permissions to write the logs to Cloud Logging.
  • Cloud Error Reporting
    • counts, analyzes and aggregates the crashes in the running cloud services
  • Cloud Profiler
    • Cloud Profiler allows for monitoring of system resources like CPU and memory on both GCP and on-premises resources.
  • Cloud Trace
    • is a distributed tracing system that collects latency data from the applications and displays it in the Google Cloud Console.
  • Cloud Debugger
    • is a feature of Google Cloud that lets you inspect the state of a running application in real-time, without stopping or slowing it down
    • Debug Logpoints allow logging injection into running services without restarting or interfering with the normal function of the service
    • Debug Snapshots help capture local variables and the call stack at a specific line location in your app’s source code

Compute Services

  • Compute services like Google Compute Engine and Google Kubernetes Engine are lightly covered more from the security aspects
  • Google Compute Engine
    • Google Compute Engine is the best IaaS option for computing and provides fine-grained control
    • Preemptible VMs and their use cases. HINT – use for short term needs
    • Committed Usage Discounts – CUD help provide cost benefits for long-term stable and predictable usage.
    • Managed Instance Group can help scale VMs as per the demand. It also helps provide auto-healing and high availability with health checks, in case an application fails.
  • Google Kubernetes Engine
    • GKE can be scaled using
      • Cluster AutoScaler to scale the cluster
      • Vertical Pod Scaler to scale the pods with increasing resource needs
      • Horizontal Pod Autoscaler helps scale Kubernetes workload by automatically increasing or decreasing the number of Pods in response to the workload’s CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.
    • Kubernetes Secrets can be used to store secrets (although they are just base64 encoded values)
    • Kubernetes supports rolling and recreate deployment strategies.

Security

  • Cloud Key Management Service – KMS
    • Cloud KMS can be used to store keys to encrypt data in Cloud Storage and other integrated storage
  • Cloud Secret Manager
    • Cloud Secret Manager can be used to store secrets as well

Site Reliability Engineering – SRE

  • SRE is a DevOps implementation and focuses on increasing reliability and observability, collaboration, and reducing toil using automation.
  • SLOs help specify a target level for the reliability of your service using SLIs which provide actual measurements.
  •  SLI Types
    • Availability
    • Freshness
    • Latency
    • Quality
  • SLOs – Choosing the measurement method
    • Synthetic clients to measure user experience
    • Client-side instrumentation
    • Application and Infrastructure metrics
    • Logs processing
  • SLOs help defines Error Budget and Error Budget Policy which need to be aligned with all the stakeholders and help plan releases to focus on features vs reliability.
  • SRE focuses on Reducing Toil – Identifying repetitive tasks and automating them.
  • Production Readiness Review – PRR
    • Applications should be performance tested for volumes before being deployed to production
    • SLOs should not be modified/adjusted to facilitate production deployments. Teams should work to make the applications SLO compliant before they are deployed to production.
  • SRE Practices include
    • Incident Management and Response
      • Priority should be to mitigate the issue, and then investigate and find the root cause. Mitigating would include
        • Rollbacking the release causes issues
        • Routing traffic to working site to restore user experience
      • Incident Live State Document helps track the events and decision making which can be useful for postmortem.
      • involves the following roles
        • Incident Commander/Manager
          • Setup a communication channel for all to collaborate
          • Assign and delegate roles. IC would assume any role, if not delegated.
          • Responsible for Incident Live State Document
        • Communications Lead
          • Provide periodic updates to all the stakeholders and customers
        • Operations Lead
          • Responds to the incident and should be the only group modifying the system during an incident.
    • Postmortem
      • should contain the root cause
      • should be Blameless
      • should be shared with all for collaboration and feedback
      • should be shared with all the shareholders
      • should have proper action items to prevent recurrence with an owner and collaborators, if required.

All the Best !!

SRE – Site Reliability Engineering Best Practices

Site Reliability Engineering Best Practices

SRE Implements DevOps. The goal of SRE is to accelerate product development teams and keep services running in reliable and continuous way.

SRE Concepts

  • Remove Silos and help increase sharing and collaboration between the Development and Operations team
  • Accidents are Normal. It is more profitable to focus on speeding recovery than preventing accidents.
  • Focus on small and gradual changes. This strategy, coupled with automatic testing of smaller changes and reliable rollback of bad changes, leads to approaches to change management like CI/CD.
  • Measurement is Crucial.

SRE Foundations

  • SLIs, SLOs, and SLAs
  • Monitoring
  • Alerting
  • Toil reduction
  • Simplicity

SLI, SLO, and SLAs

  • SRE does not attempt to give everything 100% availability
  • SLIs – Service Level Indicators
    • “A carefully defined quantitative measure of some aspect of the level of service that is provided”
    • SLIs define what to measure
    • SLIs are metrics over time – specific to a user journey such as request/response, data processing – which shows how well the service is performing.
    • SLIs is the ratio between two numbers: the good and the total:
      • Success Rate = No. of successful HTTP request / total HTTP requests
      • Throughput Rate = No. of consumed jobs in a queue / total number of jobs in a queue
    • SLI is divided into specification and implementation. for e.g.
      • Specification: ration of requests loaded in < 100 ms
      • Implementation is a way to measure for e.g. based on: a) server logs b) client code instrumentation
    • SLI ranges from 0% to 100%, where 0% means nothing works, and 100% means nothing is broken
    • Types of SLIs
      • Availability – The proportion of requests which result in a successful state
      • Latency – The proportion of requests below some time threshold
      • Freshness – The proportion of data transferred to some time threshold. Replication or Data pipeline
      • Correctness – The proportion of input that produces correct output
      • Durability – The proportion of records written that can be successfully read
  • SLO – Service Level Objective
    • “SLOs specify a target level for the reliability of your service.”
    • SLO is a goal that the service provider wants to reach.
    • SLOs are tools to help determine what engineering work to prioritize.
    • SLO is a target percentage based on SLIs and can be a single target value or range of values for e.g. SLI <= SLO or (lower bound <= SLI <= upper bound) = SLO
    • SLOs also define the concept of error budget.
    • The Product and SRE team should select an appropriate availability target for the service and its user base, and the service is managed to that SLO.
  • Error Budget
    • Error budgets are a tool for balancing reliability with other engineering work, and a great way to decide which projects will have the most impact.
    • An Error budget is 100% minus the SLO
    • If an Error budget is exhausted, a team can declare an emergency with high-level approval to deprioritize all external demands until the service meets SLOs and exit criteria.
  • SLOs & Error budget approach
    • SLOs are agreed and approved by all stakeholders
    • It is possible to meet SLOs needs under normal conditions
    • The organization is committed to using the error budget for decision making and prioritizing
    • Error budget policy should cover the policy if the error budget is exhausted.
  • SLO and SLI in practice
    • The strategy to implement SLO, SLI in the company is to start small.
    • Consider the following aspects when working on the first SLO.
      • Choose one application for which you want to define SLOs
      • Decide on a few key SLIs specs that matter to your service and users
      • Consider common ways and tasks through which your users interact with service
      • Draw a high-level architecture diagram of your system
      • Show key components. The requests flow. The data flow
    • The result is a narrow and focused proof of concept that would help to make the benefits of SLO, SLI concise and clear.

Monitoring

  • Monitoring allows you to gain visibility into a system, which is a core requirement for judging service health and diagnosing your service when things go wrong
  • from an SRE perspective,

    • Alert on conditions that requires attention
    • Investigate and diagnose issues
    • Display information about the system visually
    • Gain insight into system health and resource usage for long-term planning
    • Compare the behavior of the system before and after a change, or between two control groups
  • Monitoring features that might be relevant
    • Speed of data retrieval and freshness of data.
    • Data retention and calculations
    • Interfaces: graphs, tables, charts. High level or low level.
    • Alerts: multiple categories, notifications flow, suppress functionality.
  • Monitoring sources
    • Metrics are numerical measurements representing attributes and events, typically harvested via many data points at regular time intervals.
    • Logs are an append-only record of events.

Alerting

  • Alerting helps ensure alerts are triggered for a significant event, an event that consumes a large fraction of the error budget.
  • Alerting should be configured to notify an on-caller only when there are actionable, specific threats to the error budget.
  • Alerting considerations
    • Precision – The proportion of events detected that were significant.
    • Recall – The proportion of significant events detected.
    • Detection time – How long it takes to send notification in various conditions. Long detection time negatively impacts the error budget.
    • Reset time – How long alerts fire after an issue is resolved
  • Ways to alerts
    • The recommendation is to combine several strategies to enhance your alert quality from different directions.
    • Target error rate ≥ SLO threshold.
      • Choose a small time window (for example, 10 minutes) and alert if the error rate over that window exceeds the SLO.
      • Upsides: Short detection time, Fast recall time
      • Downsides: Precision is low
    • Increased Alert Windows.
      • By increasing the window size, you spend a higher budget amount before triggering an alert. for e.g. if an event consumes 5% of the 30-day error budget – a 36-hour window.
      • Upsides: good detection time, better precision
      • Downside: poor reset time
    • Increment Alert Duration.
      • For how long alert should be triggered to be significant.
      • Upsides: Higher precision.
      • Downside: poor recall and poor detection time
    • Alert on Burn Rate.
      • How fast, relative to SLO, the service consumes an error budget.
      • Example: 5% error budget over 1 hour period.
      • Upside: Good precision, short time window, good detection time.
      • Downside: low recall, long reset time
    • Multiple Burn Rate Alerts.
      • Burn rate is how fast, relative to the SLO, the service consumes the error budget
      • Depend on burn rate determine the severity of alert which lead to page notification or a ticket
      • Upsides: good recall, good precision
      • Downsides: More parameters to manage, long reset time
    • Multi-window, multi burn alerts.
      • Upsides: Flexible alert framework, good precision, good recall
      • Downside: even harder to manage, lots of parameters to specify

Toil Reduction

It’s better to fix root causes when possible. If I fixed the symptom, there would be no incentive to fix the root cause.

  • Toils is a repetitive, predictable, constant stream of tasks related to maintaining a service.
  • Any time spent on operational tasks means time not spent on project work and project work is how we make our services more reliable and scalable.
  • Toil can be defined using following characteristics
    • Manual. When the tmp directory on a web server reaches 95% utilization, you need to login and find a space to clean up
    • Repetitive. A full tmp directory is unlikely to be a one-time event
    • Automatable. If the instructions are well defined then it’s better to automate the problem detection and remediation
    • Reactive. When you receive too many alerts of “disks full”, they distract more than help. So, potentially high-severity alerts could be missed
    • Lacks enduring value. The satisfaction of completed tasks is short term because it is to prevent the issue in the future
    • Grow at least as fast as its source. The growing popularity of the service will require more infrastructure and more toil work
  • Potential benefits of toil automation
    • Engineering work might reduce toil in the future
    • Increased team morale and reduced burnout
    • Less context switching for interrupts, which raises team productivity
    • Increased process clarity and standardization
    • Enhanced technical skills and career growth for team members
    • Reduced training time
    • Fewer outages attributable to human errors
    • Improved security
    • Shorter response times for user requests
  • Toil Measurement

    • Identify it.
    • Measure the amount of human effort applied to this toil
    • Track these measurements before, during, and after toil reduction efforts
  • Toil categorization
    • Business processes. A most common source of toil.
    • Production interrupts. The key tasks to keep the system running.
    • Product releases. Depending on the tooling and release size they could generate toil (release requests, rollbacks, hotfixes, and repetitive manual configuration changes)
    • Migrations. Large-scale migration or even small database structure change is likely done manually as a one-time effort. Such thinking is a mistake because this work is repetitive.
    • Cost engineering and capacity planning. Ensure a cost-effective baseline. Prepare for critical high traffic events.
    • Troubleshooting
  • Toil management strategies in practices
    • Identify and measure
    • Engineer toil out of the system
    • Reject the toil
    • Use SLO to reduce toil
    • Organizational:
      • Start with human-backed interfaces. For complex businesses, problems start with a partially automated approach.
      • Get support from management and colleagues. Toil reduction is a worthwhile goal.
      • Promote toil reduction as a feature. Create strong business case for toil reduction.
      • Start small and then improve
    • Standardization and automation:
      • Increase uniformity. Lean-to standard tools, equipment and processes.
      • Access risk within automation. Automation with admin-level privileges should have safety mechanism which checks automation actions against the system. It will prevent outages caused by bugs in automation tools.
      • Automate toil response. Think how to approach toil automation. It shouldn’t eliminate human understanding of what’s going on.
      • Use open-source and third-party tools.
    • Use feedback to improve. Seek for feedback from users who interact with your tools, workflows and automation.

Simplicity

  • Simple software breaks less often and is easier and faster to fix when it does break.
  • Simple systems are easier to understand, easier to maintain, and easier to test
  • Measure complexity
    • Training time. How long does it take for a newcomer engineer to get on full speed.
    • Explanation time. The time it takes to provide a view on system internals.
    • Administrative diversity. How many ways are there to configure similar settings
    • Diversity of deployed configuration
    • Age. How old is the system
  • SRE work on simplicity
    • SRE understand the system’s as a whole to prevent and fix the source of complexity
    • SRE should be involved in the design, system architecture, configuration, deployment processes, or elsewhere.
    • SRE leadership empowers SRE teams to push for simplicity and to explicitly reward these efforts.

SRE Practices

SRE practices apply software engineering solutions to operational problems

  • SRE teams are responsible for the day-to-day functioning of the systems we support, our engineering work often focuses

SRE Practices

Incident Management & Response

  • Incident Management involves coordinating the efforts of responding teams in an efficient manner and ensuring that communication flows both between the responders and those interested in the incident’s progress.
  • Incident management is to respond to an incident in a structured way.
  • Incident Response involves mitigating the impact and/or restoring the service to its previous condition.
  • Basic principles of incident response include the following:
    • Maintain a clear line of command.
    • Designate clearly defined roles.
    • Keep a working record of debugging and mitigation as you go.
    • Declare incidents early and often.
  • Key roles in an Incident Response
    • Incident Commander (IC)
      • the person who declares the incident typically steps into the IC role and directs the high-level state of the incident
      • Commands and coordinates the incident response, delegating roles as needed.
      • By default, the IC assumes all roles that have not been delegated yet.
      • Communicates effectively.
      • Stays in control of the incident response.
      • Works with other responders to resolve the incident.
      • Remove roadblocks that prevent Ops from working most effectively.
    • Communications Lead (CL)
      • CL is the public face of the incident response team.
      • The CL’s main duties include providing periodic updates to the incident response team and stakeholders and managing inquiries about the incident.
    • Operations or Ops Lead (OL)
      • OL works to respond to the incident by applying operational tools to mitigate or resolve the incident.
      • The operations team should be the only group modifying the system during an incident.
  • Live Incident State Document
    • Live Incident State Document can live in a wiki, but should ideally be editable by several people concurrently.
    • This living doc can be messy, but must be functional and not usually shared with shareholders.
    • Live Incident State Document is Incident commander’s most important responsibility.
    • Using a template makes generating this documentation easier, and keeping the most important information at the top makes it more usable.
    • Retain this documentation for postmortem analysis and, if necessary, meta analysis.
  • Incident Management Best Practices
    • Prioritize – Stop the bleeding, restore service, and preserve the evidence for root-causing.
    • Prepare – Develop and document your incident management procedures in advance, in consultation with incident participants.
    • Trust – Give full autonomy within the assigned role to all incident participants.
    • Introspect – Pay attention to your emotional state while responding to an incident. If you start to feel panicky or overwhelmed, solicit more support.
    • Consider alternatives – Periodically consider your options and re-evaluate whether it still makes sense to continue what you’re doing or whether you should be taking another tack in incident response.
      Practice.Use the process routinely so it becomes second nature.
    • Change it around – Were you incident commander last time? Take on a different role this time. Encourage every team member to acquire familiarity with each role.

Postmortem

  • A postmortem is a written record of an incident, its impact, the actions taken to mitigate or resolve it, the root cause(s), and the follow-up actions to prevent the incident from recurring.
  • Postmortems are expected after any significant undesirable event.
  • The primary goals of writing a postmortem are to ensure that the incident is documented, that all contributing root cause(s) are well understood, and, especially, that effective preventive actions are put in place to reduce the likelihood and/or impact of recurrence.
  • Writing a postmortem is not punishment – it is a learning opportunity for the entire company.
  • Postmortem Best Practices
    • Blameless
      • Postmortems should be Blameless.
      • It must focus on identifying the contributing causes of the incident without indicting any individual or team for bad or inappropriate behavior.
    • Collaborate and Share Knowledge
      • Postmortems should be used to collaborate and share knowledge. It should e shared broadly, typically with the larger engineering team or on an internal mailing list.
      • The goal should be to share postmortems to the widest possible audience that would benefit from the knowledge or lessons imparted.
    • No Portmortem Left Unreviewed
      • An unreviewed postmortem might as well never have existed.
    • Ownership
      • Declaring official ownership results in accountability, which leads to action.
      • It’s better to have a single owner and multiple collaborators.

Conclusions

  • SRE practices require a significant amount of time and skilled SRE people to implement right
  • A lot of tools are involved in day to day SRE work
  • SRE processes are one of a key to the success of a tech company

References

AWS Key Management Service – KMS

AWS KMS - Owned vs Managed vs Customer Managed Keys

AWS Key Management Service – KMS

  • AWS Key Management Service – KMS is a managed encryption service that allows the creation and control of encryption keys to enable data encryption.
  • provides a highly available key storage, management, and auditing solution to encrypt the data across AWS services & within applications.
  • uses hardware security modules (HSMs) to protect and validate the keys by the FIPS 140-2 Cryptographic Module Validation Program.
  • seamlessly integrates with several AWS services to make encrypting data in those services easy.
  • is integrated with AWS CloudTrail to provide encryption key usage logs to help meet auditing, regulatory, and compliance needs.
  • is regional and keys are only stored and used in the region in which they are created. They cannot be transferred to another region.
  • enforces usage and management policies, to control which IAM user, role from the account, or other accounts can manage and use keys.
  • can create and manage keys by
    • Create, edit, and view symmetric and asymmetric keys, including HMAC keys.
    • Control access to the keys by using key policies, IAM policies, and grants. Policies can be further refined using condition keys.
    • Supports attribute-based access control (ABAC).
    • Create, delete, list, and update aliases for the keys.
    • Tag the keys for identification, automation, and cost tracking.
    • Enable and disable keys.
    • Enable and disable automatic rotation of the cryptographic material in keys.
    • Delete keys to complete the key lifecycle.
  • supports the following cryptographic operations
    • Encrypt, decrypt, and re-encrypt data with symmetric or asymmetric keys.
    • Sign and verify messages with asymmetric keys.
    • Generate exportable symmetric data keys and asymmetric data key pairs.
    • Generate and verify HMAC codes. 
    • Generate random numbers suitable for cryptographic applications
  • supports multi-region keys, which act like copies of the same KMS key in different AWS Regions that can be used interchangeably – as though you had the same key in multiple Regions.
  • supports VPC private endpoint to connect KMS privately from a VPC.
  • supports keys in a CloudHSM key store backed by the CloudHSM cluster.

Envelope encryption

  • AWS cloud services integrated with AWS KMS use a method called envelope encryption to protect the data.
  • Envelope encryption is an optimized method for encrypting data that uses two different keys (Master key and Data key)
  • With Envelop encryption
    • A data key is generated and used by the AWS service to encrypt each piece of data or resource.
    • Data key is encrypted under a defined master key.
    • Encrypted data key is then stored by the AWS service.
    • For data decryption by the AWS service, the encrypted data key is passed to KMS and decrypted under the master key that was originally encrypted so the service can then decrypt the data.
  • When the data is encrypted directly with KMS it must be transferred over the network.
  • Envelope encryption can offer significant performance benefits as KMS only supports sending data less than 4 KB to be encrypted.
  • Envelope encryption reduces the network load for the application or AWS cloud service as only the request and fulfillment of the data key must go over the network.

KMS Service Concepts

KMS Usage

KMS Keys OR Customer Master Keys (CMKs)

  • AWS KMS key is a logical representation of a cryptographic key.
  • KMS Keys can be used to create symmetric or asymmetric keys for encryption or signing OR HMAC keys to generate and verify HMAC tags.
  • Symmetric keys and the private keys of asymmetric keys never leave AWS KMS unencrypted.
  • A KMS key contains metadata, such as the key ID, key spec, key usage, creation date, description, key state, and a reference to the key material that is used to run cryptographic operations with the KMS key.
  • Symmetric keys are 256-bit AES keys that are not exportable.
  • KMS keys can be used to generate, encrypt, and decrypt the data keys, used outside of AWS KMS to encrypt the data [Envelope Encryption]
  • Key material for symmetric keys and the private keys of asymmetric key never leaves AWS KMS unencrypted.

Customer Keys and AWS Keys

AWS KMS - Owned vs Managed vs Customer Managed Keys

AWS Managed Keys

  • AWS Managed keys are created, managed, and used on your behalf by AWS services in your AWS account.
  • keys are automatically rotated every year (~365 days) and the rotation schedule cannot be changed.
  • have permission to view the AWS managed keys in your account, view their key policies, and audit their use in CloudTrail logs.
  • cannot manage or rotate these keys, change their key policies, or use them in cryptographic operations directly; the service that creates them uses them on your behalf.

Customer managed keys

  • Customer managed keys are created by you to encrypt your service resources in your account.
  • Automatic rotation is Optional and if enabled, keys are automatically rotated every year.
  • provides full control over these keys, including establishing and maintaining their key policies, IAM policies, and grants, enabling and disabling them, rotating their cryptographic material, adding tags, creating aliases refering the KMS keys, and scheduling the KMS keys for deletion.

AWS Owned Keys

  • AWS owned keys are a collection of KMS keys that an AWS service owns and manages for use in multiple AWS accounts.
  • AWS owned keys are not in your AWS account, however, an AWS service can use the associated AWS owned keys to protect the resources in your account.
  • cannot view, use, track, or audit them

Key Material

  • KMS keys contain a reference to the key material used to encrypt and decrypt data.
  • By default, AWS KMS generates the key material for a newly created key.
  • KMS key can be created without key material and then your own key material can be imported or created in the AWS CloudHSM cluster associated with an AWS KMS custom key store.
  • Key material cannot be extracted, exported, viewed, or managed.
  • Key material cannot be deleted; you must delete the KMS key.

Key Material Origin

  • Key material origin is a KMS key property that identifies the source of the key material in the KMS key.
  • Symmetric encryption KMS keys can have one of the following key material origin values.
    • AWS_KMS
      • AWS KMS creates and manages the key material for the KMS key in AWS KMS.
    • EXTERNAL
      • Key has imported key material. 
      • Management and security of the key are the customer’s responsibility.
      • Only symmetric keys are supported.
      • Automatic rotation is not supported and needs to be manually rotated.
    • AWS_CLOUDHSM
      • AWS KMS created the key material for the KMS key in the AWS CloudHSM cluster associated with the custom key store.
    • EXTERNAL_KEY_STORE
      • Key material is a cryptographic key in an external key manager outside of AWS.
      • This origin is supported only for KMS keys in an external key store.

Data Keys

  • Data keys are encryption keys that you can use to encrypt data, including large amounts of data and other data encryption keys.
  • KMS does not store, manage, or track your data keys.
  • Data keys must be used by services outside of KMS.

Encryption Context

  • Encryption context provides an optional set of key–value pairs that can contain additional contextual information about the data.
  • AWS KMS uses the encryption context as additional authenticated data (AAD) to support authenticated encryption.
  • Encryption context is not secret and not encrypted and appears in plaintext in CloudTrail Logs so you can use it to identify and categorize your cryptographic operations.
  • Encryption context should not include sensitive information.
  • Encryption context usage
    • When an encryption context is included in an encryption request, it is cryptographically bound to the ciphertext such that the same encryption context is required to decrypt the data.
    • If the encryption context provided in the decryption request is not an exact, case-sensitive match, the decrypt request fails.
  • Only the order of the key-value pairs in the encryption context can vary.

Key Policies

  • help determine who can use and manage those keys.
  • can add, remove, or change permissions at any time for a customer-managed key.
  • cannot edit the key policy for AWS owned or managed keys.

Grants

  • provides permissions, an alternative to the key policy and IAM policy, that allows AWS principals to use the KMS keys.
  • are often used for temporary permissions because you can create one, use its permissions, and delete it without changing the key policies or IAM policies.
  • permissions specified in the grant might not take effect immediately due to eventual consistency.

Grant Tokens

  • help mitigate the potential delay with grants.
  • use the grant token received in the response to CreateGrant API request to make the permissions in the grant take effect immediately.

Alias

  • Alias helps provide a friendly name for a KMS key.
  • can be used to refer to different KMS keys in each AWS Region.
  • can be used to point to different keys without changing the code.
  • can allow and deny access to KMS keys based on their aliases without editing policies or managing grants.
  • aliases are independent resources, not properties of a KMS key, and can be added, changed, and deleted without affecting the associated KMS key.

Encryption & Decryption Process

  • Use KMS to get encrypted and plaintext data keys using CMK.
  • Use the plaintext data key to encrypt the data and store the encrypted data key with the data.
  • Use KMS decrypt to get the plaintext data key and decrypt the data.
  • Remove the plaintext data key from memory, once the operation is completed.

KMS Working

  • KMS centrally manages and securely stores the keys.
  • Keys can be generated or imported from the key management infrastructure (KMI).
  • Keys can be used from within the applications and supported AWS services to protect the data, but the key never leaves KMS.
  • Data is submitted to KMS to be encrypted, or decrypted, under keys that you control.
  • Usage policies on these keys can be set that determines which users can use them to encrypt and decrypt data.

KMS Access Control

  • Primary way to manage access to AWS KMS keys is with policies.
  • KMS keys access can be controlled using
    • Key Policies
      • are resource-based policies
      • every KMS key has a key policy
      • is a primary mechanism for controlling access to a key.
      • can be used alone to control access to the keys.
    • IAM policies
      • use IAM policies in combination with the key policy to control access to keys.
      • helps manage all of the permissions for your IAM identities in IAM.
    • Grants
      • Use grants in combination with the key policy and IAM policies to allow access to keys.
      • helps allow access to the keys in the key policy, and to allow users to delegate their access to others.
  • To allow access to a KMS CMK, a key policy MUST be used, either alone or in combination with IAM policies or grants.
  • IAM policies by themselves are not sufficient to allow access to keys, though they can be used in combination with a key policy.
  • IAM user who creates a KMS key is not considered to be the key owner and they don’t automatically have permission to use or manage the KMS key that they created.

Rotating KMS or Customer Master Keys

  • Key rotation changes only the key material, which is the cryptographic secret that is used in encryption operations. 
  • KMS keys can be enabled for automatic key rotation, where KMS generates new cryptographic material for the key every year.
  • KMS saves all previous versions of the cryptographic material in perpetuity so it can decrypt any data encrypted with that key.
  • KMS does not delete any rotated key material until you delete the KMS key.
  • All new encryption requests against a key are encrypted under the newest version of the key.
  • Rotation can be tracked in CloudWatch and CloudTrail.

Automatic Key Rotation

  • Automatic key rotation has the following benefits
    • properties of the KMS key like ID, ARN, region, policies, and permissions do not change.
    • applications or aliases refering the key do not need to change
    • Rotating key material does not affect the use of the KMS key in any AWS service.
  • Automatic key rotation is supported only on symmetric encryption KMS keys with key material that KMS generates i.e. Origin = AWS_KMS.
  • Automatic key rotation is not supported for
    • asymmetric keys, 
    • HMAC keys,
    • keys in custom key stores, and
    • keys with imported key material.
  • AWS managed keys
    • automatically rotated every 1 year (updated from 3 years before)
    • rotation cannot be enabled or disabled
  • Customer Managed keys
    • automatic key rotation is supported but is optional.
    • automatic key rotation is disabled, by default, and needs to be enabled.
    • keys can be rotated every year.
  • CMKs with imported key material or keys generated in a CloudHSM cluster using the KMS custom key store feature
    • do not support automatic key rotation.
    • provide flexibility to manually rotate keys as required.

Manual Key Rotation

  • Manual key rotation can be performed by creating a KMS key and updating the applications or aliases to point to the new key.
  • does not retain the ID, ARN, and policies of the key.
  • can help control the rotation frequency esp. if the frequency required is less than a year.
  • is also a good solution for KMS keys that are not eligible for automatic key rotation, such as asymmetric keys, HMAC keys, keys in custom key stores, and keys with imported key material.
  • For manually rotated keys, data has to be re-encrypted depending on the application’s configuration.

KMS Deletion

  • KMS key deletion deletes the key material and all metadata associated with the key and is irreversible.
  • Data encrypted by the deleted key cannot be recovered, once the key is deleted.
  • AWS recommends disabling the key before deleting it.
  • AWS Managed and Owned keys cannot be deleted. Only Customer managed keys can be scheduled for deletion.
  • KMS never deletes the keys unless you explicitly schedule them for deletion and the mandatory waiting period expires.
  • KMS requires setting a waiting period of 7-30 days for key deletion. During the waiting period, the KMS key status and key state is Pending deletion.
    • Key pending deletion cannot be used in any cryptographic operations.
    • Key material of keys that are pending deletion is not rotated.

KMS Multi-Region Keys

  • AWS KMS supports multi-region keys, which are AWS KMS keys in different AWS Regions that can be used interchangeably – as though you had the same key in multiple Regions.
  • Multi-Region keys have the same key material and key ID, so data can be encrypted in one AWS Region and decrypted in a different AWS Region without re-encrypting or making a cross-Region call to AWS KMS.
  • Multi-Region keys never leave AWS KMS unencrypted.
  • Multi-Region keys are not global and each multi-region key needs to be replicated and managed independently.

KMS Features

  • Create keys with a unique alias and description
  • Import your own keys
  • Control which IAM users and roles can manage keys
  • Control which IAM users and roles can use keys to encrypt & decrypt data
  • Choose to have AWS KMS automatically rotate keys on an annual basis
  • Temporarily disable keys so they cannot be used by anyone
  • Re-enable disabled keys
  • Delete keys that you no longer use
  • Audit use of keys by inspecting logs in AWS CloudTrail

KMS with VPC Interface Endpoint

  • AWS KMS can be connected through a private interface endpoint in the Virtual Private Cloud (VPC).
  • Interface VPC endpoint ensures the communication between the VPC and AWS KMS is conducted entirely within the AWS network.
  • Interface VPC endpoint connects the VPC directly to KMS without an internet gateway, NAT device, VPN, or Direct Connect connection.
  • Instances in the VPC do not need public IP addresses to communicate with AWS KMS.

KMS vs CloudHSM

AWS KMS vs CloudHSM

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are designing a personal document-archiving solution for your global enterprise with thousands of employee. Each employee has potentially gigabytes of data to be backed up in this archiving solution. The solution will be exposed to he employees as an application, where they can just drag and drop their files to the archiving system. Employees can retrieve their archives through a web interface. The corporate network has high bandwidth AWS DirectConnect connectivity to AWS. You have regulatory requirements that all data needs to be encrypted before being uploaded to the cloud. How do you implement this in a highly available and cost efficient way?
    1. Manage encryption keys on-premise in an encrypted relational database. Set up an on-premises server with sufficient storage to temporarily store files and then upload them to Amazon S3, providing a client-side master key. (Storing temporary increases cost and not a high availability option)
    2. Manage encryption keys in a Hardware Security Module (HSM) appliance on-premise server with sufficient storage to temporarily store, encrypt, and upload files directly into amazon Glacier. (Not cost effective)
    3. Manage encryption keys in amazon Key Management Service (KMS), upload to amazon simple storage service (s3) with client-side encryption using a KMS customer master key ID and configure Amazon S3 lifecycle policies to store each object using the amazon glacier storage tier. (With CSE-KMS the encryption happens at client side before the object is upload to S3 and KMS is cost effective as well)
    4. Manage encryption keys in an AWS CloudHSM appliance. Encrypt files prior to uploading on the employee desktop and then upload directly into amazon glacier (Not cost effective)
  2. An AWS customer is deploying an application that is composed of an Auto Scaling group of EC2 Instances. The customers security policy requires that every outbound connection from these instances to any other service within the customers Virtual Private Cloud must be authenticated using a unique x 509 certificate that contains the specific instance-id. In addition an x 509 certificates must be designed by the customer’s Key management service in order to be trusted for authentication.
    Which of the following configurations will support these requirements?
    1. Configure an IAM Role that grants access to an Amazon S3 object containing a signed certificate and configure the Auto Scaling group to launch instances with this role. Have the instances bootstrap get the certificate from Amazon S3 upon first boot.
    2. Embed a certificate into the Amazon Machine Image that is used by the Auto Scaling group Have the launched instances generate a certificate signature request with the instance’s assigned instance-id to the Key management service for signature.
    3. Configure the Auto Scaling group to send an SNS notification of the launch of a new instance to the trusted key management service. Have the Key management service generate a signed certificate and send it directly to the newly launched instance.
    4. Configure the launched instances to generate a new certificate upon first boot. Have the Key management service poll the AutoScaling group for associated instances and send new instances a certificate signature that contains the specific instance-id.
  3. A company has a customer master key (CMK) with imported key materials. Company policy requires that all encryption keys must be rotated every year. What can be done to implement the above policy?
    1. Enable automatic key rotation annually for the CMK.
    2. Use AWS Command Line interface to create an AWS Lambda function to rotate the existing CMK annually.
    3. Import new key material to the existing CMK and manually rotate the CMK.
    4. Create a new CMK, import new key material to it, and point the key alias to the new CMK.
  4. An organization policy states that all encryption keys must be automatically rotated every 12 months. Which AWS Key Management Service (KMS) key type should be used to meet this requirement? (Select TWO)
    1. AWS managed Customer Master Key (CMK) (Now supports every year. It was every 3 years before.)
    2. Customer managed CMK with AWS generated key material
    3. Customer managed CMK with imported key material
    4. AWS managed data key

References

AWS_Key_Management_Service

Google Cloud Operations

Google Cloud Operations

Google Cloud Operations provides integrated monitoring, logging, and trace managed services for applications and systems running on Google Cloud and beyond.

Google Cloud Operations Suite
Credit Priyanka Vergadia

Cloud Monitoring

  • Cloud Monitoring collects measurements of key aspects of the service and of the Google Cloud resources used.
  • Cloud Monitoring provides tools to visualize and monitor this data.
  • Cloud Monitoring helps gain visibility into the performance, availability, and health of the applications and infrastructure.
  • Cloud Monitoring collects metrics, events, and metadata from Google Cloud, AWS, hosted uptime probes, and application instrumentation.

Cloud Logging

  • Cloud Logging is a service for storing, viewing and interacting with logs.
  • Answers the questions “Who did what, where and when” within the GCP projects
  • Maintains non-tamperable audit logs for each project and organizations
  • Logs buckets are a regional resource, which means the infrastructure that stores, indexes, and searches the logs are located in a specific geographical location.

Error Reporting

  • Error Reporting aggregates and displays errors produced in the running cloud services.
  • Error Reporting provides a centralized error management interface, to help find the application’s top or new errors so that they can be fixed faster.

Cloud Profiler

  • Cloud Profiler helps with continuous CPU, heap, and other parameters profiling to improve performance and reduce costs.
  • Cloud Profiler is a continuous profiling tool that is designed for applications running on Google Cloud:
    • It’s a statistical, or sampling, profiler that has low overhead and is suitable for production environments.
    • It supports common languages and collects multiple profile types.
  • Cloud Profiler consists of the profiling agent, which collects the data, and a console interface on Google Cloud, which lets you view and analyze the data collected by the agent.
  • Cloud Profiler is supported for Compute Engine, App Engine, GKE, and applications running on on-premises as well.

Cloud Trace

  • Cloud Trace is a distributed tracing system that collects latency data from the applications and displays it in the Google Cloud Console.
  • Cloud Trace helps understand how long it takes the application to handle incoming requests from users or applications, and how long it takes to complete operations like RPC calls performed when handling the requests.
  • CloudTrace can track how requests propagate through the application and receive detailed near real-time performance insights.
  • Cloud Trace automatically analyzes all of the application’s traces to generate in-depth latency reports to surface performance degradations and can capture traces from all the VMs, containers, or App Engines.

Cloud Debugger

  • Cloud Debugger helps inspect the state of an application, at any code location, without stopping or slowing down the running app.
  • Cloud Debugger makes it easier to view the application state without adding logging statements.
  • Cloud Debugger adds less than 10ms to the request latency only when the application state is captured. In most cases, this is not noticeable by users.
  • Cloud Debugger can be used with or without access to your app’s source code.
  • Cloud Debugger supports Cloud Source Repositories, GitHub, Bitbucket, or GitLab as the source code repository. If the source code repository is not supported, the source files can be uploaded.
  • Cloud Debugger allows collaboration by sharing the debug session by sending the Console URL.
  • Cloud Debugger supports a range of IDE.

Debug Snapshots

  • Debug Snapshots capture local variables and the call stack at a specific line location in the app’s source code without stopping or slowing it down.
  • Certain conditions and locations can be specified to return a snapshot of the app’s data.
  • Debug Snapshots support canarying wherein the debugger agent tests the snapshot on a subset of the instances.

Debug Logpoints

  • Debug Logpoints allow you to inject logging into running services without restarting or interfering with the normal function of the service.
  • Debug Logpoints are useful for debugging production issues without having to add log statements and redeploy.
  • Debug Logpoints remain active for 24 hours after creation, or until they are deleted or the service is redeployed.
  • If a logpoint is placed on a line that receives lots of traffic, the Debugger throttles the logpoint to reduce its impact on the application.
  • Debug Logpoints support canarying wherein the debugger agent tests the logpoints on a subset of the instances.

References

Google_Cloud_Operations

Google Cloud CI/CD – Continuous Integration & Continuous Deployment

Google Cloud CI/CD

Google Cloud CI/CD provides various tools for continuous integration and deployment and also integrates seamlessly with third-party solutions.

Google Cloud CI/CD - Continuous Integration Continuous Deployment

Google Cloud Source Repositories – CSR

  • Cloud Source Repositories are fully-featured, private Git repositories hosted on Google Cloud.
  • Cloud Source Repositories can be used for collaborative, version-controlled development of any app or service, including those that run on App Engine and Compute Engine.
  • Cloud Source Repositories can connect to an existing GitHub or Bitbucket repository. Connected repositories are synchronized with Cloud Source Repositories automatically.
  • Cloud Source Repositories automatically send logs on repository activity to Cloud Logging to help track and troubleshoot data access.
  • Cloud Source Repositories offer security key detection to block git push transactions that contain sensitive information which helps improve the security of the source code.
  • Cloud Source Repositories provide built-in integrations with other GCP tools like Cloud Build, Cloud Debugger, Cloud Operations, Cloud Logging, Cloud Functions, and others that let you automatically build, test, deploy, and debug code within minutes.
  • Cloud Source Repositories publishes messages about the repository to Pub/Sub topic.
  • Cloud Source Repositories provide a search feature to search for specific files or code snippets.
  • Cloud Source Repositories allow permissions to be controlled at the project (all projects) or at the repo level.

Cloud Build

  • Cloud Build is a fully-managed, serverless service that executes builds on Google Cloud Platform’s infrastructure.
  • Cloud Build can pull/import source code from a variety of repositories or cloud storage spaces, execute a build to produce containers or artifacts, and push them to the artifact registry.
  • Cloud Build executes the build as a series of build steps, where each build step specifies an action to be performed and is run in a Docker container.
  • Build steps can be provided by Cloud Build and the Cloud Build community or can be custom as well.
  • Build config file contains instructions for Cloud Build to perform tasks based on your specifications for e.g., the build config file can contain instructions to build, package, and push Docker images.
  • Builds can be started either manually or using build triggers.
  • Cloud Build uses build triggers to enable CI/CD automation.
  • Build triggers can listen for incoming events, such as when a new commit is pushed to a repository or when a pull request is initiated, and then automatically execute a build when new events come in.
  • Cloud Build publishes messages on a Pub/Sub topic called cloud-builds when the build’s state changes, such as when the build is created, when the build transitions to a working state, and when the build completes.

Container Registry

  • Container Registry is a private container image registry that supports Docker Image Manifest V2 and OCI image formats.
  • Container Registry provides a subset of Artifact Registry features.
  • Container Registry stores its tags and layer files for container images in a Cloud Storage bucket in the same project as the registry.
  • Access to the bucket is configured using Cloud Storage’s identity and access management (IAM) settings.
  • Container Registry integrates seamlessly with Google Cloud services.
    Container Registry works with popular continuous integration and continuous delivery systems including Cloud Build and third-party tools such as Jenkins.

Artifact Registry

  • Artifact Registry is a fully-managed service with support for both container images and non-container artifacts, Artifact Registry extends the capabilities of Container Registry.
  • Artifact Registry is the recommended service for container image storage and management on Google Cloud.
  • Artifact Registry comes with fine-grained access control via Cloud IAM. This enables scoping permissions as granularly as possible, for example to specific regions or environments as necessary.
  • Artifact Registry supports the creation of regional repositories

Container Registry vs Artifact Registry

Google Cloud Container Registry Vs Artifact Registry

Google Cloud DevOps
Credit Priyanka Vergadia

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

 

Google Cloud Container Registry Vs Artifact Registry

Container Registry vs Artifact Registry

Google Cloud - Container Registry vs Artifact Registry

Container Registry

  • Container Registry is a private container image registry that supports Docker Image Manifest V2 and OCI image formats.
  • provides a subset of Artifact Registry features.
  • stores its tags and layer files for container images in a Cloud Storage bucket in the same project as the registry.
  • does not support fine-grained IAM access control. Access to the bucket is configured using Cloud Storage’s permissions.
  • integrates seamlessly with Google Cloud services and works with popular continuous integration and continuous delivery systems including Cloud Build and third-party tools such as Jenkins.
  • is used to store only docker images and does not support languages or os packages.
  • is only multi-regional and does not support regional repository.
  • supports a single repository within a project and automatically creates a repository in a multi-region.
  • uses gcr.io hosts.
  • uses gcloud container images commands.
  • supports CMEK(Customer-Managed encryption keys) to encrypt the storage buckets that contain the images.
  • supports several authentication methods for pushing and pulling images with a third-party client.
  • caches the most frequently requested Docker Hub images on mirror.gcr.io
  • supports VPC-Service Controls and can be added to a service perimeter.
  • hosts Google provided images on gcr.io
  • publishes changes to the gcr topic.
  • images can be viewed and managed from the Container registry section of Cloud Console.
  • pricing is based on Cloud Storage usage, including storage and network egress.

Artifact Registry

  • Artifact Registry is a fully-managed service with support for both container images and non-container artifacts, Artifact Registry extends the capabilities of Container Registry.
  • Artifact Registry is the recommended service for container image storage and management on Google Cloud. It is considered the successor of the Container Registry.
  • Artifact Registry comes with fine-grained access control via Cloud IAM using Artifact Registry permission. This enables scoping permissions as granularly as possible for e.g. to specific regions or environments as necessary
  • supports multi-regional or regional repositories.
  • uses pkg.dev hosts.
  • uses gcloud artifacts docker commands.
  • supports CMEK(Customer-Managed encryption keys) to encrypt individual repositories.
  • supports multiple repositories within the project and the repository should be manually created before pushing any images.
  • supports multiple artifact formats, including Container images, Java packages, and Node.js modules.
  • supports the same authentication method as Container Registry.
  • mirror.gcr.io continues to cache frequently requested images from Docker Hub.
  • supports VPC-Service Controls and can be added to a service perimeter.
  • hosts Google provided images on gcr.io
  • publishes changes to the gcr topic.
  • Artifact Registry and Container Registry repositories can be viewed from the Artifact Registry section of Cloud Console.
  • pricing is based on storage and network egress.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

 

References

Artifact Registry vs Container Registry Feature Comparison

IAM Role – Identity Providers and Federation

IAM Role – Identity Providers and Federation

  • Identity Provider can be used to grant external user identity permissions to AWS resources without having to be created within your AWS account.
  • External user identities can be authenticated either through the organization’s authentication system or through a well-known identity provider such as Amazon, Google, etc.
  • Identity providers help keep the AWS account secure without having the need to distribute or embed long-term in the application
  • To use an IdP, an IAM identity provider entity can be created to establish a trust relationship between the AWS account and the IdP.
  • IAM supports IdPs that are compatible with OpenID Connect (OIDC) or SAML 2.0 (Security Assertion Markup Language 2.0)

Web Identity Federation without Cognito

IAM Web Identity Federation

  1. Mobile or Web Application needs to be configured with the IdP which gives each application a unique ID or client ID (also called audience)
  2. Create an Identity Provider entity for OIDC compatible IdP in IAM.
  3. Create an IAM role and define the
    1. Trust policy – specify the IdP (like Amazon) as the Principal (the trusted entity), and include a Condition that matches the IdP assigned app ID
    2. Permission policy – specify the permissions the application can assume
  4. Application calls the sign-in interface for the IdP to login
  5. IdP authenticates the user and returns an authentication token (OAuth access token or OIDC ID token) with information about the user to the application
  6. Application then makes an unsigned call to the STS service with the AssumeRoleWithWebIdentity action to request temporary security credentials.
  7. Application passes the IdP’s authentication token along with the Amazon Resource Name (ARN) for the IAM role created for that IdP.
  8. AWS verifies that the token is trusted and valid and if so, returns temporary security credentials (access key, secret access key, session token, expiry time) to the application that has the permissions for the role that you name in the request.
  9. STS response also includes metadata about the user from the IdP, such as the unique user ID that the IdP associates with the user.
  10. Application makes signed requests to AWS using the Temporary credentials
  11. User ID information from the identity provider can distinguish users in the app for e.g., objects can be put into S3 folders that include the user ID as prefixes or suffixes. This lets you create access control policies that lock the folder so only the user with that ID can access it.
  12. Application can cache the temporary security credentials and refresh them before their expiry accordingly. Temporary credentials, by default, are good for an hour.

Interactive Website provides a very good way to understand the flow

Mobile or Web Identity Federation with Cognito

  • Amazon Cognito as the identity broker is a recommended for almost all web identity federation scenarios
  • Cognito is easy to use and provides additional capabilities like anonymous (unauthenticated) access
  • Cognito supports anonymous users, MFA and also helps synchronizing user data across devices and providers

Web Identify Federation using Cognito

SAML 2.0-based Federation

  • AWS supports identity federation with SAML 2.0 (Security Assertion Markup Language 2.0), an open standard used by many identity providers (IdPs).
  • SAML 2.0 based federation feature enables federated single sign-on (SSO),  so users can log into the AWS Management Console or call the AWS APIs without having to create an IAM user for everyone in the organization
  • SAML helps simplify the process of configuring federation with AWS by using the IdP’s service instead of writing custom identity proxy code.
  • This is useful in organizations that have integrated their identity systems (such as Windows Active Directory or OpenLDAP) with software that can produce SAML assertions to provide information about user identity and permissions (such as Active Directory Federation Services or Shibboleth)

SAML based Federation

  1. Create a SAML provider entity in AWS using the SAML metadata document provided by the Organizations IdP to establish a “trust” between your AWS account and the IdP
  2. SAML metadata document includes the issuer name, a creation date, an expiration date, and keys that AWS can use to validate authentication responses (assertions) from your organization.
  3. Create IAM roles which define
    1. Trust policy with the SAML provider as the principal, which establishes a trust relationship between the organization and AWS
    2. Permission policy establishes what users from the organization are allowed to do in AWS
  4. SAML trust is completed by configuring the Organization’s IdP with information about AWS and the role(s) that you want the federated users to use. This is referred to as configuring relying party trust between your IdP and AWS
  5. Application calls the sign-in interface for the Organization IdP to login
  6. IdP authenticates the user and generates a SAML authentication response which includes assertions that identify the user and include attributes about the user
  7. Application then makes an unsigned call to the STS service with the AssumeRoleWithSAML action to request temporary security credentials.
  8. Application passes the ARN of the SAML provider, the ARN of the role to assume, the SAML assertion about the current user returned by IdP, and the time for which the credentials should be valid. An optional IAM Policy parameter can be provided to further restrict the permissions to the user
  9. AWS verifies that the SAML assertion is trusted and valid and if so, returns temporary security credentials (access key, secret access key, session token, expiry time) to the application that has the permissions for the role named in the request.
  10. STS response also includes metadata about the user from the IdP, such as the unique user ID that the IdP associates with the user.
  11. Using the Temporary credentials, the application makes signed requests to AWS to access the services
  12. Application can cache the temporary security credentials and refresh them before their expiry accordingly. Temporary credentials, by default, are good for an hour.

AWS SSO with SAML

  • SAML 2.0 based federation can also be used to grant access to the federated users to the AWS Management console.
  • This requires the use of the AWS SSO endpoint instead of directly calling the AssumeRoleWithSAML API.
  • The endpoint calls the API for the user and returns a URL that automatically redirects the user’s browser to the AWS Management Console.

SAML based SSO to AWS Console

  1. User browses the organization’s portal and selects the option to go to the AWS Management Console.
  2. Portal performs the function of the identity provider (IdP) that handles the exchange of trust between the organization and AWS.
  3. Portal verifies the user’s identity in the organization.
  4. Portal generates a SAML authentication response that includes assertions that identify the user and include attributes about the user.
  5. Portal sends this response to the client browser.
  6. Client browser is redirected to the AWS SSO endpoint and posts the SAML assertion.
  7. AWS SSO endpoint handles the call for the AssumeRoleWithSAML API action on the user’s behalf and requests temporary security credentials from STS and creates a console sign-in URL that uses those credentials.
  8. AWS sends the sign-in URL back to the client as a redirect.
  9. Client browser is redirected to the AWS Management Console. If the SAML authentication response includes attributes that map to multiple IAM roles, the user is first prompted to select the role to use for access to the console.

Custom Identity Broker Federation

Custom Identity broker Federation

  • If the Organization doesn’t support SAML-compatible IdP, a Custom Identity Broker can be used to provide the access.
  • Custom Identity Broker should perform the following steps
    • Verify that the user is authenticated by the local identity system.
    • Call the AWS STS AssumeRole (recommended) or GetFederationToken (by default, has an expiration period of 36 hours) APIs to obtain temporary security credentials for the user.
    • Temporary credentials limit the permissions a user has to the AWS resource
    • Call an AWS federation endpoint and supply the temporary security credentials to get a sign-in token.
    • Construct a URL for the console that includes the token.
    • URL that the federation endpoint provides is valid for 15 minutes after it is created.
    • Give the URL to the user or invoke the URL on the user’s behalf.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A photo-sharing service stores pictures in Amazon Simple Storage Service (S3) and allows application sign-in using an OpenID Connect-compatible identity provider. Which AWS Security Token Service approach to temporary access should you use for the Amazon S3 operations?
    1. SAML-based Identity Federation
    2. Cross-Account Access
    3. AWS IAM users
    4. Web Identity Federation
  2. Which technique can be used to integrate AWS IAM (Identity and Access Management) with an on-premise LDAP (Lightweight Directory Access Protocol) directory service?
    1. Use an IAM policy that references the LDAP account identifiers and the AWS credentials.
    2. Use SAML (Security Assertion Markup Language) to enable single sign-on between AWS and LDAP
    3. Use AWS Security Token Service from an identity broker to issue short-lived AWS credentials. (Refer Link)
    4. Use IAM roles to automatically rotate the IAM credentials when LDAP credentials are updated.
    5. Use the LDAP credentials to restrict a group of users from launching specific EC2 instance types.
  3. You are designing a photo sharing mobile app the application will store all pictures in a single Amazon S3 bucket. Users will upload pictures from their mobile device directly to Amazon S3 and will be able to view and download their own pictures directly from Amazon S3. You want to configure security to handle potentially millions of users in the most secure manner possible. What should your server-side application do when a new user registers on the photo-sharing mobile application? [PROFESSIONAL]
    1. Create a set of long-term credentials using AWS Security Token Service with appropriate permissions Store these credentials in the mobile app and use them to access Amazon S3.
    2. Record the user’s Information in Amazon RDS and create a role in IAM with appropriate permissions. When the user uses their mobile app create temporary credentials using the AWS Security Token Service ‘AssumeRole’ function. Store these credentials in the mobile app’s memory and use them to access Amazon S3. Generate new credentials the next time the user runs the mobile app.
    3. Record the user’s Information in Amazon DynamoDB. When the user uses their mobile app create temporary credentials using AWS Security Token Service with appropriate permissions. Store these credentials in the mobile app’s memory and use them to access Amazon S3 Generate new credentials the next time the user runs the mobile app.
    4. Create IAM user. Assign appropriate permissions to the IAM user Generate an access key and secret key for the IAM user, store them in the mobile app and use these credentials to access Amazon S3.
    5. Create an IAM user. Update the bucket policy with appropriate permissions for the IAM user Generate an access Key and secret Key for the IAM user, store them In the mobile app and use these credentials to access Amazon S3.
  4. Your company has recently extended its datacenter into a VPC on AWS to add burst computing capacity as needed Members of your Network Operations Center need to be able to go to the AWS Management Console and administer Amazon EC2 instances as necessary. You don’t want to create new IAM users for each NOC member and make those users sign in again to the AWS Management Console. Which option below will meet the needs for your NOC members? [PROFESSIONAL]
    1. Use OAuth 2.0 to retrieve temporary AWS security credentials to enable your NOC members to sign in to the AWS Management Console.
    2. Use Web Identity Federation to retrieve AWS temporary security credentials to enable your NOC members to sign in to the AWS Management Console.
    3. Use your on-premises SAML 2.O-compliant identity provider (IDP) to grant the NOC members federated access to the AWS Management Console via the AWS single sign-on (SSO) endpoint.
    4. Use your on-premises SAML 2.0-compliant identity provider (IDP) to retrieve temporary security credentials to enable NOC members to sign in to the AWS Management Console
  5. A corporate web application is deployed within an Amazon Virtual Private Cloud (VPC) and is connected to the corporate data center via an iPsec VPN. The application must authenticate against the on-premises LDAP server. After authentication, each logged-in user can only access an Amazon Simple Storage Space (S3) keyspace specific to that user. Which two approaches can satisfy these objectives? (Choose 2 answers) [PROFESSIONAL]
    1. Develop an identity broker that authenticates against IAM security Token service to assume a IAM role in order to get temporary AWS security credentials. The application calls the identity broker to get AWS temporary security credentials with access to the appropriate S3 bucket. (Needs to authenticate against LDAP and not IAM)
    2. The application authenticates against LDAP and retrieves the name of an IAM role associated with the user. The application then calls the IAM Security Token Service to assume that IAM role. The application can use the temporary credentials to access the appropriate S3 bucket. (Authenticates with LDAP and calls the AssumeRole)
    3. Develop an identity broker that authenticates against LDAP and then calls IAM Security Token Service to get IAM federated user credentials The application calls the identity broker to get IAM federated user credentials with access to the appropriate S3 bucket. (Custom Identity broker implementation, with authentication with LDAP and using federated token)
    4. The application authenticates against LDAP the application then calls the AWS identity and Access Management (IAM) Security Token service to log in to IAM using the LDAP credentials the application can use the IAM temporary credentials to access the appropriate S3 bucket. (Can’t login to IAM using LDAP credentials)
    5. The application authenticates against IAM Security Token Service using the LDAP credentials the application uses those temporary AWS security credentials to access the appropriate S3 bucket. (Need to authenticate with LDAP)
  6. Company B is launching a new game app for mobile devices. Users will log into the game using their existing social media account to streamline data capture. Company B would like to directly save player data and scoring information from the mobile app to a DynamoDB table named Score Data When a user saves their game the progress data will be stored to the Game state S3 bucket. what is the best approach for storing data to DynamoDB and S3? [PROFESSIONAL]
    1. Use an EC2 Instance that is launched with an EC2 role providing access to the Score Data DynamoDB table and the GameState S3 bucket that communicates with the mobile app via web services.
    2. Use temporary security credentials that assume a role providing access to the Score Data DynamoDB table and the Game State S3 bucket using web identity federation
    3. Use Login with Amazon allowing users to sign in with an Amazon account providing the mobile app with access to the Score Data DynamoDB table and the Game State S3 bucket.
    4. Use an IAM user with access credentials assigned a role providing access to the Score Data DynamoDB table and the Game State S3 bucket for distribution with the mobile app.
  7. A user has created a mobile application which makes calls to DynamoDB to fetch certain data. The application is using the DynamoDB SDK and root account access/secret access key to connect to DynamoDB from mobile. Which of the below mentioned statements is true with respect to the best practice for security in this scenario?
    1. User should create a separate IAM user for each mobile application and provide DynamoDB access with it
    2. User should create an IAM role with DynamoDB and EC2 access. Attach the role with EC2 and route all calls from the mobile through EC2
    3. The application should use an IAM role with web identity federation which validates calls to DynamoDB with identity providers, such as Google, Amazon, and Facebook
    4. Create an IAM Role with DynamoDB access and attach it with the mobile application
  8. You are managing the AWS account of a big organization. The organization has more than 1000+ employees and they want to provide access to the various services to most of the employees. Which of the below mentioned options is the best possible solution in this case?
    1. The user should create a separate IAM user for each employee and provide access to them as per the policy
    2. The user should create an IAM role and attach STS with the role. The user should attach that role to the EC2 instance and setup AWS authentication on that server
    3. The user should create IAM groups as per the organization’s departments and add each user to the group for better access control
    4. Attach an IAM role with the organization’s authentication service to authorize each user for various AWS services
  9. Your fortune 500 company has under taken a TCO analysis evaluating the use of Amazon S3 versus acquiring more hardware The outcome was that all employees would be granted access to use Amazon S3 for storage of their personal documents. Which of the following will you need to consider so you can set up a solution that incorporates single sign-on from your corporate AD or LDAP directory and restricts access for each user to a designated user folder in a bucket? (Choose 3 Answers) [PROFESSIONAL]
    1. Setting up a federation proxy or identity provider
    2. Using AWS Security Token Service to generate temporary tokens
    3. Tagging each folder in the bucket
    4. Configuring IAM role
    5. Setting up a matching IAM user for every user in your corporate directory that needs access to a folder in the bucket
  10. An AWS customer is deploying a web application that is composed of a front-end running on Amazon EC2 and of confidential data that is stored on Amazon S3. The customer security policy that all access operations to this sensitive data must be authenticated and authorized by a centralized access management system that is operated by a separate security team. In addition, the web application team that owns and administers the EC2 web front-end instances is prohibited from having any ability to access the data that circumvents this centralized access management system. Which of the following configurations will support these requirements? [PROFESSIONAL]
    1. Encrypt the data on Amazon S3 using a CloudHSM that is operated by the separate security team. Configure the web application to integrate with the CloudHSM for decrypting approved data access operations for trusted end-users. (S3 doesn’t integrate directly with CloudHSM, also there is no centralized access management system control)
    2. Configure the web application to authenticate end-users against the centralized access management system. Have the web application provision trusted users STS tokens entitling the download of approved data directly from Amazon S3 (Controlled access and admins cannot access the data as it needs authentication)
    3. Have the separate security team create and IAM role that is entitled to access the data on Amazon S3. Have the web application team provision their instances with this role while denying their IAM users access to the data on Amazon S3 (Web team would have access to the data)
    4. Configure the web application to authenticate end-users against the centralized access management system using SAML. Have the end-users authenticate to IAM using their SAML token and download the approved data directly from S3. (not the way SAML auth works and not sure if the centralized access management system is SAML complaint)
  11. What is web identity federation?
    1. Use of an identity provider like Google or Facebook to become an AWS IAM User.
    2. Use of an identity provider like Google or Facebook to exchange for temporary AWS security credentials.
    3. Use of AWS IAM User tokens to log in as a Google or Facebook user.
    4. Use of AWS STS Tokens to log in as a Google or Facebook user.
  12. Games-R-Us is launching a new game app for mobile devices. Users will log into the game using their existing Facebook account and the game will record player data and scoring information directly to a DynamoDB table. What is the most secure approach for signing requests to the DynamoDB API?
    1. Create an IAM user with access credentials that are distributed with the mobile app to sign the requests
    2. Distribute the AWS root account access credentials with the mobile app to sign the requests
    3. Request temporary security credentials using web identity federation to sign the requests
    4. Establish cross account access between the mobile app and the DynamoDB table to sign the requests
  13. You are building a mobile app for consumers to post cat pictures online. You will be storing the images in AWS S3. You want to run the system very cheaply and simply. Which one of these options allows you to build a photo sharing application without needing to worry about scaling expensive uploads processes, authentication/authorization and so forth?
    1. Build the application out using AWS Cognito and web identity federation to allow users to log in using Facebook or Google Accounts. Once they are logged in, the secret token passed to that user is used to directly access resources on AWS, like AWS S3. (Amazon Cognito is a superset of the functionality provided by web identity federation. Refer link)
    2. Use JWT or SAML compliant systems to build authorization policies. Users log in with a username and password, and are given a token they can use indefinitely to make calls against the photo infrastructure.
    3. Use AWS API Gateway with a constantly rotating API Key to allow access from the client-side. Construct a custom build of the SDK and include S3 access in it.
    4. Create an AWS oAuth Service Domain ad grant public signup and access to the domain. During setup, add at least one major social media site as a trusted Identity Provider for users.
  14. The Marketing Director in your company asked you to create a mobile app that lets users post sightings of good deeds known as random acts of kindness in 80-character summaries. You decided to write the application in JavaScript so that it would run on the broadest range of phones, browsers, and tablets. Your application should provide access to Amazon DynamoDB to store the good deed summaries. Initial testing of a prototype shows that there aren’t large spikes in usage. Which option provides the most cost-effective and scalable architecture for this application? [PROFESSIONAL]
    1. Provide the JavaScript client with temporary credentials from the Security Token Service using a Token Vending Machine (TVM) on an EC2 instance to provide signed credentials mapped to an Amazon Identity and Access Management (IAM) user allowing DynamoDB puts and S3 gets. You serve your mobile application out of an S3 bucket enabled as a web site. Your client updates DynamoDB. (Single EC2 instance not a scalable architecture)
    2. Register the application with a Web Identity Provider like Amazon, Google, or Facebook, create an IAM role for that provider, and set up permissions for the IAM role to allow S3 gets and DynamoDB puts. You serve your mobile application out of an S3 bucket enabled as a web site. Your client updates DynamoDB. (Can work with JavaScript SDK, is scalable and cost effective)
    3. Provide the JavaScript client with temporary credentials from the Security Token Service using a Token Vending Machine (TVM) to provide signed credentials mapped to an IAM user allowing DynamoDB puts. You serve your mobile application out of Apache EC2 instances that are load-balanced and autoscaled. Your EC2 instances are configured with an IAM role that allows DynamoDB puts. Your server updates DynamoDB. (Is Scalable but Not cost effective)
    4. Register the JavaScript application with a Web Identity Provider like Amazon, Google, or Facebook, create an IAM role for that provider, and set up permissions for the IAM role to allow DynamoDB puts. You serve your mobile application out of Apache EC2 instances that are load-balanced and autoscaled. Your EC2 instances are configured with an IAM role that allows DynamoDB puts. Your server updates DynamoDB. (Is Scalable but Not cost effective)

References

AWS IAM User Guide – Id Role Providers

AWS Snow Family

AWS Snow Family

  • AWS Snow Family helps physically transport up to exabytes of data into and out of AWS.
  • AWS Snow Family helps customers that need to run operations in austere, non-data center environments, and in locations where there’s a lack of consistent network connectivity.
  • Snow Family devices are AWS owned & managed and integrate with AWS security, monitoring, storage management, and computing capabilities.
  • AWS Snow Family, comprised of AWS Snowcone, AWS Snowball, and AWS Snowmobile, offers a number of physical devices and capacity points, most with built-in computing capabilities.

AWS Snowcone

  • AWS Snowcone is portable, rugged, and secure that provides edge computing and data transfer devices.
  • Snowcone can be used to collect, process, and move data to AWS, either offline by shipping the device, or online with AWS DataSync.
  • AWS Snowcone stores data securely in edge locations, and can run edge computing workloads that use AWS IoT Greengrass or EC2 instances.
  • Snowcone devices are small and weigh 4.5 lbs. (2.1 kg), so you can carry one in a backpack or fit it in tight spaces for IoT, vehicular, or even drone use cases.

AWS Snowball

  • AWS Snowball is a data migration and edge computing device that comes in two device options:
    • Compute Optimized
      • Snowball Edge Compute Optimized devices provide 52 vCPUs, 42 terabytes of usable block or object storage, and an optional GPU for use cases such as advanced machine learning and full-motion video analysis in disconnected environments.
    • Storage Optimized.
      • Snowball Edge Storage Optimized devices provide 40 vCPUs of compute capacity coupled with 80 terabytes of usable block or S3-compatible object storage.
      • It is well-suited for local storage and large-scale data transfer.
  • Customers can use these two options for data collection, machine learning and processing, and storage in environments with intermittent connectivity (such as manufacturing, industrial, and transportation) or in extremely remote locations (such as military or maritime operations) before shipping it back to AWS.
  • Snowball devices may also be rack mounted and clustered together to build larger, temporary installations.

AWS Snowmobile

  • AWS Snowmobile moves up to 100 PB of data in a 45-foot long ruggedized shipping container and is ideal for multi-petabyte or Exabyte-scale digital media migrations and data center shutdowns.
  • A Snowmobile arrives at the customer site and appears as a network-attached data store for more secure, high-speed data transfer.
  • After data is transferred to Snowmobile, it is driven back to an AWS Region where the data is loaded into S3.
  • Snowmobile is tamper-resistant, waterproof, and temperature controlled with multiple layers of logical and physical security – including encryption, fire suppression, dedicated security personnel, GPS tracking, alarm monitoring, 24/7 video surveillance, and an escort security vehicle during transit.

AWS Snow Family Feature Comparision

AWS Snow Family Feature Comparision

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company wants to transfer petabyte-scale of data to AWS for their analytics, however are constrained on their internet connectivity? Which AWS service can help them transfer the data quickly?
    1. S3 enhanced uploader
    2. Snowmobile
    3. Snowball
    4. Direct Connect
  2. A company wants to transfer its video library data, which runs in exabytes, to AWS. Which AWS service can help the company transfer the data?
    1. Snowmobile
    2. Snowball
    3. S3 upload
    4. S3 enhanced uploader

References

AWS_Snow