AWS CloudFormation

AWS CloudFormation

  • AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provision and update them in an orderly and predictable fashion.
  • CloudFormation consists of
    • Template
      • is an architectural diagram and provides logical resources
      • a JSON or YAML-format, text-based file that describes all the AWS resources needed to deploy and run the application.
    • Stack
      • is the end result of that diagram and provisions physical resources mapped to the logical resources.
      • is the set of AWS resources that are created and managed as a single unit when CloudFormation instantiates a template.
  • CloudFormation template can be used to set up the resources consistently and repeatedly over and over across multiple regions.
  • Resources can be updated, deleted, and modified in a controlled and predictable way, in effect applying version control to the infrastructure as done for software code
  • AWS CloudFormation Template consists of elements:-
    • List of AWS resources and their configuration values
    • An optional template file format version number
    • An optional list of template parameters (input values supplied at stack creation time)
    • An optional list of output values like public IP address using the Fn:GetAtt function
    • An optional list of data tables used to lookup static configuration values for e.g., AMI names per AZ
  • CloudFormation supports Chef & Puppet Integration to deploy and configure right down the application layer
  • CloudFormation provides a set of application bootstrapping scripts that enable you to install packages, files, and services on the EC2 instances by simply describing them in the CloudFormation template
  • By default, automatic rollback on error feature is enabled, which will cause all the AWS resources that CloudFormation created successfully for a stack up to the point where an error occurred to be deleted.
  • In case of automatic rollback, charges would still be applied for the resources, the time they were up and running
  • CloudFormation provides a WaitCondition resource that acts as a barrier, blocking the creation of other resources until a completion signal is received from an external source e.g. application or management system
  • CloudFormation allows deletion policies to be defined for resources in the template for e.g. resources to be retained or snapshots can be created before deletion useful for preserving S3 buckets when the stack is deleted

AWS CloudFormation Concepts

AWS CloudFormation, you work with templates and stacks

Templates

  • act as blueprints for building AWS resources.
  • is a JSON or YAML formatted text file, saved with any extension, such as .json, .yaml, .template, or .txt.
  • have additional capabilities to build complex sets of resources and reuse those templates in multiple contexts for e.g. using input parameters to create generic and reusable templates
  • Name used for a resource within the template is a logical name but when CloudFormation creates the resource, it generates a physical name that is based on the combination of the logical name, the stack name, and a unique ID

Stacks

  • Stacks manage related resources as a single unit,
  • Collection of resources can be created, updated, and deleted by creating, updating, and deleting stacks.
  • All the resources in a stack are defined by the stack’s AWS CloudFormation template
  • CloudFormation makes underlying service calls to AWS to provision and configure the resources in the stack and can perform only actions that the users have permission to do.

Change Sets

  • Change Sets presents a summary or preview of the proposed changes that CloudFormation will make when a stack is updated.
  • Change Sets help check how the changes might impact running resources, especially critical resources, before implementing them.
  • CloudFormation makes the changes to the stack only when the change set is executed, allowing you to decide whether to proceed with the proposed changes or explore other changes by creating another change set.
  • Change sets don’t indicate whether AWS CloudFormation will successfully update a stack for e.g. if account limits are hit or the user does not have permission.

CloudFormation Change Sets

Custom Resources

  • Custom resources help write custom provisioning logic in templates that CloudFormation runs anytime the stacks are created, updated, or deleted.
  • Custom resources help include resources that aren’t available as AWS CloudFormation resource types and can still be managed in a single stack.
  • AWS recommends using CloudFormation Registry instead.

Nested Stacks

  • Nested stacks are stacks created as part of other stacks.
  • A nested stack can be created within another stack by using the AWS::CloudFormation::Stack resource.
  • Nested stacks can be used to define common, repeated patterns and components and create dedicated templates which then can be called from other stacks.
  • Root stack is the top-level stack to which all the nested stacks ultimately belong. Nested stacks can themselves contain other nested stacks, resulting in a hierarchy of stacks.
  • In addition, each nested stack has an immediate parent stack. For the first level of nested stacks, the root stack is also the parent stack.
  • Certain stack operations, such as stack updates, should be initiated from the root stack rather than performed directly on nested stacks themselves.

Drift Detection

  • Drift detection enables you to detect whether a stack’s actual configuration differs, or has drifted, from its expected configuration.
  • Drift detection help identify stack resources to which configuration changes have been made outside of CloudFormation management
  • Drift detection can detect drift on an entire stack or individual resources
  • Corrective action can be taken to make sure the stack resources are again in sync with the definitions in the stack template, such as updating the drifted resources directly so that they agree with their template definition
  • Resolving drift helps to ensure configuration consistency and successful stack operations.
  • CloudFormation detects drift on those AWS resources that support drift detection. Resources that don’t support drift detection are assigned a drift status of NOT_CHECKED.
  • Drift detection can be performed on stacks with the following statuses: CREATE_COMPLETEUPDATE_COMPLETEUPDATE_ROLLBACK_COMPLETE, and UPDATE_ROLLBACK_FAILED.
  • CloudFormation does not detect drift on any nested stacks that belong to that stack. Instead, you can initiate a drift detection operation directly on the nested stack.

CloudFormation Template Anatomy

  • Resources (required)
    • Specifies the stack resources and their properties, such as an EC2 instance or an S3 bucket that would be created.
    • Resources can be referred to in the Resources and Outputs sections
  • Parameters (optional)
    • Pass values to the template at runtime (during stack creation or update)
    • Parameters can be referred from the Resources and Outputs sections
    • Can be referred using Fn::Ref or !Ref
  • Mappings (optional)
    • A mapping of keys and associated values that used to specify conditional parameter values, similar to a lookup table.
    • Can be referred using Fn::FindInMap or !FindInMap
  • Outputs (optional)
    • Describes the values that are returned whenever you view your stack’s properties.
  • Format Version (optional)
    • AWS CloudFormation template version that the template conforms to.
  • Description (optional)
    • A text string that describes the template. This section must always follow the template format version section.
  • Metadata (optional)
    • Objects that provide additional information about the template.
  • Rules (optional)
    • Validates a parameter or a combination of parameters passed to a template during stack creation or stack update.
  • Conditions (optional)
    • Conditions control whether certain resources are created or whether certain resource properties are assigned a value during stack creation or update.
  • Transform (optional)
    • For serverless applications (also referred to as Lambda-based applications), specifies the version of the AWS Serverless Application Model (AWS SAM) to use.
    • When you specify a transform, you can use AWS SAM syntax to declare resources in the template. The model defines the syntax that you can use and how it’s processed.

CloudFormation Template Sample

CloudFormation Access Control

  • IAM
    • IAM can be applied with CloudFormation to access control for users whether they can view stack templates, create stacks, or delete stacks
    • IAM permissions need to be provided for the user to the AWS services and resources provisioned when the stack is created
    • Before a stack is created, AWS CloudFormation validates the template to check for IAM resources that it might create
  • Service Role
    • A service role is an AWS IAM role that allows AWS CloudFormation to make calls to resources in a stack on the user’s behalf
    • By default, AWS CloudFormation uses a temporary session that it generates from the user credentials for stack operations.
    • For a service role, AWS CloudFormation uses the role’s credentials.
    • When a service role is specified, AWS CloudFormation always uses that role for all operations that are performed on that stack.

Template Resource Attributes

  • CreationPolicy Attribute
    • is invoked during the associated resource creation.
    • can be associated with a resource to prevent its status from reaching create complete until CloudFormation receives a specified number of success signals or the timeout period is exceeded.
    • helps to wait on resource configuration actions before stack creation proceeds for e.g. software installation on an EC2 instance
  • DeletionPolicy Attribute
    • preserve or (in some cases) backup a resource when its stack is deleted
    • CloudFormation deletes the resource if a resource has no DeletionPolicy attribute, by default.
    • To keep a resource when its stack is deleted,
      • default, Delete where the resources would be deleted.
      • specify Retain for that resource, to prevent deletion.
      • specify Snapshot to create a snapshot before deleting the resource, if the snapshot capability is supported e.g. RDS, EC2 volume, etc.
  • DependsOn Attribute
    • helps determine dependency order and specify that the creation of a specific resource follows another.
    • the resource is created only after the creation of the resource specified in the DependsOn attribute.
  • Metadata Attribute
    • enables association of structured data with a resource
  • UpdatePolicy Attribute
    • Defines how AWS CloudFormation handles updates to the resources
    • For AWS::AutoScaling::AutoScalingGroup resources, CloudFormation invokes one of three update policies depending on the type of change or whether a scheduled action is associated with the Auto Scaling group.
      • The AutoScalingReplacingUpdate and AutoScalingRollingUpdate policies apply only when you do one or more of the following:
        • Change the Auto Scaling group’s AWS::AutoScaling::LaunchConfiguration
        • Change the Auto Scaling group’s VPCZoneIdentifier property
        • Change the Auto Scaling group’s LaunchTemplate property
        • Update an Auto Scaling group that contains instances that don’t match the current LaunchConfiguration.
      • The AutoScalingScheduledAction policy applies when you update a stack that includes an Auto Scaling group with an associated scheduled action.
    • For AWS::Lambda::Alias resources, CloudFormation performs a CodeDeploy deployment when the version changes on the alias.

CloudFormation Termination Protection

  • Termination protection helps prevent a stack from being accidentally deleted.
  • Termination protection on stacks is disabled by default.
  • Termination protection can be enabled on a stack creation
  • Termination protection can be set on a stack with any status except DELETE_IN_PROGRESS or DELETE_COMPLETE
  • Enabling or disabling termination protection on a stack sets it for any nested stacks belonging to that stack as well. You can’t enable or disable termination protection directly on a nested stack.
  • If a user attempts to directly delete a nested stack belonging to a stack that has termination protection enabled, the operation fails and the nested stack remains unchanged.
  • If a user performs a stack update that would delete the nested stack, AWS CloudFormation deletes the nested stack accordingly.

CloudFormation Stack Policy

  • Stack policy can prevent stack resources from being unintentionally updated or deleted during a stack update.
  • By default, all update actions are allowed on all resources and anyone with stack update permissions can update all of the resources in the stack.
  • During an update, some resources might require an interruption or be completely replaced, resulting in new physical IDs or completely new storage and hence need to be prevented.
  • A stack policy is a JSON document that defines the update actions that can be performed on designated resources.
  • After you set a stack policy, all of the resources in the stack are protected by default.
  • Updates on specific resources can be added using an explicit Allow statement for those resources in the stack policy.
  • Only one stack policy can be defined per stack, but multiple resources can be protected within a single policy.
  • A stack policy applies to all CloudFormation users who attempt to update the stack. You can’t associate different stack policies with different users
  • A stack policy applies only during stack updates. It doesn’t provide access controls like an IAM policy.

CloudFormation StackSets

  • CloudFormation StackSets extends the functionality of stacks by enabling you to create, update, or delete stacks across multiple accounts and Regions with a single operation.
  • Using an administrator account, an AWS CloudFormation template can be defined, managed, and used as the basis for provisioning stacks into selected target accounts across specified AWS Regions.

CloudFormation StackSets

CloudFormation Registry

  • CloudFormation registry helps manage extensions, both public and private, such as resources, modules, and hooks that are available for use in your AWS account.
  • CloudFormation registry offers several advantages over custom resources
    • Supports the modeling, provisioning, and managing of third-party application resources
    • Supports the CreateReadUpdateDelete, and List (CRUDL) operations
    • Supports drift detection on private and third-party resource types

CloudFormation Helper Scripts

Refer blog Post @ CloudFormation Helper Scripts

CloudFormation Best Practices

Refer blog Post @ CloudFormation Best Practices

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon CloudFormation provide?
    1. The ability to setup Autoscaling for Amazon EC2 instances.
    2. A templated resource creation for Amazon Web Services.
    3. A template to map network resources for Amazon Web Services
    4. None of these
  2. A user is planning to use AWS CloudFormation for his automatic deployment requirements. Which of the below mentioned components are required as a part of the template?
    1. Parameters
    2. Outputs
    3. Template version
    4. Resources
  3. In regard to AWS CloudFormation, what is a stack?
    1. Set of AWS templates that are created and managed as a template
    2. Set of AWS resources that are created and managed as a template
    3. Set of AWS resources that are created and managed as a single unit
    4. Set of AWS templates that are created and managed as a single unit
  4. A large enterprise wants to adopt CloudFormation to automate administrative tasks and implement the security principles of least privilege and separation of duties. They have identified the following roles with the corresponding tasks in the company: (i) network administrators: create, modify and delete VPCs, subnets, NACLs, routing tables, and security groups (ii) application operators: deploy complete application stacks (ELB, Auto -Scaling groups, RDS) whereas all resources must be deployed in the VPCs managed by the network administrators (iii) Both groups must maintain their own CloudFormation templates and should be able to create, update and delete only their own CloudFormation stacks. The company has followed your advice to create two IAM groups, one for applications and one for networks. Both IAM groups are attached to IAM policies that grant rights to perform the necessary task of each group as well as the creation, update and deletion of CloudFormation stacks. Given setup and requirements, which statements represent valid design considerations? Choose 2 answers [PROFESSIONAL]
    1. Network stack updates will fail upon attempts to delete a subnet with EC2 instances (Subnets cannot be deleted with instances in them)
    2. Unless resource level permissions are used on the CloudFormation: DeleteStack action, network administrators could tear down application stacks (Network administrators themselves need permission to delete resources within the application stack & CloudFormation makes calls to create, modify, and delete those resources on their behalf)
    3. The application stack cannot be deleted before all network stacks are deleted (Application stack can be deleted before network stack)
    4. Restricting the launch of EC2 instances into VPCs requires resource level permissions in the IAM policy of the application group (IAM permissions need to be given explicitly to launch instances )
    5. Nesting network stacks within application stacks simplifies management and debugging, but requires resource level permissions in the IAM policy of the network group (Although stacks can be nested, Network group will need to have all the application group permissions)
  5. Your team is excited about the use of AWS because now they have access to programmable infrastructure. You have been asked to manage your AWS infrastructure in a manner similar to the way you might manage application code. You want to be able to deploy exact copies of different versions of your infrastructure, stage changes into different environments, revert back to previous versions, and identify what versions are running at any particular time (development, test, QA, production). Which approach addresses this requirement?
    1. Use cost allocation reports and AWS Opsworks to deploy and manage your infrastructure.
    2. Use AWS CloudWatch metrics and alerts along with resource tagging to deploy and manage your infrastructure.
    3. Use AWS Beanstalk and a version control system like GIT to deploy and manage your infrastructure.
    4. Use AWS CloudFormation and a version control system like GIT to deploy and manage your infrastructure.
  6. A user is usingCloudFormation to launch an EC2 instance and then configure an application after the instance is launched. The user wants the stack creation of ELB and AutoScaling to wait until the EC2 instance is launched and configured properly. How can the user configure this?
    1. It is not possible that the stack creation will wait until one service is created and launched
    2. The user can use the HoldCondition resource to wait for the creation of the other dependent resources
    3. The user can use the DependentCondition resource to hold the creation of the other dependent resources
    4. The user can use the WaitCondition resource to hold the creation of the other dependent resources
  7. A user has created a CloudFormation stack. The stack creates AWS services, such as EC2 instances, ELB, AutoScaling, and RDS. While creating the stack it created EC2, ELB and AutoScaling but failed to create RDS. What will CloudFormation do in this scenario?
    1. CloudFormation can never throw an error after launching a few services since it verifies all the steps before launching
    2. It will warn the user about the error and ask the user to manually create RDS
    3. Rollback all the changes and terminate all the created services
    4. It will wait for the user’s input about the error and correct the mistake after the input
  8. A user is planning to use AWS CloudFormation. Which of the below mentioned functionalities does not help him to correctly understand CloudFormation?
    1. CloudFormation follows the DevOps model for the creation of Dev & Test
    2. AWS CloudFormation does not charge the user for its service but only charges for the AWS resources created with it
    3. CloudFormation works with a wide variety of AWS services, such as EC2, EBS, VPC, IAM, S3, RDS, ELB, etc
    4. CloudFormation provides a set of application bootstrapping scripts which enables the user to install Software
  9. A customer is using AWS for Dev and Test. The customer wants to setup the Dev environment with CloudFormation. Which of the below mentioned steps are not required while using CloudFormation?
    1. Create a stack
    2. Configure a service
    3. Create and upload the template
    4. Provide the parameters configured as part of the template
  10. A marketing research company has developed a tracking system that collects user behavior during web marketing campaigns on behalf of their customers all over the world. The tracking system consists of an auto-scaled group of Amazon Elastic Compute Cloud (EC2) instances behind an elastic load balancer (ELB), and the collected data is stored in Amazon DynamoDB. After the campaign is terminated, the tracking system is torn down and the data is moved to Amazon Redshift, where it is aggregated, analyzed and used to generate detailed reports. The company wants to be able to instantiate new tracking systems in any region without any manual intervention and therefore adopted AWS CloudFormation. What needs to be done to make sure that the AWS CloudFormation template works in every AWS region? Choose 2 answers [PROFESSIONAL]
    1. IAM users with the right to start AWS CloudFormation stacks must be defined for every target region. (IAM users are global)
    2. The names of the Amazon DynamoDB tables must be different in every target region. (DynamoDB names should be unique only within a region)
    3. Use the built-in function of AWS CloudFormation to set the AvailabilityZone attribute of the ELB resource.
    4. Avoid using DeletionPolicies for EBS snapshots. (Don’t want the data to be retained)
    5. Use the built-in Mappings and FindInMap functions of AWS CloudFormation to refer to the AMI ID set in the ImageId attribute of the Auto Scaling::LaunchConfiguration resource.
  11. A gaming company adopted AWS CloudFormation to automate load -testing of their games. They have created an AWS CloudFormation template for each gaming environment and one for the load -testing stack. The load – testing stack creates an Amazon Relational Database Service (RDS) Postgres database and two web servers running on Amazon Elastic Compute Cloud (EC2) that send HTTP requests, measure response times, and write the results into the database. A test run usually takes between 15 and 30 minutes. Once the tests are done, the AWS CloudFormation stacks are torn down immediately. The test results written to the Amazon RDS database must remain accessible for visualization and analysis. Select possible solutions that allow access to the test results after the AWS CloudFormation load -testing stack is deleted. Choose 2 answers. [PROFESSIONAL]
    1. Define a deletion policy of type Retain for the Amazon QDS resource to assure that the RDS database is not deleted with the AWS CloudFormation stack.
    2. Define a deletion policy of type Snapshot for the Amazon RDS resource to assure that the RDS database can be restored after the AWS CloudFormation stack is deleted.
    3. Define automated backups with a backup retention period of 30 days for the Amazon RDS database and perform point -in -time recovery of the database after the AWS CloudFormation stack is deleted. (as the environment is required for limited time the automated backup will not serve the purpose)
    4. Define an Amazon RDS Read-Replica in the load-testing AWS CloudFormation stack and define a dependency relation between master and replica via the DependsOn attribute. (read replica not needed and will be deleted when the stack is deleted)
    5. Define an update policy to prevent deletion of the Amazon RDS database after the AWS CloudFormation stack is deleted. (UpdatePolicy does not apply to RDS)
  12. When working with AWS CloudFormation Templates what is the maximum number of stacks that you can create?
    1. 5000
    2. 500
    3. 2000 (Refer link – The limit keeps on changing to check for the latest)
    4. 100
  13. What happens, by default, when one of the resources in a CloudFormation stack cannot be created?
    1. Previously created resources are kept but the stack creation terminates
    2. Previously created resources are deleted and the stack creation terminates
    3. Stack creation continues, and the final results indicate which steps failed
    4. CloudFormation templates are parsed in advance so stack creation is guaranteed to succeed.
  14. You need to deploy an AWS stack in a repeatable manner across multiple environments. You have selected CloudFormation as the right tool to accomplish this, but have found that there is a resource type you need to create and model, but is unsupported by CloudFormation. How should you overcome this challenge? [PROFESSIONAL]
    1. Use a CloudFormation Custom Resource Template by selecting an API call to proxy for create, update, and delete actions. CloudFormation will use the AWS SDK, CLI, or API method of your choosing as the state transition function for the resource type you are modeling.
    2. Submit a ticket to the AWS Forums. AWS extends CloudFormation Resource Types by releasing tooling to the AWS Labs organization on GitHub. Their response time is usually 1 day, and they complete requests within a week or two.
    3. Instead of depending on CloudFormation, use Chef, Puppet, or Ansible to author Heat templates, which are declarative stack resource definitions that operate over the OpenStack hypervisor and cloud environment.
    4. Create a CloudFormation Custom Resource Type by implementing create, update, and delete functionality, either by subscribing a Custom Resource Provider to an SNS topic, or by implementing the logic in AWS Lambda. (Refer link)
  15. What is a circular dependency in AWS CloudFormation?
    1. When a Template references an earlier version of itself.
    2. When Nested Stacks depend on each other.
    3. When Resources form a DependOn loop. (Refer link, to resolve a dependency error, add a DependsOn attribute to resources that depend on other resources in the template. Some cases for e.g. EIP and VPC with IGW where EIP depends on IGW need explicitly declaration for the resources to be created in correct order)
    4. When a Template references a region, which references the original Template.
  16. You need to run a very large batch data processing job one time per day. The source data exists entirely in S3, and the output of the processing job should also be written to S3 when finished. If you need to version control this processing job and all setup and teardown logic for the system, what approach should you use?
    1. Model an AWS EMR job in AWS Elastic Beanstalk. (cannot directly model EMR Clusters)
    2. Model an AWS EMR job in AWS CloudFormation. (EMR cluster can be modeled using CloudFormation. Refer link)
    3. Model an AWS EMR job in AWS OpsWorks. (cannot directly model EMR Clusters)
    4. Model an AWS EMR job in AWS CLI Composer. (does not exist)
  17. Your company needs to automate 3 layers of a large cloud deployment. You want to be able to track this deployment’s evolution as it changes over time, and carefully control any alterations. What is a good way to automate a stack to meet these requirements? [PROFESSIONAL]
    1. Use OpsWorks Stacks with three layers to model the layering in your stack.
    2. Use CloudFormation Nested Stack Templates, with three child stacks to represent the three logical layers of your cloud. (CloudFormation allows source controlled, declarative templates as the basis for stack automation and Nested Stacks help achieve clean separation of layers while simultaneously providing a method to control all layers at once when needed)
    3. Use AWS Config to declare a configuration set that AWS should roll out to your cloud.
    4. Use Elastic Beanstalk Linked Applications, passing the important DNS entries between layers using the metadata interface.
  18. You have been asked to de-risk deployments at your company. Specifically, the CEO is concerned about outages that occur because of accidental inconsistencies between Staging and Production, which sometimes cause unexpected behaviors in Production even when Staging tests pass. You already use Docker to get high consistency between Staging and Production for the application environment on your EC2 instances. How do you further de-risk the rest of the execution environment, since in AWS, there are many service components you may use beyond EC2 virtual machines? [PROFESSIONAL]
    1. Develop models of your entire cloud system in CloudFormation. Use this model in Staging and Production to achieve greater parity. (Only CloudFormation’s JSON Templates allow declarative version control of repeatedly deployable models of entire AWS clouds. Refer link)
    2. Use AWS Config to force the Staging and Production stacks to have configuration parity. Any differences will be detected for you so you are aware of risks.
    3. Use AMIs to ensure the whole machine, including the kernel of the virual machines, is consistent, since Docker uses Linux Container (LXC) technology, and we need to make sure the container environment is consistent.
    4. Use AWS ECS and Docker clustering. This will make sure that the AMIs and machine sizes are the same across both environments.
  19. Which code snippet below returns the URL of a load balanced web site created in CloudFormation with an AWS::ElasticLoadBalancing::LoadBalancer resource name “ElasticLoad Balancer”? [Developer]
    1. “Fn::Join” : [“”, [ “http://”, {“Fn::GetAtt” : [ “ElasticLoadBalancer”,”DNSName”]}]] (Refer link)
    2. “Fn::Join” : [“”,[ “http://”, {“Fn::GetAtt” : [ “ElasticLoadBalancer”,”Url”]}]]
    3. “Fn::Join” : [“”, [ “http://”, {“Ref” : “ElasticLoadBalancerUrl”}]]
    4. “Fn::Join” : [“”, [ “http://”, {“Ref” : “ElasticLoadBalancerDNSName”}]]
  20. For AWS CloudFormation, which stack state refuses UpdateStack calls? [Developer]
    1. <code>UPDATE_ROLLBACK_FAILED</code> (Refer link)
    2. <code>UPDATE_ROLLBACK_COMPLETE</code>
    3. <code>UPDATE_COMPLETE</code>
    4. <code>CREATE_COMPLETE</code>
  21. Which of these is not a Pseudo Parameter in AWS CloudFormation? [Developer]
    1. AWS::StackName
    2. AWS::AccountId
    3. AWS::StackArn (Refer link)
    4. AWS::NotificationARNs
  22. Which of these is not an intrinsic function in AWS CloudFormation? [Developer]
    1. Fn::SplitValue (Refer link)
    2. Fn::FindInMap
    3. Fn::Select
    4. Fn::GetAZs
  23. Which of these is not a CloudFormation Helper Script? [Developer]
    1. cfn-signal
    2. cfn-hup
    3. cfn-request (Refer link)
    4. cfn-get-metadata
  24. What method should I use to author automation if I want to wait for a CloudFormation stack to finish completing in a script? [Developer]
    1. Event subscription using SQS.
    2. Event subscription using SNS.
    3. Poll using <code>ListStacks</code> / <code>list-stacks</code>. (Only polling will make a script wait to complete. ListStacks / list-stacks is a real method. Refer link)
    4. Poll using <code>GetStackStatus</code> / <code>get-stack-status</code>. (GetStackStatus / get-stack-status does not exist)
  25. Which status represents a failure state in AWS CloudFormation? [Developer]
    1. <code>UPDATE_COMPLETE_CLEANUP_IN_PROGRESS</code> (UPDATE_COMPLETE_CLEANUP_IN_PROGRESS means an update was successful, and CloudFormation is deleting any replaced, no longer used resources)
    2. <code>DELETE_COMPLETE_WITH_ARTIFACTS</code> (DELETE_COMPLETE_WITH_ARTIFACTS does not exist)
    3. <code>ROLLBACK_IN_PROGRESS</code> (ROLLBACK_IN_PROGRESS means an UpdateStack operation failed and the stack is in the process of trying to return to the valid, pre-update state Refer link)
    4. <code>ROLLBACK_FAILED</code> (ROLLBACK_FAILED is not a CloudFormation state but UPDATE_ROLLBACK_FAILED is)
  26. Which of these is not an intrinsic function in AWS CloudFormation? [Developer]
    1. Fn::Equals
    2. Fn::If
    3. Fn::Not
    4. Fn::Parse (Complete list of Intrinsic Functions: Fn::Base64, Fn::And, Fn::Equals, Fn::If, Fn::Not, Fn::Or, Fn::FindInMap, Fn::GetAtt, Fn::GetAZs, Fn::Join, Fn::Select, Refer link)
  27. You need to create a Route53 record automatically in CloudFormation when not running in production during all launches of a Template. How should you implement this? [Developer]
    1. Use a <code>Parameter</code> for <code>environment</code>, and add a <code>Condition</code> on the Route53 <code>Resource</code> in the template to create the record only when <code>environment</code> is not <code>production</code>. (Best way to do this is with one template, and a Condition on the resource. Route53 does not allow null strings for Refer link)
    2. Create two templates, one with the Route53 record value and one with a null value for the record. Use the one without it when deploying to production.
    3. Use a <code>Parameter</code> for <code>environment</code>, and add a <code>Condition</code> on the Route53 <code>Resource</code> in the template to create the record with a null string when <code>environment</code> is <code>production</code>.
    4. Create two templates, one with the Route53 record and one without it. Use the one without it when deploying to production.

References

AWS Organizations Service Control Policies – SCPs

AWS Organizations Service Control Policies

  • AWS Organizations Service control policies – SCPs offer central control over the maximum available permissions for all of the accounts in the organization, ensuring member accounts stay within the organization’s access control guidelines.
  • are one type of policy that help manage the organization.
  • are available only in an organization that has all features enabled, and aren’t available if the organization has enabled only the consolidated billing features.
  • are NOT sufficient for granting access to the accounts in the organization.
  • defines a guardrail for what actions accounts within the organization root or OU can do, but IAM policies need to be attached to the users and roles in the organization’s accounts to grant permissions to them.
  • Effective permissions are the logical intersection between what is allowed by the SCP and what is allowed by the IAM and resource-based policies.
  • with an SCP attached to member accounts, identity-based and resource-based policies grant permissions to entities only if those policies and the SCP allow the action.
  • don’t affect users or roles in the management account. They affect only the member accounts in your organization.

SCPs Effects on Permissions

  • never grant permissions but define the maximum permissions for the affected accounts.
  • Users and roles must still be granted permissions with appropriate IAM permission policies. A user without any IAM permission policies has no access at all, even if the applicable SCPs allow all services and all actions.
  • limits permissions for entities in member accounts, including each AWS account root user.
  • does not limit actions performed by the master or management account.
  • does not affect any service-linked role. Service-linked roles enable other AWS services to integrate with AWS Organizations and can’t be restricted by SCPs.
  • affect only IAM users or roles that are managed by accounts that are part of the organization. They don’t affect users or roles from accounts outside the organization.
  • don’t affect resource-based policies directly.

SCPs Strategies

  • By default, an SCP named FullAWSAccess is attached to every root, OU, and account, which allows all actions and all services.
  • Blacklist or Deny Strategy
    • actions are allowed by default and services and actions to be prohibited need to be specified.
    • blacklist permissions using deny statements can be assigned in combination with the default FullAWSAccess SCP.
    • using deny statements in SCPs require less maintenance because they don’t need to be updated when AWS adds new services.
    • deny statements usually use less space, thus making it easier to stay within SCP size limits.
  • Whitelist or Allow Strategy
    • actions are prohibited by default, and you specify what services and actions are allowed.
    • whitelist permissions can be assigned, by removing the default FullAWSAccess SCP.
    • allows SCP that explicitly permits only those allowed services and actions

SCPs Testing Effects

  • don’t attach SCPs to the root of the organization without thoroughly testing the impact that the policy has on accounts.
  • Create an OU that the accounts can be moved into one at a time, or at least in small numbers, to ensure that users are not inadvertently locked out of key services.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company is planning on setting up multiple accounts in AWS. The IT Security department has a requirement to ensure that certain services and actions are not allowed across all accounts. How would the system admin achieve this in the most EFFECTIVE way possible?
    1. Create a common IAM policy that can be applied across all accounts
    2. Create an IAM policy per account and apply them accordingly​
    3. Deny the services to be used across accounts by contacting AWS​ support
    4. Use AWS Organizations and Service Control Policies
  2. You are in the process of implementing AWS Organizations for your company. At your previous company, you saw an Organizations implementation go bad when an SCP (Service Control Policy) was applied at the root of the organization before being thoroughly tested. In what way can an SCP be properly tested and implemented?
    1. Back up your entire Organization to S3 and restore rollback and restore if something goes wrong
    2. The SCP must be verified with AWS before it is implemented to avoid any problems.
    3. Mirror your Organizational Unit in another region. Apply the SCP and test it. Once testing is complete, attach the SCP to the root of your organization.
    4. Create an Organizational Unit (OU). Attach the SCP to this new OU. Move your accounts in one at a time to ensure that you don’t inadvertently lock users out of key services.

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

AWS Data Analytics - Specialty DAS-C01 Certificate

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

  • Recertified with the AWS Certified Data Analytics – Specialty (DAS-C01) which tends to cover a lot of big data topics focused on AWS services.
  • Data Analytics – Specialty (DAS-C01) has replaced the previous Big Data – Specialty (BDS-C01).

AWS Certified Data Analytics – Specialty (DAS-C01) exam basically validates

  • Define AWS data analytics services and understand how they integrate with each other.
  • Explain how AWS data analytics services fit in the data lifecycle of collection, storage, processing, and visualization.

Refer AWS Certified Data Analytics – Specialty Exam Guide for details

AWS Certified Data Analytics - Specialty DAS-C01 Domains

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Resources

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Summary

  • Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • DAS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question.
  • DAS-C01 exam includes two types of questions, multiple-choice and multiple-response.
  • DAS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
  • Specialty exams currently cost $ 300 + tax.
  • You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
  • As always, mark the questions for review and move on and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
  • AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Topics

  • AWS Certified Data Analytics – Specialty exam, as its name suggests, covers a lot of Big Data concepts right from data collection, ingestion, transfer, storage, pre and post-processing, analytics, and visualization with the added concepts for data security at each layer.

Analytics

  • Make sure you know and cover all the services in-depth, as 80% of the exam is focused on topics like Glue, Kinesis, and Redshift.
  • AWS Analytics Services Cheat Sheet
  • Glue
    • DAS-C01 covers Glue in great detail.
    • AWS Glue is a fully managed, ETL service that automates the time-consuming steps of data preparation for analytics.
    • supports server-side encryption for data at rest and SSL for data in motion.
    • Glue ETL engine to Extract, Transform, and Load data that can automatically generate Scala or Python code.
    • Glue Data Catalog is a central repository and persistent metadata store to store structural and operational metadata for all the data assets. It works with Apache Hive as its metastore.
    • Glue Crawlers scan various data stores to automatically infer schemas and partition structures to populate the Data Catalog with corresponding table definitions and statistics.
    • Glue Job Bookmark tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run.
    • Glue Streaming ETL enables performing ETL operations on streaming data using continuously-running jobs.
    • Glue provides flexible scheduler that handles dependency resolution, job monitoring, and retries.
    • Glue Studio offers a graphical interface for authoring AWS Glue jobs to process data allowing you to define the flow of the data sources, transformations, and targets in the visual interface and generating Apache Spark code on your behalf.
    • Glue Data Quality helps reduces manual data quality efforts by automatically measuring and monitoring the quality of data in data lakes and pipelines.
    • Glue DataBrew helps prepare, visualize, clean, and normalize data directly from the data lake, data warehouses, and databases, including S3, Redshift, Aurora, and RDS.
  • Redshift
    • Redshift is also covered in depth.
    • Cover Redshift Advanced topics
      • Redshift Distribution Style determines how data is distributed across compute nodes and helps minimize the impact of the redistribution step by locating the data where it needs to be before the query is executed.
      • Redshift Enhanced VPC routing forces all COPY and UNLOAD traffic between the cluster and the data repositories through the VPC.
      • Workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won’t get stuck in queues behind long-running queries.
      • Redshift Spectrum helps query and retrieve structured and semistructured data from files in S3 without having to load the data into Redshift tables.
      • Federated Query feature allows querying and analyzing data across operational databases, data warehouses, and data lakes.
      • Short query acceleration (SQA) prioritizes selected short-running queries ahead of longer-running queries.
      • Redshift Serverless is a serverless option of Redshift that makes it more efficient to run and scale analytics in seconds without the need to set up and manage data warehouse infrastructure.
    • Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
      • COPY command which allows parallelism, and performs better than multiple COPY commands
      • COPY command can use manifest files to load data
      • COPY command handles encrypted data
    • Redshift Resizing cluster options (elastic resize did not support node type changes before, but does now)
    • Redshift supports encryption at rest and in transit
    • Redshift supports encrypting an unencrypted cluster using KMS. However, you can’t enable hardware security module (HSM) encryption by modifying the cluster. Instead, create a new, HSM-encrypted cluster and migrate your data to the new cluster.
    • Know Redshift views to control access to data.
  • Elastic Map Reduce
    • Understand EMRFS
      • Use Consistent view to make sure S3 objects referred by different applications are in sync. Although, it is not needed now.
    • Know EMR Best Practices (hint: start with many small nodes instead of few large nodes)
    • Know EMR Encryption options
      • supports SSE-S3, SS3-KMS, CSE-KMS, and CSE-Custom encryption for EMRFS
      • supports LUKS encryption for local disks
      • supports TLS for data in transit encryption
      • supports EBS encryption
    • Hive metastore can be externally hosted using RDS, Aurora, and AWS Glue Data Catalog
    • Know also different technologies
      • Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources
      • Spark is a distributed processing framework and programming model that helps do machine learning, stream processing, or graph analytics using Amazon EMR clusters
      • Zeppelin/Jupyter as a notebook for interactive data exploration and provides open-source web application that can be used to create and share documents that contain live code, equations, visualizations, and narrative text
      • Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store
  • Kinesis
    • Understand Kinesis Data Streams and Kinesis Data Firehose in depth
    • Know Kinesis Data Streams vs Kinesis Firehose
      • Know Kinesis Data Streams is open-ended for both producer and consumer. It supports KCL and works with Spark.
      • Know Kinesis Firehose is open-ended for producers only. Data is stored in S3, Redshift, and OpenSearch.
      • Kinesis Firehose works in batches with minimum 60secs intervals and in near-real time.
      • Kinesis Firehose supports out-of-the-box transformation and  custom transformation using Lambda
    • Kinesis supports encryption at rest using server-side encryption
    • Kinesis Producer Library supports batching
    • Kinesis Data Analytics
      • helps transform and analyze streaming data in real time using Apache Flink.
      • supports anomaly detection using Random Cut Forest ML
      • supports reference data stored in S3.
  • OpenSearch
    • OpenSearch is a search service that supports indexing, full-text search, faceting, etc.
    • OpenSearch can be used for analysis and supports visualization using OpenSearch Dashboards which can be real-time.
    • OpenSearch Service Storage tiers support Hot, UltraWarm, and Cold and the data can be transitioned using Index State management.
  • QuickSight
    • Know Visual Types (hint: esp. word clouds, plotting line, bar, and story based visualizations)
    • Know Supported Data Sources
    • QuickSight provides IP addresses that need to be whitelisted for QuickSight to access the data store.
    • QuickSight provides direct integration with Microsoft AD
    • QuickSight supports Row level security using dataset rules to control access to data at row granularity based on permissions associated with the user interacting with the data.
    • QuickSight supports ML insights as well
    • QuickSight supports users defined via IAM or email signup.
  • Athena
    • is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats.
    • provides a simplified, flexible way to analyze data in an S3 data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python without loading the data.
    • integrates with QuickSight for visualizing the data or creating dashboards.
    • uses a managed Glue Data Catalog to store information and schemas about the databases and tables that you create for the data stored in S3
    • Workgroups can be used to separate users, teams, applications, or workloads, to set limits on the amount of data each query or the entire workgroup can process, and to track costs.
    • Athena best practices recommended partitioning the data, partition projection, and using the Columnar file format like ORC or Parquet as they support compression and are splittable.
  • Know Data Pipeline for data transfer

Security, Identity & Compliance

Management & Governance Tools

  • Understand AWS CloudWatch for Logs and Metrics.
  • CloudWatch Subscription Filters can be used to route data to Kinesis Data Streams, Kinesis Data Firehose, and Lambda.

Whitepapers and articles

On the Exam Day

  • Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
  • If you are taking the AWS Online exam
    • Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
    • The online verification process does take some time and usually, there are glitches.
    • Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
    • Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Directory Services

AWS Directory Services

  • AWS Directory Services is a managed service offering, providing directories that contain information about the organization, including users, groups, computers, and other resources.
  • AWS Directory Services provides multiple ways including
    • Simple AD – a standalone directory service
    • AD Connector – acts as a proxy to use On-Premise Microsoft Active Directory with other AWS services.
    • AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also referred to as Microsoft AD

Simple AD

  • is a Microsoft Active Directory compatible directory from AWS Directory Service that is powered by Samba 4.
  • is the least expensive option and the best choice if there are 5,000 or fewer users & don’t need the more advanced Microsoft Active Directory features.
  • supports commonly used Active Directory features such as user accounts, group memberships, domain-joining EC2 instances running Linux and Windows, Kerberos-based single sign-on (SSO), and group policies.
  • does not support features like DNS dynamic update, schema extensions, multi-factor authentication, communication over LDAPS, PowerShell AD cmdlets, and the transfer of FSMO roles
  • provides daily automated snapshots to enable point-in-time recovery
  • Trust relationships between Simple AD and other Active Directory domains cannot be set up.
  • does not support MFA, RDS SQL Server, or AWS SSO.

AD Connector

  • helps connect to an existing on-premises Active Directory to AWS
  • is the best choice to leverage an existing on-premises directory with AWS services
  • requires VPN or Direct Connect connection
  • is a proxy service for connecting on-premises Microsoft Active Directory to AWS without requiring complex directory synchronization technologies or the cost and complexity of hosting a federation infrastructure
  • forwards sign-in requests to the Active Directory domain controllers for authentication and provides the ability for applications to query the directory for data
  • enables consistent enforcement of existing security policies, such as password expiration, password history, and account lockouts, whether users are accessing resources on-premises or in the AWS cloud

Microsoft Active Directory (Enterprise Edition)

  • is a feature-rich managed Microsoft Active Directory hosted on AWS
  • is the best choice if there are more than 5,000 users
  • supports trust relationship (forest trust) set up between an AWS-hosted directory and on-premises directories providing users and groups with access to resources in either domain, using single sign-on (SSO) without the need to synchronize or replicate the users, groups, or passwords.
  • requires a VPN or Direct Connect connection.
  • provides much of the functionality offered by Microsoft Active Directory plus integration with AWS applications.
  • provides a highly available pair of domain controllers running in different AZs connected to the VPC in a Region of your choice.
  • supports MFA by integrating with an existing RADIUS-based MFA infrastructure to provide an additional layer of security when users access AWS applications.
  • automatically configures and manages host monitoring and recovery, data replication, snapshots, and software updates.
  • supports RDS for SQL Server, AWS Workspaces, Quicksight, WorkDocs, etc.

AWS Directory Services - Microsoft AD Use Cases

Microsoft AD Connectivity Options

  • If the VGW is used to connect to the On-Premise AD is not stable or has connectivity issues, the following options can be explored
    • Simple AD
      • lower cost, low scale, basic AD compatible, or LDAP compatibility
      • provides a standalone instance for the Microsoft AD in AWS
      • No single point of Authentication or Authorization, as a separate copy is maintained
      • trust relationships cannot be set up between Simple AD and other Active Directory domains
    • Read-only Domain Controllers (RODCs)
      • works out as a Read-only Active Directory
      • holds a copy of the Active Directory Domain Service (AD DS) database and responds to authentication requests.
      • are typically deployed in locations where physical security cannot be guaranteed.
      • they cannot be written to by applications or other servers.
      • helps maintain a single point to authentication & authorization controls, however, needs to be synced.
    • Writable Domain Controllers
      • are expensive to setup
      • operate in a multi-master model; changes can be made on any writable server in the forest, and those changes are replicated to servers throughout the entire forest

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. The majority of your Infrastructure is on-premises and you have a small footprint on AWS. Your company has decided to roll out a new application that is heavily dependent on low latency connectivity to LDAP for authentication. Your security policy requires minimal changes to the company’s existing application user management processes. What option would you implement to successfully launch this application?
    1. Create a second, independent LDAP server in AWS for your application to use for authentication (independent would not work for authentication as its a separate copy)
    2. Establish a VPN connection so your applications can authenticate against your existing on-premises LDAP servers (not a low latency solution)
    3. Establish a VPN connection between your data center and AWS create an LDAP replica on AWS and configure your application to use the LDAP replica for authentication (RODCs low latency and minimal setup)
    4. Create a second LDAP domain on AWS establish a VPN connection to establish a trust relationship between your new and existing domains and use the new domain for authentication (Not minimal effort)
  2. A company is preparing to give AWS Management Console access to developers Company policy mandates identity federation and role-based access control. Roles are currently assigned using groups in the corporate Active Directory. What combination of the following will give developers access to the AWS console? (Select 2) Choose 2 answers
    1. AWS Directory Service AD Connector (for Corporate Active directory)
    2. AWS Directory Service Simple AD
    3. AWS Identity and Access Management groups
    4. AWS Identity and Access Management roles
    5. AWS Identity and Access Management users
  3. An Enterprise customer is starting their migration to the cloud, their main reason for migrating is agility, and they want to make their internal Microsoft Active Directory available to any applications running on AWS; this is so internal users only have to remember one set of credentials and as a central point of user control for leavers and joiners. How could they make their Active Directory secure, and highly available, with minimal on-premises infrastructure changes, in the most cost and time-efficient way? Choose the most appropriate
    1. Using Amazon Elastic Compute Cloud (EC2), they would create a DMZ using a security group; within the security group they could provision two smaller Amazon EC2 instances that are running Openswan for resilient IPSEC tunnels, and two larger instances that are domain controllers; they would use multiple Availability Zones (Whats Openswan? Refer Implementation)
    2. Using VPC, they could create an extension to their data center and make use of resilient hardware IPSEC tunnels; they could then have two domain controller instances that are joined to their existing domain and reside within different subnets, in different Availability Zones (highly available with 2 AZ’s, secure with VPN connection and minimal changes)
    3. Within the customer’s existing infrastructure, they could provision new hardware to run Active Directory Federation Services; this would present Active Directory as a SAML2 endpoint on the internet; any new application on AWS could be written to authenticate using SAML2 (not minimal on-premises hardware changes)
    4. The customer could create a stand-alone VPC with its own Active Directory Domain Controllers; two domain controller instances could be configured, one in each Availability Zone; new applications would authenticate with those domain controllers (not a central location, but a copy)
  4. A company needs to deploy virtual desktops to its customers in a virtual private cloud, leveraging existing security controls. Which set of AWS services and features will meet the company’s requirements?
    1. Virtual Private Network connection. AWS Directory Services, and ClassicLink (ClassicLink allows you to link an EC2-Classic instance to a VPC in your account, within the same region)
    2. Virtual Private Network connection. AWS Directory Services, and Amazon Workspaces (WorkSpaces for Virtual desktops, and AWS Directory Services to authenticate to an existing on-premises AD through VPN)
    3. AWS Directory Service, Amazon Workspaces, and AWS Identity and Access Management (AD service needs a VPN connection to interact with an On-premise AD directory)
    4. Amazon Elastic Compute Cloud, and AWS Identity and Access Management (Need WorkSpaces for virtual desktops)
  5. An Enterprise customer is starting their migration to the cloud, their main reason for migrating is agility and they want to make their internal Microsoft active directory available to any applications running on AWS, this is so internal users only have to remember one set of credentials and as a central point of user control for leavers and joiners. How could they make their active directory secure and highly available with minimal on-premises infrastructure changes in the most cost and time-efficient way? Choose the most appropriate:
    1. Using Amazon EC2, they could create a DMZ using a security group, within the security group they could provision two smaller Amazon EC2 instances that are running Openswan for resilient IPSEC tunnels and two larger instances that are domain controllers, they would use multiple availability zones.
    2. Using VPC, they could create an extension to their data center and make use of resilient hardware IPSEC tunnels, they could then have two domain controller instances that are joined to their existing domain and reside within different subnets in different availability zones.
    3. Within the customer’s existing infrastructure, they could provision new hardware to run active directory federation services, this would present active directory as a SAML2 endpoint on the internet and any new application on AWS could be written to authenticate using SAML2 (not a  minimal change to the existing infrastructure)
    4. The customer could create a stand alone VPC with its own active directory domain controllers, two domain controller instances could be configured, one in each availability zone, new applications would authenticate with those domain controllers. (Standalone cannot use the same security)
  6. You run a 2000-engineer organization. You are about to begin using AWS at a large scale for the first time. You want to integrate with your existing identity management system running on Microsoft Active Directory because your organization is a power-user of Active Directory. How should you manage your AWS identities in the simplest manner?
    1. Use a large AWS Directory Service Simple AD.
    2. Use a large AWS Directory Service AD Connector. (AD Connector can be used as power-user of Microsoft Active Directory. Simple AD only works with a subset of AD functionality)
    3. Use a Sync Domain running on AWS Directory Service.
    4. Use an AWS Directory Sync Domain running on AWS Lambda.

References

AWS Security Hub

AWS Security Hub

  • AWS Security Hub is a cloud security posture management service that performs security best practice checks, aggregates alerts, and enables automated remediation.
  • collects security data from across AWS accounts, services, and supported third-party partner products and helps analyze the security trends and identify the highest priority security issues.
  • is Regional and only receives and processes findings from the Region where the Security Hub is enabled. However, it supports cross-region aggregation of findings via the designation of an aggregator region.
  • must be enabled in each region to view findings in that region.
  • automatically runs continuous, account-level configuration and security checks based on AWS best practices and industry standards which include
    • CIS AWS Foundations
    • Payment Card Industry Data Security Standard (PCI DSS)
    • AWS Foundational Security Best Practices
  • can consume, aggregate, organize, and prioritize findings from
  • consolidates the security findings across accounts and provider products and displays results on the Security Hub console.
  • supports integration with Amazon EventBridge. Custom actions can be defined when a finding is received.
  • only detects and consolidates findings that are generated after the Security Hub is enabled.
  • has multi-account management through AWS Organizations integration, which allows delegating an administrator account for the organization.
  • uses service-linked AWS Config rules to perform most of its security checks for controls. AWS Config must be enabled on all accounts – both the administrator account and member accounts – in each Region where Security Hub is enabled.
  • works with a service-linked role named AWSServiceRoleForSecurityHub which includes the permissions and trust policy to do the following:
    • Detect and aggregate findings from Amazon GuardDuty, Amazon Inspector, and Amazon Macie
    • Configure the requisite AWS Config infrastructure to run security checks for the supported standards

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A security engineer has been asked to continuously monitor the company’s AWS account using automated compliance checks based on AWS best practices and Center for Internet Security (CIS) AWS Foundations Benchmarks. How can the security engineer accomplish this using AWS services?
    1. AWS Config + AWS Security Hub
    2. Amazon Inspector + AWS GuardDuty
    3. Amazon Inspector + AWS Shield
    4. AWS Config + Amazon Inspector

References

AWS_Security_Hub

Amazon GuardDuty

Amazon GuardDuty

Amazon GuardDuty

  • Amazon GuardDuty is a threat detection service that continuously monitors the AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
  • is a continuous security monitoring service that analyzes and processes the following data sources:
  • uses threat intelligence feeds, such as lists of malicious IP addresses and domains, and machine learning to identify unexpected and potentially unauthorized and malicious activity within the AWS environment.
  • combines machine learning, anomaly detection, network monitoring, and malicious file discovery, utilizing both AWS-developed and industry-leading third-party sources to help protect workloads and data on AWS
  • is a Regional service and is recommended to be enabled in all supported AWS Regions. This helps generate findings of unauthorized or unusual activity even in Regions not actively used.
  • does not look at historical data, it monitors only the activity that starts after it is enabled.
  • operates completely independent of the AWS resources and therefore has no impact on the performance or availability of the accounts or workloads.
  • GuardDuty supports
    • Suppression rules, allow the creation of very specific combinations of attributes to suppress findings.
    • Trusted IP List for highly secure communication with the AWS environment. Findings are not generated based on trusted IP lists.
    • Threat List for known malicious IP addresses. Findings are generated based on threat lists.
  • Security findings are retained and made available through the GuardDuty console and APIs for 90 days, after which they are discarded.
  • Findings are assigned a severity, and actions can be automated by integrating with Security Hub, EventBridge, Lambda, and Step Functions.
  • Amazon Detective is also tightly integrated with GuardDuty which helps perform deeper forensic and root cause investigations.
  • GuardDuty Malware Protection feature helps to detect malicious files on EBS volumes attached to an EC2 instance and container workloads.

Amazon GuardDuty

GuardDuty with Multiple Accounts

  • GuardDuty has multi-account management through AWS Organizations integration, which allows delegating an administrator account for the organization.
  • The delegated administrator (DA) account is a centralized account that consolidates all findings and can configure all member accounts.
  • The administrator account helps to associate and manage multiple AWS accounts.
  • All security findings are aggregated to the administrator account for review and remediation.
  • CloudWatch Events are also aggregated to the administrator account when using this configuration.

GuardDuty Automated Remediation

  • GuardDuty security findings can be remediated automatically using EventBridge and AWS Lambda
  • For example, a Lambda function can be created to modify the AWS security group rules based on security findings. For a GuardDuty finding indicating one of your EC2 instances is being probed by a known malicious IP, the address can be added through an EventBridge rule, initiating a Lambda function to automatically modify the security group rules and restrict access on that port.

GuardDuty Malware Protection

  • GuardDuty Malware Protection helps scan EBS volume data for possible malware and identifies suspicious behavior indicative of malicious software in EC2 instances or container workloads.
  • is optimized to consume large data volumes for near real-time processing of security detections.
  • scans a replica EBS volume that GuardDuty generates based on the snapshot of the EBS volume for trojans, worms, crypto miners, rootkits, bots, and more.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which AWS service makes detecting and reporting unexpected and potentially malicious activity in your AWS environment easy?
    1. AWS Shield
    2. AWS Inspector
    3. AWS GuardDuty
    4. AWS WAF

References

Amazon_GuardDuty

AWS Redshift Advanced

AWS Redshift Advanced

  • Redshift Distribution Style determines how data is distributed across compute nodes and helps minimize the impact of the redistribution step by locating the data where it needs to be before the query is executed.
  • Redshift enhanced VPC routing forces all COPY and UNLOAD traffic between the cluster and the data repositories through the VPC.
  • Redshift workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won’t get stuck in queues behind long-running queries.
  • Redshift Spectrum helps query and retrieve structured and semistructured data from files in S3 without having to load the data into Redshift tables.
  • Redshift Federated Query feature allows querying and analyzing data across operational databases, data warehouses, and data lakes.

Distribution Styles

  • Table distribution style determines how data is distributed across compute nodes and helps minimize the impact of the redistribution step by locating the data where it needs to be before the query is executed.
  • Redshift supports four distribution styles; AUTO, EVEN, KEY, or ALL.

KEY distribution

  • A single column acts as a distribution key (DISTKEY) and helps place matching values on the same node slice.
  • As a rule of thumb, choose a column that:
    • Is uniformly distributed – Otherwise skew data will cause unbalances in the volume of data that will be stored in each compute node leading to undesired situations where some slices will process bigger amounts of data than others and causing bottlenecks.
    • acts as a JOIN column – for tables related to dimensions tables (star-schema), it is better to choose as DISTKEY the field that acts as the JOIN field with the larger dimension table, so that matching values from the common columns are physically stored together, reducing the amount of data that needs to be broadcasted through the network.

EVEN distribution

  • distributes the rows across the slices in a round-robin fashion, regardless of the values in any particular column
  • Choose EVEN distribution
    • when the table does not participate in joins
    • when there is not a clear choice between KEY and ALL distribution.

ALL distribution

  • Whole table is replicated in every compute node.
  • ensures that every row is collocated for every join that the table participates in.
  • ideal for relatively slow-moving tables, tables that are not updated frequently or extensively.
  • Small dimension tables DO NOT benefit significantly from ALL distribution, because the cost of redistribution is low.

AUTO distribution

  • Redshift assigns an optimal distribution style based on the size of the table data for e.g. apply ALL distribution for a small table and as it grows changes it to Even distribution
  • Amazon Redshift applies AUTO distribution, by default.

Sort Key

  • Sort keys define the order in which the data will be stored.
  • Sorting enables efficient handling of range-restricted predicates.
  • Only one sort key per table can be defined, but it can be composed of one or more columns.
  • Redshift stores columnar data in 1 MB disk blocks. The min and max values for each block are stored as part of the metadata. If the query uses a range-restricted predicate, the query processor can use the min and max values to rapidly skip over large numbers of blocks during table scans
  • The are two kinds of sort keys in Redshift: Compound and Interleaved.

Compound Keys

  • A compound key is made up of all of the columns listed in the sort key definition, in the order, they are listed.
  • A compound sort key is more efficient when query predicates use a prefix, or query’s filter applies conditions, such as filters and joins, which is a subset of the sort key columns in order.
  • Compound sort keys might speed up joins, GROUP BY and ORDER BY operations, and window functions that use PARTITION BY and ORDER BY.

Interleaved Sort Keys

  • An interleaved sort key gives equal weight to each column in the sort key, so query predicates can use any subset of the columns that make up the sort key, in any order.
  • An interleaved sort key is more efficient when multiple queries use different columns for filters.
  • Don’t use an interleaved sort key on columns with monotonically increasing attributes, such as identity columns, dates, or timestamps.
  • Use cases involve performing ad-hoc multi-dimensional analytics, which often requires pivoting, filtering, and grouping data using different columns as query dimensions.

Constraints

  • Redshift does not support Indexes.
  • Redshift supports UNIQUE, PRIMARY KEY, and FOREIGN KEY constraints, however, they are only for informational purposes.
  • Redshift does not perform integrity checks for these constraints and is used by the query planner, as hints, in order to optimize executions.
  • Redshift does enforce NOT NULL column constraints.

Redshift Enhanced VPC Routing

  • Redshift enhanced VPC routing forces all COPY and UNLOAD traffic between the cluster and the data repositories through the VPC.
  • Without enhanced VPC routing, Redshift would route traffic through the internet, including traffic to other services within the AWS network.

Redshift Workload Management

  • Redshift workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won’t get stuck in queues behind long-running queries.
  • Redshift provides query queues, in order to manage concurrency and resource planning. Each queue can be configured with the following parameters:
    • Slots: number of concurrent queries that can be executed in this queue.
    • Working memory: percentage of memory assigned to this queue.
    • Max. Execution Time: the amount of time a query is allowed to run before it is terminated.
  • Queries can be routed to different queues using Query Groups and User Groups.
  • As a rule of thumb, it is considered a best practice to have separate queues for long running resource-intensive queries and fast queries that don’t require big amounts of memory and CPU.
  • By default, Redshift configures one queue with a concurrency level of five, which enables up to five queries to run concurrently, plus one predefined Superuser queue, with a concurrency level of one.
  • A maximum of eight queues can be defined, with each queue configured with a maximum concurrency level of 50. The maximum total concurrency level for all user-defined queues (not including the Superuser queue) is 50.
  • Redshift WLM supports two modes – Manual and Automatic
    • Automatic WLM supports queue priorities.

Redshift Short Query Acceleration – SQA

  • Short query acceleration (SQA) prioritizes selected short-running queries ahead of longer-running queries.
  • SQA runs short-running queries in a dedicated space, so that SQA queries aren’t forced to wait in queues behind longer queries.
  • SQA only prioritizes queries that are short-running and are in a user-defined queue.

Redshift Loading Data

  • A COPY command is the most efficient way to load a table.
    • COPY command is able to read from multiple data files or multiple data streams simultaneously.
    • Redshift allocates the workload to the cluster nodes and performs the load operations in parallel, including sorting the rows and distributing data across node slices.
    • COPY command supports loading data from S3, EMR, DynamoDB, and remote hosts such as EC2 instances using SSH.
    • COPY supports decryption and can decrypt the data as it performs the load if the data is encrypted
    • COPY can then speed up the load process by uncompressing the files as they are read if the data is compressed.
    • COPY command can be used with COMPUPDATE set to ON to analyze and apply compression automatically based on sample data.
    • Optimizing storage for narrow tables (multiple rows few columns) by using Single COPY command instead of multiple COPY commands, as it would not work well due to hidden fields and compression issues.
  • Auto Copy
    • Auto-copy provides the ability to automate copy statements by tracking S3 folders and ingesting new files without customer intervention
    • Without Auto-copy, a copy statement immediately starts the file ingestion process for existing files.
    • Auto-copy extends the existing copy command and provides the ability to
      • Automate file ingestion process by monitoring specified S3 paths for new files
      • Re-use copy configurations, reducing the need to create and run new copy statements for repetitive ingestion tasks and
      • Keep track of loaded files to avoid data duplication.
  • INSERT command
    • Clients can connect to Amazon Redshift using ODBC or JDBC and issue ‘insert’ SQL commands to insert the data.
    • INSERT command is much less efficient than using COPY as they are routed through the single leader node.

Redshift Resizing Cluster

  • Elastic resize
    • Use elastic resize to change the node type, number of nodes, or both. (Circa April 2020 – Changing node type is available recently and was not supported before)
    • If only the number of nodes is changed, then queries are temporarily paused and connections are held open if possible.
    • During the resize operation, the cluster is read-only.
    • Elastic resize takes 10–15 minutes.
  • Classic resize
    • Use classic resize to change the node type, number of nodes, or both.
    • During the resize operation, data is copied to a new cluster and the source cluster is read-only
    • Classic resize takes 2 hours – 2 days or longer, depending on the data’s size
  • Snapshot and restore with classic resize
    • To keep the cluster available during a classic resize, create a snapshot , make a copy of an existing cluster, then resize the new cluster.

Redshift Spectrum

  • Redshift Spectrum helps query and retrieve structured and semistructured data from files in S3 without having to load the data into Redshift tables.
  • Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets. Much of the processing occurs in the Redshift Spectrum layer, and most of the data remains in S3.
  • Multiple clusters can concurrently query the same dataset in S3 without the need to make copies of the data for each cluster.
  • Redshift Spectrum resides on dedicated Redshift servers that are independent of the existing cluster.
  • Redshift Spectrum pushes many compute-intensive tasks, such as predicate filtering and aggregation, down to the Redshift Spectrum layer.
  • Redshift Spectrum also scales automatically, based on the demands of the queries, and can potentially use thousands of instances to take advantage of massively parallel processing.
  • Supports external data catalog using Glue, Athena, or Hive metastore
  • Redshift cluster and the S3 bucket must be in the same AWS Region.
  • Redshift Spectrum external tables are read-only. You can’t COPY or INSERT to an external table.

Redshift Federated Query

  • Redshift Federated Query feature allows querying and analyzing data across operational databases, warehouses, and lakes.
  • Redshift Federated Query allows integrating queries on live data in RDS for PostgreSQL and Aurora PostgreSQL with queries across Redshift and S3.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A Redshift data warehouse has different user teams that need to query the same table with very different query types. These user teams are experiencing poor performance. Which action improves performance for the user teams in this situation?
    1. Create custom table views.
    2. Add interleaved sort keys per team.
    3. Maintain team-specific copies of the table.
    4. Add support for workload management queue hopping.

Amazon Athena

Athena

Amazon Athena

  • Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats.
  • provides a simplified, flexible way to analyze petabytes of data in an S3 data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python without loading the data.
  • is built on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.
  • is highly available and runs queries using compute resources across multiple facilities, automatically routing queries appropriately if a particular facility is unreachable
  • can process unstructured, semi-structured, and structured datasets.
  • integrates with QuickSight for visualizing the data or creating dashboards.
  • supports various standard data formats, including CSV, TSV, JSON, ORC, Avro, and Parquet.
  • supports compressed data in Snappy, Zlib, LZO, and GZIP formats. You can improve performance and reduce costs by compressing, partitioning, and using columnar formats.
  • can handle complex analysis, including large joins, window functions, and arrays
  • uses a managed Glue Data Catalog to store information and schemas about the databases and tables that you create for the data stored in S3
  • uses schema-on-read technology, which means that the table definitions are applied to the data in S3 when queries are being applied. There’s no data loading or transformation required. Table definitions and schema can be deleted without impacting the underlying data stored in S3.
  • supports fine-grained access control with AWS Lake Formation which allows for centrally managing permissions and access control for data catalog resources in the S3 data lake.

Athena
Source: Amazon

Athena Workgroups

  • Athena workgroups can be used to separate users, teams, applications, or workloads, to set limits on amount of data each query or the entire workgroup can process, and to track costs.
  • Resource-level identity-based policies can be used to control access to a specific workgroup.
  • Workgroups help view query-related metrics in CloudWatch, control costs by configuring limits on the amount of data scanned, create thresholds, and trigger actions, such as SNS, when these thresholds are breached.
  • Workgroups integrate with IAM, CloudWatch, Simple Notification Service, and AWS Cost and Usage Reports as follows:
    • IAM identity-based policies with resource-level permissions control who can run queries in a workgroup.
    • Athena publishes the workgroup query metrics to CloudWatch if you enable query metrics.
    • SNS topics can be created that issue alarms to specified workgroup users when data usage controls for queries in a workgroup exceed the established thresholds.
    • Workgroup tag can be configured as a cost allocation tag in the Billing and Cost Management console and the costs associated with running queries in that workgroup appear in the Cost and Usage Reports with that cost allocation tag.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A SysOps administrator is storing access logs in Amazon S3 and wants to use standard SQL to query data and generate a report
    without having to manage infrastructure. Which AWS service will allow the SysOps administrator to accomplish this task?

    1. Amazon Inspector
    2. Amazon CloudWatch
    3. Amazon Athena
    4. Amazon RDS
  2. A Solutions Architect must design a storage solution for incoming billing reports in CSV format. The data does not need to be
    scanned frequently and is discarded after 30 days. Which service will be MOST cost-effective in meeting these requirements?

    1. Import the logs into an RDS MySQL instance
    2. Use AWS Data pipeline to import the logs into a DynamoDB table
    3. Write the files to an S3 bucket and use Amazon Athena to query the data
    4. Import the logs to an Amazon Redshift cluster

References

Amazon_Athena

AWS Data Pipeline

AWS Data Pipeline

  • AWS Data Pipeline is a web service that makes it easy to automate and schedule regular data movement and data processing activities in AWS
  • helps define data-driven workflows
  • integrates with on-premises and cloud-based storage systems
  • helps quickly define a pipeline, which defines a dependent chain of data sources, destinations, and predefined or custom data processing activities
  • supports scheduling where the pipeline regularly performs processing activities such as distributed data copy, SQL transforms, EMR applications, or custom scripts against destinations such as S3, RDS, or DynamoDB.
  • ensures that the pipelines are robust and highly available by executing the scheduling, retry, and failure logic for the workflows as a highly scalable and fully managed service.

AWS Data Pipeline features

  • Distributed, fault-tolerant, and highly available
  • Managed workflow orchestration service for data-driven workflows
  • Infrastructure management service, as it will provision and terminate resources as required
  • Provides dependency resolution
  • Can be scheduled
  • Supports Preconditions for readiness checks.
  • Grants control over retries, including frequency and number
  • Native integration with S3, DynamoDB, RDS, EMR, EC2 and Redshift
  • Support for both AWS based and external on-premise resources

AWS Data Pipeline Concepts

Pipeline Definition

  • Pipeline definition helps the business logic to be communicated to the AWS Data Pipeline
  • Pipeline definition defines the location of data (Data Nodes), activities to be performed, the schedule, resources to run the activities, per-conditions, and actions to be performed

Pipeline Components, Instances, and Attempts

  • Pipeline components represent the business logic of the pipeline and are represented by the different sections of a pipeline definition.
  • Pipeline components specify the data sources, activities, schedule, and preconditions of the workflow
  • When AWS Data Pipeline runs a pipeline, it compiles the pipeline components to create a set of actionable instances and contains all the information needed to perform a specific task
  • Data Pipeline provides durable and robust data management as it retries a failed operation depending on frequency & defined number of retries

Task Runners

  • A task runner is an application that polls AWS Data Pipeline for tasks and then performs those tasks
  • When Task Runner is installed and configured,
    • it polls AWS Data Pipeline for tasks associated with activated pipelines
    • after a task is assigned to Task Runner, it performs that task and reports its status back to Pipeline.
  • A task is a discreet unit of work that the Pipeline service shares with a task runner and differs from a pipeline, which defines activities and resources that usually yields several tasks
  • Tasks can be executed either on the AWS Data Pipeline managed or user-managed resources.

Data Nodes

  • Data Node defines the location and type of data that a pipeline activity uses as source (input) or destination (output)
  • supports S3, Redshift, DynamoDB, and SQL data nodes

Databases

  • supports JDBC, RDS, and Redshift database

Activities

  • An activity is a pipeline component that defines the work to perform
  • Data Pipeline provides pre-defined activities for common scenarios like sql transformation, data movement, hive queries, etc
  • Activities are extensible and can be used to run own custom scripts to support endless combinations

Preconditions

  • Precondition is a pipeline component containing conditional statements that must be satisfied (evaluated to True) before an activity can run
  • A pipeline supports
    • System-managed preconditions
      • are run by the AWS Data Pipeline web service on your behalf and do not require a computational resource
      • Includes source data and keys check for e.g. DynamoDB data, table exists or S3 key exists or prefix not empty
    • User-managed preconditions
      • run on user defined and managed computational resources
      • Can be defined as Exists check or Shell command

Resources

  • A resource is a computational resource that performs the work that a pipeline activity specifies
  • supports AWS Data Pipeline-managed and self-managed resources
  • AWS Data Pipeline-managed resources include EC2 and EMR, which are launched by the Data Pipeline service only when they’re needed
  • Self managed on-premises resources can also be used, where a Task Runner package is installed which  continuously polls the AWS Data Pipeline service for work to perform
  • Resources can run in the same region as their working data set or even on a region different than AWS Data Pipeline
  • Resources launched by AWS Data Pipeline are counted within the resource limits and should be taken into account

Actions

  • Actions are steps that a pipeline takes when a certain event like success, or failure occurs.
  • Pipeline supports SNS notifications and termination action on resources

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An International company has deployed a multi-tier web application that relies on DynamoDB in a single region. For regulatory reasons they need disaster recovery capability in a separate region with a Recovery Time Objective of 2 hours and a Recovery Point Objective of 24 hours. They should synchronize their data on a regular basis and be able to provision the web application rapidly using CloudFormation. The objective is to minimize changes to the existing web application, control the throughput of DynamoDB used for the synchronization of data and synchronize only the modified elements. Which design would you choose to meet these requirements?
    1. Use AWS data Pipeline to schedule a DynamoDB cross region copy once a day. Create a ‘Lastupdated’ attribute in your DynamoDB table that would represent the timestamp of the last update and use it as a filter. (Refer Blog Post)
    2. Use EMR and write a custom script to retrieve data from DynamoDB in the current region using a SCAN operation and push it to DynamoDB in the second region. (No Schedule and throughput control)
    3. Use AWS data Pipeline to schedule an export of the DynamoDB table to S3 in the current region once a day then schedule another task immediately after it that will import data from S3 to DynamoDB in the other region. (With AWS Data pipeline the data can be copied directly to other DynamoDB table)
    4. Send each item into an SQS queue in the second region; use an auto-scaling group behind the SQS queue to replay the write in the second region. (Not Automated to replay the write)
  2. Your company produces customer commissioned one-of-a-kind skiing helmets combining nigh fashion with custom technical enhancements. Customers can show off their Individuality on the ski slopes and have access to head-up-displays, GPS rear-view cams and any other technical innovation they wish to embed in the helmet. The current manufacturing process is data rich and complex including assessments to ensure that the custom electronics and materials used to assemble the helmets are to the highest standards. Assessments are a mixture of human and automated assessments you need to add a new set of assessment to model the failure modes of the custom electronics using GPUs with CUD across a cluster of servers with low latency networking. What architecture would allow you to automate the existing process using a hybrid approach and ensure that the architecture can support the evolution of processes over time?
    1. Use AWS Data Pipeline to manage movement of data & meta-data and assessments. Use an auto-scaling group of G2 instances in a placement group. (Involves mixture of human assessments)
    2. Use Amazon Simple Workflow (SWF) to manage assessments, movement of data & meta-data. Use an autoscaling group of G2 instances in a placement group. (Human and automated assessments with GPU and low latency networking)
    3. Use Amazon Simple Workflow (SWF) to manage assessments movement of data & meta-data. Use an autoscaling group of C3 instances with SR-IOV (Single Root I/O Virtualization). (C3 and SR-IOV won’t provide GPU as well as Enhanced networking needs to be enabled)
    4. Use AWS data Pipeline to manage movement of data & meta-data and assessments use auto-scaling group of C3 with SR-IOV (Single Root I/O virtualization). (Involves mixture of human assessments)

References

AWS Lake Formation

AWS Lake Formation

AWS Lake Formation

  • AWS Lake Formation easily creates secure data lakes, making data available for wide-ranging analytics.
  • is an integrated data lake service that helps to discover, ingest, clean, catalog, transform, and secure data and make it available for analysis and ML.
  • automatically manages access to the registered data in S3 through services including AWS Glue, Athena, Redshift, QuickSight, and EMR using Zeppelin notebooks with Apache Spark to ensure compliance with your defined policies.
  • helps configure and manage your data lake without manually integrating multiple underlying AWS services.
  • can manage data ingestion through AWS Glue. Data is automatically classified, and relevant data definitions, schema, and metadata are stored in the central Glue Data Catalog. Once the data is in the S3 data lake, access policies, including table-and-column-level access controls can be defined, and encryption for data at rest enforced.
  • uses a shared infrastructure with AWS Glue, including console controls, ETL code creation and job monitoring, blueprints to create workflows for data ingest, the same data catalog, and a serverless architecture.
  • integrates with IAM so authenticated users and roles can be automatically mapped to data protection policies that are stored in the data catalog. The IAM integration also supports Microsoft Active Directory or LDAP to federate into IAM using SAML.
  • helps centralize data access policy controls. Users and roles can be defined to control access, down to the table and column level.
  • supports private endpoints in the VPC and records all activity in AWS CloudTrail for network isolation and auditability.

AWS Lake Formation

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

AWS_Lake_Formation