AWS DynamoDB Accelerator – DAX

August 10, 2023 ~ Last updated on : August 29, 2023 ~ jayendrapatil ~ 2 Comments

AWS DynamoDB Accelerator DAX

DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from ms to µs – even at millions of requests per second.
DAX as a managed service handles the cache invalidation, data population, or cluster management.
DAX provides API compatibility with DynamoDB. Therefore, it requires only minimal functional changes to use with an existing application.
DAX saves costs by reducing the read load (RCU) on DynamoDB.
DAX helps prevent hot partitions.
DAX is intended for high-performance read applications. As a write-through cache, DAX writes directly so that the writes are immediately reflected in the item cache.
DAX only supports eventual consistency and strong consistency requests are passed through to DynamoDB.
DAX is fault-tolerant and scalable.
DAX cluster has a primary node and zero or more read-replica nodes. Upon a failure for a primary node, DAX will automatically failover and elect a new primary. For scaling, add or remove read replicas.
DAX supports server-side encryption.
DAX supports encryption in transit, ensuring that all requests and responses between the application and the cluster are encrypted by TLS, and connections to the cluster can be authenticated by verification of a cluster x509 certificate.

DAX Cluster

DAC cluster is a logical grouping of one or more nodes that DAX manages as a unit.
One of the nodes in the cluster is designated as the primary node, and the other nodes (if any) are read replicas.
Primary Node is responsible for
- Fulfilling application requests for cached data.
- Handling write operations to DynamoDB.
- Evicting data from the cache according to the cluster’s eviction policy.
Read replicas are responsible for
- Fulfilling application requests for cached data.
- Evicting data from the cache according to the cluster’s eviction policy.
Only the primary node writes to DynamoDB, read replicas don’t write to DynamoDB.
For production, it is recommended to have DAX with at least three nodes with each node placed in different Availability Zones.
Three nodes are required for a DAX cluster to be fault-tolerant.
A DAX cluster in an AWS Region can only interact with DynamoDB tables that are in the same Region.

DynamoDB Accelerator Operations

Eventual Read operations
- If DAX has the item available (a cache hit), DAX returns the item without accessing DynamoDB.
- If DAX does not have the item available (a cache miss), DAX passes the request through to DynamoDB. When it receives the response from DynamoDB, DAX returns the results to the application. But it also writes the results to the cache on the primary node.
Strongly Consistent Read operations
- DAX passes the request through to DynamoDB. The results from DynamoDB are not cached in DAX. but simply returned.
- DAX is not ideal for applications that require strongly consistent reads (or that cannot tolerate eventually consistent reads).
For Write operations
- Data is first written to the DynamoDB table, and then to the DAX cluster.
- Operation is successful only if the data is successfully written to both the table and to DAX.
- Is not ideal for applications that are write-intensive, or that do not perform much read activity.

DynamoDB Accelerator Caches

DAX cluster has two distinct caches – Item cache and Query cache
Item cache
- item cache to store the results from GetItem and BatchGetItem operations.
- Item remains in the DAX item cache, subject to the Time to Live (TTL) setting and the least recently used (LRU) algorithm for the cache
- DAX provides a write-through cache, keeping the DAX item cache consistent with the underlying DynamoDB tables.
Query cache
- DAX caches the results from Query and Scan requests in its query cache.
- Query and Scan results don’t affect the item cache at all, as the result set is saved in the query cache – not in the item cache.
- Writes to the Item cache don’t affect the Query cache
Item and Query cache has a default 5 minutes TTL setting.
DAX assigns a timestamp to every entry it writes to the cache. The entry expires if it has remained in the cache for longer than the TTL setting
DAX maintains an LRU list for both Item and Query cache. LRU list tracks the item addition and last read time. If the cache becomes full, DAX evicts older items (even if they haven’t expired yet) to make room for new entries
LRU algorithm is always enabled for both the item and query cache and is not user-configurable.

DynamoDB Accelerator Write Strategies

Write-Through

DAX item cache implements a write-through policy
For write operations, DAX ensures that the cached item is synchronized with the item as it exists in DynamoDB.

Write-Around

Write-around strategy reduces write latency
Ideal for bulk uploads or writing large quantities of data
Item cache doesn’t remain in sync with the data in DynamoDB.

DynamoDB Accelerator Scenarios

As an in-memory cache, DAX increases performance and reduces the response times of eventually consistent read workloads by an order of magnitude from single-digit milliseconds to microseconds.
DAX reduces operational and application complexity by providing a managed service that is API-compatible with DynamoDB. It requires only minimal functional changes to use with an existing application.
For read-heavy or bursty workloads, DAX provides increased throughput and potential operational cost savings by reducing the need to overprovision read capacity units.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A company has setup an application in AWS that interacts with DynamoDB. DynamoDB is currently responding in milliseconds, but the application response guidelines require it to respond within microseconds. How can the performance of DynamoDB be further improved?
1. Use ElastiCache in front of DynamoDB
2. Use DynamoDB inbuilt caching
3. Use DynamoDB Accelerator
4. Use RDS with ElastiCache instead

References

AWS_DynamoDB_Accelerator

AWS Service Catalog

July 16, 2023 ~ jayendrapatil ~ 1 Comment

AWS Service Catalog

AWS Service Catalog helps centrally manage cloud resources to achieve governance at scale of the infrastructure as code (IaC) templates, written in CloudFormation or Terraform.
allows IT administrators to create, manage, and distribute catalogs of approved products to end users, who can then access the products they need in a personalized portal.
can help control which users have access to each product to enforce compliance with organizational business policies while making sure the customers can quickly deploy the cloud resources they need.
increases agility and reduces costs as end users can find and launch only the products they need from a controlled catalog.
is a regional service and Portfolios and products are a regional construct that will need to be created per region and are only visible/usable on the regions in which they were created.
supports VPC Endpoints to privately access Service Catalog APIs from VPC without the need for an Internet gateway, NAT gateway, or VPN connection.

Service Catalog Portfolios and Products

Service Catalog portfolio is a collection of products, with configuration information that determines who can use those products and how they can use them.
Each Service Catalog product is based on an infrastructure-as-code (IaC) template using CloudFormation or Terraform.
Customized portfolios can be created for each type of user in an organization and selectively granted access to the appropriate portfolio.
When an administrator adds a new version of a product to a portfolio, that version is automatically available to all current portfolio users.
Same product can be included in multiple portfolios.
Portfolios can be shared with other AWS accounts and extended by applying additional constraints.

Service Catalog Access Control

Launch Constraint
- provide AWS Service Catalog with the capability to perform actions on behalf of users even when those users do not have the necessary IAM permissions to perform those actions directly.
- is an IAM Role that AWS Service Catalog assumes when an end user launches a product.
- Service Catalog products without a launch constraint will launch and manage products using the end user’s IAM credentials; if the end user credentials are not sufficient for those activities, errors will result either in provisioning or in management activities.
Template Constraint
- define rules that limit the parameter values that a user enters when launching a product
- is applied when provisioning a new product or updating a product that is already in use.
- applies the most restrictive constraint among all constraints applied to the portfolio and the product.
- are not supported for Terraform configurations

Service Catalog AppRegistry

Service Catalog AppRegistry allows organizations to understand the application context of their AWS resources.
AppRegistry provides a repository for the information that describes the applications and associated resources that you use within your enterprise.
AppRegistry provides a single, up-to-date, definition of applications within their AWS environment.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A company has several business units that want to use Amazon EC2. The company wants to require all business units to provision their EC2 instances by using only approved EC2 instance configurations. What should a SysOps administrator do to implement this requirement?
1. Create an EC2 instance launch configuration. Allow the business units to launch EC2 instances by specifying this launch configuration in the AWS Management Console.
2. Develop an IAM policy that limits the business units to provision EC2 instances only. Instruct the business units to launch instances by using an AWS CloudFormation template.
3. Publish a product and launch constraint role for EC2 instances by using AWS Service Catalog. Allow the business units to perform actions in AWS Service Catalog only.
4. Share an AWS CloudFormation template with the business units. Instruct the business units to pass a role to AWS CloudFormation to allow the service to manage EC2 instances.

AWS Config

July 15, 2023 ~ Last updated on : July 16, 2023 ~ jayendrapatil ~ 2 Comments

AWS Config

AWS Config is a fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security, compliance, and governance.
provides a detailed view of the configuration of AWS resources in the AWS account.
is a regional service.
is strictly a detective service and does not prevent changes but it integrates with other AWS services for remediation.
gives point-in-time and historical states thereby allowing users to see changes visually in a timeline.
will only record the latest configuration of that resource only, in cases where several configuration changes are made to a resource in quick succession (i.e., within a span of a few minutes); this represents the cumulative impact of that entire set of changes.
does not cover all the AWS services and for the services unsupported the configuration management process can be automated using API and code and used to compare current and past data.
provides customizable, predefined rules as well as the ability to define custom rules.
can help with the following:
- Evaluate the AWS resource configurations for desired settings.
- Get a snapshot of the current configurations of the supported resources that are associated with your AWS account.
- Retrieve configurations of one or more resources that exist in the account.
- Retrieve historical configurations of one or more resources.
- Receive a notification whenever a resource is created, modified, or deleted.
- View relationships between resources e.g., you might want to find all resources that use a particular security group.

AWS Config Use Cases

Security Analysis & Resource Administration
- enables continuous monitoring and governance over resource configurations and helps evaluate them for any misconfigurations leading to security gaps or weaknesses.
Auditing & Compliance
- helps maintain a complete inventory of all resources and their configurations attributes as well as point in time history
- helps retrieve historical configurations that can be very useful to ensure compliance and audits with internal policies and best practices
Change Management
- helps understand relationships between resources so that the impact of the change can be proactively assessed.
- can be configured to notify whenever resources are created, modified, or deleted without having to monitor these changes by polling the calls made to each resource
Troubleshooting
- helps to quickly identify and troubleshoot issues, by being able to use the historical configurations and compare the last working configuration to the one recent change causing issues.
Discovery
- helps discover resources that exist within an account leading to better inventory and asset management.
- Get a snapshot of the current configurations of the supported resources that are associated with the AWS account

AWS Config Concepts

AWS Resources
- AWS Resources are entities created and managed for e.g. EC2 instances, Security groups
Resource Relationship
- AWS Config discovers AWS resources in the account and then creates a map of relationships between AWS resources for e.g. EBS volume linked to an EC2 instance
Configuration Items
- A configuration item represents a point-in-time view of the supported AWS resource
- Components of a configuration item include metadata, attributes, relationships, current configuration, and related events.
Configuration Snapshot
- A configuration snapshot is a collection of the configuration items for the supported resources that exist in the account.
Configuration History
- A configuration history is a collection of the configuration items for a given resource over any time period.
Configuration Stream
- Configuration stream is an automatically updated list of all configuration items for the resources that AWS Config is recording.
Configuration Recorder
- Configuration recorder stores the configurations of the supported resources in your account as configuration items.
- A configuration recorder needs to be created and started for recording and by default records, all supported services in the region.
AWS Config Rules
- AWS Config Rules help define desired configuration settings for specific resources or for the entire account.
- AWS Config continuously tracks the resource configuration changes against the rules and if violated marks the resource as non-compliant.
- supports Managed Rules and Custom Rules.
- supports Proactive (before resource provisioning) and Detective (after resource provisioning) evaluation modes.
- can be triggered either periodically or on configuration changes.

AWS Config Flow

When AWS Config is turned on, it discovers the supported resources that exist in the account and generates a configuration item for each resource.
By default, AWS Config creates configuration items for every supported resource in the region but can be customized to limited resource types.
AWS Config
- generates configuration items when the configuration of a resource changes, and it maintains historical records of the configuration items of the resources from the time the configuration recorder is started.
- keeps track of all changes to the resources by invoking the Describe or the List API call for each resource as well as related resources in the account.
- also tracks the configuration changes that were not initiated by the API. It examines the resource configurations periodically and generates configuration items for the configurations that have changed.
Configuration items are delivered in a configuration stream to an S3 bucket.
AWS Config rules, if configured,
- are evaluated continuously for resource configurations for desired settings.
- resources are evaluated either in response to configuration changes or periodically, depending on the rule.
- when the resources are evaluated, it invokes the rule’s AWS Lambda function, which contains the evaluation logic for the rule.
- The function returns the compliance status of the evaluated resources.
- If a resource violates the conditions of a rule, the resource and the rule are flagged as non-compliant and a notification is sent to the SNS topic.

AWS Config Remediation

AWS Config is strictly a detective service and does not prevent changes but it integrates with other AWS services for remediation.
allows remediation of noncompliant resources that are evaluated by config rules.
Remediation is applied using Systems Manager Automation documents, which define the actions to be performed on noncompliant AWS resources.
provides a set of managed automation documents with remediation actions.
Custom automation documents can also be created and associated with rules.

Multi-Account Multi-Region Data Aggregation

An aggregator helps collect AWS Config configuration and compliance data from the following:
- Multiple accounts and multiple regions.
- Single account and multiple regions.
- An organization in AWS Organizations and all the accounts in that organization that has AWS Config enabled.

AWS Config vs CloudTrail

AWS Config reports on WHAT has changed, whereas CloudTrail reports on WHO made the change, WHEN, and from WHICH location.
AWS Config focuses on the configuration of the AWS resources and reports with detailed snapshots on HOW the resources have changed, whereas CloudTrail focuses on the events or API calls, that drive those changes. It focuses on the user, application, and activity performed on the system.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
Open to further feedback, discussion and correction.

One of the challenges in managing AWS resources is to keep track of changes in the resource configuration over time. Which one of the following statements provide the best solution?
1. Use strict syntax tagging on the resources
2. Create a custom application to automate the configuration management process
3. Use AWS Config for supported services and use an automated process via APIs for unsupported services
4. Use resource groups and tagging along with CloudTrail so that you can audit changes using the logs
Fill the blanks: ____ helps us track AWS API calls and transitions, ____ helps to understand what resources we have now, and ____ allows auditing credentials and logins.
1. AWS Config, CloudTrail, IAM Credential Reports
2. CloudTrail, IAM Credential Reports, AWS Config
3. CloudTrail, AWS Config, IAM Credential Reports
4. AWS Config, IAM Credential Reports, CloudTrail

References

AWS_Config_Developer_Guide

AWS CloudFormation

July 15, 2023 ~ Last updated on : July 20, 2023 ~ jayendrapatil ~ 35 Comments

AWS CloudFormation

AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provision and update them in an orderly and predictable fashion.
CloudFormation consists of
- Template
  - is an architectural diagram and provides logical resources
  - a JSON or YAML-format, text-based file that describes all the AWS resources needed to deploy and run the application.
- Stack
  - is the end result of that diagram and provisions physical resources mapped to the logical resources.
  - is the set of AWS resources that are created and managed as a single unit when CloudFormation instantiates a template.
CloudFormation template can be used to set up the resources consistently and repeatedly over and over across multiple regions.
Resources can be updated, deleted, and modified in a controlled and predictable way, in effect applying version control to the infrastructure as done for software code
AWS CloudFormation Template consists of elements:-
- List of AWS resources and their configuration values
- An optional template file format version number
- An optional list of template parameters (input values supplied at stack creation time)
- An optional list of output values like public IP address using the Fn:GetAtt function
- An optional list of data tables used to lookup static configuration values for e.g., AMI names per AZ
CloudFormation supports Chef & Puppet Integration to deploy and configure right down the application layer
CloudFormation provides a set of application bootstrapping scripts that enable you to install packages, files, and services on the EC2 instances by simply describing them in the CloudFormation template
By default, automatic rollback on error feature is enabled, which will cause all the AWS resources that CloudFormation created successfully for a stack up to the point where an error occurred to be deleted.
In case of automatic rollback, charges would still be applied for the resources, the time they were up and running
CloudFormation provides a WaitCondition resource that acts as a barrier, blocking the creation of other resources until a completion signal is received from an external source e.g. application or management system
CloudFormation allows deletion policies to be defined for resources in the template for e.g. resources to be retained or snapshots can be created before deletion useful for preserving S3 buckets when the stack is deleted

AWS CloudFormation Concepts

AWS CloudFormation, you work with templates and stacks

Templates

act as blueprints for building AWS resources.
is a JSON or YAML formatted text file, saved with any extension, such as .json, .yaml, .template, or .txt.
have additional capabilities to build complex sets of resources and reuse those templates in multiple contexts for e.g. using input parameters to create generic and reusable templates
Name used for a resource within the template is a logical name but when CloudFormation creates the resource, it generates a physical name that is based on the combination of the logical name, the stack name, and a unique ID

Stacks

Stacks manage related resources as a single unit,
Collection of resources can be created, updated, and deleted by creating, updating, and deleting stacks.
All the resources in a stack are defined by the stack’s AWS CloudFormation template
CloudFormation makes underlying service calls to AWS to provision and configure the resources in the stack and can perform only actions that the users have permission to do.

Change Sets

Change Sets presents a summary or preview of the proposed changes that CloudFormation will make when a stack is updated.
Change Sets help check how the changes might impact running resources, especially critical resources, before implementing them.
CloudFormation makes the changes to the stack only when the change set is executed, allowing you to decide whether to proceed with the proposed changes or explore other changes by creating another change set.
Change sets don’t indicate whether AWS CloudFormation will successfully update a stack for e.g. if account limits are hit or the user does not have permission.

Custom Resources

Custom resources help write custom provisioning logic in templates that CloudFormation runs anytime the stacks are created, updated, or deleted.
Custom resources help include resources that aren’t available as AWS CloudFormation resource types and can still be managed in a single stack.
AWS recommends using CloudFormation Registry instead.

Nested Stacks

Nested stacks are stacks created as part of other stacks.
A nested stack can be created within another stack by using the AWS::CloudFormation::Stack resource.
Nested stacks can be used to define common, repeated patterns and components and create dedicated templates which then can be called from other stacks.
Root stack is the top-level stack to which all the nested stacks ultimately belong. Nested stacks can themselves contain other nested stacks, resulting in a hierarchy of stacks.
In addition, each nested stack has an immediate parent stack. For the first level of nested stacks, the root stack is also the parent stack.
Certain stack operations, such as stack updates, should be initiated from the root stack rather than performed directly on nested stacks themselves.

Drift Detection

Drift detection enables you to detect whether a stack’s actual configuration differs, or has drifted, from its expected configuration.
Drift detection help identify stack resources to which configuration changes have been made outside of CloudFormation management
Drift detection can detect drift on an entire stack or individual resources
Corrective action can be taken to make sure the stack resources are again in sync with the definitions in the stack template, such as updating the drifted resources directly so that they agree with their template definition
Resolving drift helps to ensure configuration consistency and successful stack operations.
CloudFormation detects drift on those AWS resources that support drift detection. Resources that don’t support drift detection are assigned a drift status of NOT_CHECKED.
Drift detection can be performed on stacks with the following statuses: CREATE_COMPLETE, UPDATE_COMPLETE, UPDATE_ROLLBACK_COMPLETE, and UPDATE_ROLLBACK_FAILED.
CloudFormation does not detect drift on any nested stacks that belong to that stack. Instead, you can initiate a drift detection operation directly on the nested stack.

CloudFormation Template Anatomy

Resources (required)
- Specifies the stack resources and their properties, such as an EC2 instance or an S3 bucket that would be created.
- Resources can be referred to in the Resources and Outputs sections
Parameters (optional)
- Pass values to the template at runtime (during stack creation or update)
- Parameters can be referred from the Resources and Outputs sections
- Can be referred using Fn::Ref or !Ref
Mappings (optional)
- A mapping of keys and associated values that used to specify conditional parameter values, similar to a lookup table.
- Can be referred using Fn::FindInMap or !FindInMap
Outputs (optional)
- Describes the values that are returned whenever you view your stack’s properties.
Format Version (optional)
- AWS CloudFormation template version that the template conforms to.
Description (optional)
- A text string that describes the template. This section must always follow the template format version section.
Metadata (optional)
- Objects that provide additional information about the template.
Rules (optional)
- Validates a parameter or a combination of parameters passed to a template during stack creation or stack update.
Conditions (optional)
- Conditions control whether certain resources are created or whether certain resource properties are assigned a value during stack creation or update.
Transform (optional)
- For serverless applications (also referred to as Lambda-based applications), specifies the version of the AWS Serverless Application Model (AWS SAM) to use.
- When you specify a transform, you can use AWS SAM syntax to declare resources in the template. The model defines the syntax that you can use and how it’s processed.

CloudFormation Template Sample

CloudFormation Access Control

IAM
- IAM can be applied with CloudFormation to access control for users whether they can view stack templates, create stacks, or delete stacks
- IAM permissions need to be provided for the user to the AWS services and resources provisioned when the stack is created
- Before a stack is created, AWS CloudFormation validates the template to check for IAM resources that it might create
Service Role
- A service role is an AWS IAM role that allows AWS CloudFormation to make calls to resources in a stack on the user’s behalf
- By default, AWS CloudFormation uses a temporary session that it generates from the user credentials for stack operations.
- For a service role, AWS CloudFormation uses the role’s credentials.
- When a service role is specified, AWS CloudFormation always uses that role for all operations that are performed on that stack.

Template Resource Attributes

CreationPolicy Attribute
- is invoked during the associated resource creation.
- can be associated with a resource to prevent its status from reaching create complete until CloudFormation receives a specified number of success signals or the timeout period is exceeded.
- helps to wait on resource configuration actions before stack creation proceeds for e.g. software installation on an EC2 instance
DeletionPolicy Attribute
- preserve or (in some cases) backup a resource when its stack is deleted
- CloudFormation deletes the resource if a resource has no DeletionPolicy attribute, by default.
- To keep a resource when its stack is deleted,
  - default, Delete where the resources would be deleted.
  - specify Retain for that resource, to prevent deletion.
  - specify Snapshot to create a snapshot before deleting the resource, if the snapshot capability is supported e.g. RDS, EC2 volume, etc.
DependsOn Attribute
- helps determine dependency order and specify that the creation of a specific resource follows another.
- the resource is created only after the creation of the resource specified in the DependsOn attribute.
Metadata Attribute
- enables association of structured data with a resource
UpdatePolicy Attribute
- Defines how AWS CloudFormation handles updates to the resources
- For AWS::AutoScaling::AutoScalingGroup resources, CloudFormation invokes one of three update policies depending on the type of change or whether a scheduled action is associated with the Auto Scaling group.
  - The AutoScalingReplacingUpdate and AutoScalingRollingUpdate policies apply only when you do one or more of the following:
    - Change the Auto Scaling group’s AWS::AutoScaling::LaunchConfiguration
    - Change the Auto Scaling group’s VPCZoneIdentifier property
    - Change the Auto Scaling group’s LaunchTemplate property
    - Update an Auto Scaling group that contains instances that don’t match the current LaunchConfiguration.
  - The AutoScalingScheduledAction policy applies when you update a stack that includes an Auto Scaling group with an associated scheduled action.
- For AWS::Lambda::Alias resources, CloudFormation performs a CodeDeploy deployment when the version changes on the alias.

CloudFormation Termination Protection

Termination protection helps prevent a stack from being accidentally deleted.
Termination protection on stacks is disabled by default.
Termination protection can be enabled on a stack creation
Termination protection can be set on a stack with any status except DELETE_IN_PROGRESS or DELETE_COMPLETE
Enabling or disabling termination protection on a stack sets it for any nested stacks belonging to that stack as well. You can’t enable or disable termination protection directly on a nested stack.
If a user attempts to directly delete a nested stack belonging to a stack that has termination protection enabled, the operation fails and the nested stack remains unchanged.
If a user performs a stack update that would delete the nested stack, AWS CloudFormation deletes the nested stack accordingly.

CloudFormation Stack Policy

Stack policy can prevent stack resources from being unintentionally updated or deleted during a stack update.
By default, all update actions are allowed on all resources and anyone with stack update permissions can update all of the resources in the stack.
During an update, some resources might require an interruption or be completely replaced, resulting in new physical IDs or completely new storage and hence need to be prevented.
A stack policy is a JSON document that defines the update actions that can be performed on designated resources.
After you set a stack policy, all of the resources in the stack are protected by default.
Updates on specific resources can be added using an explicit Allow statement for those resources in the stack policy.
Only one stack policy can be defined per stack, but multiple resources can be protected within a single policy.
A stack policy applies to all CloudFormation users who attempt to update the stack. You can’t associate different stack policies with different users
A stack policy applies only during stack updates. It doesn’t provide access controls like an IAM policy.

CloudFormation StackSets

CloudFormation StackSets extends the functionality of stacks by enabling you to create, update, or delete stacks across multiple accounts and Regions with a single operation.
Using an administrator account, an AWS CloudFormation template can be defined, managed, and used as the basis for provisioning stacks into selected target accounts across specified AWS Regions.

CloudFormation StackSets

CloudFormation Registry

CloudFormation registry helps manage extensions, both public and private, such as resources, modules, and hooks that are available for use in your AWS account.
CloudFormation registry offers several advantages over custom resources
- Supports the modeling, provisioning, and managing of third-party application resources
- Supports the Create, Read, Update, Delete, and List (CRUDL) operations
- Supports drift detection on private and third-party resource types

CloudFormation Helper Scripts

Refer blog Post @ CloudFormation Helper Scripts

CloudFormation Best Practices

Refer blog Post @ CloudFormation Best Practices

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

What does Amazon CloudFormation provide?
1. The ability to setup Autoscaling for Amazon EC2 instances.
2. A templated resource creation for Amazon Web Services.
3. A template to map network resources for Amazon Web Services
4. None of these
A user is planning to use AWS CloudFormation for his automatic deployment requirements. Which of the below mentioned components are required as a part of the template?
1. Parameters
2. Outputs
3. Template version
4. Resources
In regard to AWS CloudFormation, what is a stack?
1. Set of AWS templates that are created and managed as a template
2. Set of AWS resources that are created and managed as a template
3. Set of AWS resources that are created and managed as a single unit
4. Set of AWS templates that are created and managed as a single unit
A large enterprise wants to adopt CloudFormation to automate administrative tasks and implement the security principles of least privilege and separation of duties. They have identified the following roles with the corresponding tasks in the company: (i) network administrators: create, modify and delete VPCs, subnets, NACLs, routing tables, and security groups (ii) application operators: deploy complete application stacks (ELB, Auto -Scaling groups, RDS) whereas all resources must be deployed in the VPCs managed by the network administrators (iii) Both groups must maintain their own CloudFormation templates and should be able to create, update and delete only their own CloudFormation stacks. The company has followed your advice to create two IAM groups, one for applications and one for networks. Both IAM groups are attached to IAM policies that grant rights to perform the necessary task of each group as well as the creation, update and deletion of CloudFormation stacks. Given setup and requirements, which statements represent valid design considerations? Choose 2 answers [PROFESSIONAL]
1. Network stack updates will fail upon attempts to delete a subnet with EC2 instances (Subnets cannot be deleted with instances in them)
2. Unless resource level permissions are used on the CloudFormation: DeleteStack action, network administrators could tear down application stacks (Network administrators themselves need permission to delete resources within the application stack & CloudFormation makes calls to create, modify, and delete those resources on their behalf)
3. The application stack cannot be deleted before all network stacks are deleted (Application stack can be deleted before network stack)
4. Restricting the launch of EC2 instances into VPCs requires resource level permissions in the IAM policy of the application group (IAM permissions need to be given explicitly to launch instances )
5. Nesting network stacks within application stacks simplifies management and debugging, but requires resource level permissions in the IAM policy of the network group (Although stacks can be nested, Network group will need to have all the application group permissions)
Your team is excited about the use of AWS because now they have access to programmable infrastructure. You have been asked to manage your AWS infrastructure in a manner similar to the way you might manage application code. You want to be able to deploy exact copies of different versions of your infrastructure, stage changes into different environments, revert back to previous versions, and identify what versions are running at any particular time (development, test, QA, production). Which approach addresses this requirement?
1. Use cost allocation reports and AWS Opsworks to deploy and manage your infrastructure.
2. Use AWS CloudWatch metrics and alerts along with resource tagging to deploy and manage your infrastructure.
3. Use AWS Beanstalk and a version control system like GIT to deploy and manage your infrastructure.
4. Use AWS CloudFormation and a version control system like GIT to deploy and manage your infrastructure.
A user is usingCloudFormation to launch an EC2 instance and then configure an application after the instance is launched. The user wants the stack creation of ELB and AutoScaling to wait until the EC2 instance is launched and configured properly. How can the user configure this?
1. It is not possible that the stack creation will wait until one service is created and launched
2. The user can use the HoldCondition resource to wait for the creation of the other dependent resources
3. The user can use the DependentCondition resource to hold the creation of the other dependent resources
4. The user can use the WaitCondition resource to hold the creation of the other dependent resources
A user has created a CloudFormation stack. The stack creates AWS services, such as EC2 instances, ELB, AutoScaling, and RDS. While creating the stack it created EC2, ELB and AutoScaling but failed to create RDS. What will CloudFormation do in this scenario?
1. CloudFormation can never throw an error after launching a few services since it verifies all the steps before launching
2. It will warn the user about the error and ask the user to manually create RDS
3. Rollback all the changes and terminate all the created services
4. It will wait for the user’s input about the error and correct the mistake after the input
A user is planning to use AWS CloudFormation. Which of the below mentioned functionalities does not help him to correctly understand CloudFormation?
1. CloudFormation follows the DevOps model for the creation of Dev & Test
2. AWS CloudFormation does not charge the user for its service but only charges for the AWS resources created with it
3. CloudFormation works with a wide variety of AWS services, such as EC2, EBS, VPC, IAM, S3, RDS, ELB, etc
4. CloudFormation provides a set of application bootstrapping scripts which enables the user to install Software
A customer is using AWS for Dev and Test. The customer wants to setup the Dev environment with CloudFormation. Which of the below mentioned steps are not required while using CloudFormation?
1. Create a stack
2. Configure a service
3. Create and upload the template
4. Provide the parameters configured as part of the template
A marketing research company has developed a tracking system that collects user behavior during web marketing campaigns on behalf of their customers all over the world. The tracking system consists of an auto-scaled group of Amazon Elastic Compute Cloud (EC2) instances behind an elastic load balancer (ELB), and the collected data is stored in Amazon DynamoDB. After the campaign is terminated, the tracking system is torn down and the data is moved to Amazon Redshift, where it is aggregated, analyzed and used to generate detailed reports. The company wants to be able to instantiate new tracking systems in any region without any manual intervention and therefore adopted AWS CloudFormation. What needs to be done to make sure that the AWS CloudFormation template works in every AWS region? Choose 2 answers [PROFESSIONAL]
1. IAM users with the right to start AWS CloudFormation stacks must be defined for every target region. (IAM users are global)
2. The names of the Amazon DynamoDB tables must be different in every target region. (DynamoDB names should be unique only within a region)
3. Use the built-in function of AWS CloudFormation to set the AvailabilityZone attribute of the ELB resource.
4. Avoid using DeletionPolicies for EBS snapshots. (Don’t want the data to be retained)
5. Use the built-in Mappings and FindInMap functions of AWS CloudFormation to refer to the AMI ID set in the ImageId attribute of the Auto Scaling::LaunchConfiguration resource.
A gaming company adopted AWS CloudFormation to automate load -testing of their games. They have created an AWS CloudFormation template for each gaming environment and one for the load -testing stack. The load – testing stack creates an Amazon Relational Database Service (RDS) Postgres database and two web servers running on Amazon Elastic Compute Cloud (EC2) that send HTTP requests, measure response times, and write the results into the database. A test run usually takes between 15 and 30 minutes. Once the tests are done, the AWS CloudFormation stacks are torn down immediately. The test results written to the Amazon RDS database must remain accessible for visualization and analysis. Select possible solutions that allow access to the test results after the AWS CloudFormation load -testing stack is deleted. Choose 2 answers. [PROFESSIONAL]
1. Define a deletion policy of type Retain for the Amazon QDS resource to assure that the RDS database is not deleted with the AWS CloudFormation stack.
2. Define a deletion policy of type Snapshot for the Amazon RDS resource to assure that the RDS database can be restored after the AWS CloudFormation stack is deleted.
3. Define automated backups with a backup retention period of 30 days for the Amazon RDS database and perform point -in -time recovery of the database after the AWS CloudFormation stack is deleted. (as the environment is required for limited time the automated backup will not serve the purpose)
4. Define an Amazon RDS Read-Replica in the load-testing AWS CloudFormation stack and define a dependency relation between master and replica via the DependsOn attribute. (read replica not needed and will be deleted when the stack is deleted)
5. Define an update policy to prevent deletion of the Amazon RDS database after the AWS CloudFormation stack is deleted. (UpdatePolicy does not apply to RDS)
When working with AWS CloudFormation Templates what is the maximum number of stacks that you can create?
1. 5000
2. 500
3. 2000 (Refer link – The limit keeps on changing to check for the latest)
4. 100
What happens, by default, when one of the resources in a CloudFormation stack cannot be created?
1. Previously created resources are kept but the stack creation terminates
2. Previously created resources are deleted and the stack creation terminates
3. Stack creation continues, and the final results indicate which steps failed
4. CloudFormation templates are parsed in advance so stack creation is guaranteed to succeed.
You need to deploy an AWS stack in a repeatable manner across multiple environments. You have selected CloudFormation as the right tool to accomplish this, but have found that there is a resource type you need to create and model, but is unsupported by CloudFormation. How should you overcome this challenge? [PROFESSIONAL]
1. Use a CloudFormation Custom Resource Template by selecting an API call to proxy for create, update, and delete actions. CloudFormation will use the AWS SDK, CLI, or API method of your choosing as the state transition function for the resource type you are modeling.
2. Submit a ticket to the AWS Forums. AWS extends CloudFormation Resource Types by releasing tooling to the AWS Labs organization on GitHub. Their response time is usually 1 day, and they complete requests within a week or two.
3. Instead of depending on CloudFormation, use Chef, Puppet, or Ansible to author Heat templates, which are declarative stack resource definitions that operate over the OpenStack hypervisor and cloud environment.
4. Create a CloudFormation Custom Resource Type by implementing create, update, and delete functionality, either by subscribing a Custom Resource Provider to an SNS topic, or by implementing the logic in AWS Lambda. (Refer link)
What is a circular dependency in AWS CloudFormation?
1. When a Template references an earlier version of itself.
2. When Nested Stacks depend on each other.
3. When Resources form a DependOn loop. (Refer link, to resolve a dependency error, add a DependsOn attribute to resources that depend on other resources in the template. Some cases for e.g. EIP and VPC with IGW where EIP depends on IGW need explicitly declaration for the resources to be created in correct order)
4. When a Template references a region, which references the original Template.
You need to run a very large batch data processing job one time per day. The source data exists entirely in S3, and the output of the processing job should also be written to S3 when finished. If you need to version control this processing job and all setup and teardown logic for the system, what approach should you use?
1. Model an AWS EMR job in AWS Elastic Beanstalk. (cannot directly model EMR Clusters)
2. Model an AWS EMR job in AWS CloudFormation. (EMR cluster can be modeled using CloudFormation. Refer link)
3. Model an AWS EMR job in AWS OpsWorks. (cannot directly model EMR Clusters)
4. Model an AWS EMR job in AWS CLI Composer. (does not exist)
Your company needs to automate 3 layers of a large cloud deployment. You want to be able to track this deployment’s evolution as it changes over time, and carefully control any alterations. What is a good way to automate a stack to meet these requirements? [PROFESSIONAL]
1. Use OpsWorks Stacks with three layers to model the layering in your stack.
2. Use CloudFormation Nested Stack Templates, with three child stacks to represent the three logical layers of your cloud. (CloudFormation allows source controlled, declarative templates as the basis for stack automation and Nested Stacks help achieve clean separation of layers while simultaneously providing a method to control all layers at once when needed)
3. Use AWS Config to declare a configuration set that AWS should roll out to your cloud.
4. Use Elastic Beanstalk Linked Applications, passing the important DNS entries between layers using the metadata interface.
You have been asked to de-risk deployments at your company. Specifically, the CEO is concerned about outages that occur because of accidental inconsistencies between Staging and Production, which sometimes cause unexpected behaviors in Production even when Staging tests pass. You already use Docker to get high consistency between Staging and Production for the application environment on your EC2 instances. How do you further de-risk the rest of the execution environment, since in AWS, there are many service components you may use beyond EC2 virtual machines? [PROFESSIONAL]
1. Develop models of your entire cloud system in CloudFormation. Use this model in Staging and Production to achieve greater parity. (Only CloudFormation’s JSON Templates allow declarative version control of repeatedly deployable models of entire AWS clouds. Refer link)
2. Use AWS Config to force the Staging and Production stacks to have configuration parity. Any differences will be detected for you so you are aware of risks.
3. Use AMIs to ensure the whole machine, including the kernel of the virual machines, is consistent, since Docker uses Linux Container (LXC) technology, and we need to make sure the container environment is consistent.
4. Use AWS ECS and Docker clustering. This will make sure that the AMIs and machine sizes are the same across both environments.
Which code snippet below returns the URL of a load balanced web site created in CloudFormation with an AWS::ElasticLoadBalancing::LoadBalancer resource name “ElasticLoad Balancer”? [Developer]
1. “Fn::Join” : [“”, [ “http://”, {“Fn::GetAtt” : [ “ElasticLoadBalancer”,”DNSName”]}]] (Refer link)
2. “Fn::Join” : [“”,[ “http://”, {“Fn::GetAtt” : [ “ElasticLoadBalancer”,”Url”]}]]
3. “Fn::Join” : [“”, [ “http://”, {“Ref” : “ElasticLoadBalancerUrl”}]]
4. “Fn::Join” : [“”, [ “http://”, {“Ref” : “ElasticLoadBalancerDNSName”}]]
For AWS CloudFormation, which stack state refuses UpdateStack calls? [Developer]
1. <code>UPDATE_ROLLBACK_FAILED</code> (Refer link)
2. <code>UPDATE_ROLLBACK_COMPLETE</code>
3. <code>UPDATE_COMPLETE</code>
4. <code>CREATE_COMPLETE</code>
Which of these is not a Pseudo Parameter in AWS CloudFormation? [Developer]
1. AWS::StackName
2. AWS::AccountId
3. AWS::StackArn (Refer link)
4. AWS::NotificationARNs
Which of these is not an intrinsic function in AWS CloudFormation? [Developer]
1. Fn::SplitValue (Refer link)
2. Fn::FindInMap
3. Fn::Select
4. Fn::GetAZs
Which of these is not a CloudFormation Helper Script? [Developer]
1. cfn-signal
2. cfn-hup
3. cfn-request (Refer link)
4. cfn-get-metadata
What method should I use to author automation if I want to wait for a CloudFormation stack to finish completing in a script? [Developer]
1. Event subscription using SQS.
2. Event subscription using SNS.
3. Poll using <code>ListStacks</code> / <code>list-stacks</code>. (Only polling will make a script wait to complete. ListStacks / list-stacks is a real method. Refer link)
4. Poll using <code>GetStackStatus</code> / <code>get-stack-status</code>. (GetStackStatus / get-stack-status does not exist)
Which status represents a failure state in AWS CloudFormation? [Developer]
1. <code>UPDATE_COMPLETE_CLEANUP_IN_PROGRESS</code> (UPDATE_COMPLETE_CLEANUP_IN_PROGRESS means an update was successful, and CloudFormation is deleting any replaced, no longer used resources)
2. <code>DELETE_COMPLETE_WITH_ARTIFACTS</code> (DELETE_COMPLETE_WITH_ARTIFACTS does not exist)
3. <code>ROLLBACK_IN_PROGRESS</code> (ROLLBACK_IN_PROGRESS means an UpdateStack operation failed and the stack is in the process of trying to return to the valid, pre-update state Refer link)
4. <code>ROLLBACK_FAILED</code> (ROLLBACK_FAILED is not a CloudFormation state but UPDATE_ROLLBACK_FAILED is)
Which of these is not an intrinsic function in AWS CloudFormation? [Developer]
1. Fn::Equals
2. Fn::If
3. Fn::Not
4. Fn::Parse (Complete list of Intrinsic Functions: Fn::Base64, Fn::And, Fn::Equals, Fn::If, Fn::Not, Fn::Or, Fn::FindInMap, Fn::GetAtt, Fn::GetAZs, Fn::Join, Fn::Select, Refer link)
You need to create a Route53 record automatically in CloudFormation when not running in production during all launches of a Template. How should you implement this? [Developer]
1. Use a <code>Parameter</code> for <code>environment</code>, and add a <code>Condition</code> on the Route53 <code>Resource</code> in the template to create the record only when <code>environment</code> is not <code>production</code>. (Best way to do this is with one template, and a Condition on the resource. Route53 does not allow null strings for Refer link)
2. Create two templates, one with the Route53 record value and one with a null value for the record. Use the one without it when deploying to production.
3. Use a <code>Parameter</code> for <code>environment</code>, and add a <code>Condition</code> on the Route53 <code>Resource</code> in the template to create the record with a null string when <code>environment</code> is <code>production</code>.
4. Create two templates, one with the Route53 record and one without it. Use the one without it when deploying to production.

References

AWS_CloudFormation_User_Guide

AWS Organizations Service Control Policies – SCPs

July 14, 2023 ~ Last updated on : July 15, 2023 ~ jayendrapatil ~ 2 Comments

AWS Organizations Service Control Policies

AWS Organizations Service control policies – SCPs offer central control over the maximum available permissions for all of the accounts in the organization, ensuring member accounts stay within the organization’s access control guidelines.
are one type of policy that help manage the organization.
are available only in an organization that has all features enabled, and aren’t available if the organization has enabled only the consolidated billing features.
are NOT sufficient for granting access to the accounts in the organization.
defines a guardrail for what actions accounts within the organization root or OU can do, but IAM policies need to be attached to the users and roles in the organization’s accounts to grant permissions to them.
Effective permissions are the logical intersection between what is allowed by the SCP and what is allowed by the IAM and resource-based policies.
with an SCP attached to member accounts, identity-based and resource-based policies grant permissions to entities only if those policies and the SCP allow the action.
don’t affect users or roles in the management account. They affect only the member accounts in your organization.

SCPs Effects on Permissions

never grant permissions but define the maximum permissions for the affected accounts.
Users and roles must still be granted permissions with appropriate IAM permission policies. A user without any IAM permission policies has no access at all, even if the applicable SCPs allow all services and all actions.
limits permissions for entities in member accounts, including each AWS account root user.
does not limit actions performed by the master or management account.
does not affect any service-linked role. Service-linked roles enable other AWS services to integrate with AWS Organizations and can’t be restricted by SCPs.
affect only IAM users or roles that are managed by accounts that are part of the organization. They don’t affect users or roles from accounts outside the organization.
don’t affect resource-based policies directly.

SCPs Strategies

By default, an SCP named FullAWSAccess is attached to every root, OU, and account, which allows all actions and all services.
Blacklist or Deny Strategy
- actions are allowed by default and services and actions to be prohibited need to be specified.
- blacklist permissions using deny statements can be assigned in combination with the default FullAWSAccess SCP.
- using deny statements in SCPs require less maintenance because they don’t need to be updated when AWS adds new services.
- deny statements usually use less space, thus making it easier to stay within SCP size limits.
Whitelist or Allow Strategy
- actions are prohibited by default, and you specify what services and actions are allowed.
- whitelist permissions can be assigned, by removing the default FullAWSAccess SCP.
- allows SCP that explicitly permits only those allowed services and actions

SCPs Testing Effects

don’t attach SCPs to the root of the organization without thoroughly testing the impact that the policy has on accounts.
Create an OU that the accounts can be moved into one at a time, or at least in small numbers, to ensure that users are not inadvertently locked out of key services.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your company is planning on setting up multiple accounts in AWS. The IT Security department has a requirement to ensure that certain services and actions are not allowed across all accounts. How would the system admin achieve this in the most EFFECTIVE way possible?
1. Create a common IAM policy that can be applied across all accounts
2. Create an IAM policy per account and apply them accordingly
3. Deny the services to be used across accounts by contacting AWS support
4. Use AWS Organizations and Service Control Policies
You are in the process of implementing AWS Organizations for your company. At your previous company, you saw an Organizations implementation go bad when an SCP (Service Control Policy) was applied at the root of the organization before being thoroughly tested. In what way can an SCP be properly tested and implemented?
1. Back up your entire Organization to S3 and restore rollback and restore if something goes wrong
2. The SCP must be verified with AWS before it is implemented to avoid any problems.
3. Mirror your Organizational Unit in another region. Apply the SCP and test it. Once testing is complete, attach the SCP to the root of your organization.
4. Create an Organizational Unit (OU). Attach the SCP to this new OU. Move your accounts in one at a time to ensure that you don’t inadvertently lock users out of key services.

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

AWS Data Analytics - Specialty DAS-C01 Certificate

July 11, 2023 ~ Last updated on : October 4, 2023 ~ jayendrapatil ~ 25 Comments

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

Recertified with the AWS Certified Data Analytics – Specialty (DAS-C01) which tends to cover a lot of big data topics focused on AWS services.
Data Analytics – Specialty (DAS-C01) has replaced the previous Big Data – Specialty (BDS-C01).

AWS Certified Data Analytics – Specialty (DAS-C01) exam basically validates

Define AWS data analytics services and understand how they integrate with each other.
Explain how AWS data analytics services fit in the data lifecycle of collection, storage, processing, and visualization.

Refer AWS Certified Data Analytics – Specialty Exam Guide for details

AWS Certified Data Analytics - Specialty DAS-C01 Domains

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified Data Analytics Specialty Exam
- Whizlabs – AWS Certified Data Analytics – Specialty Course
Practice tests
- Braincert – AWS Certified Data Analytics – Specialty DAS-C01 Practice Exams
- Stephane Maarek – Practice Exams | AWS Certified Data Analytics Specialty
- Whizlabs – AWS Certified Data Analytics – Specialty Practice Tests

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Summary

Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
DAS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question.
DAS-C01 exam includes two types of questions, multiple-choice and multiple-response.
DAS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
Specialty exams currently cost $ 300 + tax.
You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
As always, mark the questions for review and move on and come back to them after you are done with all.
As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Topics

AWS Certified Data Analytics – Specialty exam, as its name suggests, covers a lot of Big Data concepts right from data collection, ingestion, transfer, storage, pre and post-processing, analytics, and visualization with the added concepts for data security at each layer.

Analytics

Make sure you know and cover all the services in-depth, as 80% of the exam is focused on topics like Glue, Kinesis, and Redshift.
AWS Analytics Services Cheat Sheet
Glue
- DAS-C01 covers Glue in great detail.
- AWS Glue is a fully managed, ETL service that automates the time-consuming steps of data preparation for analytics.
- supports server-side encryption for data at rest and SSL for data in motion.
- Glue ETL engine to Extract, Transform, and Load data that can automatically generate Scala or Python code.
- Glue Data Catalog is a central repository and persistent metadata store to store structural and operational metadata for all the data assets. It works with Apache Hive as its metastore.
- Glue Crawlers scan various data stores to automatically infer schemas and partition structures to populate the Data Catalog with corresponding table definitions and statistics.
- Glue Job Bookmark tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run.
- Glue Streaming ETL enables performing ETL operations on streaming data using continuously-running jobs.
- Glue provides flexible scheduler that handles dependency resolution, job monitoring, and retries.
- Glue Studio offers a graphical interface for authoring AWS Glue jobs to process data allowing you to define the flow of the data sources, transformations, and targets in the visual interface and generating Apache Spark code on your behalf.
- Glue Data Quality helps reduces manual data quality efforts by automatically measuring and monitoring the quality of data in data lakes and pipelines.
- Glue DataBrew helps prepare, visualize, clean, and normalize data directly from the data lake, data warehouses, and databases, including S3, Redshift, Aurora, and RDS.
Redshift
- Redshift is also covered in depth.
- Cover Redshift Advanced topics
  - Redshift Distribution Style determines how data is distributed across compute nodes and helps minimize the impact of the redistribution step by locating the data where it needs to be before the query is executed.
  - Redshift Enhanced VPC routing forces all COPY and UNLOAD traffic between the cluster and the data repositories through the VPC.
  - Workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won’t get stuck in queues behind long-running queries.
  - Redshift Spectrum helps query and retrieve structured and semistructured data from files in S3 without having to load the data into Redshift tables.
  - Federated Query feature allows querying and analyzing data across operational databases, data warehouses, and data lakes.
  - Short query acceleration (SQA) prioritizes selected short-running queries ahead of longer-running queries.
  - Redshift Serverless is a serverless option of Redshift that makes it more efficient to run and scale analytics in seconds without the need to set up and manage data warehouse infrastructure.
- Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
  - COPY command which allows parallelism, and performs better than multiple COPY commands
  - COPY command can use manifest files to load data
  - COPY command handles encrypted data
- Redshift Resizing cluster options (elastic resize did not support node type changes before, but does now)
- Redshift supports encryption at rest and in transit
- Redshift supports encrypting an unencrypted cluster using KMS. However, you can’t enable hardware security module (HSM) encryption by modifying the cluster. Instead, create a new, HSM-encrypted cluster and migrate your data to the new cluster.
- Know Redshift views to control access to data.
Elastic Map Reduce
- Understand EMRFS
  - Use Consistent view to make sure S3 objects referred by different applications are in sync. Although, it is not needed now.
- Know EMR Best Practices (hint: start with many small nodes instead of few large nodes)
- Know EMR Encryption options
  - supports SSE-S3, SS3-KMS, CSE-KMS, and CSE-Custom encryption for EMRFS
  - supports LUKS encryption for local disks
  - supports TLS for data in transit encryption
  - supports EBS encryption
- Hive metastore can be externally hosted using RDS, Aurora, and AWS Glue Data Catalog
- Know also different technologies
  - Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources
  - Spark is a distributed processing framework and programming model that helps do machine learning, stream processing, or graph analytics using Amazon EMR clusters
  - Zeppelin/Jupyter as a notebook for interactive data exploration and provides open-source web application that can be used to create and share documents that contain live code, equations, visualizations, and narrative text
  - Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store
Kinesis
- Understand Kinesis Data Streams and Kinesis Data Firehose in depth
- Know Kinesis Data Streams vs Kinesis Firehose
  - Know Kinesis Data Streams is open-ended for both producer and consumer. It supports KCL and works with Spark.
  - Know Kinesis Firehose is open-ended for producers only. Data is stored in S3, Redshift, and OpenSearch.
  - Kinesis Firehose works in batches with minimum 60secs intervals and in near-real time.
  - Kinesis Firehose supports out-of-the-box transformation and custom transformation using Lambda
- Kinesis supports encryption at rest using server-side encryption
- Kinesis Producer Library supports batching
- Kinesis Data Analytics
  - helps transform and analyze streaming data in real time using Apache Flink.
  - supports anomaly detection using Random Cut Forest ML
  - supports reference data stored in S3.
OpenSearch
- OpenSearch is a search service that supports indexing, full-text search, faceting, etc.
- OpenSearch can be used for analysis and supports visualization using OpenSearch Dashboards which can be real-time.
- OpenSearch Service Storage tiers support Hot, UltraWarm, and Cold and the data can be transitioned using Index State management.
QuickSight
- Know Visual Types (hint: esp. word clouds, plotting line, bar, and story based visualizations)
- Know Supported Data Sources
- QuickSight provides IP addresses that need to be whitelisted for QuickSight to access the data store.
- QuickSight provides direct integration with Microsoft AD
- QuickSight supports Row level security using dataset rules to control access to data at row granularity based on permissions associated with the user interacting with the data.
- QuickSight supports ML insights as well
- QuickSight supports users defined via IAM or email signup.
Athena
- is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats.
- provides a simplified, flexible way to analyze data in an S3 data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python without loading the data.
- integrates with QuickSight for visualizing the data or creating dashboards.
- uses a managed Glue Data Catalog to store information and schemas about the databases and tables that you create for the data stored in S3
- Workgroups can be used to separate users, teams, applications, or workloads, to set limits on the amount of data each query or the entire workgroup can process, and to track costs.
- Athena best practices recommended partitioning the data, partition projection, and using the Columnar file format like ORC or Parquet as they support compression and are splittable.
Know Data Pipeline for data transfer

Security, Identity & Compliance

Data security is a key concept controlled in the Data Analytics – Specialty exam
Identity and Access Management (IAM)
- Understand IAM in depth
- Understand IAM Roles
- Understand Identity Providers & Federation
- Understand IAM Policies
Deep dive into Key Management Service (KMS). There would be quite a few questions on this.
- Understand how KMS works
- Understand IAM Policies, Key Policies, Grants
- Know KMS are regional and how to use in other regions.
Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in S3.
Understand AWS Cognito esp. authentication across devices

Management & Governance Tools

Understand AWS CloudWatch for Logs and Metrics.
CloudWatch Subscription Filters can be used to route data to Kinesis Data Streams, Kinesis Data Firehose, and Lambda.

Whitepapers and articles

On the Exam Day

Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
If you are taking the AWS Online exam
- Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
- The online verification process does take some time and usually, there are glitches.
- Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
- Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Directory Services

July 8, 2023 ~ Last updated on : July 21, 2023 ~ jayendrapatil ~ 3 Comments

AWS Directory Services

AWS Directory Services is a managed service offering, providing directories that contain information about the organization, including users, groups, computers, and other resources.
AWS Directory Services provides multiple ways including
- Simple AD – a standalone directory service
- AD Connector – acts as a proxy to use On-Premise Microsoft Active Directory with other AWS services.
- AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also referred to as Microsoft AD

Simple AD

is a Microsoft Active Directory compatible directory from AWS Directory Service that is powered by Samba 4.
is the least expensive option and the best choice if there are 5,000 or fewer users & don’t need the more advanced Microsoft Active Directory features.
supports commonly used Active Directory features such as user accounts, group memberships, domain-joining EC2 instances running Linux and Windows, Kerberos-based single sign-on (SSO), and group policies.
does not support features like DNS dynamic update, schema extensions, multi-factor authentication, communication over LDAPS, PowerShell AD cmdlets, and the transfer of FSMO roles
provides daily automated snapshots to enable point-in-time recovery
Trust relationships between Simple AD and other Active Directory domains cannot be set up.
does not support MFA, RDS SQL Server, or AWS SSO.

AD Connector

helps connect to an existing on-premises Active Directory to AWS
is the best choice to leverage an existing on-premises directory with AWS services
requires VPN or Direct Connect connection
is a proxy service for connecting on-premises Microsoft Active Directory to AWS without requiring complex directory synchronization technologies or the cost and complexity of hosting a federation infrastructure
forwards sign-in requests to the Active Directory domain controllers for authentication and provides the ability for applications to query the directory for data
enables consistent enforcement of existing security policies, such as password expiration, password history, and account lockouts, whether users are accessing resources on-premises or in the AWS cloud

Microsoft Active Directory (Enterprise Edition)

is a feature-rich managed Microsoft Active Directory hosted on AWS
is the best choice if there are more than 5,000 users
supports trust relationship (forest trust) set up between an AWS-hosted directory and on-premises directories providing users and groups with access to resources in either domain, using single sign-on (SSO) without the need to synchronize or replicate the users, groups, or passwords.
requires a VPN or Direct Connect connection.
provides much of the functionality offered by Microsoft Active Directory plus integration with AWS applications.
provides a highly available pair of domain controllers running in different AZs connected to the VPC in a Region of your choice.
supports MFA by integrating with an existing RADIUS-based MFA infrastructure to provide an additional layer of security when users access AWS applications.
automatically configures and manages host monitoring and recovery, data replication, snapshots, and software updates.
supports RDS for SQL Server, AWS Workspaces, Quicksight, WorkDocs, etc.

AWS Directory Services - Microsoft AD Use Cases

Microsoft AD Connectivity Options

If the VGW is used to connect to the On-Premise AD is not stable or has connectivity issues, the following options can be explored
- Simple AD
  - lower cost, low scale, basic AD compatible, or LDAP compatibility
  - provides a standalone instance for the Microsoft AD in AWS
  - No single point of Authentication or Authorization, as a separate copy is maintained
  - trust relationships cannot be set up between Simple AD and other Active Directory domains
- Read-only Domain Controllers (RODCs)
  - works out as a Read-only Active Directory
  - holds a copy of the Active Directory Domain Service (AD DS) database and responds to authentication requests.
  - are typically deployed in locations where physical security cannot be guaranteed.
  - they cannot be written to by applications or other servers.
  - helps maintain a single point to authentication & authorization controls, however, needs to be synced.
- Writable Domain Controllers
  - are expensive to setup
  - operate in a multi-master model; changes can be made on any writable server in the forest, and those changes are replicated to servers throughout the entire forest

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

The majority of your Infrastructure is on-premises and you have a small footprint on AWS. Your company has decided to roll out a new application that is heavily dependent on low latency connectivity to LDAP for authentication. Your security policy requires minimal changes to the company’s existing application user management processes. What option would you implement to successfully launch this application?
1. Create a second, independent LDAP server in AWS for your application to use for authentication (independent would not work for authentication as its a separate copy)
2. Establish a VPN connection so your applications can authenticate against your existing on-premises LDAP servers (not a low latency solution)
3. Establish a VPN connection between your data center and AWS create an LDAP replica on AWS and configure your application to use the LDAP replica for authentication (RODCs low latency and minimal setup)
4. Create a second LDAP domain on AWS establish a VPN connection to establish a trust relationship between your new and existing domains and use the new domain for authentication (Not minimal effort)
A company is preparing to give AWS Management Console access to developers Company policy mandates identity federation and role-based access control. Roles are currently assigned using groups in the corporate Active Directory. What combination of the following will give developers access to the AWS console? (Select 2) Choose 2 answers
1. AWS Directory Service AD Connector (for Corporate Active directory)
2. AWS Directory Service Simple AD
3. AWS Identity and Access Management groups
4. AWS Identity and Access Management roles
5. AWS Identity and Access Management users
An Enterprise customer is starting their migration to the cloud, their main reason for migrating is agility, and they want to make their internal Microsoft Active Directory available to any applications running on AWS; this is so internal users only have to remember one set of credentials and as a central point of user control for leavers and joiners. How could they make their Active Directory secure, and highly available, with minimal on-premises infrastructure changes, in the most cost and time-efficient way? Choose the most appropriate
1. Using Amazon Elastic Compute Cloud (EC2), they would create a DMZ using a security group; within the security group they could provision two smaller Amazon EC2 instances that are running Openswan for resilient IPSEC tunnels, and two larger instances that are domain controllers; they would use multiple Availability Zones (Whats Openswan? Refer Implementation)
2. Using VPC, they could create an extension to their data center and make use of resilient hardware IPSEC tunnels; they could then have two domain controller instances that are joined to their existing domain and reside within different subnets, in different Availability Zones (highly available with 2 AZ’s, secure with VPN connection and minimal changes)
3. Within the customer’s existing infrastructure, they could provision new hardware to run Active Directory Federation Services; this would present Active Directory as a SAML2 endpoint on the internet; any new application on AWS could be written to authenticate using SAML2 (not minimal on-premises hardware changes)
4. The customer could create a stand-alone VPC with its own Active Directory Domain Controllers; two domain controller instances could be configured, one in each Availability Zone; new applications would authenticate with those domain controllers (not a central location, but a copy)
A company needs to deploy virtual desktops to its customers in a virtual private cloud, leveraging existing security controls. Which set of AWS services and features will meet the company’s requirements?
1. Virtual Private Network connection. AWS Directory Services, and ClassicLink (ClassicLink allows you to link an EC2-Classic instance to a VPC in your account, within the same region)
2. Virtual Private Network connection. AWS Directory Services, and Amazon Workspaces (WorkSpaces for Virtual desktops, and AWS Directory Services to authenticate to an existing on-premises AD through VPN)
3. AWS Directory Service, Amazon Workspaces, and AWS Identity and Access Management (AD service needs a VPN connection to interact with an On-premise AD directory)
4. Amazon Elastic Compute Cloud, and AWS Identity and Access Management (Need WorkSpaces for virtual desktops)
An Enterprise customer is starting their migration to the cloud, their main reason for migrating is agility and they want to make their internal Microsoft active directory available to any applications running on AWS, this is so internal users only have to remember one set of credentials and as a central point of user control for leavers and joiners. How could they make their active directory secure and highly available with minimal on-premises infrastructure changes in the most cost and time-efficient way? Choose the most appropriate:
1. Using Amazon EC2, they could create a DMZ using a security group, within the security group they could provision two smaller Amazon EC2 instances that are running Openswan for resilient IPSEC tunnels and two larger instances that are domain controllers, they would use multiple availability zones.
2. Using VPC, they could create an extension to their data center and make use of resilient hardware IPSEC tunnels, they could then have two domain controller instances that are joined to their existing domain and reside within different subnets in different availability zones.
3. Within the customer’s existing infrastructure, they could provision new hardware to run active directory federation services, this would present active directory as a SAML2 endpoint on the internet and any new application on AWS could be written to authenticate using SAML2 (not a minimal change to the existing infrastructure)
4. The customer could create a stand alone VPC with its own active directory domain controllers, two domain controller instances could be configured, one in each availability zone, new applications would authenticate with those domain controllers. (Standalone cannot use the same security)
You run a 2000-engineer organization. You are about to begin using AWS at a large scale for the first time. You want to integrate with your existing identity management system running on Microsoft Active Directory because your organization is a power-user of Active Directory. How should you manage your AWS identities in the simplest manner?
1. Use a large AWS Directory Service Simple AD.
2. Use a large AWS Directory Service AD Connector. (AD Connector can be used as power-user of Microsoft Active Directory. Simple AD only works with a subset of AD functionality)
3. Use a Sync Domain running on AWS Directory Service.
4. Use an AWS Directory Sync Domain running on AWS Lambda.

References

AWS Directory Service Administrative Guide

AWS Security Hub

July 6, 2023 ~ Last updated on : July 28, 2023 ~ jayendrapatil

AWS Security Hub

AWS Security Hub is a cloud security posture management service that performs security best practice checks, aggregates alerts, and enables automated remediation.
collects security data from across AWS accounts, services, and supported third-party partner products and helps analyze the security trends and identify the highest priority security issues.
is Regional and only receives and processes findings from the Region where the Security Hub is enabled. However, it supports cross-region aggregation of findings via the designation of an aggregator region.
must be enabled in each region to view findings in that region.
automatically runs continuous, account-level configuration and security checks based on AWS best practices and industry standards which include
- CIS AWS Foundations
- Payment Card Industry Data Security Standard (PCI DSS)
- AWS Foundational Security Best Practices
can consume, aggregate, organize, and prioritize findings from
- AWS services like
  - Amazon GuardDuty,
  - Amazon Inspector,
  - Amazon Macie,
  - AWS IAM Access Analyzer,
  - AWS Firewall Manager
- other supported third-party partner products.
consolidates the security findings across accounts and provider products and displays results on the Security Hub console.
supports integration with Amazon EventBridge. Custom actions can be defined when a finding is received.
only detects and consolidates findings that are generated after the Security Hub is enabled.
has multi-account management through AWS Organizations integration, which allows delegating an administrator account for the organization.
uses service-linked AWS Config rules to perform most of its security checks for controls. AWS Config must be enabled on all accounts – both the administrator account and member accounts – in each Region where Security Hub is enabled.
works with a service-linked role named AWSServiceRoleForSecurityHub which includes the permissions and trust policy to do the following:
- Detect and aggregate findings from Amazon GuardDuty, Amazon Inspector, and Amazon Macie
- Configure the requisite AWS Config infrastructure to run security checks for the supported standards

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A security engineer has been asked to continuously monitor the company’s AWS account using automated compliance checks based on AWS best practices and Center for Internet Security (CIS) AWS Foundations Benchmarks. How can the security engineer accomplish this using AWS services?
1. AWS Config + AWS Security Hub
2. Amazon Inspector + AWS GuardDuty
3. Amazon Inspector + AWS Shield
4. AWS Config + Amazon Inspector

References

AWS_Security_Hub

Amazon GuardDuty

July 3, 2023 ~ Last updated on : July 28, 2023 ~ jayendrapatil

Amazon GuardDuty

Amazon GuardDuty is a threat detection service that continuously monitors the AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
is a continuous security monitoring service that analyzes and processes the following data sources:
- CloudTrail S3 data events and management event logs,
- DNS logs,
- EKS audit logs, and
- VPC flow logs.
uses threat intelligence feeds, such as lists of malicious IP addresses and domains, and machine learning to identify unexpected and potentially unauthorized and malicious activity within the AWS environment.
combines machine learning, anomaly detection, network monitoring, and malicious file discovery, utilizing both AWS-developed and industry-leading third-party sources to help protect workloads and data on AWS
is a Regional service and is recommended to be enabled in all supported AWS Regions. This helps generate findings of unauthorized or unusual activity even in Regions not actively used.
does not look at historical data, it monitors only the activity that starts after it is enabled.
operates completely independent of the AWS resources and therefore has no impact on the performance or availability of the accounts or workloads.
GuardDuty supports
- Suppression rules, allow the creation of very specific combinations of attributes to suppress findings.
- Trusted IP List for highly secure communication with the AWS environment. Findings are not generated based on trusted IP lists.
- Threat List for known malicious IP addresses. Findings are generated based on threat lists.
Security findings are retained and made available through the GuardDuty console and APIs for 90 days, after which they are discarded.
Findings are assigned a severity, and actions can be automated by integrating with Security Hub, EventBridge, Lambda, and Step Functions.
Amazon Detective is also tightly integrated with GuardDuty which helps perform deeper forensic and root cause investigations.
GuardDuty Malware Protection feature helps to detect malicious files on EBS volumes attached to an EC2 instance and container workloads.

GuardDuty with Multiple Accounts

GuardDuty has multi-account management through AWS Organizations integration, which allows delegating an administrator account for the organization.
The delegated administrator (DA) account is a centralized account that consolidates all findings and can configure all member accounts.
The administrator account helps to associate and manage multiple AWS accounts.
All security findings are aggregated to the administrator account for review and remediation.
CloudWatch Events are also aggregated to the administrator account when using this configuration.

GuardDuty Automated Remediation

GuardDuty security findings can be remediated automatically using EventBridge and AWS Lambda
For example, a Lambda function can be created to modify the AWS security group rules based on security findings. For a GuardDuty finding indicating one of your EC2 instances is being probed by a known malicious IP, the address can be added through an EventBridge rule, initiating a Lambda function to automatically modify the security group rules and restrict access on that port.

GuardDuty Malware Protection

GuardDuty Malware Protection helps scan EBS volume data for possible malware and identifies suspicious behavior indicative of malicious software in EC2 instances or container workloads.
is optimized to consume large data volumes for near real-time processing of security detections.
scans a replica EBS volume that GuardDuty generates based on the snapshot of the EBS volume for trojans, worms, crypto miners, rootkits, bots, and more.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Which AWS service makes detecting and reporting unexpected and potentially malicious activity in your AWS environment easy?
1. AWS Shield
2. AWS Inspector
3. AWS GuardDuty
4. AWS WAF

References

Amazon_GuardDuty

AWS Data Pipeline

June 27, 2023 ~ Last updated on : July 18, 2023 ~ jayendrapatil ~ 4 Comments

AWS Data Pipeline

AWS Data Pipeline is a web service that makes it easy to automate and schedule regular data movement and data processing activities in AWS
helps define data-driven workflows
integrates with on-premises and cloud-based storage systems
helps quickly define a pipeline, which defines a dependent chain of data sources, destinations, and predefined or custom data processing activities
supports scheduling where the pipeline regularly performs processing activities such as distributed data copy, SQL transforms, EMR applications, or custom scripts against destinations such as S3, RDS, or DynamoDB.
ensures that the pipelines are robust and highly available by executing the scheduling, retry, and failure logic for the workflows as a highly scalable and fully managed service.

AWS Data Pipeline features

Distributed, fault-tolerant, and highly available
Managed workflow orchestration service for data-driven workflows
Infrastructure management service, as it will provision and terminate resources as required
Provides dependency resolution
Can be scheduled
Supports Preconditions for readiness checks.
Grants control over retries, including frequency and number
Native integration with S3, DynamoDB, RDS, EMR, EC2 and Redshift
Support for both AWS based and external on-premise resources

AWS Data Pipeline Concepts

Pipeline Definition

Pipeline definition helps the business logic to be communicated to the AWS Data Pipeline
Pipeline definition defines the location of data (Data Nodes), activities to be performed, the schedule, resources to run the activities, per-conditions, and actions to be performed

Pipeline Components, Instances, and Attempts

Pipeline components represent the business logic of the pipeline and are represented by the different sections of a pipeline definition.
Pipeline components specify the data sources, activities, schedule, and preconditions of the workflow
When AWS Data Pipeline runs a pipeline, it compiles the pipeline components to create a set of actionable instances and contains all the information needed to perform a specific task
Data Pipeline provides durable and robust data management as it retries a failed operation depending on frequency & defined number of retries

Task Runners

A task runner is an application that polls AWS Data Pipeline for tasks and then performs those tasks
When Task Runner is installed and configured,
- it polls AWS Data Pipeline for tasks associated with activated pipelines
- after a task is assigned to Task Runner, it performs that task and reports its status back to Pipeline.
A task is a discreet unit of work that the Pipeline service shares with a task runner and differs from a pipeline, which defines activities and resources that usually yields several tasks
Tasks can be executed either on the AWS Data Pipeline managed or user-managed resources.

Data Nodes

Data Node defines the location and type of data that a pipeline activity uses as source (input) or destination (output)
supports S3, Redshift, DynamoDB, and SQL data nodes

Databases

supports JDBC, RDS, and Redshift database

Activities

An activity is a pipeline component that defines the work to perform
Data Pipeline provides pre-defined activities for common scenarios like sql transformation, data movement, hive queries, etc
Activities are extensible and can be used to run own custom scripts to support endless combinations

Preconditions

Precondition is a pipeline component containing conditional statements that must be satisfied (evaluated to True) before an activity can run
A pipeline supports
- System-managed preconditions
  - are run by the AWS Data Pipeline web service on your behalf and do not require a computational resource
  - Includes source data and keys check for e.g. DynamoDB data, table exists or S3 key exists or prefix not empty
- User-managed preconditions
  - run on user defined and managed computational resources
  - Can be defined as Exists check or Shell command

Resources

A resource is a computational resource that performs the work that a pipeline activity specifies
supports AWS Data Pipeline-managed and self-managed resources
AWS Data Pipeline-managed resources include EC2 and EMR, which are launched by the Data Pipeline service only when they’re needed
Self managed on-premises resources can also be used, where a Task Runner package is installed which continuously polls the AWS Data Pipeline service for work to perform
Resources can run in the same region as their working data set or even on a region different than AWS Data Pipeline
Resources launched by AWS Data Pipeline are counted within the resource limits and should be taken into account

Actions

Actions are steps that a pipeline takes when a certain event like success, or failure occurs.
Pipeline supports SNS notifications and termination action on resources

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

An International company has deployed a multi-tier web application that relies on DynamoDB in a single region. For regulatory reasons they need disaster recovery capability in a separate region with a Recovery Time Objective of 2 hours and a Recovery Point Objective of 24 hours. They should synchronize their data on a regular basis and be able to provision the web application rapidly using CloudFormation. The objective is to minimize changes to the existing web application, control the throughput of DynamoDB used for the synchronization of data and synchronize only the modified elements. Which design would you choose to meet these requirements?
1. Use AWS data Pipeline to schedule a DynamoDB cross region copy once a day. Create a ‘Lastupdated’ attribute in your DynamoDB table that would represent the timestamp of the last update and use it as a filter. (Refer Blog Post)
2. Use EMR and write a custom script to retrieve data from DynamoDB in the current region using a SCAN operation and push it to DynamoDB in the second region. (No Schedule and throughput control)
3. Use AWS data Pipeline to schedule an export of the DynamoDB table to S3 in the current region once a day then schedule another task immediately after it that will import data from S3 to DynamoDB in the other region. (With AWS Data pipeline the data can be copied directly to other DynamoDB table)
4. Send each item into an SQS queue in the second region; use an auto-scaling group behind the SQS queue to replay the write in the second region. (Not Automated to replay the write)
Your company produces customer commissioned one-of-a-kind skiing helmets combining nigh fashion with custom technical enhancements. Customers can show off their Individuality on the ski slopes and have access to head-up-displays, GPS rear-view cams and any other technical innovation they wish to embed in the helmet. The current manufacturing process is data rich and complex including assessments to ensure that the custom electronics and materials used to assemble the helmets are to the highest standards. Assessments are a mixture of human and automated assessments you need to add a new set of assessment to model the failure modes of the custom electronics using GPUs with CUD across a cluster of servers with low latency networking. What architecture would allow you to automate the existing process using a hybrid approach and ensure that the architecture can support the evolution of processes over time?
1. Use AWS Data Pipeline to manage movement of data & meta-data and assessments. Use an auto-scaling group of G2 instances in a placement group. (Involves mixture of human assessments)
2. Use Amazon Simple Workflow (SWF) to manage assessments, movement of data & meta-data. Use an autoscaling group of G2 instances in a placement group. (Human and automated assessments with GPU and low latency networking)
3. Use Amazon Simple Workflow (SWF) to manage assessments movement of data & meta-data. Use an autoscaling group of C3 instances with SR-IOV (Single Root I/O Virtualization). (C3 and SR-IOV won’t provide GPU as well as Enhanced networking needs to be enabled)
4. Use AWS data Pipeline to manage movement of data & meta-data and assessments use auto-scaling group of C3 with SR-IOV (Single Root I/O virtualization). (Involves mixture of human assessments)

References

AWS_Data_Pipeline_Developer_Guide