IAM Role – Identity Providers and Federation

January 1, 2022 ~ Last updated on : January 2, 2023 ~ jayendrapatil ~ 29 Comments

IAM Role – Identity Providers and Federation

Identity Provider can be used to grant external user identity permissions to AWS resources without having to be created within your AWS account.

External user identities can be authenticated either through the organization’s authentication system or through a well-known identity provider such as Amazon, Google, etc.
Identity providers help keep the AWS account secure without having the need to distribute or embed long-term in the application

To use an IdP, an IAM identity provider entity can be created to establish a trust relationship between the AWS account and the IdP.
IAM supports IdPs that are compatible with OpenID Connect (OIDC) or SAML 2.0 (Security Assertion Markup Language 2.0)

Web Identity Federation without Cognito

IAM Web Identity Federation

Mobile or Web Application needs to be configured with the IdP which gives each application a unique ID or client ID (also called audience)
Create an Identity Provider entity for OIDC compatible IdP in IAM.
Create an IAM role and define the
1. Trust policy – specify the IdP (like Amazon) as the Principal (the trusted entity), and include a Condition that matches the IdP assigned app ID
2. Permission policy – specify the permissions the application can assume
Application calls the sign-in interface for the IdP to login

IdP authenticates the user and returns an authentication token (OAuth access token or OIDC ID token) with information about the user to the application
Application then makes an unsigned call to the STS service with the AssumeRoleWithWebIdentity action to request temporary security credentials.
Application passes the IdP’s authentication token along with the Amazon Resource Name (ARN) for the IAM role created for that IdP.

AWS verifies that the token is trusted and valid and if so, returns temporary security credentials (access key, secret access key, session token, expiry time) to the application that has the permissions for the role that you name in the request.
STS response also includes metadata about the user from the IdP, such as the unique user ID that the IdP associates with the user.
Application makes signed requests to AWS using the Temporary credentials

User ID information from the identity provider can distinguish users in the app for e.g., objects can be put into S3 folders that include the user ID as prefixes or suffixes. This lets you create access control policies that lock the folder so only the user with that ID can access it.
Application can cache the temporary security credentials and refresh them before their expiry accordingly. Temporary credentials, by default, are good for an hour.

Interactive Website provides a very good way to understand the flow

Mobile or Web Identity Federation with Cognito

Amazon Cognito as the identity broker is a recommended for almost all web identity federation scenarios
Cognito is easy to use and provides additional capabilities like anonymous (unauthenticated) access
Cognito supports anonymous users, MFA and also helps synchronizing user data across devices and providers

Web Identify Federation using Cognito

SAML 2.0-based Federation

AWS supports identity federation with SAML 2.0 (Security Assertion Markup Language 2.0), an open standard used by many identity providers (IdPs).
SAML 2.0 based federation feature enables federated single sign-on (SSO), so users can log into the AWS Management Console or call the AWS APIs without having to create an IAM user for everyone in the organization

SAML helps simplify the process of configuring federation with AWS by using the IdP’s service instead of writing custom identity proxy code.
This is useful in organizations that have integrated their identity systems (such as Windows Active Directory or OpenLDAP) with software that can produce SAML assertions to provide information about user identity and permissions (such as Active Directory Federation Services or Shibboleth)

SAML based Federation

Create a SAML provider entity in AWS using the SAML metadata document provided by the Organizations IdP to establish a “trust” between your AWS account and the IdP
SAML metadata document includes the issuer name, a creation date, an expiration date, and keys that AWS can use to validate authentication responses (assertions) from your organization.
Create IAM roles which define
1. Trust policy with the SAML provider as the principal, which establishes a trust relationship between the organization and AWS
2. Permission policy establishes what users from the organization are allowed to do in AWS
SAML trust is completed by configuring the Organization’s IdP with information about AWS and the role(s) that you want the federated users to use. This is referred to as configuring relying party trust between your IdP and AWS

Application calls the sign-in interface for the Organization IdP to login
IdP authenticates the user and generates a SAML authentication response which includes assertions that identify the user and include attributes about the user
Application then makes an unsigned call to the STS service with the AssumeRoleWithSAML action to request temporary security credentials.

Application passes the ARN of the SAML provider, the ARN of the role to assume, the SAML assertion about the current user returned by IdP, and the time for which the credentials should be valid. An optional IAM Policy parameter can be provided to further restrict the permissions to the user
AWS verifies that the SAML assertion is trusted and valid and if so, returns temporary security credentials (access key, secret access key, session token, expiry time) to the application that has the permissions for the role named in the request.
STS response also includes metadata about the user from the IdP, such as the unique user ID that the IdP associates with the user.

Using the Temporary credentials, the application makes signed requests to AWS to access the services
Application can cache the temporary security credentials and refresh them before their expiry accordingly. Temporary credentials, by default, are good for an hour.

AWS SSO with SAML

SAML 2.0 based federation can also be used to grant access to the federated users to the AWS Management console.

This requires the use of the AWS SSO endpoint instead of directly calling the AssumeRoleWithSAML API.
The endpoint calls the API for the user and returns a URL that automatically redirects the user’s browser to the AWS Management Console.

SAML based SSO to AWS Console

User browses the organization’s portal and selects the option to go to the AWS Management Console.
Portal performs the function of the identity provider (IdP) that handles the exchange of trust between the organization and AWS.
Portal verifies the user’s identity in the organization.

Portal generates a SAML authentication response that includes assertions that identify the user and include attributes about the user.
Portal sends this response to the client browser.
Client browser is redirected to the AWS SSO endpoint and posts the SAML assertion.

AWS SSO endpoint handles the call for the AssumeRoleWithSAML API action on the user’s behalf and requests temporary security credentials from STS and creates a console sign-in URL that uses those credentials.
AWS sends the sign-in URL back to the client as a redirect.
Client browser is redirected to the AWS Management Console. If the SAML authentication response includes attributes that map to multiple IAM roles, the user is first prompted to select the role to use for access to the console.

Custom Identity Broker Federation

Custom Identity broker Federation

If the Organization doesn’t support SAML-compatible IdP, a Custom Identity Broker can be used to provide the access.
Custom Identity Broker should perform the following steps
- Verify that the user is authenticated by the local identity system.
- Call the AWS STS AssumeRole (recommended) or GetFederationToken (by default, has an expiration period of 36 hours) APIs to obtain temporary security credentials for the user.
- Temporary credentials limit the permissions a user has to the AWS resource
- Call an AWS federation endpoint and supply the temporary security credentials to get a sign-in token.
- Construct a URL for the console that includes the token.
- URL that the federation endpoint provides is valid for 15 minutes after it is created.
- Give the URL to the user or invoke the URL on the user’s behalf.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A photo-sharing service stores pictures in Amazon Simple Storage Service (S3) and allows application sign-in using an OpenID Connect-compatible identity provider. Which AWS Security Token Service approach to temporary access should you use for the Amazon S3 operations?
1. SAML-based Identity Federation
2. Cross-Account Access
3. AWS IAM users
4. Web Identity Federation

Which technique can be used to integrate AWS IAM (Identity and Access Management) with an on-premise LDAP (Lightweight Directory Access Protocol) directory service?
1. Use an IAM policy that references the LDAP account identifiers and the AWS credentials.
2. Use SAML (Security Assertion Markup Language) to enable single sign-on between AWS and LDAP
3. Use AWS Security Token Service from an identity broker to issue short-lived AWS credentials. (Refer Link)
4. Use IAM roles to automatically rotate the IAM credentials when LDAP credentials are updated.
5. Use the LDAP credentials to restrict a group of users from launching specific EC2 instance types.

You are designing a photo sharing mobile app the application will store all pictures in a single Amazon S3 bucket. Users will upload pictures from their mobile device directly to Amazon S3 and will be able to view and download their own pictures directly from Amazon S3. You want to configure security to handle potentially millions of users in the most secure manner possible. What should your server-side application do when a new user registers on the photo-sharing mobile application? [PROFESSIONAL]
1. Create a set of long-term credentials using AWS Security Token Service with appropriate permissions Store these credentials in the mobile app and use them to access Amazon S3.
2. Record the user’s Information in Amazon RDS and create a role in IAM with appropriate permissions. When the user uses their mobile app create temporary credentials using the AWS Security Token Service ‘AssumeRole’ function. Store these credentials in the mobile app’s memory and use them to access Amazon S3. Generate new credentials the next time the user runs the mobile app.
3. Record the user’s Information in Amazon DynamoDB. When the user uses their mobile app create temporary credentials using AWS Security Token Service with appropriate permissions. Store these credentials in the mobile app’s memory and use them to access Amazon S3 Generate new credentials the next time the user runs the mobile app.
4. Create IAM user. Assign appropriate permissions to the IAM user Generate an access key and secret key for the IAM user, store them in the mobile app and use these credentials to access Amazon S3.
5. Create an IAM user. Update the bucket policy with appropriate permissions for the IAM user Generate an access Key and secret Key for the IAM user, store them In the mobile app and use these credentials to access Amazon S3.

Your company has recently extended its datacenter into a VPC on AWS to add burst computing capacity as needed Members of your Network Operations Center need to be able to go to the AWS Management Console and administer Amazon EC2 instances as necessary. You don’t want to create new IAM users for each NOC member and make those users sign in again to the AWS Management Console. Which option below will meet the needs for your NOC members? [PROFESSIONAL]
1. Use OAuth 2.0 to retrieve temporary AWS security credentials to enable your NOC members to sign in to the AWS Management Console.
2. Use Web Identity Federation to retrieve AWS temporary security credentials to enable your NOC members to sign in to the AWS Management Console.
3. Use your on-premises SAML 2.O-compliant identity provider (IDP) to grant the NOC members federated access to the AWS Management Console via the AWS single sign-on (SSO) endpoint.
4. Use your on-premises SAML 2.0-compliant identity provider (IDP) to retrieve temporary security credentials to enable NOC members to sign in to the AWS Management Console
A corporate web application is deployed within an Amazon Virtual Private Cloud (VPC) and is connected to the corporate data center via an iPsec VPN. The application must authenticate against the on-premises LDAP server. After authentication, each logged-in user can only access an Amazon Simple Storage Space (S3) keyspace specific to that user. Which two approaches can satisfy these objectives? (Choose 2 answers) [PROFESSIONAL]
1. Develop an identity broker that authenticates against IAM security Token service to assume a IAM role in order to get temporary AWS security credentials. The application calls the identity broker to get AWS temporary security credentials with access to the appropriate S3 bucket. (Needs to authenticate against LDAP and not IAM)
2. The application authenticates against LDAP and retrieves the name of an IAM role associated with the user. The application then calls the IAM Security Token Service to assume that IAM role. The application can use the temporary credentials to access the appropriate S3 bucket. (Authenticates with LDAP and calls the AssumeRole)
3. Develop an identity broker that authenticates against LDAP and then calls IAM Security Token Service to get IAM federated user credentials The application calls the identity broker to get IAM federated user credentials with access to the appropriate S3 bucket. (Custom Identity broker implementation, with authentication with LDAP and using federated token)
4. The application authenticates against LDAP the application then calls the AWS identity and Access Management (IAM) Security Token service to log in to IAM using the LDAP credentials the application can use the IAM temporary credentials to access the appropriate S3 bucket. (Can’t login to IAM using LDAP credentials)
5. The application authenticates against IAM Security Token Service using the LDAP credentials the application uses those temporary AWS security credentials to access the appropriate S3 bucket. (Need to authenticate with LDAP)
Company B is launching a new game app for mobile devices. Users will log into the game using their existing social media account to streamline data capture. Company B would like to directly save player data and scoring information from the mobile app to a DynamoDB table named Score Data When a user saves their game the progress data will be stored to the Game state S3 bucket. what is the best approach for storing data to DynamoDB and S3? [PROFESSIONAL]
1. Use an EC2 Instance that is launched with an EC2 role providing access to the Score Data DynamoDB table and the GameState S3 bucket that communicates with the mobile app via web services.
2. Use temporary security credentials that assume a role providing access to the Score Data DynamoDB table and the Game State S3 bucket using web identity federation
3. Use Login with Amazon allowing users to sign in with an Amazon account providing the mobile app with access to the Score Data DynamoDB table and the Game State S3 bucket.
4. Use an IAM user with access credentials assigned a role providing access to the Score Data DynamoDB table and the Game State S3 bucket for distribution with the mobile app.
A user has created a mobile application which makes calls to DynamoDB to fetch certain data. The application is using the DynamoDB SDK and root account access/secret access key to connect to DynamoDB from mobile. Which of the below mentioned statements is true with respect to the best practice for security in this scenario?
1. User should create a separate IAM user for each mobile application and provide DynamoDB access with it
2. User should create an IAM role with DynamoDB and EC2 access. Attach the role with EC2 and route all calls from the mobile through EC2
3. The application should use an IAM role with web identity federation which validates calls to DynamoDB with identity providers, such as Google, Amazon, and Facebook
4. Create an IAM Role with DynamoDB access and attach it with the mobile application

You are managing the AWS account of a big organization. The organization has more than 1000+ employees and they want to provide access to the various services to most of the employees. Which of the below mentioned options is the best possible solution in this case?
1. The user should create a separate IAM user for each employee and provide access to them as per the policy
2. The user should create an IAM role and attach STS with the role. The user should attach that role to the EC2 instance and setup AWS authentication on that server
3. The user should create IAM groups as per the organization’s departments and add each user to the group for better access control
4. Attach an IAM role with the organization’s authentication service to authorize each user for various AWS services
Your fortune 500 company has under taken a TCO analysis evaluating the use of Amazon S3 versus acquiring more hardware The outcome was that all employees would be granted access to use Amazon S3 for storage of their personal documents. Which of the following will you need to consider so you can set up a solution that incorporates single sign-on from your corporate AD or LDAP directory and restricts access for each user to a designated user folder in a bucket? (Choose 3 Answers) [PROFESSIONAL]
1. Setting up a federation proxy or identity provider
2. Using AWS Security Token Service to generate temporary tokens
3. Tagging each folder in the bucket
4. Configuring IAM role
5. Setting up a matching IAM user for every user in your corporate directory that needs access to a folder in the bucket
An AWS customer is deploying a web application that is composed of a front-end running on Amazon EC2 and of confidential data that is stored on Amazon S3. The customer security policy that all access operations to this sensitive data must be authenticated and authorized by a centralized access management system that is operated by a separate security team. In addition, the web application team that owns and administers the EC2 web front-end instances is prohibited from having any ability to access the data that circumvents this centralized access management system. Which of the following configurations will support these requirements? [PROFESSIONAL]
1. Encrypt the data on Amazon S3 using a CloudHSM that is operated by the separate security team. Configure the web application to integrate with the CloudHSM for decrypting approved data access operations for trusted end-users. (S3 doesn’t integrate directly with CloudHSM, also there is no centralized access management system control)
2. Configure the web application to authenticate end-users against the centralized access management system. Have the web application provision trusted users STS tokens entitling the download of approved data directly from Amazon S3 (Controlled access and admins cannot access the data as it needs authentication)
3. Have the separate security team create and IAM role that is entitled to access the data on Amazon S3. Have the web application team provision their instances with this role while denying their IAM users access to the data on Amazon S3 (Web team would have access to the data)
4. Configure the web application to authenticate end-users against the centralized access management system using SAML. Have the end-users authenticate to IAM using their SAML token and download the approved data directly from S3. (not the way SAML auth works and not sure if the centralized access management system is SAML complaint)
What is web identity federation?
1. Use of an identity provider like Google or Facebook to become an AWS IAM User.
2. Use of an identity provider like Google or Facebook to exchange for temporary AWS security credentials.
3. Use of AWS IAM User tokens to log in as a Google or Facebook user.
4. Use of AWS STS Tokens to log in as a Google or Facebook user.

Games-R-Us is launching a new game app for mobile devices. Users will log into the game using their existing Facebook account and the game will record player data and scoring information directly to a DynamoDB table. What is the most secure approach for signing requests to the DynamoDB API?
1. Create an IAM user with access credentials that are distributed with the mobile app to sign the requests
2. Distribute the AWS root account access credentials with the mobile app to sign the requests
3. Request temporary security credentials using web identity federation to sign the requests
4. Establish cross account access between the mobile app and the DynamoDB table to sign the requests
You are building a mobile app for consumers to post cat pictures online. You will be storing the images in AWS S3. You want to run the system very cheaply and simply. Which one of these options allows you to build a photo sharing application without needing to worry about scaling expensive uploads processes, authentication/authorization and so forth?
1. Build the application out using AWS Cognito and web identity federation to allow users to log in using Facebook or Google Accounts. Once they are logged in, the secret token passed to that user is used to directly access resources on AWS, like AWS S3. (Amazon Cognito is a superset of the functionality provided by web identity federation. Refer link)
2. Use JWT or SAML compliant systems to build authorization policies. Users log in with a username and password, and are given a token they can use indefinitely to make calls against the photo infrastructure.
3. Use AWS API Gateway with a constantly rotating API Key to allow access from the client-side. Construct a custom build of the SDK and include S3 access in it.
4. Create an AWS oAuth Service Domain ad grant public signup and access to the domain. During setup, add at least one major social media site as a trusted Identity Provider for users.
The Marketing Director in your company asked you to create a mobile app that lets users post sightings of good deeds known as random acts of kindness in 80-character summaries. You decided to write the application in JavaScript so that it would run on the broadest range of phones, browsers, and tablets. Your application should provide access to Amazon DynamoDB to store the good deed summaries. Initial testing of a prototype shows that there aren’t large spikes in usage. Which option provides the most cost-effective and scalable architecture for this application? [PROFESSIONAL]
1. Provide the JavaScript client with temporary credentials from the Security Token Service using a Token Vending Machine (TVM) on an EC2 instance to provide signed credentials mapped to an Amazon Identity and Access Management (IAM) user allowing DynamoDB puts and S3 gets. You serve your mobile application out of an S3 bucket enabled as a web site. Your client updates DynamoDB. (Single EC2 instance not a scalable architecture)
2. Register the application with a Web Identity Provider like Amazon, Google, or Facebook, create an IAM role for that provider, and set up permissions for the IAM role to allow S3 gets and DynamoDB puts. You serve your mobile application out of an S3 bucket enabled as a web site. Your client updates DynamoDB. (Can work with JavaScript SDK, is scalable and cost effective)
3. Provide the JavaScript client with temporary credentials from the Security Token Service using a Token Vending Machine (TVM) to provide signed credentials mapped to an IAM user allowing DynamoDB puts. You serve your mobile application out of Apache EC2 instances that are load-balanced and autoscaled. Your EC2 instances are configured with an IAM role that allows DynamoDB puts. Your server updates DynamoDB. (Is Scalable but Not cost effective)
4. Register the JavaScript application with a Web Identity Provider like Amazon, Google, or Facebook, create an IAM role for that provider, and set up permissions for the IAM role to allow DynamoDB puts. You serve your mobile application out of Apache EC2 instances that are load-balanced and autoscaled. Your EC2 instances are configured with an IAM role that allows DynamoDB puts. Your server updates DynamoDB. (Is Scalable but Not cost effective)

References

AWS IAM User Guide – Id Role Providers

AWS Snow Family

October 9, 2021 ~ Last updated on : November 24, 2022 ~ jayendrapatil

AWS Snow Family

AWS Snow Family helps physically transport up to exabytes of data into and out of AWS.

AWS Snow Family helps customers that need to run operations in austere, non-data center environments, and in locations where there’s a lack of consistent network connectivity.
Snow Family devices are AWS owned & managed and integrate with AWS security, monitoring, storage management, and computing capabilities.

AWS Snow Family, comprised of AWS Snowcone, AWS Snowball, and AWS Snowmobile, offers a number of physical devices and capacity points, most with built-in computing capabilities.

AWS Snowcone

AWS Snowcone is portable, rugged, and secure that provides edge computing and data transfer devices.
Snowcone can be used to collect, process, and move data to AWS, either offline by shipping the device, or online with AWS DataSync.

AWS Snowcone stores data securely in edge locations, and can run edge computing workloads that use AWS IoT Greengrass or EC2 instances.
Snowcone devices are small and weigh 4.5 lbs. (2.1 kg), so you can carry one in a backpack or fit it in tight spaces for IoT, vehicular, or even drone use cases.

AWS Snowball

AWS Snowball is a data migration and edge computing device that comes in two device options:
- Compute Optimized
  - Snowball Edge Compute Optimized devices provide 52 vCPUs, 42 terabytes of usable block or object storage, and an optional GPU for use cases such as advanced machine learning and full-motion video analysis in disconnected environments.
- Storage Optimized.
  - Snowball Edge Storage Optimized devices provide 40 vCPUs of compute capacity coupled with 80 terabytes of usable block or S3-compatible object storage.
  - It is well-suited for local storage and large-scale data transfer.
Customers can use these two options for data collection, machine learning and processing, and storage in environments with intermittent connectivity (such as manufacturing, industrial, and transportation) or in extremely remote locations (such as military or maritime operations) before shipping it back to AWS.

Snowball devices may also be rack mounted and clustered together to build larger, temporary installations.

AWS Snowmobile

AWS Snowmobile moves up to 100 PB of data in a 45-foot long ruggedized shipping container and is ideal for multi-petabyte or Exabyte-scale digital media migrations and data center shutdowns.
A Snowmobile arrives at the customer site and appears as a network-attached data store for more secure, high-speed data transfer.

After data is transferred to Snowmobile, it is driven back to an AWS Region where the data is loaded into S3.
Snowmobile is tamper-resistant, waterproof, and temperature controlled with multiple layers of logical and physical security – including encryption, fire suppression, dedicated security personnel, GPS tracking, alarm monitoring, 24/7 video surveillance, and an escort security vehicle during transit.

AWS Snow Family Feature Comparision

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A company wants to transfer petabyte-scale of data to AWS for their analytics, however are constrained on their internet connectivity? Which AWS service can help them transfer the data quickly?
1. S3 enhanced uploader
2. Snowmobile
3. Snowball
4. Direct Connect
A company wants to transfer its video library data, which runs in exabytes, to AWS. Which AWS service can help the company transfer the data?
1. Snowmobile
2. Snowball
3. S3 upload
4. S3 enhanced uploader

References

AWS_Snow

Using AWS in a Hybrid Environment

September 27, 2021 ~ Last updated on : October 6, 2021 ~ jayendrapatil

Using AWS in a Hybrid Environment

As the adoption of the public cloud grows, more and more people are finding themselves adopting a hybrid cloud model. This adoption is being driven out of necessity more than any architectural design decision. When you start to look closely at what makes a hybrid cloud, it’s increasingly understandable why more and more people are using AWS in this hybrid model.

Of course, when we think about why you would use a hybrid cloud, it’s the mixture of different computing models that is the most obvious reason. With a hybrid cloud model consisting of on-premises infrastructures, private cloud services, and public cloud offerings, and being able to orchestrate the deployment of workloads and components across these, it becomes increasingly easy to distribute an application in a resilient manner.

AWS services

AWS provides so many different services from IaaS virtual machines, PaaS relational database offerings, through to all different types of storage offerings at extremely low costs. When we start to look at how we can leverage AWS in a hybrid cloud world, it becomes extremely obvious that storage is at the forefront of these decisions, but the key driver behind everything in the cloud is agility. It’s a buzzword that has been around for quite some time, but the ability to spin up and destroy workloads on-demand without having to invest a large amount of capital to deliver a new service or capability to a business is what is driving this hybrid cloud adoption. Of course, when you start to introduce different platforms into your already existing environment, complexities can exist. Overcoming these complexities is what makes a successful hybrid cloud implementation. Considerations around security, authentication, networking, and connectivity must be looked at. In short, these considerations are the same as when a new data center is implemented, it is just simply a different platform.

Veeam, AWS, and the Hybrid Cloud

As more and more businesses adopt this hybrid cloud approach, protecting and migrating workloads across these different platforms becomes extremely complex. This is where Veeam can help businesses deliver on the true promise of a hybrid cloud. Veeam offers multiple products that can be used individually in a modular way, to provide data protection and management of individual resources and services or combined to provide a centralized data management solution.

Let’s look at a real-world scenario. In this example, we have some workloads running out in AWS that need to be migrated to our on-premises data center. Maybe we are facing latency issues, or we have a security requirement for this application to be closer to some services running on-premises. Using Veeam, we can easily protect those workloads and migrate those to another data center.

Using AWS in a Hybrid Environment - Veeam The diagram above shows how simple this can be carried out. By protecting workloads in AWS and using Veeam, you can simply move workloads across different platforms. It doesn’t matter which direction you want to move workloads either, you can just as easily take a virtual machine running on VMware vSphere or Microsoft Hyper-V and migrate that to AWS EC2 as an instance. You can move workloads across multiple platforms or hypervisors extremely easily.

Summary

Introducing and implementing a hybrid cloud with AWS may be daunting, but it needn’t be complex. By taking a considered approach to aspects such as networking, connectivity, and migration, leveraging AWS in a hybrid cloud model with your existing on-premises implementation can provide anyone with an agile, simple, quick, and easy approach to delivering new services and capabilities. Combine that with products from companies like Veeam, and implementing a true hybrid cloud data management solution is extremely simple, providing you with the flexibility of moving workloads across multiple platforms, while implementing a reliable service to the end customers of the business.

For more information on Veeam, please visit the Veeam Backup for AWS page. Also, don’t forget to check “Choose Your Cloud Adventure” interactive e-book to learn how to manage your AWS data like a hero

AWS Elastic File Store – EFS

September 24, 2021 ~ Last updated on : November 28, 2022 ~ jayendrapatil

Elastic File Store – EFS

Elastic File Store – EFS provides a simple, fully managed, easy to set up, scalable, serverless, and cost-optimized file storage for use with AWS Cloud and on-premises resources.

can automatically scale from gigabytes to petabytes of data without needing to provision storage.
provides managed NFS (network file system) that can be mounted on and accessed by multiple EC2 in multiple AZs simultaneously.

offers highly durable, highly scalable, and highly available.
- stores data redundantly across multiple AZs in the same region
- grows and shrinks automatically as files are added and removed, so there is no need to manage storage procurement or provisioning.

supports the Network File System version 4 (NFSv4.1 and NFSv4.0) protocol
provides file system access semantics, such as strong data consistency and file locking
is compatible with all Linux-based AMIs for EC2, POSIX file system (~Linux) that has a standard file API

is a shared POSIX system for Linux systems and does not work for Windows
offers the ability to encrypt data at rest using KMS and in transit.
can be accessed from on-premises using an AWS Direct Connect or AWS VPN connection between the on-premises datacenter and VPC.

can be accessed concurrently from servers in the on-premises data center as well as EC2 instances in the VPC

EFS Storage Classes

Standard storage classes

EFS Standard and Standard-Infrequent Access (Standard-IA), offer multi-AZ resilience and the highest levels of durability and availability.

For file systems using Standard storage classes, a mount target can be created in each availability Zone in the AWS Region.
Standard
- regional storage class for frequently accessed data.
- offers the highest levels of availability and durability by storing file system data redundantly across multiple AZs in an AWS Region.
- ideal for active file system workloads and you pay only for the file system storage you use per month
Standard-Infrequent Access (Standard-IA)
- regional, low-cost storage class that’s cost-optimized for files infrequently accessed i.e. not accessed every day
- offers the highest levels of availability and durability by storing file system data redundantly across multiple AZs in an AWS Region
- cost to retrieve files, lower price to store

EFS Regional

One Zone storage classes

EFS One Zone and One Zone-Infrequent Access (One Zone-IA) offer additional savings by saving the data in a single AZ.
For file systems using One Zone storage classes, only a single mount target that is in the same Availability Zone as the file system needs to be created.

EFS One Zone
- For frequently accessed files stored redundantly within a single AZ in an AWS Region.
EFS One Zone-IA (One Zone-IA)
- A lower-cost storage class for infrequently accessed files stored redundantly within a single AZ in an AWS Region.

EFS Zonal

EFS Lifecycle Management

EFS lifecycle management automatically manages cost-effective file storage for the file systems.

When enabled, lifecycle management migrates files that haven’t been accessed for a set period of time to an infrequent access storage class, Standard-IA or One Zone-IA
Lifecycle Management automatically moves the data to the EFS IA storage class according to the lifecycle policy. for e.g., you can move files automatically into EFS IA fourteen days after not being accessed.
Lifecycle management uses an internal timer to track when a file was last accessed and not the POSIX file system attribute that is publicly viewable.

Whenever a file in Standard or One Zone storage is accessed, the lifecycle management timer is reset.
After lifecycle management moves a file into one of the IA storage classes, the file remains there indefinitely if EFS Intelligent-Tiering is not enabled.

EFS Performance Modes

General Purpose (Default)

latency-sensitive use cases

ideal for web serving environments, content management systems, home directories, and general file serving, etc.

Max I/O

can scale to higher levels of aggregate throughput and operations per second.
with a tradeoff of slightly higher latencies for file metadata operations

ideal for highly parallelized applications and workloads, such as big data analysis, media processing, and genomic analysis
is not available for file systems using One Zone storage classes.

EFS Throughput Modes

Provisioned Throughput

throughput of the file system (in MiB/s) can be instantly provisioned independent of the amount of data stored.

Bursting Throughput

throughput on EFS scales as the size of the file system in the EFS Standard or One Zone storage class grows

EFS Security

EFS supports authentication, authorization, and encryption capabilities to help meet security and compliance requirements.
EFS supports two forms of encryption for file systems,
- Encryption in transit
  - Encryption in Transit can be enabled when you mount the file system.
- Encryption at rest.
  - encrypts all the data and metadata
  - can be enabled only when creating an EFS file system.
  - to encrypt an existing unencrypted EFS file system, create a new encrypted EFS file system, and migrate the data using AWS DataSync.

NFS client access to EFS is controlled by both AWS IAM policies and network security policies like security groups.

EFS Access Points

EFS access points are application-specific entry points into an EFS file system that make it easier to manage application access to shared datasets.
Access points can enforce a user identity, including the user’s POSIX groups, for all file system requests that are made through the access point.

Access points can enforce a different root directory for the file system so that clients can only access data in the specified directory or its subdirectories.
AWS IAM policies can be used to enforce that specific application use a specific access point.
IAM policies with access points provide secure access to specific datasets for the applications.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

An administrator runs a highly available application in AWS. A file storage layer is needed that can share between instances and scale the platform more easily. The storage should also be POSIX compliant. Which AWS service can perform this action?
1. Amazon EBS
2. Amazon S3
3. Amazon EFS
4. Amazon EC2 Instance store

References

AWS_Elastic_File_Store_EFS

AWS CloudFormation Helper Scripts

September 19, 2021 ~ Last updated on : September 21, 2021 ~ jayendrapatil

AWS CloudFormation Helper Scripts

AWS CloudFormation helper scripts can be used to install software and start services on an EC2 instance created as a part of the stack

CloudFormation Helper scripts aren’t executed by default and calls must be included in the template to execute specific helper scripts.
CloudFormation helper scripts are preinstalled on Amazon Linux AMI images.

cfn-init

cfn-init can be used to retrieve and interpret resource metadata, install packages, create files, and start services.
cfn-init helper script reads template metadata from the AWS::CloudFormation::Init key and acts accordingly to:
- Fetch and parse metadata from CloudFormation
- Install packages
- Write files to disk
- Enable/disable and start/stop services

cfn-signal

cfn-signal can be used to signal with a CreationPolicy or WaitCondition, so you can synchronize other resources in the stack when the prerequisite resource or application is ready.
cfn-signal script is used in conjunction with a CreationPolicy or an Auto Scaling group with a WaitOnResourceSignals update policy.
When CloudFormation creates or updates resources with those policies, it suspends work on the stack until the resource receives the requisite number of signals or until the timeout period is exceeded.

For each valid signal that CloudFormation receives, CloudFormation publishes the signals to the stack events so that you track each signal.

Troubleshoot Failed to receive X resource signal(s) within the specified duration

cfn-signalscript isn’t installed on one or more instances of the AWS CloudFormation stack.
There are syntax errors or incorrect values in the AWS CloudFormation template

Value of the Timeout property for the CreationPolicy attribute is too low.
Check the logs /var/log/cloud-init.log and /var/log/cfn-init.log
Logs can be checked only if the instance is not terminated, by using Rollback on failure option of the AWS CloudFormation stack to No

cfn-signal isn’t sent from the EC2 instance.
Verify the instances have internet connectivity

cfn-get-metadata

cfn-get-metadata helper script helps to retrieve metadata for a resource or path to a specific key.

cfn-get-metadata helper script can be used to fetch a metadata block from CloudFormation and print it to standard out.
You can also print a sub-tree of the metadata block if you specify a key.
However, only top-level keys are supported.

cfn-hup

Use to check for updates to metadata and execute custom hooks when changes are detected.
cfn-hup helper is a daemon that detects changes in resource metadata and runs user-specified actions when a change is detected.
This allows you to make configuration updates on the running EC2 instances through the UpdateStack API action.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Which of these is not a CloudFormation Helper Script?
1. cfn-signal
2. cfn-hup
3. cfn-request (Refer link)
4. cfn-get-metadata
You are designing a CloudFormation template to create a set of EC2 Instance and install an application package. You need to ensure that the stack is only successful if the software package gets installed successfully. Which of the following would assign in achieving this requirement?
1. Use the Change sets feature
2. Use CloudWatch logs to signal the completion
3. Use CloudTrail to signal the completion
4. Use the cfn-signal helper script
You are in charge of designing a CloudFormation template, which deploys a LAMP stack. After deploying a stack, you see that the status of the stack is showing as CREATE_COMPLETE, but the apache server is still not up and running and is experiencing issues while starting up. You want to ensure that the stack creation only shows the status of CREATE_COMPLETE after all resources defined in the stack are up and running. How can you achieve this? (Select TWO)
1. Define a stack policy, which defines that all underlying resources should be up and running before showing a status of
  CREATE_COMPLETE.
2. Use lifecycle hooks to mark the completion of the creation and configuration of the underlying resource.
3. Use the CreationPolicy to ensure it is associated with the EC2 Instance resource.
4. Use the cfn helper scripts to signal once the resource configuration is complete.

References

AWS_CloudFormation_Helper_Scripts

Google Cloud Certified – Cloud Digital Leader Learning Path

Google Cloud Certified - Cloud Digital Leader Certificate

September 13, 2021 ~ Last updated on : March 30, 2022 ~ jayendrapatil

Google Cloud – Cloud Digital Leader Certification Learning Path

Continuing on the Google Cloud Journey, glad to have passed the seventh certification with the Professional Cloud Digital Leader certification. Google Cloud was missing the initial entry-level certification similar to AWS Cloud Practitioner certification, which was introduced as the Cloud Digital Leader certification. Cloud Digital Leader focuses on general Cloud knowledge, Google Cloud knowledge with its products and services.

Google Cloud – Cloud Digital Leader Certification Summary

Had 59 questions (somewhat odd !!) to be answered in 90 minutes.
Covers a wide range of General Cloud and Google Cloud services and products knowledge.

This exam does not require much Hands-on and theoretical knowledge is good enough to clear the exam.

Google Cloud – Cloud Digital Leader Certification Resources

Google Cloud – Cloud Digital Leader Exam Guide
Courses
- Udemy Google Cloud Digital Leader Certification For Beginners
- Udemy GCP for Beginners – Become a Google Cloud Digital Leader
- Coursera – Google Cloud Digital Leader Training

Practice tests
- Braincert Google Cloud Certified – Cloud Digital Leader Practice Exams

Google Cloud – Cloud Digital Leader Certification Topics

General cloud knowledge

Define basic cloud technologies. Considerations include:
1. Differentiate between traditional infrastructure, public cloud, and private cloud
  1. Traditional infrastructure includes on-premises data centers
  2. Public cloud include Google Cloud, AWS, and Azure
  3. Private Cloud includes services like AWS Outpost
2. Define cloud infrastructure ownership
3. Shared Responsibility Model
  1. Security of the Cloud is Google Cloud’s responsibility
  2. Security on the Cloud depends on the services used and is shared between Google Cloud and the Customer
4. Essential characteristics of cloud computing
  1. On-demand computing
  2. Pay-as-you-use
  3. Scalability and Elasticity
  4. High Availability and Resiliency
  5. Security
Differentiate cloud service models. Considerations include:
1. Infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS)
  1. IaaS – everything is done by you – more flexibility more management
  2. PaaS – most of the things are done by Cloud with few things done by you – moderate flexibility and management
  3. SaaS – everything is taken care of by the Cloud, you would just it – no flexibility and management
2. Describe the trade-offs between level of management versus flexibility when comparing cloud services
3. Define the trade-offs between costs versus responsibility
4. Appropriate implementation and alignment with given budget and resources
Identify common cloud procurement financial concepts. Considerations include:
1. Operating expenses (OpEx), capital expenditures (CapEx), and total cost of operations (TCO)
  1. On-premises has more of Capex and less OpEx
  2. Cloud has no to least Capex and more of OpEx
2. Recognize the relationship between OpEx and CapEx related to networking and compute infrastructure
3. Summarize the key cost differentiators between cloud and on-premises environments

General Google Cloud knowledge

Recognize how Google Cloud meets common compliance requirements. Considerations include:
1. Locating current Google Cloud compliance requirements
2. Familiarity with Compliance Reports Manager
Recognize the main elements of Google Cloud resource hierarchy. Considerations include:
1. Describe the relationship between organization, folders, projects, and resources i.e. Organization -> Folder -> Folder or Projects -> Resources

Describe controlling and optimizing Google Cloud costs. Considerations include:
1. Google Cloud billing models and applicability to different service classes
2. Define a consumption-based use model
3. Application of discounts (e.g., flat-rate, committed-use discounts [CUD], sustained-use discounts [SUD])
  1. Sustained-use discounts [SUD] are automatic discounts for running specific resources for a significant portion of the billing month
  2. Committed use discounts [CUD] help with committed use contracts in return for deeply discounted prices for VM usage

Describe Google Cloud’s geographical segmentation strategy. Considerations include:
1. Regions are collections of zones. Zones have high-bandwidth, low-latency network connections to other zones in the same region. Regions help design fault-tolerant and highly available solutions.
2. Zones are deployment areas within a region and provide the lowest latency usually less than 10ms
3. Regional resources are accessible by any resources within the same region
4. Zonal resources are hosted in a zone are called per-zone resources.
5. Multiregional resources or Global resources are accessible by any resource in any zone within the same project.

Define Google Cloud support options. Considerations include:
1. Distinguish between billing support, technical support, role-based support, and enterprise support
  1. Role-Based Support provides more predictable rates and a flexible configuration. Although they are legacy, the exam does cover these.
  2. Enterprise Support provides the fastest case response times and a dedicated Technical Account Management (TAM) contact who helps you execute a Google Cloud strategy.
2. Recognize a variety of Service Level Agreement (SLA) applications

Google Cloud products and services

Describe the benefits of Google Cloud virtual machine (VM)-based compute options. Considerations include:
1. Compute Engine provides virtual machines (VM) hosted on Google’s infrastructure.
2. Google Cloud VMware Engine helps easy lift and shift VMware-based applications to Google Cloud without changes to the apps, tools, or processes
3. Bare Metal lets businesses run specialized workloads such as Oracle databases close to Google Cloud while lowering overall costs and reducing risks associated with migration
4. Custom versus standard sizing
5. Free, premium, and custom service options
6. Attached storage/disk options
7. Preemptible VMs is an instance that can be created and run at a much lower price than normal instances.
Identify and evaluate container-based compute options. Considerations include:
1. Define the function of a container registry
  1. Container Registry is a single place to manage Docker images, perform vulnerability analysis, and decide who can access what with fine-grained access control.
2. Distinguish between VMs, containers, and Google Kubernetes Engine
Identify and evaluate serverless compute options. Considerations include:
1. Define the function and use of App Engine, Cloud Functions, and Cloud Run
2. Define rationale for versioning with serverless compute options
3. Cost and performance tradeoffs of scale to zero
  1. Scale to zero helps provides cost efficiency by scaling down to zero when there is no load but comes with an issue with cold starts
  2. Serverless technologies like Cloud Functions, Cloud Run, App Standard Engine provides these capabilities
Identify and evaluate multiple data management offerings. Considerations include:
1. Describe the differences and benefits of Google Cloud’s relational and non-relational database offerings
  1. Cloud SQL provides fully managed, relational SQL databases and offers MySQL, PostgreSQL, MSSQL databases as a service
  2. Cloud Spanner provides fully managed, relational SQL databases with joins and secondary indexes
  3. Cloud Bigtable provides a scalable, fully managed, non-relational NoSQL wide-column analytical big data database service suitable for low-latency single-point lookups and precalculated analytics
  4. BigQuery provides fully managed, no-ops, OLAP, enterprise data warehouse (EDW) with SQL and fast ad-hoc queries.
2. Describe Google Cloud’s database offerings and how they compare to commercial offerings

Distinguish between ML/AI offerings. Considerations include:
1. Describe the differences and benefits of Google Cloud’s hardware accelerators (e.g., Vision API, AI Platform, TPUs)
2. Identify when to train your own model, use a Google Cloud pre-trained model, or build on an existing model
  1. Vision API provides out-of-the-box pre-trained models to extract data from images
  2. AutoML provides the ability to train models
  3. BigQuery Machine Learning provides support for limited models and SQL interface

Differentiate between data movement and data pipelines. Considerations include:
1. Describe Google Cloud’s data pipeline offerings
  1. Cloud Pub/Sub provides reliable, many-to-many, asynchronous messaging between applications. By decoupling senders and receivers, Google Cloud Pub/Sub allows developers to communicate between independently written applications.
  2. Cloud Dataflow is a fully managed service for strongly consistent, parallel data-processing pipelines
  3. Cloud Data Fusion is a fully managed, cloud-native, enterprise data integration service for quickly building & managing data pipelines
  4. BigQuery Service is a fully managed, highly scalable data analysis service that enables businesses to analyze Big Data.
  5. Looker provides an enterprise platform for business intelligence, data applications, and embedded analytics.
2. Define data ingestion options
Apply use cases to a high-level Google Cloud architecture. Considerations include:
1. Define Google Cloud’s offerings around the Software Development Life Cycle (SDLC)
2. Describe Google Cloud’s platform visibility and alerting offerings covers Cloud Monitoring and Cloud Logging
Describe solutions for migrating workloads to Google Cloud. Considerations include:
1. Identify data migration options
2. Differentiate when to use Migrate for Compute Engine versus Migrate for Anthos
  1. Migrate for Compute Engine provides fast, flexible, and safe migration to Google Cloud
  2. Migrate for Anthos and GKE makes it fast and easy to modernize traditional applications away from virtual machines and into native containers. This significantly reduces the cost and labor that would be required for a manual application modernization project.
3. Distinguish between lift and shift versus application modernization
  1. involves lift and shift migration with zero to minimal changes and is usually performed with time constraints
  2. Application modernization requires a redesign of infra and applications and takes time. It can include moving legacy monolithic architecture to microservices architecture, building CI/CD pipelines for automated builds and deployments, frequent releases with zero downtime, etc.
Describe networking to on-premises locations. Considerations include:
1. Define Software-Defined WAN (SD-WAN) – did not have any questions regarding the same.
2. Determine the best connectivity option based on networking and security requirements – covers Cloud VPN, Interconnect, and Peering.
3. Private Google Access provides access from VM instances to Google provides services like Cloud Storage or third-party provided services

Define identity and access features. Considerations include:
1. Cloud Identity & Access Management (Cloud IAM) provides administrators the ability to manage cloud resources centrally by controlling who can take what action on specific resources.
2. Google Cloud Directory Sync enables administrators to synchronize users, groups, and other data from an Active Directory/LDAP service to their Google Cloud domain directory.

Google Cloud Compute Options

August 30, 2021 ~ Last updated on : August 30, 2021 ~ jayendrapatil

Google Cloud Compute Options

Compute Engine

provides Infrastructure as a Service (IaaS) in the Google Cloud

provides full control/flexibility on the choice of OS, resources like CPU and memory

Usage patterns
- lift and shift migrations of existing systems
- existing VM images to move to the cloud
- need low-level access to or fine-grained control of the operating system, network, and other operational characteristics.
- require custom kernel or arbitrary OS
- software that can’t be easily containerized
- using a third party licensed software
Usage anti-patterns
- containerized applications – Choose App Engine, GKE, or Cloud Run
- stateless event-driven applications – Choose Cloud Functions

App Engine

helps build highly scalable web and mobile backend applications on a fully managed serverless platform

Usage patterns
- Rapidly developing CRUD-heavy applications
- HTTP/S based applications
- Deploying complex APIs
Usage anti-patterns
- Stateful applications requiring lots of in-memory states to meet the performance or functional requirements
- Systems that require protocols other than HTTP

Google Kubernetes Engine – GKE

provides a managed environment for deploying, managing, and scaling containerized applications using Google infrastructure.

Usage patterns
- containerized applications or those that can be easily containerized
- Hybrid or multi-cloud environments
- Systems leveraging stateful and stateless services
- Strong CI/CD Pipelines
Usage anti-patterns
- non-containerized applications – Choose CE or App engine
- applications requiring very low-level access to the underlying hardware like custom kernel, networking, etc. – Choose CE
- stateless event-driven applications – Choose Cloud Functions

Cloud Run

provides a serverless managed compute platform to run stateless, isolated containers without orchestration that can be invoked via web requests or Pub/Sub events.
abstracts away all infrastructure management allowing users to focus on building great applications.

is built from Knative.
Usage patterns
- Stateless services that are easily containerized
- Event-driven applications and systems
- Applications that require custom system and language dependencies
Usage anti-patterns
- Highly stateful systems
- Systems that require protocols other than HTTP
- Compliance requirements that demand strict controls over the low-level environment and infrastructure (might be okay with the Knative GKE mode)

Cloud Functions

provides serverless compute for event-driven apps

Usage patterns
- ephemeral and event-driven applications and functions
- fully managed environment
- pay only for what you use
- quick data transformations (ETL)

Usage anti-patterns
- continuous stateful application – Choose CE, App Engine or GKE

Google Cloud Compute Options Decision Tree

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your organization is developing a new application. This application responds to events created by already running applications. The business goal for the new application is to scale to handle spikes in the flow of incoming events while minimizing administrative work for the team. Which Google Cloud product or feature should you choose?
1. Cloud Run
2. Cloud Run for Anthos
3. App Engine standard environment
4. Compute Engine

A company wants to build an application that stores images in a Cloud Storage bucket and wants to generate thumbnails as well as resize the images. They want to use managed service which will help them scale automatically from zero to scale and back to zero. Which GCP service satisfies the requirement?
1. Google Compute Engine
2. Google Kubernetes Engine
3. Google App Engine
4. Cloud Functions

Google Cloud Composer

August 25, 2021 ~ jayendrapatil

Cloud Composer

Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, enabling workflow creation that spans across clouds and on-premises data centers.

Cloud Composer requires no installation or has no management overhead.
Cloud Composer integrates with Cloud Logging and Cloud Monitoring to provide a central place to view all Airflow service and workflow logs.

Cloud Composer Components

Cloud Composer helps define a series of tasks as Workflow executed within an Environment
Workflows are created using DAGs or Direct Acyclic Graphs
A DAG is a collection of tasks that are scheduled and executed, organized in a way that reflects their relationships and dependencies.

DAGs are stored in Cloud Storage
Each Task can represent anything from ingestion, transform, filtering, monitoring, preparing, etc.
Environments are self-contained Airflow deployments based on Google Kubernetes Engine, and they work with other Google Cloud services using connectors built into Airflow.

Cloud Composer environment is a wrapper around Apache Airflow with components like GKE Cluster, Web Server, Database, Cloud Storage.

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate the entire pipeline?
1. Cloud Dataflow
2. Cloud Composer
3. Cloud Dataprep
4. Cloud Dataproc
Your company is working on a multi-cloud initiative. The data processing pipelines require creating workflows that connect data, transfer data, processing, and using services across clouds. What cloud-native tool should be used for orchestration?
1. Cloud Scheduler
2. Cloud Dataflow
3. Cloud Composer
4. Cloud Dataproc

Google Cloud Dataflow vs Dataproc

August 12, 2021 ~ jayendrapatil ~ 1 Comment

Google Cloud Dataflow vs Dataproc

Cloud Dataproc

Cloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of open-source data tools for batch processing, querying, streaming, and machine learning.

Cloud Dataproc provides a Hadoop cluster, on GCP, and access to Hadoop-ecosystem tools (e.g. Apache Pig, Hive, and Spark); this has strong appeal if already familiar with Hadoop tools and have Hadoop jobs
Ideal for Lift and Shift migration of existing Hadoop environment

Requires manual provisioning of clusters
Consider Dataproc
- If you have a substantial investment in Apache Spark or Hadoop on-premise and considering moving to the cloud
- If you are looking at a Hybrid cloud and need portability across a private/multi-cloud environment
- If in the current environment Spark is the primary machine learning tool and platform
- In case the code depends on any custom packages along with distributed computing need

Cloud Dataflow

Google Cloud Dataflow is a fully managed, serverless service for unified stream and batch data processing requirements
When using it as a pre-processing pipeline for ML model that can be deployed in GCP AI Platform Training (earlier called Cloud ML Engine)
None of the above considerations made for Cloud Dataproc is relevant

Cloud Dataflow vs Dataproc Decision Tree

Dataflow vs Dataproc

Dataflow vs Dataproc Table

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your company is forecasting a sharp increase in the number and size of Apache Spark and Hadoop jobs being run on your local data center. You want to utilize the cloud to help you scale this upcoming demand with the least amount of operations work and code change. Which product should you use?
1. Google Cloud Dataflow
2. Google Cloud Dataproc
3. Google Compute Engine
4. Google Container Engine

A startup plans to use a data processing platform, which supports both batch and streaming applications. They would prefer to have a hands-off/serverless data processing platform to start with. Which GCP service is suited for them?
1. Dataproc
2. Dataprep
3. Dataflow
4. BigQuery

References

https://learning.oreilly.com

Google Cloud BigQuery Data Transfer Service

August 11, 2021 ~ jayendrapatil

Cloud BigQuery Data Transfer Service

BigQuery Data Transfer Service automates data movement into BigQuery on a scheduled, managed basis

After a data transfer is configured, the BigQuery Data Transfer Service automatically loads data into BigQuery on a regular basis.
BigQuery Data Transfer Service can also initiate data backfills to recover from any outages or gaps.

BigQuery Data Transfer Service can only sink data to BigQuery and cannot be used to transfer data out of BigQuery.

BigQuery Data Transfer Service Sources

BigQuery Data Transfer Service supports loading data from the following data sources:
- Google Software as a Service (SaaS) apps
- Campaign Manager
- Cloud Storage
- Google Ad Manager
- Google Ads
- Google Merchant Center (beta)
- Google Play
- Search Ads 360 (beta)
- YouTube Channel reports
- YouTube Content Owner reports
- External cloud storage providers
  - Amazon S3
- Data warehouses
  - Teradata
  - Amazon Redshift

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your company uses Google Analytics for tracking. You need to export the session and hit data from a Google Analytics 360 reporting view on a scheduled basis into BigQuery for analysis. How can the data be exported?
1. Configure a scheduler in Google Analytics to convert the Google Analytics data to JSON format, then import directly into BigQuery using bq command line.
2. Use gsutil to export the Google Analytics data to Cloud Storage, then import into BigQuery and schedule it using Cron.
3. Import data to BigQuery directly from Google Analytics using Cron
4. Use BigQuery Data Transfer Service to import the data from Google Analytics

Reference

Google_Cloud_BigQuery_Transfer_Service