Google Cloud – EHR Healthcare Case Study

July 20, 2021 ~ Last updated on : February 2, 2022 ~ jayendrapatil

Google Cloud – EHR Healthcare Case Study

EHR Healthcare is a leading provider of electronic health record software to the medical industry. EHR Healthcare provides its software as a service to multi-national medical offices, hospitals, and insurance providers.

Executive statement

Our on-premises strategy has worked for years but has required a major investment of time and money in training our team on distinctly different systems, managing similar but separate environments, and responding to outages. Many of these outages have been a result of misconfigured systems, inadequate capacity to manage spikes in traffic, and inconsistent monitoring practices. We want to use Google Cloud to leverage a scalable, resilient platform that can span multiple environments seamlessly and provide a consistent and stable user experience that positions us for future growth.

EHR Healthcare wants to move to Google Cloud to expand, build scalable and highly available applications. It also wants to leverage automation and IaaC to provide consistency across environments and reduce provisioning errors.

Solution Concept

Due to rapid changes in the healthcare and insurance industry, EHR Healthcare’s business has been growing exponentially year over year. They need to be able to scale their environment, adapt their disaster recovery plan, and roll out new continuous deployment capabilities to update their software at a fast pace. Google Cloud has been chosen to replace its current colocation facilities.

EHR wants to build a scalable, Highly Available, Disaster Recovery setup and introduce Continous Integration and Deployment in their setup.

Existing Technical Environment

EHR’s software is currently hosted in multiple colocation facilities. The lease on one of the data centers is about to expire.
Customer-facing applications are web-based, and many have recently been containerized to run on a group of Kubernetes clusters. Data is stored in a mixture of relational and NoSQL databases (MySQL, MS SQL Server, Redis, and MongoDB).
EHR is hosting several legacy file- and API-based integrations with insurance providers on-premises. These systems are scheduled to be replaced over the next several years. There is no plan to upgrade or move these systems at the current time.
Users are managed via Microsoft Active Directory. Monitoring is currently being done via various open-source tools. Alerts are sent via email and are often ignored.

As the lease of one of the data centers is about to expire, time is critical
Some web applications are containerized and have SQL and NoSQL databases and can be moved
Some of the systems are legacy and would be replaced and need not be migrated

Team has multiple monitoring tools and might need consolidation

Business requirements

On-board new insurance providers as quickly as possible.

Provide a minimum 99.9% availability for all customer-facing systems.

Availability can be increased by hosting applications across multiple zones and using managed services which span multiple AZs

Provide centralized visibility and proactive action on system performance and usage.

Cloud Monitoring can be used to provide centralized visibility and alerting can provide proactive action

Cloud Logging can be also used for log monitoring and alerting

Increase ability to provide insights into healthcare trends.

Data can be pushed and analyzed using BigQuery and insights visualized using Data studio.

Reduce latency to all customers.

Performance can be improved using Global Load Balancer to expose the applications.
Applications can also be hosted across regions for low latency access.

Maintain regulatory compliance.

Regulatory compliance can be maintained using data localization, data retention policies as well as security measures.

Decrease infrastructure administration costs.

Infrastructure administration costs can be reduced using automation with either Terraform or Deployment Manager

Make predictions and generate reports on industry trends based on provider data.

Data can be pushed and analyzed using BigQuery.

Technical requirements

Maintain legacy interfaces to insurance providers with connectivity to both on-premises systems and cloud providers.

Provide a consistent way to manage customer-facing applications that are container-based.

Containers based applications can be deployed GKE or Cloud Run with consistent CI/CD experience

Provide a secure and high-performance connection between on-premises systems and Google Cloud.

Cloud VPN, Dedicated Interconnect, or Partner Interconnect connections can be established between on-premises and Google Cloud

Provide consistent logging, log retention, monitoring, and alerting capabilities.

Cloud Monitoring and Cloud Logging can be used to provide a single tool for monitoring, logging, and alerting.

Maintain and manage multiple container-based environments.

Use Deployment Manager or IaaC to provide consistent implementations across environments

Dynamically scale and provision new environments.

Applications deployed on GKE can be scaled using Cluster Autoscaler and HPA for deployments.

Create interfaces to ingest and process data from new providers.

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

For this question, refer to the EHR Healthcare case study. In the past, configuration errors put IP addresses on backend servers that should not have been accessible from the internet. You need to ensure that no one can put external IP addresses on backend Compute Engine instances and that external IP addresses can only be configured on the front end Compute Engine instances. What should you do?
1. Create an organizational policy with a constraint to allow external IP addresses on the front end Compute Engine instances
2. Revoke the compute.networkadmin role from all users in the project with front end instances
3. Create an Identity and Access Management (IAM) policy that maps the IT staff to the compute.networkadmin role for the organization
4. Create a custom Identity and Access Management (IAM) role named GCE_FRONTEND with the compute.addresses.create permission

EHR Healthcare References

EHR Healthcare Case Study

Google Cloud – HipLocal Case Study

June 18, 2021 ~ Last updated on : June 19, 2021 ~ jayendrapatil

Google Cloud – HipLocal Case Study

HipLocal is a community application designed to facilitate communication between people in close proximity. It is used for event planning and organizing sporting events, and for businesses to connect with their local communities. HipLocal launched recently in a few neighborhoods in Dallas and is rapidly growing into a global phenomenon. Its unique style of hyper-local community communication and business outreach is in demand around the world.

Key point here is HipLocal is expanding globally

HipLocal Solution Concept

HipLocal wants to expand their existing service with updated functionality in new locations to better serve their global customers. They want to hire and train a new team to support these locations in their time zones. They will need to ensure that the application scales smoothly and provides clear uptime data, and that they analyze and respond to any issues that occur.

Key points here are HipLocal wants to expand globally, with an ability to scale and provide clear observability, alerting and ability to react.

HipLocal Existing Technical Environment

HipLocal’s environment is a mixture of on-premises hardware and infrastructure running in Google Cloud. The HipLocal team understands their application well, but has limited experience in globally scaled applications. Their existing technical environment is as follows:

Existing APIs run on Compute Engine virtual machine instances hosted in Google Cloud.

Expand availability of the application to new locations.
Support 10x as many concurrent users.
State is stored in a single instance MySQL database in Google Cloud.

Release cycles include development freezes to allow for QA testing.
The application has no consistent logging.
Applications are manually deployed by infrastructure engineers during periods of slow traffic on weekday evenings.

There are basic indicators of uptime; alerts are frequently fired when the APIs are unresponsive.

Business requirements

HipLocal’s investors want to expand their footprint and support the increase in demand they are experiencing. Their requirements are:

Expand availability of the application to new locations.
- Availability can be achieved using either
  - scaling the application and exposing it through Global Load Balancer OR
  - deploying the applications across multiple regions.

Support 10x as many concurrent users.
- As the APIs run on Compute Engine, the scale can be implemented using Managed Instance Groups frontend by a Load Balancer OR App Engine OR Container-based application deployment
- Scaling policies can be defined to scale as per the demand.

Ensure a consistent experience for users when they travel to different locations.
- Consistent experience for the users can be provided using either
  - Google Cloud Global Load Balancer which uses GFE and routes traffic close to the users
  - multi-region setup targeting each region
Obtain user activity metrics to better understand how to monetize their product.
- User activity data can also be exported to BigQuery for analytics and monetization
- Cloud Monitoring and Logging can be configured for application logs and metrics to provide observability, alerting, and reporting.
- Cloud Logging can be exported to BigQuery for analytics
Ensure compliance with regulations in the new regions (for example, GDPR).
- Compliance is shared responsibility, while Google Cloud ensures compliance of its services, application hosted on Google Cloud would be customer responsibility
- GDPR or other regulations for data residency can be met using setup per region, so that the data resides with the region
Reduce infrastructure management time and cost.
- As the infrastructure is spread across on-premises and Google Cloud, it would make sense to consolidate the infrastructure into one place i.e. Google Cloud
- Consolidation would help in automation, maintenance, as well as provide cost benefits.
Adopt the Google-recommended practices for cloud computing:
- Develop standardized workflows and processes around application lifecycle management.
- Define service level indicators (SLIs) and service level objectives (SLOs).

Technical requirements

Provide secure communications between the on-premises data center and cloud hosted applications and infrastructure
- Secure communications can be enabled between the on-premise data centers and the Cloud using Cloud VPN and Interconnect.
The application must provide usage metrics and monitoring.
- Cloud Monitoring and Logging can be configured for application logs and metrics to provide observability, alerting, and reporting.

APIs require authentication and authorization.
- APIs can be configured for various Authentication mechanisms.
- APIs can be exposed through a centralized Cloud Endpoints gateway
- Internal Applications can be exposed using Cloud Identity-Aware Proxy
Implement faster and more accurate validation of new features.
- QA Testing can be improved using automated testing
- Production Release cycles can be improved using canary deployments to test the applications on a smaller base before rolling out to all.
- Application can be deployed to App Engine which supports traffic spilling out of the box for canary releases
Logging and performance metrics must provide actionable information to be able to provide debugging information and alerts.
- Cloud Monitoring and Logging can be configured for application logs and metrics to provide observability, alerting, and reporting.
- Cloud Logging can be exported to BigQuery for analytics
Must scale to meet user demand.
- As the APIs run on Compute Engine, the scale can be implemented using Managed Instance Groups frontend by a Load Balancer and using scaling policies as per the demand.
- Single instance MySQL instance can be migrated to Cloud SQL. This would not need any application code changes and can be as-is migration. With read replicas to scale both horizontally and vertically seamlessly.

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Which database should HipLocal use for storing state while minimizing application changes?
1. Firestore
2. BigQuery
3. Cloud SQL
4. Cloud Bigtable
Which architecture should HipLocal use for log analysis?
1. Use Cloud Spanner to store each event.
2. Start storing key metrics in Memorystore.
3. Use Cloud Logging with a BigQuery sink.
4. Use Cloud Logging with a Cloud Storage sink.
HipLocal wants to improve the resilience of their MySQL deployment, while also meeting their business and technical requirements. Which configuration should they choose?
1. Use the current single instance MySQL on Compute Engine and several read-only MySQL servers on Compute Engine.
2. Use the current single instance MySQL on Compute Engine, and replicate the data to Cloud SQL in an external master configuration.
3. Replace the current single instance MySQL instance with Cloud SQL, and configure high availability.
4. Replace the current single instance MySQL instance with Cloud SQL, and Google provides redundancy without further configuration.

Which service should HipLocal use to enable access to internal apps?
1. Cloud VPN
2. Cloud Armor
3. Virtual Private Cloud
4. Cloud Identity-Aware Proxy
Which database should HipLocal use for storing user activity?
1. BigQuery
2. Cloud SQL
3. Cloud Spanner
4. Cloud Datastore

Reference

Case_Study_HipLocal

Google Cloud – TerramEarth Case Study

February 27, 2021 ~ Last updated on : February 1, 2022 ~ jayendrapatil ~ 2 Comments

TerramEarth manufactures heavy equipment for the mining and agricultural industries. They currently have over 500 dealers and service centers in 100 countries. Their mission is to build products that make their customers more productive.

Key points here are 500 dealers and service centers are spread across the world and they want to make their customers more productive.

Solution Concept

There are 2 million TerramEarth vehicles in operation currently, and we see 20% yearly growth. Vehicles collect telemetry data from many sensors during operation. A small subset of critical data is transmitted from the vehicles in real time to facilitate fleet management. The rest of the sensor data is collected, compressed, and uploaded daily when the vehicles return to home base. Each vehicle usually generates 200 to 500 megabytes of data per day

Key points here are TerramEarth has 2 million vehicles. Only critical data is transferred in real-time while the rest of the data is uploaded in bulk daily.

Executive Statement

Our competitive advantage has always been our focus on the customer, with our ability to provide excellent customer service and minimize vehicle downtimes. After moving multiple systems into Google Cloud, we are seeking new ways to provide best-in-class online fleet management services to our customers and improve operations of our dealerships. Our 5-year strategic plan is to create a partner ecosystem of new products by enabling access to our data, increasing autonomous operation capabilities of our vehicles, and creating a path to move the remaining legacy systems to the cloud.

Key point here is the company wants to improve further in operations, customer experience, and partner ecosystem by allowing them to reuse the data.

Existing Technical Environment

TerramEarth’s vehicle data aggregation and analysis infrastructure resides in Google Cloud and serves clients from all around the world. A growing amount of sensor data is captured from their two main manufacturing plants and sent to private data centers that contain their legacy inventory and logistics management systems. The private data centers have multiple network interconnects configured to Google Cloud.
The web frontend for dealers and customers is running in Google Cloud and allows access to stock management and analytics.

Key point here is the company is hosting its infrastructure in Google Cloud and private data centers. GCP has web frontend and vehicle data aggregation & analysis. Data is sent to private data centers.

Business Requirements

Predict and detect vehicle malfunction and rapidly ship parts to dealerships for just-in-time repair where possible.

Cloud IoT core can provide a fully managed service to easily and securely connect, manage, and ingest data from globally dispersed devices.
Existing legacy inventory and logistics management systems running in the private data centers can be migrated to Google Cloud.
Existing data can be migrated one time using Transfer Appliance.

Decrease cloud operational costs and adapt to seasonality.

- Google Cloud provides configuring elasticity and scalability for resources based on the demand.

Increase speed and reliability of development workflow.

- Google Cloud CI/CD tools like Cloud Build and open-source tools like Spinnaker can be used to increase the speed and reliability of the deployments.

Allow remote developers to be productive without compromising code or data security.

Cloud Function to Function authentication

Create a flexible and scalable platform for developers to create custom API services for dealers and partners.

Google Cloud provides multiple fully managed serverless and scalable application hosting solutions like Cloud Run and Cloud Functions
Managed Instance group with Compute Engines and GKE cluster with scaling can also be used to provide scalable, highly available compute services.

Technical Requirements

Create a new abstraction layer for HTTP API access to their legacy systems to enable a gradual move into the cloud without disrupting operations.

- Google Cloud API Gateway & Cloud Endpoints can be used to provide an abstraction layer to expose the data externally over a variety of backends.

Modernize all CI/CD pipelines to allow developers to deploy container-based workloads in highly scalable environments.

Google Cloud CI/CD - Continuous Integration Continuous Deployment

- Google Cloud provides DevOps tools like Cloud Build and supports open-source tools like Spinnaker to provide CI/CD features.
- Cloud Source Repositories are fully-featured, private Git repositories hosted on Google Cloud.
- Cloud Build is a fully-managed, serverless service that executes builds on Google Cloud Platform’s infrastructure.
- Container Registry is a private container image registry that supports Docker Image Manifest V2 and OCI image formats.
- Artifact Registry is a fully-managed service with support for both container images and non-container artifacts, Artifact Registry extends the capabilities of Container Registry.

Allow developers to run experiments without compromising security and governance requirements

- Google Cloud deployments can be configured for Canary or A/B testing to allow experimentation.

Create a self-service portal for internal and partner developers to create new projects, request resources for data analytics jobs, and centrally manage access to the API endpoints.

Use cloud-native solutions for keys and secrets management and optimize for identity-based access

- Google Cloud supports Key Management Service – KMS and Secrets Manager for managing secrets and key management.

Improve and standardize tools necessary for application and network monitoring and troubleshooting.

- Google Cloud provides Cloud Operations Suite which includes Cloud Monitoring and Logging to cover both on-premises and Cloud resources.
- Cloud Monitoring collects measurements of key aspects of the service and of the Google Cloud resources used
- Cloud Monitoring Uptime check is a request sent to a publicly accessible IP address on a resource to see whether it responds.
- Cloud Logging is a service for storing, viewing, and interacting with logs.
- Error Reporting aggregates and displays errors produced in the running cloud services.
- Cloud Profiler helps with continuous CPU, heap, and other parameters profiling to improve performance and reduce costs.
- Cloud Trace is a distributed tracing system that collects latency data from the applications and displays it in the Google Cloud Console.
- Cloud Debugger helps inspect the state of an application, at any code location, without stopping or slowing down the running app.

Reference Cellular Upload Architecture

Batch Upload Replacement Architecture

Reference

Google Cloud – TerramEarth case study

Google Cloud – Dress4win Case Study

January 14, 2019 ~ Last updated on : May 12, 2021 ~ jayendrapatil

Dress4Win is a web-based company that helps their users organize and manage their personal wardrobe using a web app and mobile application. The company also cultivates an active social network that connects their users with designers and retailers. They monetize their services through advertising, e-commerce, referrals, and a freemium app model. The application has grown from a few servers in the founder’s garage to several hundred servers and appliances in a colocated data center. However, the capacity of their infrastructure is now insufficient for the application’s rapid growth. Because of this growth and the company’s desire to innovate faster, Dress4Win is committing to a full migration to a public cloud.

The key here is the company wants to migrate completely to public cloud for the current infrastructures inability to scale

Solution Concept

For the first phase of their migration to the cloud, Dress4Win is moving their development and test environments. They are also building a disaster recovery site, because their current infrastructure is at a single location. They are not sure which components of their architecture they can migrate as is and which components they need to change before migrating them.

Key here is Dress4Win wants to move the development and test environments first. And also, they want to build a DR site for their current production site which would continue to be hosted on-premises

Executive Statement

Our investors are concerned about our ability to scale and contain costs with our current infrastructure. They are also concerned that a competitor could use a public cloud platform to offset their up-front investment and free them to focus on developing better features. Our traffic patterns are highest in the mornings and weekend evenings; during other times, 80% of our capacity is sitting idle.

Our capital expenditure is now exceeding our quarterly projections. Migrating to the cloud will likely cause an initial increase in spending, but we expect to fully transition before our next hardware refresh cycle. Our total cost of ownership (TCO) analysis over the next 5 years for a public cloud strategy achieves a cost reduction between 30% and 50% over our current model.

The key here is that the company wants to improve on the application scalability, efficiency (hardware sitting idle most of the time), capex cost reduction, and improve TCO over a period of time

Existing Technical Environment

The Dress4Win application is served out of a single data center location. All servers run Ubuntu LTS v16.04.

Databases:

MySQL. 1 server for user data, inventory, static data,
- MySQL 5.8
- 8 core CPUs
- 128 GB of RAM
- 2x 5 TB HDD (RAID 1)
Redis 3 server cluster for metadata, social graph, caching. Each server is:
- Redis 3.2
- 4 core CPUs
- 32GB of RAM

MySQL server can be migrated directly to Cloud SQL, which is GCP managed relational database and supports MySQL.
For Redis cluster, MemoryStore can be used which is a fully-managed in-memory data store service for Redis.
There would be no changes required to support the same.

Compute:

40 Web Application servers providing micro-services based APIs and static content.
- Tomcat – Java
- Nginx
- 4 core CPUs
- 32 GB of RAM

20 Apache Hadoop/Spark servers:
- Data analysis
- Real-time trending calculations
- 8 core CPUs
- 128 GB of RAM
- 4x 5 TB HDD (RAID 1)

3 RabbitMQ servers for messaging, social notifications, and events:
- 8 core CPUs
- 32GB of RAM

Miscellaneous servers:
- Jenkins, monitoring, bastion hosts, security scanners
- 8 core CPUs
- 32GB of RAM

Web Application servers with Java and Nginx can be supported using Compute engine, App Engine or even with Container Engine with auto scaling configured.
Although the core and RAM combination would need a custom machine type, the same be configured or tuned to use an existing machine type

Apache Hadoop/Spark servers can be easily migrated to Cloud Dataproc
RabbitMQ messaging service is currently not directly supported by Google Cloud and can be supported either with
- Cloud Pub/Sub messaging – however this would need changes to the code and would not be a seamless migration
- Use Compute engine to host the RabbitMQ servers
Jenkins, Bastion hosts, Security scanners can be hosted using Google Compute Engine (GCE)
Monitoring can be provided using Stackdriver

Storage appliances:

iSCSI for VM hosts
Fiber channel SAN – MySQL databases
- 1 PB total storage; 400 TB available
NAS – image storage, logs, backups
- 100 TB total storage; 35 TB available

iSCSI for VM hosts can be supported using Cloud persistent disks as it needs a block level storage
SAN for MySQL databases can be supported using Cloud persistent disks as it needs a block level storage. However, a single disk cannot scale to 1PB and multiple disks need to be combined to create the storage
NAS for image storage, logs and backups can be supported using Cloud Storage which provides unlimited storage capacity

Business Requirements

Build a reliable and reproducible environment with scaled parity of production.
- can be handled by provisioning services or using GCP managed services with the same scale as on-premises resources and with Cloud Deployment Manager for creating repeatable deployments
Improve security by defining and adhering to a set of security and Identity and Access Management (IAM) best practices for cloud.
- can be handled using IAM by implemented best practices like least privileges, separating dev/test/production projects to control access
Improve business agility and speed of innovation through rapid provisioning of new resources.
- can be handled using Cloud Deployment Manager for repeatable and automated provisioning of resources
- deployments of applications and new releases can be handled efficiently using rolling updates, A/B testing
Analyze and optimize architecture for performance in the cloud.
- can be handled using auto scaling compute engines based on the demand
- can be handled using Stackdriver for monitoring and fine tuning the specs

Technical Requirements

Easily create non-production environments in the cloud.
- most of the services can be created using GCP managed services and the environment creation can be standardized and automated using templates and configurations

Implement an automation framework for provisioning resources in cloud.
- can be handled using Cloud Deployment Manager, which provides Infrastructure as a Code service for provisioning resources in cloud.
Implement a continuous deployment process for deploying applications to the on-premises datacenter or cloud.
- continuous deployments can be handled using tools like Jenkins available on both the environments
Support failover of the production environment to cloud during an emergency.
- can be handled by replicating all the data to the cloud environment and ability to provision the servers quickly.
- can be handled by using DNS to repoint from on-premises environment to cloud environment
Encrypt data on the wire and at rest.
- All the GCP services, by default, provide encryption on wire and at rest. Encryption can be performed using Google provided or Custom keys

Support multiple private connections between the production data center and cloud environment.
- can be handled using VPN (multiple VPNs for better performance) or dedicated Interconnect connection between the production data center and the cloud environment

References

Google Cloud – Dress4Win case study