Google Cloud Monitoring – Stackdriver

Google Cloud Monitoring

  • Cloud Monitoring collects measurements of key aspects of the service and of the Google Cloud resources used.
  • Cloud Monitoring provides tools to visualize and monitor this data.
  • Cloud Monitoring helps gain visibility into the performance, availability, and health of the applications and infrastructure.
  • Cloud Monitoring collects metrics, events, and metadata from Google Cloud, AWS, hosted uptime probes, and application instrumentation.
  • Using the BindPlane service, data can be collected from over 150 common application components, on-premise systems, and hybrid cloud systems.

Cloud Monitoring Workspaces

  • Cloud Monitoring uses Workspaces to organize monitoring information
  • Workspace is a tool for monitoring resources across Google Cloud projects
  • A Workspace accesses metric data from its monitored projects, but the metric data remains in those projects.
  • Every Workspace has a host project. If you delete the host project, you also delete the Workspace.
  • A Workspace always monitors its Google Cloud host project
  • Host project is the project used to create the Workspace. The name of the Workspace is set to the name of the host project. This isn’t configurable.
  • Host project for Workspace stores all of the configuration content for dashboards, alerting policies, uptime checks, notification channels, and group definitions that you configure.
  • Workspace can monitor multiple projects but a Google Cloud project can be monitored by exactly 1 Workspace.
  • Projects can be moved from one workspace to another workspace
  • Two different workspaces can be merged into a single workspace

Cloud Monitoring Metrics

  • Metrics are a collection of measurements that help you understand how the applications and system services are performing.
  • Measurements might include the latency of requests to a service, the amount of disk space available on a machine, the number of tables in the SQL database, the number of widgets sold, and so forth.
  • Metric Value type includes
    • For measurements consisting of a single value at a time
      • BOOL, a boolean
      • INT64, a 64-bit integer
      • DOUBLE, a double-precision float
      • STRING, a string
    • For distribution measurements, the value isn’t a single value but a group of values.
      • The value type for distribution measurements is DISTRIBUTION.
      • Values in distribution include the mean, count, max, and other statistics, computed for a group of values.
      • Latency metrics typically capture data as distributions
  • Metric Kind includes
    • Gauge metric – Value is measured at a specific instant in time for e.g, CPU utilization, current temperature.
    • Delta metric – Value is measured as the change since it was last recorded for e.g., metrics measuring request counts are delta metrics; each value records how many requests were received since the last data point was recorded.
    • Cumulative metric – Value constantly increases over time for e.g., a metric for “sent bytes” might be cumulative; each value records the total number of bytes sent by a service at that time.

Cloud Monitoring Agent

  • Google Cloud’s operations suite provides the following agents for collecting metrics on Linux and Windows VM instances.
  • Ops Agent
    • The primary and preferred agent for collecting telemetry from the Compute Engine instances.
    • This agent combines logging and metrics into a single agent, providing YAML-based configurations for collecting the logs and metrics, and features high-throughput logging.
    • Ops Agent uses Fluent Bit for logs, which supports high-throughput logging, and the OpenTelemetry Collector for metrics.
  • Legacy Monitoring Agent
    • The agent gathers system and application metrics from virtual machine instances and sends them to Cloud Monitoring.
    • By default, the legacy monitoring agent collects disk, CPU, network, and process metrics.
    • The agent can be configured to monitor third-party applications to get the full list of agent metrics.
    • The agent is a collectd-based daemon that gathers system and application metrics from VM instances and sends them to Monitoring.

Cloud Monitoring – Uptime Checks

  • An uptime check is a request sent to a publicly accessible IP address on a resource to see whether it responds.
  • Uptime checks can determine the availability of the following:
    • URLs
    • Kubernetes LoadBalancer Services
    • VM instances
    • App Engine services
    • AWS load balancers
  • The availability of a resource can be monitored by creating an alerting policy that creates an incident when the uptime check fails.
  • The alerting policy can be configured to notify by email or through a different channel, and that notification can include details about the resource that failed to respond.
  • The results of uptime checks can also be observed in the Monitoring uptime-check dashboards.
  • For non-publicly available resources, the resource’s firewall must be configured o permit incoming traffic from the uptime-check servers
  • Uptime checks are unable to reach resources that don’t have an external IP address.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You need to monitor resources that are distributed over different projects in Google Cloud Platform. You want to consolidate reporting under the same Stackdriver Monitoring dashboard. What should you do?
    1. Use Shared VPC to connect all projects, and link Stackdriver to one of the projects.
    2. For each project, create a Stackdriver account. In each project, create a service account for that project and grant it the role
      of Stackdriver Account Editor in all other projects.
    3. Configure a single Stackdriver account, and link all projects to the same account.
    4. Configure a single Stackdriver account for one of the projects. In Stackdriver, create a Group and add the other project
      names as criteria for that Group.
  2. You are asked to set up application performance monitoring on Google Cloud projects A, B, and C as a single pane of glass. You want to monitor CPU, memory, and disk. What should you do?
    1. Enable API and then share charts from projects A, B, and C.
    2. Enable API and then give the metrics.reader role to projects A, B, and C.
    3. Enable API and then use default dashboards to view all projects in sequence.
    4. Enable API, create a workspace under project A, and then add projects B and C.

References

Google_Cloud_Monitoring