Google Cloud Operations

Google Cloud Operations

Google Cloud Operations provides integrated monitoring, logging, and trace managed services for applications and systems running on Google Cloud and beyond.

Google Cloud Operations Suite
Credit Priyanka Vergadia

Cloud Monitoring

  • Cloud Monitoring collects measurements of key aspects of the service and of the Google Cloud resources used.
  • Cloud Monitoring provides tools to visualize and monitor this data.
  • Cloud Monitoring helps gain visibility into the performance, availability, and health of the applications and infrastructure.
  • Cloud Monitoring collects metrics, events, and metadata from Google Cloud, AWS, hosted uptime probes, and application instrumentation.

Cloud Logging

  • Cloud Logging is a service for storing, viewing and interacting with logs.
  • Answers the questions “Who did what, where and when” within the GCP projects
  • Maintains non-tamperable audit logs for each project and organizations
  • Logs buckets are a regional resource, which means the infrastructure that stores, indexes, and searches the logs are located in a specific geographical location.

Error Reporting

  • Error Reporting aggregates and displays errors produced in the running cloud services.
  • Error Reporting provides a centralized error management interface, to help find the application’s top or new errors so that they can be fixed faster.

Cloud Profiler

  • Cloud Profiler helps with continuous CPU, heap, and other parameters profiling to improve performance and reduce costs.
  • Cloud Profiler is a continuous profiling tool that is designed for applications running on Google Cloud:
    • It’s a statistical, or sampling, profiler that has low overhead and is suitable for production environments.
    • It supports common languages and collects multiple profile types.
  • Cloud Profiler consists of the profiling agent, which collects the data, and a console interface on Google Cloud, which lets you view and analyze the data collected by the agent.
  • Cloud Profiler is supported for Compute Engine, App Engine, GKE, and applications running on on-premises as well.

Cloud Trace

  • Cloud Trace is a distributed tracing system that collects latency data from the applications and displays it in the Google Cloud Console.
  • Cloud Trace helps understand how long it takes the application to handle incoming requests from users or applications, and how long it takes to complete operations like RPC calls performed when handling the requests.
  • CloudTrace can track how requests propagate through the application and receive detailed near real-time performance insights.
  • Cloud Trace automatically analyzes all of the application’s traces to generate in-depth latency reports to surface performance degradations and can capture traces from all the VMs, containers, or App Engines.

Cloud Debugger

  • Cloud Debugger helps inspect the state of an application, at any code location, without stopping or slowing down the running app.
  • Cloud Debugger makes it easier to view the application state without adding logging statements.
  • Cloud Debugger adds less than 10ms to the request latency only when the application state is captured. In most cases, this is not noticeable by users.
  • Cloud Debugger can be used with or without access to your app’s source code.
  • Cloud Debugger supports Cloud Source Repositories, GitHub, Bitbucket, or GitLab as the source code repository. If the source code repository is not supported, the source files can be uploaded.
  • Cloud Debugger allows collaboration by sharing the debug session by sending the Console URL.
  • Cloud Debugger supports a range of IDE.

Debug Snapshots

  • Debug Snapshots capture local variables and the call stack at a specific line location in the app’s source code without stopping or slowing it down.
  • Certain conditions and locations can be specified to return a snapshot of the app’s data.
  • Debug Snapshots support canarying wherein the debugger agent tests the snapshot on a subset of the instances.

Debug Logpoints

  • Debug Logpoints allow you to inject logging into running services without restarting or interfering with the normal function of the service.
  • Debug Logpoints are useful for debugging production issues without having to add log statements and redeploy.
  • Debug Logpoints remain active for 24 hours after creation, or until they are deleted or the service is redeployed.
  • If a logpoint is placed on a line that receives lots of traffic, the Debugger throttles the logpoint to reduce its impact on the application.
  • Debug Logpoints support canarying wherein the debugger agent tests the logpoints on a subset of the instances.

References

Google_Cloud_Operations

Google Cloud Logging – Stackdriver

Google Cloud Logging

  • Cloud Logging is a service for storing, viewing and interacting with logs.
  • Answers the questions “Who did what, where and when” within the GCP projects
  • Maintains non-tamperable audit logs for each project and organizations
  • Logs buckets are a regional resource, which means the infrastructure that stores, indexes, and searches the logs are located in a specific geographical location. Google manages that infrastructure so that the applications are available redundantly across the zones within that region.
  • Cloud Logging is scoped by the project.

Cloud Logging Process

Google Cloud Logging Export

  • For each Google Cloud project, Logging automatically creates two logs buckets: _Required and _Default.
    • _Required bucket
      • holds Admin Activity audit logs, System Event audit logs, and Access Transparency logs
      • retains them for 400 days.
      • the retention period of the logs stored here cannot be modified.
      • aren’t charged for the logs stored in _Required, and
      • cannot delete this bucket.
    • _Default bucket
      • holds all other ingested logs in a Google Cloud project except for the logs held in the _Required bucket.
      • are charged
      • are retained for 30 days, by default, and can be customized from 1 to 3650 days
    • these buckets cannot be deleted
  • All logs generated in the project are stored in the _Required and _Default logs buckets, which live in the project that the logs are generated in
  • Logs buckets only have regional availability, including those created in the global region.

Cloud Logging Types

Cloud Platform Logs

  • Cloud platform logs are service-specific logs that can help troubleshoot and debug issues, as well as better understand the Google Cloud services.
  • Cloud Platform logs are logs generated by GCP services and vary depending on which Google Cloud resources are used in your Google Cloud project or organization.

Security Logs

  • Audit Logs
    • Cloud Audit Logs includes three types of audit logs:
      • Admin Activity,
      • Data Access, and
      • System Event.
    • Cloud Audit Logs provide audit trails of administrative changes and data accesses of the Google Cloud resources.
      • Admin Activity
        • captures user-initiated resource configuration changes
        • enabled by default
        • no additional charge
        • admin activity – administrative actions and API calls
        • have 400-day retention
      • System Events
        • captures system initiated resource configuration changes
        • enabled by default
        • no additional charge
        • system events – GCE system events like live migration
        • have 400-day retention
      • Data Access logs
        • Log API calls that create, modify or read user-provided data for e.g. object created in a GCS bucket.
        • 30-day retention
        • disabled by default
        • size can be huge
        • charged beyond free limits
        • Available for GCP-visible services only. Not available for public resources.
  • Access Transparency Logs
    • provides logs of actions taken by Google staff when accessing the Google Cloud content.
    • can help track compliance with the organization’s legal and regulatory requirements.
    • have 400-day retention

User Logs

  • User logs are generated by user software, services, or applications and written to Cloud Logging using a logging agent, the Cloud Logging API, or the Cloud Logging client libraries
  • Agent logs
    • produced by logging agent installed that collects logs from user applications and VMs
    • covers log data from third-party applications
    • charged beyond free limits
    • 30-day retention

Cloud Logging Export

  • Log entries are stored in logs buckets for a specified length of time i.e. retention period and are then deleted and cannot be recovered
  • Logs can be exported by configuring log sinks, which then continue to export log entries as they arrive in Logging.
  • A sink includes a destination and a filter that selects the log entries to export.
  • Exporting involves writing a filter that selects the log entries to be exported, and choosing a destination from the following options:
    • Cloud Storage: JSON files stored in buckets for long term retention
    • BigQuery: Tables created in BigQuery datasets. for analytics
    • Pub/Sub: JSON messages delivered to Pub/Sub topics to stream to other resources. Supports third-party integrations, such as Splunk
    • Another Google Cloud Cloud project: Log entries held in Cloud Logging logs buckets.
  • Every time a log entry arrives in a project, folder, billing account, or organization resource, Logging compares the log entry to the sinks in that resource. Each sink whose filter matches the log entry writes a copy of the log entry to the sink’s export destination.
  • Exporting happens for new log entries only, it is not retrospective

Log-based Metrics

  • Log-based metrics are based on the content of log entries for e.g., the metrics can record the number of log entries containing particular messages, or they can extract latency information reported in log entries.
  • Log-based metrics can be used in Cloud Monitoring charts and alerting policies.
  • Log-based metrics apply only to a single Google Cloud project. You can’t create them for Logging buckets or for other Google Cloud resources such as Cloud Billing accounts or organizations.
  • Log-based metrics are of two kinds
    • System-defined log-based metrics
      • provided by Cloud Logging for use by all Google Cloud projects.
      • System log-based metrics are calculated from included logs only i.e. they are calculated only from logs that have been ingested by Logging. If a log has been explicitly excluded from ingestion by Logging, it isn’t included in these metrics.
    • User-defined log-based metric
      • user-created to track things in the Google Cloud project for e.g. a log-based metric to count the no. of log entries that match a given filter.
      • User-defined log-based metrics are calculated from both included and excluded logs. i.e. are calculated from all logs received by the Logging API for the Cloud project, regardless of any inclusion filters or exclusion filters that may apply to the Cloud project.
  • Log-based metrics support the following types
    • Counter metrics count the number of log entries matching a given filter.
    • Distribution metrics accumulate numeric data from log entries matching a filter.

Cloud Logging Agent

  • Cloud Logging agent streams logs from VM instances and from selected third-party software packages to Cloud Logging.
  • Cloud Logging Agent helps capture logs from GCE and AWS EC2 instances
  • VM images for GCE and Amazon EC2 don’t include the Logging agent and must be installed explicitly.
  • Cloud Logging Agent uses fluentd for capturing logs
  • Logging features include:
    • Standard system logs (/var/log/syslog and /var/log/messages for Linux, Windows Event Log) collected with no setup.
    • High throughput capability, taking full advantage of multi-core architecture.
    • Efficient resource (e.g. memory, CPU) management.
    • Custom log files.
    • JSON logs.
    • Plain text logs.
    • Regex-based parsing.
    • JSON-based parsing.
  • Logging agent is pre-configured to send logs from VM instances to Cloud Logging which include syslogs, and other third-party applications like Redis,
  • Cloud Logging Agent provides additional plugins and configurations like filter_record_transformer that can help modify, delete log entries before the logs are pushed to Cloud Logging for e.g. masking of sensitive PII information
  • Ops Agent doesn’t directly support automatic log parsing for third-party applications, but it can be configured to parse these files.

Cloud Logging IAM Roles

  • Logs Viewer – View logs except Data Access/Access Transperancy logs
  • Private Logs Viewer – View all logs
  • Logging Admin – Full access to all logging actions
  • Project Viewer – View logs except Data Access/Access Transperancy logs
  • Project Editor – Write, view, and delete logs. Create log based metrics. However, it cannot create export sinks or view Data Access/Access Transperancy logs.
  • Project Owner – Full access to all logging actions

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your organization is a financial company that needs to store audit log files for 3 years. Your organization has hundreds of Google Cloud projects. You need to implement a cost-effective approach for log file retention. What should you do?
    1. Create an export to the sink that saves logs from Cloud Audit to BigQuery.
    2. Create an export to the sink that saves logs from Cloud Audit to a Coldline Storage bucket.
    3. Write a custom script that uses logging API to copy the logs from Stackdriver logs to BigQuery.
    4. Export these logs to Cloud Pub/Sub and write a Cloud Dataflow pipeline to store logs to Cloud SQL.
  2. Your organization is a financial company that needs to store audit log files for 3 years. Your organization has hundreds of Google Cloud projects. You need to implement a cost-effective approach for log file retention. What should you do?
    1. Create an export to the sink that saves logs from Cloud Audit to BigQuery.
    2. Create an export to the sink that saves logs from Cloud Audit to a Coldline Storage bucket.
    3. Write a custom script that uses logging API to copy the logs from Stackdriver logs to BigQuery.
    4. Export these logs to Cloud Pub/Sub and write a Cloud Dataflow pipeline to store logs to Cloud SQL.

Reference

Google_Cloud_Logging