Google Cloud Data Loss Prevention – DLP

Google Cloud Data Loss Prevention – DLP

  • Cloud Data Loss Prevention – DLP is a fully managed service designed to help discover, classify, and protect the most sensitive data.
  • DLP helps inspect the data to gain valuable insights and make informed decisions to secure your data
  • DLP effectively reduces the data risk with de-identification methods like masking and tokenization
  • DLP seamlessly inspects and transforms the structured and unstructured data

Cloud Data Loss Prevention (DLP) Action

  • Cloud Data Loss Prevention (DLP) action is something that occurs after a DLP job completes successfully or, in the case of emails, on error.
  • DLP supports the following types of actions
    • Save findings to BigQuery (inspection and risk jobs)
    • Publish to Pub/Sub (inspection and risk jobs)
    • Publish to Security Command Center (risk jobs)
    • Publish to Data Catalog (risk jobs)
    • Publish to Google Cloud’s operations suite (risk jobs)
    • Notify by email (inspection and risk jobs)

Cloud Data Loss Prevention Key Concepts

  • Classification is the process to inspect the data and know what data we have, how sensitive it is, and the likelihood.
  • De-identification is the process of removing identifying information from data.
  • De-identification techniques supported by Cloud DLP
    • Redaction: Deletes all or part of a detected sensitive value.
    • Replacement: Replaces a detected sensitive value with a specified surrogate value.
    • Masking: Replaces a number of characters of a sensitive value with a specified surrogate character, such as a hash (#) or asterisk (*).
    • Pseudonymization: replaces sensitive data values with cryptographically generated tokens.
    • Generalization: is the process of abstracting a distinguishing value into a more general, less distinguishing value. Generalization attempts to preserve data utility while also reducing the identifiability of the data.
    • Bucketing: “Generalizes” a sensitive value by replacing it with a range of values. (For example, replacing a specific age with an age range, or temperatures with ranges corresponding to “Hot,” “Medium,” and “Cold.”)
    • Date shifting: Shifts sensitive date values by a random amount of time.
    • Time extraction: Extracts or preserves specified portions of date and time values.

Cloud Data Loss Prevention InfoTypes

  • Cloud Data Loss Prevention (DLP) uses information types – or infoTypes – to define what it scans for.
  • An infoType is a type of sensitive data, such as a name, email address, telephone number, identification number, credit card number, and so on.
  • An infoType detector is the corresponding detection mechanism that matches an infoType’s matching criteria.
  • Cloud DLP uses infoType detectors in the configuration for its scans to determine what to inspect for and how to transform findings. InfoType names are also used when displaying or reporting scan results.
  • Cloud DLP supports the following infoType detectors
    • Built-in infoType detectors specified by name and include detectors for the country- or region-specific sensitive data types as well as globally applicable data types.
    • Custom infoType detectors, defined by you
      • Small custom dictionary detectors
        • simple word lists that Cloud DLP matches on
        • ideal for several tens of thousands of words or phrases
        • preferred if the word list doesn’t change significantly.
      • Large custom dictionary detectors
        • are generated by Cloud DLP using large lists of words or phrases stored in either Cloud Storage or BigQuery.
        • ideal for a large list of words or phrases—up to tens of millions.
      • Regular expressions (regex) detectors
        • enable Cloud DLP to detect matches based on a regex pattern.
  • Cloud DLP supports inspection rules to fine-tune scan results using
    • Exclusion rules decrease the number of findings returned by adding rules to a built-in or custom infoType detector.
    • Hotword rules increase the quantity or change the likelihood value of findings returned by adding rules to a built-in or custom infoType detector.
  • DLP uses a bucketized representation of likelihood, which is intended to indicate how likely it is that a piece of data matches a given infoType
    • LIKELIHOOD_UNSPECIFIED: Default value; same as POSSIBLE.
    • VERY_UNLIKELY: very unlikely that the data matches the given InfoType
    • UNLIKELY: unlikely that the data matches the given InfoType.
    • POSSIBLE: possible that the data matches the given InfoType.
    • LIKELY: likely that the data matches the given InfoType.
    • VERY_LIKELY: very likely that the data matches the given InfoType

DLP Classification and De-identification

  • Cloud DLP can easily classify, redact and de-identify sensitive data contained in text-based content and images, including content stored in Google Cloud storage repositories.
  • Text Classification and Reduction
    • Text Classification returns classification findings
    • Automatic Text Redaction produces an output with sensitive data matches removed using a placeholder of ***
  • Image Classification and Reduction
    • DLP uses Optical Character Recognition (OCR) technology to recognize text prior to classification. Similar to text classification, it returns findings, but it also adds a bounding box where the text was found.
    • Inspection
      • Cloud DLP inspects the submitted base64-encoded image for the specified intoTypes.
      • It returns the detected InfoTypes, along with one or more set of pixel coordinates and dimensions.
      • Each set of pixel coordinate and dimension values indicate the bottom-left corner and the dimensions of bounding boxes, respectively
      • Each bounding box corresponds to all or part of a Cloud DLP finding.
    • Redaction
      • Cloud DLP redacts any sensitive data findings by masking them with opaque rectangles.
      • It returns the redacted base64-encoded image in the same image format as the original image.
      • Color of the redaction boxes can be configured in the request.
  • Storage classification
    • scans data stored in Cloud Storage, Datastore, and BigQuery
    • supports scanning of binary, text, image, Microsoft Word, PDF, and Apache Avro files
    • unrecognized file types are scanned as binary files.
  • Date, if considered PII, can be handled
    • Using generalization, or bucketing, which can however remove the utility in the dates for e.g. generalizing all the dates to just the year
    • Using date obfuscation by date shifting which randomly shifts a set of dates but preserves the sequence and duration of a period of time.

DLP Re-identification Risk Analysis

  • DLP Re-identification risk analysis is the process of analyzing sensitive data to find properties that might increase the risk of subjects being identified, or of sensitive information about individuals being revealed.
  • Risk analysis methods can be used before de-identification to help determine an effective de-identification strategy, or after de-identification to monitor for any changes or outliers.
  • Re-identification is the process of matching up de-identified data with other available data to determine the person to whom the data belongs.

Cloud Data Loss Prevention Templates

  • DLP Templates help decouple configuration information from the implementation of the requests
  • Templates provide a robust way to manage large-scale rollouts of Cloud DLP capabilities.
  • Cloud DLP supports two types of templates:
    • Inspection Templates: Templates for saving configuration information for inspection scan jobs, including what predefined or custom detectors to use.
    • De-identification Templates: Templates for saving configuration information for de-identification jobs, including both infoType and structured dataset transformations.

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.



Google Cloud Security Services Cheat Sheet

Cloud Armor

  • Cloud Armor protects the applications from multiple types of threats, including distributed denial-of-service (DDoS) attacks and application attacks like cross-site scripting (XSS) and SQL injection (SQLi).
  • Cloud Armor provides protection only to applications running behind an external HTTP(S) and TCP/SSL Proxy load balancer.
  • Cloud Armor supports applications deployed on Google Cloud, in a hybrid deployment, or in a multi-cloud architecture.
  • Cloud Armor is implemented at the edge of Google’s network in Google’s points of presence (PoP).
  • Security policies protect applications running behind a load balancer from DDoS and other web-based attacks
  • Backend service can have only one security policy associated with it
  • Prioritized rules define configurable match conditions, actions (allow or deny) and order in a security policy
  • Cloud Armor provides Preview mode that helps evaluate and preview the rules before going live.

Cloud Identity-Aware Proxy

  • Identity-Aware Proxy IAP allows managing access to HTTP-based apps both on Google Cloud and outside of Google Cloud.
  • Identity-Aware Proxy IAP intercepts the web requests sent to the application, authenticates the user making the request using the Google Identity Service, and only lets the requests through if they come from an authorized user. In addition, it can modify the request headers to include information about the authenticated user.
  • Identity-Aware Proxy IAP helps establish a central authorization layer for applications accessed by HTTPS to use an application-level access control model instead of relying on network-level firewalls.
  • IAP uses Google identities and IAM and can leverage external identity providers as well like OAuth with Facebook, GitHub, Microsoft, SAML, etc.
  • Identity-Aware Proxy (IAP) can be configured to use JSON Web Tokens (JWT) as signed headers to make sure that a request to the app is authorized and doesn’t bypass IAP

Cloud Data Loss Prevention – DLP

  • Cloud Data Loss Prevention – DLP is a fully managed service designed to help discover, classify, and protect the most sensitive data.
  • provides two key features
    • Classification is the process to inspect the data and know what data we have, how sensitive it is, and the likelihood.
    • De-identification is the process of removing, masking, replacing information from data.
  • uses information types – or infoTypes – to define what it scans like credit card numbers, email addresses, etc.
  • provides various built-in infoType detector and supports custom ones
  • supports inspection rules to fine-tune scan results using
    • Exclusion rules decrease the number of findings
    • Hotword rules increase the quantity or change the likelihood value of findings
  • provides likelihood, which indicates how likely it is that a piece of data matches a given infoType like VERY_LIKELY or POSSIBLE, etc.
  • supports Text Classification and Reduction
  • supports Image Classification and Reduction, where the image is handled using its base64 encoded version
  • supports storage classification with scans on data stored in Cloud Storage, Datastore, and BigQuery
  • supports scanning of binary, text, image, Microsoft Word, PDF, and Apache Avro files
  • supports Templates help decouple configuration information from the implementation of the requests and manage large scale rollouts

Security Command Center – SCC

  • is a Security and risk management platform
  • helps generate curated insights that provide a unique view of incoming threats and attacks to the assets, which include organization, projects, instances, and applications
  • displays possible security risks, called findings, that are associated with each asset.
  • provides services
    • Security Health Analytics provides managed vulnerability assessment scanning that can automatically detect the highest severity vulnerabilities and misconfigurations across assets.
    • Web Security Scanner custom scans provide granular information about application vulnerability findings like outdated libraries, XSS, etc.
    • Cloud Data Loss Prevention discovers, classifies, and protects sensitive data
    • Cloud Armor protects Google Cloud deployments against threats
    • Anomaly Detection identifies security anomalies for the projects and VM instances, like potential leaked credentials and coin mining, etc.
    • Container Threat Detection can detect the most common container runtime attacks
    • Forseti Security, the open-source security toolkit, and third-party security information and event management (SIEM) applications
    • Event Threat Detection monitors the organization’s Cloud Logging stream and consumes logs to detect Malware, Cryptomining, etc.
    • Phishing Protection helps prevent users from accessing phishing sites by classifying malicious content that uses the brand and reporting the unsafe URLs to Google Safe Browsing
    • Continuous Exports, which automatically manage the export of new findings to Pub/Sub.

DDoS Protection and Mitigation

  • Distributed Denial of Service (DDoS) Protection and Mitigation is a shared responsibility between Google Cloud and the Customer
  • DDoS attack is an attempt to render the service or application unavailable to the end-users using multiple sources
  • DDoS Protection and Mitigation Best Practices
    • Reduce the Attack Surface
      • Isolate and secure network using VPC, subnets, firewall rules. tags and IAM
      • Google provides Anti-spoofing protection and Automatic isolation between virtual networks
    • Isolate Internal Traffic
      • Use privates IPs and avoid using Public IPs
      • Use NAT Gateway and Bastion host
      • Use Internal Load Balancer for internal traffic
    • Enable Proxy-based Load Balancing
      • HTTP(S) or SSL proxy load balancer uses GFE that helps mitigate and absorb layer 4 and other attacks
      • Disperse traffic across multiple regions
    • Scale to Absorb the Attack
      • Use GFE for protection
      • Use Anycast-based load balancing to provide single anycast IP to FE
      • Use Autoscaling to scale backend services as per the demand
    • Protection using CDN Offloading
      • CDN acts as a proxy and can help render cache content reducing the load on the origin servers
    • Deploy Third-party DDoS Protection solutions
    • App Engine Deployment
      • A fully multi-tenant system with isolation
    • Google Cloud Storage
      • Use signed URLs to access Google Cloud Storage
    • API Rate Limiting
      • Define rate limiting based on the number of allowed requests
      • API Rate limits are per applied per-project basis
    • Resource Quotas
      • Quotas help prevent unforeseen spikes in usage

Access Context Manager

  • Access Context Manager allows organization administrators to define fine-grained, attribute-based access control for projects and resources
  • helps prevent data exfiltration
  • helps reduce the size of the privileged network and move to a model where endpoints do not carry ambient authority based on the network.
  • helps define desired rules and policy but isn’t responsible for policy enforcement. The policy is configured and enforced across various points, such as VPC Service Controls.

FIPS 140-2 Validated

  • The NIST developed the Federal Information Processing Standard (FIPS) Publication 140-2 as a security standard that sets forth requirements for cryptographic modules, including hardware, software, and/or firmware, for U.S. federal agencies.
  • FIPS 140-2 Validated certification was established to aid in the protection of digitally stored unclassified, yet sensitive, information.
  • Google Cloud Platform uses a FIPS 140-2 validated encryption module called BoringCrypto in its production environment.
  • Data in transit to the customer and between data centers, and data at rest are encrypted using FIPS 140-2 validated encryption.
  • BoringCrypto module that achieved FIPS 140-2 validation is part of the BoringSSL library.
  • BoringSSL library as a whole is not FIPS 140-2 validated
  • In order to operate using only FIPS-validated implementations:
    • Google’s Local SSD storage product is automatically encrypted with NIST approved ciphers, but Google’s current implementation for this product doesn’t have a FIPS 140-2 validation certificate. If you require FIPS-validated encryption on Local SSD storage, you must provide your own encryption with a FIPS-validated cryptographic module.
    • Google automatically encrypts traffic between VMs that travels between Google data centers using NIST-approved encryption algorithms, but this implementation does not have a FIPS validation certificate. If you require this traffic to be encrypted with a FIPS-validated implementation, you must provide your own.
    • Clients connecting to Google infrastructure with TLS clients must be configured to require use of secure FIPS-compliant algorithms; if the TLS client and GCP’s TLS services agree on an encryption method that is incompatible with FIPS, a non-validated encryption implementation will be used.
    • Applications built and operated on GCP might include their own cryptographic implementations; in order for the data they process to be secured with a FIPS-validated cryptographic module, you must integrate such an implementation yourself.
  • All Google Cloud regions and zones currently support FIPS 140-2 validated encryption.