Google Cloud Pub/Sub – Asynchronous Messaging

Google Cloud Pub/Sub

  • Pub/Sub is a fully managed, asynchronous messaging service designed to be highly reliable and scalable.
  • Pub/Sub service allows applications to exchange messages reliably, quickly, and asynchronously
  • Pub/Sub allows services to communicate asynchronously, with latencies on the order of 100 milliseconds.
  • Pub/Sub enables the creation of event producers and consumers, called publishers and subscribers.
  • Publishers communicate with subscribers asynchronously by broadcasting events, rather than by synchronous remote procedure calls.
  • Pub/Sub offers at-least-once message delivery and best-effort ordering to existing subscribers
  • Pub/Sub accepts a maximum of 1,000 messages in a batch, and the size of a batch can not exceed 10 megabytes.

Pub/Sub Core Concepts

  • Topic: A named resource to which messages are sent by publishers.
  • Publisher: An application that creates and sends messages to a topic(s).
  • Subscriber: An application with a subscription to a topic(s) to receive messages from it.
  • Subscription: A named resource representing the stream of messages from a single, specific topic, to be delivered to the subscribing application.
  • Message: The combination of data and (optional) attributes that a publisher sends to a topic and is eventually delivered to subscribers.
  • Message attribute: A key-value pair that a publisher can define for a message.
  • Acknowledgment (or “ack”): A signal sent by a subscriber to Pub/Sub after it has received a message successfully. Acked messages are removed from the subscription’s message queue.
  • Schema: A schema is a format that messages must follow, creating a contract between publisher and subscriber that Pub/Sub will enforce
  • Push and pull: The two message delivery methods. A subscriber receives messages either by Pub/Sub pushing them to the subscriber’s chosen endpoint or by the subscriber pulling them from the service.

Message lifecycle

Pub/Sub Subscription Properties

  • Delivery method
    • Messages can be received with pull or push delivery.
    • In pull delivery, the subscriber application initiates requests to the Pub/Sub server to retrieve messages.
    • In push delivery, Pub/Sub initiates requests to the subscriber application to deliver messages. The push endpoint must be a publicly accessible HTTPS address.
    • If unspecified, Pub/Sub subscriptions use pull delivery.
    • Messages published before a subscription is created will not be delivered to that subscription
  • Acknowledgment deadline
    • Message not acknowledged before the deadline is sent again.
    • Default acknowledgment deadline is 10 secs. with a max of 10 mins.
  • Message retention duration
    • Message retention duration specifies how long Pub/Sub retains messages after publication.
    • Acknowledged messages are no longer available to subscribers and are deleted, by default
    • After the message retention duration, Pub/Sub might discard the message, regardless of its acknowledgment state.
    • Default message retention duration is 7 days with a min-max of 10 mins – 7 days
  • Dead-letter topics
    • If a subscriber can’t acknowledge a message, Pub/Sub can forward the message to a dead-letter topic.
    • With a dead-letter topic, message ordering can’t be enabled
    • With a dead-letter topic, the maximum number of delivery attempts can be specified.
    • Default is 5 delivery attempts; with a min-max of 5-100
  • Expiration period
    • Subscriptions expire without any subscriber activity such as open connections, active pulls, or successful pushes
    • Subscription deletion clock restarts, if subscriber activity is detected
    • Default expiration period is 31 days with a min-max of 1 day-never
  • Retry policy
    • If the acknowledgment deadline expires or a subscriber responds with a negative acknowledgment, Pub/Sub can send the message again using exponential backoff.
    • If the retry policy isn’t set, Pub/Sub resends the message as soon as the acknowledgment deadline expires or a subscriber responds with a negative acknowledgment.
  • Message ordering
    • If publishers send messages with an ordering key, are in the same region and message ordering is set, Pub/Sub delivers the messages in order.
    • If not set, Pub/Sub doesn’t deliver messages in order, including messages with ordering keys.
  • Filter
    • Filter is a string with a filtering expression where the subscription only delivers the messages that match the filter.
    • Pub/Sub service automatically acknowledges the messages that don’t match the filter.
    • Message can be filtered using their attributes.

Pub/Sub Seek Feature

  • Acknowledged messages are no longer available to subscribers and are deleted
  • Subscriber clients must process every message in a subscription even if only a subset is needed.
  • Seek feature extends subscriber functionality by allowing you to alter the acknowledgment state of messages in bulk
  • Timestamp Seeking
    • With Seek feature, you can replay previously acknowledged messages or purge messages in bulk
    • Seeking to a time marks every message received by Pub/Sub before the time as acknowledged, and all messages received after the time as unacknowledged.
    • Seeking to a time in the future allows you to purge messages.
    • Seeking to a time in the past allows replay and reprocess previously acknowledged messages
    • Timestamp seeking approach is imprecise as
      • Possible clock skew among Pub/Sub servers.
      • Pub/Sub has to work with the arrival time of the publish request rather than when an event occurred in the source system.
  • Snapshot Seeking
    • State of one subscription can be copied to another by using seek in combination with a Snapshot.
    • Once a snapshot is created, it retains:
      • All messages that were unacknowledged in the source subscription at the time of the snapshot’s creation.
      • Any messages published to the topic thereafter.

Pub/Sub Locations

  • Pub/Sub servers run in all GCP regions around the world, which helps offer fast, global data access while giving users control over where messages are stored
  • Cloud Pub/Sub offers global data access in that publisher and subscriber clients are not aware of the location of the servers to which they connect or how those services route the data.
  • Pub/Sub’s load balancing mechanisms direct publisher traffic to the nearest GCP data center where data storage is allowed, as defined in the Resource Location Restriction
  • Publishers in multiple regions may publish messages to a single topic with low latency. Any individual message is stored in a single region. However, a topic may have messages stored in many regions.
  • Subscriber client requesting messages published to this topic connects to the nearest server which aggregates data from all messages published to the topic for delivery to the client.
  • Message Storage Policy
    • Message Storage Policy helps ensure that messages published to a topic are never persisted outside a set of specified Google Cloud regions, regardless of where the publish requests originate.
    • Pub/Sub chooses the nearest allowed region, when multiple regions are allowed by the policy

Pub/Sub Security

  • Pub/Sub encrypts messages with Google-managed keys, by default.
  • Every message is encrypted at the following states and layers:
    • At rest
      • Hardware layer
      • Infrastructure layer
      • Application layer
        • Pub/Sub individually encrypts incoming messages as soon as the message is received
    • In transit
  • Pub/Sub does not encrypt message attributes at the application layer.
  • Message attributes are still encrypted at the hardware and infrastructure layers.

Common use cases

  • Ingestion user interaction and server events
  • Real-time event distribution
  • Replicating data among databases
  • Parallel processing and workflows
  • Data streaming from IoT devices
  • Refreshing distributed caches
  • Load balancing for reliability

GCP Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

Google_Cloud_Pub/Sub