Amazon Managed Streaming for Apache Kafka – MSK

Managed Streaming for Apache Kafka – MSK

  • Managed Streaming for Apache Kafka- MSK is an AWS streaming data service that manages Apache Kafka infrastructure and operations.
  • Apache Kafka
    • is an open-source, high-performance, fault-tolerant, and scalable streaming data store platform for building real-time streaming data pipelines and applications.
    • stores streaming data in a fault-tolerant way, providing a buffer between producers and consumers.
    • stores events as a continuous series of records and preserves the order in which the records were produced.
    • runs as a cluster and stores data records in topics, which are partitioned and replicated across one or more brokers that can be spread across multiple AZs for high availability.
    • allows many data producers and multiple consumers that can process data from Kafka topics on a first-in-first-out basis, preserving the order data were produced.
  • makes it easy for developers and DevOps managers to run Kafka applications and Kafka Connect connectors on AWS, without the need to become experts in operating Kafka.
  • operates, maintains, and scales Kafka clusters, provides enterprise-grade security features out of the box, and has built-in AWS integrations that accelerate development of streaming data applications.
  • always runs within a VPC managed by the MSK and is available to your own selected VPC, subnet, and security group when the cluster is setup.
  • IP addresses from the VPC are attached to the MSK resources through elastic network interfaces (ENIs), and all network traffic stays within the AWS network and is not accessible to the internet by default.
  • integrates with CloudWatch for monitoring, metrics, and logging.
  • MSK Serverless is a cluster type for MSK that makes it easy for you to run Apache Kafka clusters without having to manage compute and storage capacity. With MSK Serverless, you can run your applications without having to provision, configure, or optimize clusters, and you pay for the data volume you stream and retain.

MSK Serverless

  • MSK Serverless is a cluster type that helps run Kafka clusters without having to manage compute and storage capacity.
  • fully manages partitions, including monitoring and moving them to even load across a cluster.
  • creates 2 replicas for each partition and places them in different AZs. Additionally, MSK serverless automatically detects and recovers failed backend resources to maintain high availability.
  • encrypts all traffic in transit and all data at rest using Key Management Service (KMS).
  • allows clients to connect over a private connection using AWS PrivateLink without exposing the traffic to the public internet.
  • offers IAM Access Control to manage client authentication and client authorization to Kafka resources such as topics.

MSK Security

  • MSK uses EBS server-side encryption and KMS keys to encrypt storage volumes.
  • Clusters have encryption in transit enabled via TLS for inter-broker communication. For provisioned clusters, you can opt out of using encryption in transit when a cluster is created.
  • MSK clusters running Kafka version 2.5.1 or greater support TLS in-transit encryption between Kafka brokers and ZooKeeper nodes.
  • For provisioned clusters, you have three options:
    • IAM Access Control for both AuthN/Z (recommended),
    • TLS certificate authentication (CA) for AuthN and access control lists for AuthZ
    • SASL/SCRAM for AuthN and access control lists for AuthZ.
  • MSK recommends using IAM Access Control as it defaults to least privilege access and is the most secure option.
  • For serverless clusters, IAM Access Control can be used for both authentication and authorization.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.