Amazon DynamoDB Streams

Amazon DynamoDB Streams

  • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
  • DynamoDB Streams is a serverless data streaming feature that makes it straightforward to track, process, and react to item-level changes in DynamoDB tables in near real-time.
  • DynamoDB Streams stores the data for the last 24 hours, after which they are erased.
  • DynamoDB Streams maintains an ordered sequence of the events per item; however, sequence across items is not maintained.
  • Example:
    • For e.g., suppose that you have a DynamoDB table tracking high scores for a game and that each item in the table represents an individual player. If you make the following three updates in this order:
      • Update 1: Change Player 1’s high score to 100 points
      • Update 2: Change Player 2’s high score to 50 points
      • Update 3: Change Player 1’s high score to 125 points
    • DynamoDB Streams will maintain the order for Player 1 score events. However, it would not maintain order across the players. So Player 2 score event is not guaranteed between the 2 Player 1 events.
  • Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time.
  • DynamoDB Streams APIs help developers consume updates and receive the item-level data before and after items are changed.

DynamoDB Streams Features

  • Streams allow reads at up to twice the rate of the provisioned write capacity of the DynamoDB table.
  • Streams have to be enabled on a per-table basis. When enabled on a table, DynamoDB captures information about every modification to data items in the table.
  • Streams support Encryption at rest to encrypt the data.
  • Streams are designed for No Duplicates so that every update made to the table will be represented exactly once in the stream.
  • Streams write stream records in near-real time so that applications can consume these streams and take action based on the contents.
  • Stream records contain information about a data modification to a single item in a DynamoDB table.
  • Each stream record has a sequence number that reflects the order in which the record was published to the stream.

Stream View Types

  • When enabling a stream on a table, you must specify the stream view type, which determines what information is written to the stream:
  • KEYS_ONLY: Only the key attributes of the modified item.
  • NEW_IMAGE: The entire item, as it appears after it was modified.
  • OLD_IMAGE: The entire item, as it appeared before it was modified.
  • NEW_AND_OLD_IMAGES: Both the new and the old images of the item (recommended for maximum flexibility).

Use Cases

  • Multi-Region Replication: Keep other data stores up-to-date with the latest changes to DynamoDB (used by DynamoDB Global Tables).
  • Real-time Analytics: Stream data to analytics services for real-time insights.
  • Event-Driven Architectures: Trigger actions based on changes made to the table.
  • Data Aggregation: Aggregate data from multiple tables into a single view.
  • Audit and Compliance: Maintain audit logs of all changes to data.
  • Search Index Updates: Keep search indexes (e.g., OpenSearch) synchronized with DynamoDB data.
  • Cache Invalidation: Invalidate caches when data changes.
  • Notifications: Send notifications when specific data changes occur.

Processing DynamoDB Streams

  • Stream records can be processed using multiple methods:

AWS Lambda

  • Most common and recommended approach for processing DynamoDB Streams.
  • Lambda polls the stream and invokes the function synchronously when new records are available.
  • Lambda automatically handles scaling, retries, and error handling.
  • Supports batch processing of stream records.
  • Can filter events using event filtering to reduce invocations and costs.

Kinesis Data Streams

  • DynamoDB can stream change data directly to Amazon Kinesis Data Streams.
  • Provides longer data retention (up to 365 days vs. 24 hours for DynamoDB Streams).
  • Enables integration with Kinesis Data Firehose, Kinesis Data Analytics, and other Kinesis consumers.
  • Supports fan-out to multiple consumers.
  • Better for high-throughput scenarios requiring multiple consumers.

Kinesis Client Library (KCL)

  • KCL can be used to build custom applications that process DynamoDB Streams.
  • DynamoDB Streams Kinesis Adapter allows KCL applications to consume DynamoDB Streams.
  • KCL 3.0 Support (June 2025): DynamoDB Streams now supports Kinesis Client Library 3.0.
    • Reduces compute costs to process streaming data by up to 33% compared to previous KCL versions.
    • Improved load balancing algorithm based on CPU utilization.
    • Enhanced performance and efficiency.
    • Note: KCL 1.x reaches end-of-support on January 30, 2026. Migrate to KCL 3.x.

AWS PrivateLink Support (March 2025)

  • Announced in March 2025, DynamoDB Streams now supports AWS PrivateLink.
  • Allows invoking DynamoDB Streams APIs from within your Amazon VPC without traversing the public internet.
  • Only interface endpoints are supported for DynamoDB Streams (gateway endpoints are not supported).
  • Enables private connectivity for stream processing applications running on-premises or in other Regions.
  • Supports FIPS endpoints in US and Canada commercial AWS Regions (announced November 2025).
  • Enhances security by keeping stream data within the AWS network.
  • Critical for compliance requirements that mandate private network connectivity.
  • Can be accessed from on-premises via AWS Direct Connect or Site-to-Site VPN.

DynamoDB Streams vs. Kinesis Data Streams

  • DynamoDB Streams:
    • 24-hour data retention
    • Automatically scales with table
    • No additional cost (included with DynamoDB)
    • Simpler to set up and use
    • Best for simple event-driven architectures
  • Kinesis Data Streams:
    • Up to 365 days data retention
    • Manual capacity management (or on-demand mode)
    • Additional cost for Kinesis
    • More complex but more flexible
    • Best for multiple consumers and longer retention needs
  • Recommendation: Use DynamoDB Streams for simple use cases with Lambda. Use Kinesis Data Streams for complex scenarios requiring multiple consumers or longer retention.

Best Practices

  • Choose the Right View Type: Use NEW_AND_OLD_IMAGES for maximum flexibility unless you have specific requirements.
  • Handle Duplicates: Although designed for no duplicates, implement idempotent processing logic.
  • Monitor Stream Processing: Use CloudWatch metrics to monitor Lambda invocations, errors, and iterator age.
  • Use Event Filtering: Filter events in Lambda to reduce unnecessary invocations and costs.
  • Batch Processing: Configure appropriate batch sizes for Lambda to optimize throughput and cost.
  • Error Handling: Implement proper error handling and configure dead-letter queues for failed records.
  • Consider Kinesis for Multiple Consumers: If you need multiple consumers, use Kinesis Data Streams instead.
  • Migrate to KCL 3.0: If using KCL, migrate to version 3.0 for cost savings and performance improvements.
  • Use PrivateLink for Security: Enable AWS PrivateLink for enhanced security and compliance.

Limitations and Considerations

  • Stream records are available for only 24 hours.
  • Streams do not guarantee ordering across different items (only per-item ordering).
  • Stream records are eventually consistent with the table.
  • Enabling streams does not affect table performance.
  • Streams cannot be enabled on tables with local secondary indexes that use non-key attributes in the projection.
  • For Global Tables with MREC, streams are enabled by default and cannot be disabled.
  • For Global Tables with MRSC, streams are not used for replication but can be enabled separately.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An application currently writes a large number of records to a DynamoDB table in one region. There is a requirement for a secondary application to retrieve new records written to the DynamoDB table every 2 hours and process the updates accordingly. Which of the following is an ideal way to ensure that the secondary application gets the relevant changes from the DynamoDB table?
    1. Insert a timestamp for each record and then scan the entire table for the timestamp as per the last 2 hours.
    2. Create another DynamoDB table with the records modified in the last 2 hours.
    3. Use DynamoDB Streams to monitor the changes in the DynamoDB table.
    4. Transfer records to S3 which were modified in the last 2 hours.
  2. A company needs to process DynamoDB stream records from an on-premises application without exposing traffic to the public internet. What should they implement?
    1. Use a NAT gateway to access DynamoDB Streams.
    2. Create an interface VPC endpoint for DynamoDB Streams using AWS PrivateLink.
    3. Create a gateway VPC endpoint for DynamoDB Streams.
    4. Use an internet gateway with security groups.
  3. A company wants to reduce costs for processing DynamoDB Streams using KCL. What should they do?
    1. Switch from KCL to Lambda for processing.
    2. Migrate from KCL 1.x to KCL 3.0 for up to 33% cost reduction.
    3. Reduce the number of shards in the stream.
    4. Increase the batch size for stream processing.
  4. A company needs to maintain an audit log of all changes to a DynamoDB table for 90 days. DynamoDB Streams only retains data for 24 hours. What is the BEST solution?
    1. Enable PITR on the DynamoDB table.
    2. Stream DynamoDB changes to Kinesis Data Streams with 90-day retention.
    3. Use Lambda to copy stream records to S3 every 24 hours.
    4. Create on-demand backups every 24 hours.
  5. A developer wants to capture both the old and new values of items when they are modified in a DynamoDB table. Which stream view type should they configure?
    1. KEYS_ONLY
    2. NEW_IMAGE
    3. OLD_IMAGE
    4. NEW_AND_OLD_IMAGES
  6. Which of the following statements about DynamoDB Streams are correct? (Select TWO)
    1. Stream records are available for 24 hours.
    2. Streams guarantee ordering across all items in the table.
    3. Streams maintain ordered sequence of events per item.
    4. Streams can be processed only by Lambda functions.
    5. Enabling streams impacts table write performance.
  7. A company has multiple applications that need to process the same DynamoDB change events. What is the BEST approach?
    1. Create multiple Lambda functions triggered by the same DynamoDB Stream.
    2. Stream DynamoDB changes to Kinesis Data Streams and use multiple consumers.
    3. Enable multiple DynamoDB Streams on the same table.
    4. Use DynamoDB Streams with fan-out to multiple Lambda functions.

References

2 thoughts on “Amazon DynamoDB Streams

Comments are closed.