Amazon S3 Replication

S3 Replication

Amazon S3 Replication

  • S3 Replication enables automatic, asynchronous copying of objects across S3 buckets in the same or different AWS regions.
  • S3 Cross-Region Replication – CRR is used to copy objects across S3 buckets in different AWS Regions.
  • S3 Same-Region Replication – SRR is used to copy objects across S3 buckets in the same AWS Regions.
  • S3 Replication helps to
    • Replicate objects while retaining metadata
    • Replicate objects into different storage classes
    • Maintain object copies under different ownership
    • Keep objects stored over multiple AWS Regions
    • Replicate objects within 15 minutes
  • S3 can replicate all or a subset of objects with specific key name prefixes
  • S3 encrypts all data in transit across AWS regions using SSL
  • Object replicas in the destination bucket are exact replicas of the objects in the source bucket with the same key names and the same metadata.
  • Objects may be replicated to a single destination bucket or multiple destination buckets.
  • Cross-Region Replication can be useful for the following scenarios:-
    • Compliance requirement to have data backed up across regions
    • Minimize latency to allow users across geography to access objects
    • Operational reasons compute clusters in two different regions that analyze the same set of objects
  • Same-Region Replication can be useful for the following scenarios:-
    • Aggregate logs into a single bucket
    • Configure live replication between production and test accounts
    • Abide by data sovereignty laws to store multiple copies

S3 Replication

S3 Replication Requirements

  • source and destination buckets must be versioning-enabled
  • for CRR, the source and destination buckets must be in different AWS regions.
  • S3 must have permission to replicate objects from that source bucket to the destination bucket on your behalf.
  • If the source bucket owner also owns the object, the bucket owner has full permission to replicate the object. If not, the source bucket owner must have permission for the S3 actions s3:GetObjectVersion and s3:GetObjectVersionACL to read the object and object ACL
  • Setting up cross-region replication in a cross-account scenario (where the source and destination buckets are owned by different AWS accounts), the source bucket owner must have permission to replicate objects in the destination bucket.
  • if the source bucket has S3 Object Lock enabled, the destination buckets must also have S3 Object Lock enabled.
  • destination buckets cannot be configured as Requester Pays buckets

S3 Replication – Replicated & Not Replicated

  • Only new objects created after you add a replication configuration are replicated. S3 does NOT retroactively replicate objects that existed before you added replication configuration.
  • Objects encrypted using customer provided keys (SSE-C), objects encrypted at rest under an S3 managed key (SSE-S3) or a KMS key stored in AWS Key Management Service (SSE-KMS).
  • S3 replicates only objects in the source bucket for which the bucket owner has permission to read objects and read ACLs
  • Any object ACL updates are replicated, although there can be some delay before S3 can bring the two in sync. This applies only to objects created after you add a replication configuration to the bucket.
  • S3 does NOT replicate objects in the source bucket for which the bucket owner does not have permission.
  • Updates to bucket-level S3 subresources are NOT replicated, allowing different bucket configurations on the source and destination buckets
  • Only customer actions are replicated & actions performed by lifecycle configuration are NOT replicated
  • Replication chaining is NOT allowed, Objects in the source bucket that are replicas, created by another replication, are NOT replicated.
  • S3 does NOT replicate the delete marker by default. However, you can add delete marker replication to non-tag-based rules to override it.
  • S3 does NOT replicate deletion by object version ID. This protects data from malicious deletions.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

S3_Replication

Amazon S3 Event Notifications

S3 Event Notifications

  • S3 notification feature enables notifications to be triggered when certain events happen in the bucket.
  • Notifications are enabled at the Bucket level
  • Notifications can be configured to be filtered by the prefix and suffix of the key name of objects. However, filtering rules cannot be defined with overlapping prefixes, overlapping suffixes, or prefix and suffix overlapping
  • S3 can publish the following events
    • New Object created events
      • Can be enabled for PUT, POST, or COPY operations
      • You will not receive event notifications from failed operations
    • Object Removal events
      • Can public delete events for object deletion, version object deletion or insertion of delete marker
      • You will not receive event notifications from automatic deletes from lifecycle policies or from failed operations.
    • Restore object events
      • restoration of objects archived to the S3 Glacier storage classes
    • Reduced Redundancy Storage (RRS) object lost events
      • Can be used to reproduce/recreate the Object
    • Replication events
      • for replication configurations that have S3 replication metrics or S3 Replication Time Control (S3 RTC) enabled
  • S3 can publish events to the following destination
  • For S3 to be able to publish events to the destination, the S3 principal should be granted the necessary permissions
  • S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

S3 Event Notifications

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

Amazon_S3_Event_Notifications

Amazon S3 Static Website Hosting

S3 Static Website Hosting

  • Amazon S3 can be used for Static Website hosting with Client-side scripts.
  • S3 does not support server-side scripting.
  • S3, in conjunction with Route 53, supports hosting a website at the root domain which can point to the S3 website endpoint
  • S3 website endpoints do not support HTTPS or access points
  • For S3 website hosting the content should be made publicly readable which can be provided using a bucket policy or an ACL on an object.
  • Users can configure the index, and error document as well as configure the conditional routing of an object name
  • Bucket policy applies only to objects owned by the bucket owner. If the bucket contains objects not owned by the bucket owner, then public READ permission on those objects should be granted using the object ACL.
  • Requester Pays buckets or DevPay buckets do not allow access through the website endpoint. Any request to such a bucket will receive a 403 -Access Denied response

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Company ABCD is currently hosting their corporate site in an Amazon S3 bucket with Static Website Hosting enabled. Currently, when visitors go to http://www.companyabcd.com the index.html page is returned. Company C now would like a new page welcome.html to be returned when a visitor enters http://www.companyabcd.com in the browser. Which of the following steps will allow Company ABCD to meet this requirement? Choose 2 answers.
    1. Upload an html page named welcome.html to their S3 bucket
    2. Create a welcome subfolder in their S3 bucket
    3. Set the Index Document property to welcome.html
    4. Move the index.html page to a welcome subfolder
    5. Set the Error Document property to welcome.html

References

Amazon_S3_Website_Hosting

AWS S3 Storage Classes

S3 Storage Classes Performance

AWS S3 Storage Classes

  • AWS S3 offers a range of S3 Storage Classes to match the use case scenario and performance access requirements.
  • S3 storage classes are designed to sustain the concurrent loss of data in one or two facilities.
  • S3 storage classes allow lifecycle management for automatic transition of objects for cost savings.
  • All S3 storage classes provide the same durability, first-byte latency, and support SSL encryption of data in transit, and data encryption at rest.
  • S3 also regularly verifies the integrity of the data using checksums and provides the auto-healing capability.

S3 Storage Classes Comparison

S3 Storage Classes Performance

S3 Standard

  • STANDARD is the default storage class, if none specified during upload
  • Low latency and high throughput performance
  • Designed for 99.999999999% i.e. 11 9’s Durability of objects across AZs
  • Designed for 99.99% availability over a given year
  • Resilient against events that impact an entire Availability Zone and is designed to sustain the loss of data in a two facilities
  • Ideal for performance-sensitive use cases and frequently accessed data
  • S3 Standard is appropriate for a wide variety of use cases, including cloud applications, dynamic websites, content distribution, mobile and gaming applications, and big data analytics.

S3 Intelligent Tiering (S3 Intelligent-Tiering)

  • S3 Intelligent Tiering storage class is designed to optimize storage costs by automatically moving data to the most cost-effective storage access tier, without performance impact or operational overhead.
  • Delivers automatic cost savings by moving data on a granular object-level between two access tiers
    • one tier that is optimized for frequent access and
    • another lower-cost tier that is optimized for infrequently accessed data.
  • a frequent access tier and a lower-cost infrequent access tier, when access patterns change.
  • Ideal to optimize storage costs automatically for long-lived data when access patterns are unknown or unpredictable.
  • For a small monthly monitoring and automation fee per object, S3 monitors access patterns of the objects and moves objects that have not been accessed for 30 consecutive days to the infrequent access tier.
  • There are no separate retrieval fees when using the Intelligent Tiering storage class. If an object in the infrequent access tier is accessed, it is automatically moved back to the frequent access tier.
  • No additional fees apply when objects are moved between access tiers
  • Suitable for objects greater than 128 KB (smaller objects are charged for 128 KB only) kept for at least 30 days (charged for a minimum of 30 days)
  • Same low latency and high throughput performance of S3 Standard
  • Designed for 99.999999999% i.e. 11 9’s Durability of objects across AZs
  • Designed for 99.9% availability over a given year

S3 Standard-Infrequent Access (S3 Standard-IA)

  • S3 Standard-Infrequent Access storage class is optimized for long-lived and less frequently accessed data. for e.g. for  backups and older data where access is limited, but the use case still demands high performance
  • Ideal for use for the primary or only copy of data that can’t be recreated.
  • Data stored redundantly across multiple geographically separated AZs and are resilient to the loss of an Availability Zone.
  • offers greater availability and resiliency than the ONEZONE_IA class.
  • Objects are available for real-time access.
  • Suitable for larger objects greater than 128 KB (smaller objects are charged for 128 KB only) kept for at least 30 days (charged for minimum 30 days)
  • Same low latency and high throughput performance of Standard
  • Designed for 99.999999999% i.e. 11 9’s Durability of objects across AZs
  • Designed for 99.9% availability over a given year
  • S3 charges a retrieval fee for these objects, so they are most suitable for infrequently accessed data.

S3 One Zone-Infrequent Access (S3 One Zone-IA)

  • S3 One Zone-Infrequent Access storage classes are designed for long-lived and infrequently accessed data, but available for millisecond access (similar to the STANDARD and STANDARD_IA storage class).
  • Ideal when the data can be recreated if the AZ fails, and for object replicas when setting cross-region replication (CRR).
  • Objects are available for real-time access.
  • Suitable for objects greater than 128 KB (smaller objects are charged for 128 KB only) kept for at least 30 days (charged for a minimum of 30 days)
  • Stores the object data in only one AZ, which makes it less expensive than Standard-Infrequent Access
  • Data is not resilient to the physical loss of the AZ resulting from disasters, such as earthquakes and floods.
  • One Zone-Infrequent Access storage class is as durable as Standard-Infrequent Access, but it is less available and less resilient.
  • Designed for 99.999999999% i.e. 11 9’s Durability of objects in a single AZ
  • Designed for 99.5% availability over a given year
  • S3 charges a retrieval fee for these objects, so they are most suitable for infrequently accessed data.

Reduced Redundancy Storage – RRS

  • NOTE – AWS recommends not to use this storage class. The STANDARD storage class is more cost-effective now.
  • Reduced Redundancy Storage (RRS) storage class is designed for non-critical, reproducible data stored at lower levels of redundancy than the STANDARD storage class, which reduces storage costs
  • Designed for durability of 99.99% of objects
  • Designed for 99.99% availability over a given year
  • Lower level of redundancy results in less durability and availability
  • RRS stores object on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive,
  • RRS does not replicate objects as many times as S3 standard storage and is designed to sustain the loss of data in a single facility.
  • If an RRS object is lost, S3 returns a 405 error on requests made to that object
  • S3 can send an event notification, configured on the bucket, to alert a user or start a workflow when it detects that an RRS object is lost which can be used to replace the lost object

S3 Glacier Instant Retrieval

  • Use for archiving data that is rarely accessed and requires milliseconds retrieval.
  • Storage class has a minimum storage duration period of 90 days
  • Designed for 99.999999999% i.e. 11 9’s Durability of objects across AZs
  • Designed for 99.99% availability

S3 Glacier Flexible Retrieval – S3 Glacier

  • S3 GLACIER storage class is suitable for low-cost data archiving where data access is infrequent and retrieval time of minutes to hours is acceptable.
  • Storage class has a minimum storage duration period of 90 days
  • Provides configurable retrieval times, from minutes to hours
    • Expedited retrieval: 1-5 mins
    • Standard retrieval: 3-5 hours
    • Bulk retrieval: 5-12 hours
  • GLACIER storage class uses the very low-cost Glacier storage service, but the objects in this storage class are still managed through S3
  • For accessing GLACIER objects,
    • the object must be restored which can take anywhere between minutes to hours
    • objects are only available for the time period (the number of days) specified during the restoration request
    • object’s storage class remains GLACIER
    • charges are levied for both the archive (GLACIER rate) and the copy restored temporarily
  • Vault Lock feature enforces compliance via a lockable policy.
  • Offers the same durability and resiliency as the STANDARD storage class
  • Designed for 99.999999999% i.e. 11 9’s Durability of objects across AZs
  • Designed for 99.99% availability

S3 Glacier Deep Archive

  • Glacier Deep Archive storage class provides the lowest-cost data archiving where data access is infrequent and retrieval time of hours is acceptable.
  • Has a minimum storage duration period of 180 days and can be accessed at a default retrieval time of 12 hours.
  • Supports long-term retention and digital preservation for data that may be accessed once or twice a year
  • Designed for 99.999999999% i.e. 11 9’s Durability of objects across AZs
  • Designed for 99.9% availability over a given year
  • DEEP_ARCHIVE retrieval costs can be reduced by using bulk retrieval, which returns data within 48 hours.
  • Ideal alternative to magnetic tape libraries

S3 Analytics – S3 Storage Classes Analysis

  • S3 Analytics – Storage Class Analysis helps analyze storage access patterns to decide when to transition the right data to the right storage class.
  • S3 Analytics feature observes data access patterns to help determine when to transition less frequently accessed STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.
  • Storage Class Analysis can be configured to analyze all the objects in a bucket or filters to group objects.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does RRS stand for when talking about S3?
    1. Redundancy Removal System
    2. Relational Rights Storage
    3. Regional Rights Standard
    4. Reduced Redundancy Storage
  2. What is the durability of S3 RRS?
    1. 99.99%
    2. 99.95%
    3. 99.995%
    4. 99.999999999%
  3. What is the Reduced Redundancy option in Amazon S3?
    1. Less redundancy for a lower cost
    2. It doesn’t exist in Amazon S3, but in Amazon EBS.
    3. It allows you to destroy any copy of your files outside a specific jurisdiction.
    4. It doesn’t exist at all
  4. An application is generating a log file every 5 minutes. The log file is not critical but may be required only for verification in case of some major issue. The file should be accessible over the internet whenever required. Which of the below mentioned options is a best possible storage solution for it?
    1. AWS S3
    2. AWS Glacier
    3. AWS RDS
    4. AWS S3 RRS (Reduced Redundancy Storage (RRS) is an Amazon S3 storage option that enables customers to store noncritical, reproducible data at lower levels of redundancy than Amazon S3’s standard storage. RRS is designed to sustain the loss of data in a single facility.)
  5. A user has moved an object to Glacier using the life cycle rules. The user requests to restore the archive after 6 months. When the restore request is completed the user accesses that archive. Which of the below mentioned statements is not true in this condition?
    1. The archive will be available as an object for the duration specified by the user during the restoration request
    2. The restored object’s storage class will be RRS (After the object is restored the storage class still remains GLACIER. Read more)
    3. The user can modify the restoration period only by issuing a new restore request with the updated period
    4. The user needs to pay storage for both RRS (restored) and Glacier (Archive) Rates
  6. Your department creates regular analytics reports from your company’s log files. All log data is collected in Amazon S3 and processed by daily Amazon Elastic Map Reduce (EMR) jobs that generate daily PDF reports and aggregated tables in CSV format for an Amazon Redshift data warehouse. Your CFO requests that you optimize the cost structure for this system. Which of the following alternatives will lower costs without compromising average performance of the system or data integrity for the raw data? [PROFESSIONAL]
    1. Use reduced redundancy storage (RRS) for PDF and CSV data in Amazon S3. Add Spot instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift. (Spot instances impacts performance)
    2. Use reduced redundancy storage (RRS) for all data in S3. Use a combination of Spot instances and Reserved Instances for Amazon EMR jobs. Use Reserved instances for Amazon Redshift (Combination of the Spot and reserved with guarantee performance and help reduce cost. Also, RRS would reduce cost and guarantee data integrity, which is different from data durability )
    3. Use reduced redundancy storage (RRS) for all data in Amazon S3. Add Spot Instances to Amazon EMR jobs. Use Reserved Instances for Amazon Redshift (Spot instances impacts performance)
    4. Use reduced redundancy storage (RRS) for PDF and CSV data in S3. Add Spot Instances to EMR jobs. Use Spot Instances for Amazon Redshift. (Spot instances impacts performance)
  7. Which of the below mentioned options can be a good use case for storing content in AWS RRS?
    1. Storing mission critical data Files
    2. Storing infrequently used log files
    3. Storing a video file which is not reproducible
    4. Storing image thumbnails
  8. A newspaper organization has an on-premises application which allows the public to search its back catalogue and retrieve individual newspaper pages via a website written in Java. They have scanned the old newspapers into JPEGs (approx. 17TB) and used Optical Character Recognition (OCR) to populate a commercial search product. The hosting platform and software is now end of life and the organization wants to migrate its archive to AWS and produce a cost efficient architecture and still be designed for availability and durability. Which is the most appropriate? [PROFESSIONAL]
    1. Use S3 with reduced redundancy to store and serve the scanned files, install the commercial search application on EC2 Instances and configure with auto-scaling and an Elastic Load Balancer. (RRS impacts durability and commercial search would add to cost)
    2. Model the environment using CloudFormation. Use an EC2 instance running Apache webserver and an open source search application, stripe multiple standard EBS volumes together to store the JPEGs and search index. (Using EBS is not cost effective for storing files)
    3. Use S3 with standard redundancy to store and serve the scanned files, use CloudSearch for query processing, and use Elastic Beanstalk to host the website across multiple availability zones. (Standard S3 and Elastic Beanstalk provides availability and durability, Standard S3 and CloudSearch provides cost effective storage and search)
    4. Use a single-AZ RDS MySQL instance to store the search index and the JPEG images use an EC2 instance to serve the website and translate user queries into SQL. (RDS is not ideal and cost effective to store files, Single AZ impacts availability)
    5. Use a CloudFront download distribution to serve the JPEGs to the end users and Install the current commercial search product, along with a Java Container for the website on EC2 instances and use Route53 with DNS round-robin. (CloudFront needs a source and using commercial search product is not cost effective)
  9. A research scientist is planning for the one-time launch of an Elastic MapReduce cluster and is encouraged by her manager to minimize the costs. The cluster is designed to ingest 200TB of genomics data with a total of 100 Amazon EC2 instances and is expected to run for around four hours. The resulting data set must be stored temporarily until archived into an Amazon RDS Oracle instance. Which option will help save the most money while meeting requirements? [PROFESSIONAL]
    1. Store ingest and output files in Amazon S3. Deploy on-demand for the master and core nodes and spot for the task nodes.
    2. Optimize by deploying a combination of on-demand, RI and spot-pricing models for the master, core and task nodes. Store ingest and output files in Amazon S3 with a lifecycle policy that archives them to Amazon Glacier. (Master and Core must be RI or On Demand. Cannot be Spot)
    3. Store the ingest files in Amazon S3 RRS and store the output files in S3. Deploy Reserved Instances for the master and core nodes and on-demand for the task nodes. (Need better durability for ingest file. Spot instances can be used for task nodes for cost saving.)
    4. Deploy on-demand master, core and task nodes and store ingest and output files in Amazon S3 RRS (Input must be in S3 standard)

AWS S3 Subresources

AWS S3 Subresources

  • S3 Subresources provides support to store, and manage the bucket configuration information.
  • S3 subresources only exist in the context of a specific bucket or object
  • S3 subresources are associated with buckets and objects.
  • S3 Subresources are subordinates to objects; i.e. they do not exist on their own, they are always associated with some other entity, such as an object or a bucket.
  • S3 supports various options to configure a bucket for e.g., the bucket can be configured for website hosting, configuration added to manage the lifecycle of objects in the bucket, and to log all access to the bucket.

S3 Object Lifecycle

Refer blog post @ S3 Object Lifecycle Management

Static Website Hosting

  • S3 can be used for Static Website hosting with Client-side scripts.
  • S3 does not support server-side scripting.
  • S3, in conjunction with Route 53, supports hosting a website at the root domain which can point to the S3 website endpoint
  • S3 website endpoints do not support HTTPS or access points
  • For S3 website hosting the content should be made publicly readable which can be provided using a bucket policy or an ACL on an object.
  • Users can configure the index, and error document as well as configure the conditional routing of an object name
  • Bucket policy applies only to objects owned by the bucket owner. If the bucket contains objects not owned by the bucket owner, then public READ permission on those objects should be granted using the object ACL.
  • Requester Pays buckets or DevPay buckets do not allow access through the website endpoint. Any request to such a bucket will receive a 403 -Access Denied response

S3 Versioning

Refer blog post @ S3 Object Versioning

Policy & Access Control List (ACL)

Refer blog post @ S3 Permissions

CORS (Cross Origin Resource Sharing)

  • All browsers implement the Same-Origin policy, for security reasons, where the web page from a domain can only request resources from the same domain.
  • CORS allows client web applications loaded in one domain access to the restricted resources to be requested from another domain.
  • With CORS support, S3 allows cross-origin access to S3 resources
  • CORS configuration rules identify the origins allowed to access the bucket, the operations (HTTP methods) that would be supported for each origin, and other operation-specific information.

S3 Access Logs

  • S3 Access Logs enable tracking access requests to an S3 bucket.
  • S3 Access logs are disabled by default.
  • Each access log record provides details about a single access request, such as the requester, bucket name, request time, request action, response status, and error code, etc.
  • Access log information can be useful in security and access audits and also help learn about the customer base and understand the S3 bill.
  • S3 periodically collects access log records, consolidates the records in log files, and then uploads log files to a target bucket as log objects.
  • Logging can be enabled on multiple source buckets with the same target bucket which will have access logs for all those source buckets, but each log object will report access log records for a specific source bucket.
  • Source and target buckets should be in the same region.
  • Source and target buckets should be different to avoid an infinite loop of logs issue.
  • Target bucket can be encrypted using SSS-S3 default encryption. However, Default encryption with AWS KMS keys (SSE-KMS) is not supported.
  • S3 Object Lock cannot be enabled on the target bucket.
  •  S3 uses a special log delivery account to write server access logs.
    • AWS recommends updating the bucket policy on the target bucket to grant access to the logging service principal (logging.s3.amazonaws.com) for access log delivery.
    • Access for access log delivery can also be granted to the S3 log delivery group through the bucket ACL. Granting access to the S3 log delivery group using your bucket ACL is not recommended.
  • Access log records are delivered on a best-effort basis. The completeness and timeliness of server logging is not guaranteed i.e. log record for a particular request might be delivered long after the request was actually processed, or it might not be delivered at all.
  • S3 Access Logs can be analyzed using data analysis tools or Athena.

Tagging

  • S3 provides the tagging subresource to store and manage tags on a bucket
  • Cost allocation tags can be added to the bucket to categorize and track AWS costs.
  • AWS can generate a cost allocation report with usage and costs aggregated by the tags applied to the buckets.

Location

  • AWS region needs to be specified during bucket creation and it cannot be changed.
  • S3 stores this information in the location subresource and provides an API for retrieving this information

Event Notifications

  • S3 notification feature enables notifications to be triggered when certain events happen in the bucket.
  • Notifications are enabled at the Bucket level
  • Notifications can be configured to be filtered by the prefix and suffix of the key name of objects. However, filtering rules cannot be defined with overlapping prefixes, overlapping suffixes, or prefix and suffix overlapping
  • S3 can publish the following events
    • New Object created events
      • Can be enabled for PUT, POST, or COPY operations
      • You will not receive event notifications from failed operations
    • Object Removal events
      • Can public delete events for object deletion, version object deletion or insertion of delete marker
      • You will not receive event notifications from automatic deletes from lifecycle policies or from failed operations.
    • Restore object events
      • restoration of objects archived to the S3 Glacier storage classes
    • Reduced Redundancy Storage (RRS) object lost events
      • Can be used to reproduce/recreate the Object
    • Replication events
      • for replication configurations that have S3 replication metrics or S3 Replication Time Control (S3 RTC) enabled
  • S3 can publish events to the following destination
  • For S3 to be able to publish events to the destination, the S3 principal should be granted the necessary permissions
  • S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

Cross-Region Replication & Same-Region Replication

  • S3 Replication enables automatic, asynchronous copying of objects across S3 buckets in the same or different AWS regions.
  • S3 Cross-Region Replication – CRR is used to copy objects across S3 buckets in different AWS Regions.
  • S3 Same-Region Replication – SRR is used to copy objects across S3 buckets in the same AWS Regions.
  • S3 Replication helps to
    • Replicate objects while retaining metadata
    • Replicate objects into different storage classes
    • Maintain object copies under different ownership
    • Keep objects stored over multiple AWS Regions
    • Replicate objects within 15 minutes
  • S3 can replicate all or a subset of objects with specific key name prefixes
  • S3 encrypts all data in transit across AWS regions using SSL
  • Object replicas in the destination bucket are exact replicas of the objects in the source bucket with the same key names and the same metadata.
  • Objects may be replicated to a single destination bucket or multiple destination buckets.
  • Cross-Region Replication can be useful for the following scenarios:-
    • Compliance requirement to have data backed up across regions
    • Minimize latency to allow users across geography to access objects
    • Operational reasons compute clusters in two different regions that analyze the same set of objects
  • Same-Region Replication can be useful for the following scenarios:-
    • Aggregate logs into a single bucket
    • Configure live replication between production and test accounts
    • Abide by data sovereignty laws to store multiple copies
  • Replication Requirements
    • source and destination buckets must be versioning-enabled
    • for CRR, the source and destination buckets must be in different AWS regions.
    • S3 must have permission to replicate objects from that source bucket to the destination bucket on your behalf.
    • If the source bucket owner also owns the object, the bucket owner has full permission to replicate the object. If not, the source bucket owner must have permission for the S3 actions s3:GetObjectVersion and s3:GetObjectVersionACL to read the object and object ACL
    • Setting up cross-region replication in a cross-account scenario (where the source and destination buckets are owned by different AWS accounts), the source bucket owner must have permission to replicate objects in the destination bucket.
    • if the source bucket has S3 Object Lock enabled, the destination buckets must also have S3 Object Lock enabled.
    • destination buckets cannot be configured as Requester Pays buckets
  • Replicated & Not Replicated
    • Only new objects created after you add a replication configuration are replicated. S3 does NOT retroactively replicate objects that existed before you added replication configuration.
    • Objects encrypted using customer provided keys (SSE-C), objects encrypted at rest under an S3 managed key (SSE-S3) or a KMS key stored in AWS Key Management Service (SSE-KMS).
    • S3 replicates only objects in the source bucket for which the bucket owner has permission to read objects and read ACLs
    • Any object ACL updates are replicated, although there can be some delay before S3 can bring the two in sync. This applies only to objects created after you add a replication configuration to the bucket.
    • S3 does NOT replicate objects in the source bucket for which the bucket owner does not have permission.
    • Updates to bucket-level S3 subresources are NOT replicated, allowing different bucket configurations on the source and destination buckets
    • Only customer actions are replicated & actions performed by lifecycle configuration are NOT replicated
    • Replication chaining is NOT allowed, Objects in the source bucket that are replicas, created by another replication, are NOT replicated.
    • S3 does NOT replicate the delete marker by default. However, you can add delete marker replication to non-tag-based rules to override it.
    • S3 does NOT replicate deletion by object version ID. This protects data from malicious deletions.

S3 Inventory

  • S3 Inventory helps manage the storage and can be used to audit and report on the replication and encryption status of the objects for business, compliance, and regulatory needs.
  • S3 inventory provides a scheduled alternative to the S3 synchronous List API operation.
  • S3 inventory provides CSV, ORC, or Apache Parquet output files that list the objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix.

Requester Pays

  • By default, buckets are owned by the AWS account that created it (the bucket owner) and the AWS account pays for storage costs, downloads, and data transfer charges associated with the bucket.
  • Using Requester Pays subresource:-
    • Bucket owner specifies that the requester requesting the download will be charged for the download
    • However, the bucket owner still pays the storage costs
  • Enabling Requester Pays on a bucket
    • disables anonymous access to that bucket
    • does not support BitTorrent
    • does not support SOAP requests
    • cannot be enabled for end-user logging bucket

Torrent

  • Default distribution mechanism for S3 data is via client/server download
  • Bucket owner bears the cost of Storage as well as the request and transfer charges which can increase linearly for a popular object
  • S3 also supports the BitTorrent protocol
    • BitTorrent is an open-source Internet distribution protocol
    • BitTorrent addresses this problem by recruiting the very clients that are downloading the object as distributors themselves
    • S3 bandwidth rates are inexpensive, but BitTorrent allows developers to further save on bandwidth costs for a popular piece of data by letting users download from Amazon and other users simultaneously
  • Benefit for a publisher is that for large, popular files the amount of data actually supplied by S3 can be substantially lower than what it would have been serving the same clients via client/server download
  • Any object in S3 that is publicly available and can be read anonymously can be downloaded via BitTorrent
  • Torrent file can be retrieved for any publicly available object by simply adding a “?torrent” query string parameter at the end of the REST GET request for the object
  • Generating the .torrent for an object takes time proportional to the size of that object, so its recommended to make a first torrent request yourself to generate the file so that subsequent requests are faster
  • Torrent is enabled only for objects that are less than 5 GB in size.
  • Torrent subresource can only be retrieved, and cannot be created, updated, or deleted

Object ACL

Refer blog post @ S3 Permissions

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An organization’s security policy requires multiple copies of all critical data to be replicated across at least a primary and backup data center. The organization has decided to store some critical data on Amazon S3. Which option should you implement to ensure this requirement is met?
    1. Use the S3 copy API to replicate data between two S3 buckets in different regions
    2. You do not need to implement anything since S3 data is automatically replicated between regions
    3. Use the S3 copy API to replicate data between two S3 buckets in different facilities within an AWS Region
    4. You do not need to implement anything since S3 data is automatically replicated between multiple facilities within an AWS Region
  2. A customer wants to track access to their Amazon Simple Storage Service (S3) buckets and also use this information for their internal security and access audits. Which of the following will meet the Customer requirement?
    1. Enable AWS CloudTrail to audit all Amazon S3 bucket access.
    2. Enable server access logging for all required Amazon S3 buckets
    3. Enable the Requester Pays option to track access via AWS Billing
    4. Enable Amazon S3 event notifications for Put and Post.
  3. A user is enabling a static website hosting on an S3 bucket. Which of the below mentioned parameters cannot be configured by the user?
    1. Error document
    2. Conditional error on object name
    3. Index document
    4. Conditional redirection on object name
  4. Company ABCD is running their corporate website on Amazon S3 accessed from http//www.companyabcd.com. Their marketing team has published new web fonts to a separate S3 bucket accessed by the S3 endpoint: https://s3-us-west1.amazonaws.com/abcdfonts. While testing the new web fonts, Company ABCD recognized the web fonts are being blocked by the browser. What should Company ABCD do to prevent the web fonts from being blocked by the browser?
    1. Enable versioning on the abcdfonts bucket for each web font
    2. Create a policy on the abcdfonts bucket to enable access to everyone
    3. Add the Content-MD5 header to the request for webfonts in the abcdfonts bucket from the website
    4. Configure the abcdfonts bucket to allow cross-origin requests by creating a CORS configuration
  5. Company ABCD is currently hosting their corporate site in an Amazon S3 bucket with Static Website Hosting enabled. Currently, when visitors go to http://www.companyabcd.com the index.html page is returned. Company C now would like a new page welcome.html to be returned when a visitor enters http://www.companyabcd.com in the browser. Which of the following steps will allow Company ABCD to meet this requirement? Choose 2 answers.
    1. Upload an html page named welcome.html to their S3 bucket
    2. Create a welcome subfolder in their S3 bucket
    3. Set the Index Document property to welcome.html
    4. Move the index.html page to a welcome subfolder
    5. Set the Error Document property to welcome.html

AWS S3 Security

AWS S3 Security

  • AWS S3 Security is a shared responsibility between AWS and the Customer
  • S3 is a fully managed service that is protected by the AWS global network security procedures
  • AWS handles basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery.
  • Security and compliance of S3 are assessed by third-party auditors as part of multiple AWS compliance programs including SOC, PCI DSS, HIPAA, etc.
  • S3 provides several other features to handle security, which are the customers’ responsibility.
  • S3 Encryption supports both data at rest and data in transit encryption.
    • Data in transit encryption can be provided by enabling communication via SSL or using client-side encryption
    • Data at rest encryption can be provided using Server Side or Client Side encryption
  • S3 permissions can be handled using
  • S3 Object Lock helps to store objects using a WORM model and can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
  • S3 Access Points simplify data access for any AWS service or customer application that stores data in S3.
  • S3 Versioning with MFA Delete can be enabled on a bucket to ensure that data in the bucket cannot be accidentally overwritten or deleted.
  • S3 Block Public Access provides controls across an entire AWS Account or at the individual S3 bucket level to ensure that objects never have public access, now and in the future.
  • S3 Access Analyzer monitors the access policies, ensuring that the policies provide only the intended access to your S3 resources.

S3 Encryption

  • S3 allows the protection of data in transit by enabling communication via SSL or using client-side encryption
  • S3 provides data-at-rest encryption using
    • Server-Side Encryption: S3 handles the encryption
      • SSE-S3
        • S3 handles the encryption and decryption using S3 managed keys
      • SSE-KMS
        • S3 handles the encryption and decryption using keys managed through AWS KMS.
      • SSE-C
        • S3 handles the encryption and decryption using keys managed and provided by the Customer.
    • Client Side Encryption: Customer handles the encryption
      • CSE-CMK
        • Customer handles the encryption and decryption using keys managed through AWS KMS.
      • Client-side Master Key
        • Customer handles the encryption and decryption using keys managed by them.

S3 Permissions

Refer blog post @ S3 Permissions

S3 Object Lock

  • S3 Object Lock helps to store objects using a write-once-read-many (WORM) model.
  • can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
  • can help meet regulatory requirements that require WORM storage or add an extra layer of protection against object changes and deletion.
  • can be enabled only for new buckets and works only in versioned buckets.
  • provides two retention modes that apply different levels of protection to the objects
    • Governance mode
      • Users can’t overwrite or delete an object version or alter its lock settings unless they have special permissions.
      • Objects can be protected from being deleted by most users, but some users can be granted permission to alter the retention settings or delete the object if necessary.
      • Can be used to test retention-period settings before creating a compliance-mode retention period.
    • Compliance mode
      • A protected object version can’t be overwritten or deleted by any user, including the root user in the AWS account.
      • Object retention mode can’t be changed, and its retention period can’t be shortened.
      • Object versions can’t be overwritten or deleted for the duration of the retention period.

S3 Access Points

  • S3 access points simplify data access for any AWS service or customer application that stores data in S3.
  • Access points are named network endpoints that are attached to buckets and can be used to perform S3 object operations, such as GetObject and PutObject.
  • Each access point has distinct permissions and network controls that S3 applies for any request that is made through that access point.
  • Each access point enforces a customized access point policy that works in conjunction with the bucket policy, attached to the underlying bucket.
  • An access point can be configured to accept requests only from a VPC to restrict S3 data access to a private network.
  • Custom block public access settings can be configured for each access point.

S3 VPC Gateway Endpoint

  • A VPC endpoint enables connections between a VPC and supported services, without requiring that you use an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.
  • VPC is not exposed to the public internet.
  • Gateway Endpoint is a gateway that is a target for a route in your route table used for traffic destined to either S3.

S3 Block Public Access

  • S3 Block Public Access provides controls across an entire AWS Account or at the individual S3 bucket level to ensure that objects never have public access, now and in the future.
  • S3 Block Public Access provides settings for access points, buckets, and accounts to help manage public access to S3 resources.
  • By default, new buckets, access points, and objects don’t allow public access. However, users can modify bucket policies, access point policies, or object permissions to allow public access.
  • S3 Block Public Access settings override these policies and permissions so that public access to these resources can be limited.
  • S3 Block Public Access allows account administrators and bucket owners to easily set up centralized controls to limit public access to their S3 resources that are enforced regardless of how the resources are created.
  • S3 doesn’t support block public access settings on a per-object basis.
  • S3 Block Public Access settings when applied to an account apply to all AWS Regions globally.

S3 Access Analyzer

  • S3 Access Analyzer monitors the access policies, ensuring that the policies provide only the intended access to your S3 resources.
  • S3 Access Analyzer evaluates the bucket access policies and enables you to discover and swiftly remediate buckets with potentially unintended access.

S3 Security Best Practices

S3 Preventative Security Best Practices

  • Ensure S3 buckets use the correct policies and are not publicly accessible
    • Use S3 block public access
    • Identify Bucket policies and ACLs that allow public access
    • Use AWS Trusted Advisor to inspect the S3 implementation.
  • Implement least privilege access
  • Use IAM roles for applications and AWS services that require S3 access
  • Enable Multi-factor authentication (MFA) Delete to help prevent accidental bucket deletions
  • Consider Data at Rest Encryption
  • Enforce Data in Transit Encryption
  • Consider S3 Object Lock to store objects using a “Write Once Read Many” (WORM) model.
  • Enable versioning to easily recover from both unintended user actions and application failures.
  • Consider S3 Cross-Region replication
  • Consider VPC endpoints for S3 access to provide private S3 connectivity and help prevent traffic from potentially traversing the open internet.

S3 Monitoring and Auditing Best Practices

  • Identify and Audit all S3 buckets to have visibility of all the S3 resources to assess their security posture and take action on potential areas of weakness.
  • Implement monitoring using AWS monitoring tools
  • Enable S3 server access logging, which provides detailed records of the requests that are made to a bucket useful for security and access audits
  • Use AWS CloudTrail, which provides a record of actions taken by a user, a role, or an AWS service in S3.
  • Enable AWS Config, which enables you to assess, audit, and evaluate the configurations of the AWS resources
  • Consider using Amazon Macie with S3 to automatically discover, classify, and protect sensitive data in AWS.
  • Monitor AWS security advisories to regularly check security advisories posted in Trusted Advisor for the AWS account.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.

References

AWS_S3_Security

AWS S3 Object Lock

AWS S3 Object Lock

  • S3 Object Lock helps to store objects using a write-once-read-many (WORM) model.
  • can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
  • can help meet regulatory requirements that require WORM storage or add an extra layer of protection against object changes and deletion.
  • can be enabled only for new buckets. For an existing bucket, you need to contact AWS Support.
  • works only in versioned buckets.
  • Once Object Lock is enabled
    • Object Lock can’t be disabled
    • automatically enables versioning for the bucket
    • versioning can’t be suspended for the bucket.
  • provides two ways to manage object retention.
    • Retention period
      • protects an object version for a fixed amount of time, during which an object remains locked.
      • During this period, the object is WORM-protected and can’t be overwritten or deleted.
      • can be applied on an object version either explicitly or through a bucket default setting.
      • S3 stores a timestamp in the object version’s metadata to indicate when the retention period expires. After the retention period expires, the object version can be overwritten or deleted unless you also placed a legal hold on the object version.
    • Legal hold
      • protects an object version, as a retention period, but it has no expiration date.
      • remains in place until you explicitly remove it.
      • can be freely placed and removed by any user who has the s3:PutObjectLegalHold permission.
      • are independent of retention periods.
    • Retention periods and legal holds apply to individual object versions.
    • Placing a retention period or legal hold on an object protects only the version specified in the request. It doesn’t prevent new versions of the object from being created.
    • An object version can have both a retention period and a legal hold, one but not the other, or neither.
  • provides two retention modes that apply different levels of protection to the objects
    • Governance mode
    • Compliance mode
  • S3 buckets with S3 Object Lock can’t be used as destination buckets for server access logs.
  •  has been assessed by Cohasset Associates for use in environments that are subject to SEC 17a-4, CFTC, and FINRA regulations.

S3 Object Lock – Retention Modes

Governance mode

  • Users can’t overwrite or delete an object version or alter its lock settings unless they have special permissions.
  • Objects can be protected from being deleted by most users, but some users can be granted permission to alter the retention settings or delete the object if necessary.
  • Can be used to test retention-period settings before creating a compliance-mode retention period.
  • To override or remove governance-mode retention settings, a user must have the s3:BypassGovernanceRetention permission and must explicitly include x-amz-bypass-governance-retention:true as a request header.

Compliance mode

  • A protected object version can’t be overwritten or deleted by any user, including the root user in the AWS account.
  • Object retention mode can’t be changed, and its retention period can’t be shortened.
  • Object versions can’t be overwritten or deleted for the duration of the retention period.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company needs to store its accounting records in Amazon S3. No one at the company; including administrative users and root users, should be able to delete the records for an entire 10-year period. The records must be stored with maximum resiliency. Which solution will meet these requirements?
    1. Use an access control policy to deny deletion of the records for a period of 10 years.
    2. Use an IAM policy to deny deletion of the records. After 10 years, change the IAM policy to allow deletion.
    3. Use S3 Object Lock in compliance mode for a period of 10 years.
    4. Use S3 Object Lock in governance mode for a period of 10 years.

References

Amazon_S3_Object_Lock

AWS S3 Encryption

AWS S3 Encryption

  • AWS S3 Encryption supports both data at rest and data in transit encryption.
  • Data in-transit
    • S3 allows protection of data in transit by enabling communication via SSL or using client-side encryption
  • Data at Rest
    • Server-Side Encryption
      • S3 encrypts the object before saving it on disks in its data centers and decrypt it when the objects are downloaded
    • Client-Side Encryption
      • data is encrypted at the client-side and uploaded to S3.
      • the encryption process, the encryption keys, and related tools are managed by the user.

S3 Server-Side Encryption

  • Server-side encryption is about data encryption at rest
  • Server-side encryption encrypts only the object data.
  • Any object metadata is not encrypted.
  • S3 handles the encryption (as it writes to disks) and decryption (when objects are accessed) of the data objects
  • There is no difference in the access mechanism for both encrypted and unencrypted objects and is handled transparently by S3

Server-Side Encryption with S3-Managed Keys – SSE-S3

  • Encryption keys are handled and managed by AWS
  • Each object is encrypted with a unique data key employing strong multi-factor encryption.
  • SSE-S3 encrypts the data key with a master key that is regularly rotated.
  • S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt the data.
  • Whether or not objects are encrypted with SSE-S3 can’t be enforced when they are uploaded using pre-signed URLs, because the only way server-side encryption can be specified is through the AWS Management Console or through an HTTP request header.
  • Request must set header x-amz-server-side-encryption to AES-256
  • For enforcing server-side encryption for all of the objects that are stored in a bucket, use a bucket policy that denies permissions to upload an object unless the request includes x-amz-server-side-encryption header to request server-side encryption.
SSE-S3 : Server Side Encryption using S3 Managed Keys
Source: Oreilly

Server-Side Encryption with AWS KMS-Managed Keys – SSE-KMS

Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)

  • SSE-KMS is similar to SSE-S3, but it uses AWS Key Management Services (KMS) which provides additional benefits along with additional charges
    • KMS is a service that combines secure, highly available hardware and software to provide a key management system scaled for the cloud.
    • KMS uses customer master keys (CMKs) to encrypt the S3 objects.
    • The master key is never made available.
    • KMS enables you to centrally create encryption keys, and define the policies that control how keys can be used.
    • Allows audit of keys used to prove they are being used correctly, by inspecting logs in AWS CloudTrail.
    • Allows keys to be temporarily disabled and re-enabled.
    • Allows keys to be rotated regularly.
    • Security controls in AWS KMS can help meet encryption-related compliance requirements..
  • SSE-KMS enables separate permissions for the use of an envelope key (that is, a key that protects the data’s encryption key) that provides added protection against unauthorized access to the objects in S3.
  • SSE-KMS provides the option to create and manage encryption keys yourself, or use a default customer master key (CMK) that is unique to you, the service you’re using, and the region you’re working in.
  • Creating and Managing CMK gives more flexibility, including the ability to create, rotate, disable, and define access controls, and audit the encryption keys used to protect the data.
  • Data keys used to encrypt the data are also encrypted and stored alongside the data they protect and are unique to each object.
  • Process flow
    • An application or AWS service client requests an encryption key to encrypt data and passes a reference to a master key under the account.
    • Client requests are authenticated based on whether they have access to use the master key.
    • A new data encryption key is created, and a copy of it is encrypted under the master key.
    • Both the data key and encrypted data key are returned to the client.
    • Data key is used to encrypt customer data and then deleted as soon as is practical.
    • Encrypted data key is stored for later use and sent back to AWS KMS when the source data needs to be decrypted.
  • S3 only supports symmetric keys and not asymmetric keys.
  • Must set header x-amz-server-side-encryption to aws:kms
SSE-KMS : Server Side Encryption using AWS KMS managed keys
Source: Oreilly

Server-Side Encryption with Customer-Provided Keys – SSE-C

AWS S3 Server Side Encryption using Customer Provided Keys SSE-C

  • Encryption keys can be managed and provided by the Customer and S3 manages the encryption, as it writes to disks, and decryption, when you access the objects
  • When you upload an object, the encryption key is provided as a part of the request and S3 uses that encryption key to apply AES-256 encryption to the data and removes the encryption key from memory.
  • When you download an object, the same encryption key should be provided as a part of the request. S3 first verifies the encryption key and if it matches the object is decrypted before returning back to you.
  • As each object and each object’s version can be encrypted with a different key, you are responsible for maintaining the mapping between the object and the encryption key used.
  • SSE-C requests must be done through HTTPS and S3 will reject any requests made over HTTP when using SSE-C.
  • For security considerations, AWS recommends considering any key sent erroneously using HTTP to be compromised and it should be discarded or rotated.
  • S3 does not store the encryption key provided. Instead, a randomly salted HMAC value of the encryption key is stored which can be used to validate future requests. The salted HMAC value cannot be used to decrypt the contents of the encrypted object or to derive the value of the encryption key. That means, if you lose the encryption key, you lose the object.
SSE-C : Server-Side Encryption with Customer-Provided Keys
Source: Oreilly

Client-Side Encryption

Client-side encryption refers to encrypting data before sending it to S3 and decrypting the data after downloading it

AWS KMS-managed Customer Master Key – CMK

  • Customer can maintain the encryption CMK with AWS KMS and can provide the CMK id to the client to encrypt the data
  • Uploading Object
    • AWS S3 encryption client first sends a request to AWS KMS for the key to encrypt the object data.
    • AWS KMS returns a randomly generated data encryption key with 2 versions a plain text version for encrypting the data and cypher blob to be uploaded with the object as object metadata
    • Client obtains a unique data encryption key for each object it uploads.
    • AWS S3 encryption client uploads the encrypted data and the cipher blob with object metadata
  • Download Object
    • AWS Client first downloads the encrypted object along with the cipher blob version of the data encryption key stored as object metadata
    • AWS Client then sends the cipher blob to AWS KMS to get the plain text version of the same, so that it can decrypt the object data.
Client Side Encryption - Customer Master Keys CSE-CMK
Source: Oreilly

Client-Side master key

  • Encryption master keys are completely maintained at the Client-side
  • Uploading Object
    • S3 encryption client ( for e.g. AmazonS3EncryptionClient in the AWS SDK for Java) locally generates randomly a one-time-use symmetric key (also known as a data encryption key or data key).
    • Client encrypts the data encryption key using the customer provided master key
    • Client uses this data encryption key to encrypt the data of a single S3 object (for each object, the client generates a separate data key).
    • Client then uploads the encrypted data to S3 and also saves the encrypted data key and its material description as object metadata ( x-amz-meta-x-amz-key)  in S3 by default
  • Downloading Object
    • Client first downloads the encrypted object from S3 along with the object metadata.
    • Using the material description in the metadata, the client first determines which master key to use to decrypt the encrypted data key.
    • Using that master key, the client decrypts the data key and uses it to decrypt the object
  • Client-side master keys and your unencrypted data are never sent to AWS
  • If the master key is lost the data cannot be decrypted

Enforcing S3 Encryption

  • S3 Encryption in Transit
    • S3 Bucket Policy can be used to enforce SSL communication with S3 using the effect deny with condition  aws:SecureTransport set to false.
  • S3 Default Encryption
    • helps set the default encryption behaviour for an S3 bucket so that all new objects are encrypted when they are stored in the bucket.
    • Objects are encrypted using SSE with either S3-managed keys (SSE-S3) or AWS KMS keys stored in AWS KMS (SSE-KMS).
  • S3 Bucket Policy
    • can be applied that denies permissions to upload an object unless the request includes x-amz-server-side-encryption header to request server-side encryption.
    • is not required, if S3 default encryption is enabled
    • are evaluated before the default encryption.

S3 Bucket Policy Enforce Encryption

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is storing data on Amazon Simple Storage Service (S3). The company’s security policy mandates that data is encrypted at rest. Which of the following methods can achieve this? Choose 3 answers
    1. Use Amazon S3 server-side encryption with AWS Key Management Service managed keys
    2. Use Amazon S3 server-side encryption with customer-provided keys
    3. Use Amazon S3 server-side encryption with EC2 key pair.
    4. Use Amazon S3 bucket policies to restrict access to the data at rest.
    5. Encrypt the data on the client-side before ingesting to Amazon S3 using their own master key
    6. Use SSL to encrypt the data while in transit to Amazon S3.
  2. A user has enabled versioning on an S3 bucket. The user is using server side encryption for data at Rest. If the user is supplying his own keys for encryption (SSE-C) which of the below mentioned statements is true?
    1. The user should use the same encryption key for all versions of the same object
    2. It is possible to have different encryption keys for different versions of the same object
    3. AWS S3 does not allow the user to upload his own keys for server side encryption
    4. The SSE-C does not work when versioning is enabled
  3. A storage admin wants to encrypt all the objects stored in S3 using server side encryption. The user does not want to use the AES 256 encryption key provided by S3. How can the user achieve this?
    1. The admin should upload his secret key to the AWS console and let S3 decrypt the objects
    2. The admin should use CLI or API to upload the encryption key to the S3 bucket. When making a call to the S3 API mention the encryption key URL in each request
    3. S3 does not support client supplied encryption keys for server side encryption
    4. The admin should send the keys and encryption algorithm with each API call
  4. A user has enabled versioning on an S3 bucket. The user is using server side encryption for data at rest. If the user is supplying his own keys for encryption (SSE-C), what is recommended to the user for the purpose of security?
    1. User should not use his own security key as it is not secure
    2. Configure S3 to rotate the user’s encryption key at regular intervals
    3. Configure S3 to store the user’s keys securely with SSL
    4. Keep rotating the encryption key manually at the client side
  5. A system admin is planning to encrypt all objects being uploaded to S3 from an application. The system admin does not want to implement his own encryption algorithm; instead he is planning to use server side encryption by supplying his own key (SSE-C.. Which parameter is not required while making a call for SSE-C?
    1. x-amz-server-side-encryption-customer-key-AES-256
    2. x-amz-server-side-encryption-customer-key
    3. x-amz-server-side-encryption-customer-algorithm
    4. x-amz-server-side-encryption-customer-key-MD5
  6. You are designing a personal document-archiving solution for your global enterprise with thousands of employee. Each employee has potentially gigabytes of data to be backed up in this archiving solution. The solution will be exposed to he employees as an application, where they can just drag and drop their files to the archiving system. Employees can retrieve their archives through a web interface. The corporate network has high bandwidth AWS DirectConnect connectivity to AWS. You have regulatory requirements that all data needs to be encrypted before being uploaded to the cloud. How do you implement this in a highly available and cost efficient way?
    1. Manage encryption keys on-premise in an encrypted relational database. Set up an on-premises server with sufficient storage to temporarily store files and then upload them to Amazon S3, providing a client-side master key. (Storing temporary increases cost and not a high availability option)
    2. Manage encryption keys in a Hardware Security Module(HSM) appliance on-premise server with sufficient storage to temporarily store, encrypt, and upload files directly into amazon Glacier. (Not cost effective)
    3. Manage encryption keys in amazon Key Management Service (KMS), upload to amazon simple storage service (s3) with client-side encryption using a KMS customer master key ID and configure Amazon S3 lifecycle policies to store each object using the amazon glacier storage tier. (with CSE-KMS the encryption happens at client side before the object is upload to S3 and KMS is cost effective as well)
    4. Manage encryption keys in an AWS CloudHSM appliance. Encrypt files prior to uploading on the employee desktop and then upload directly into amazon glacier (Not cost effective)
  7. A user has enabled server side encryption with S3. The user downloads the encrypted object from S3. How can the user decrypt it?
    1. S3 does not support server side encryption
    2. S3 provides a server side key to decrypt the object
    3. The user needs to decrypt the object using their own private key
    4. S3 manages encryption and decryption automatically
  8. When uploading an object, what request header can be explicitly specified in a request to Amazon S3 to encrypt object data when saved on the server side?
    1. x-amz-storage-class
    2. Content-MD5
    3. x-amz-security-token
    4. x-amz-server-side-encryption
  9. A company must ensure that any objects uploaded to an S3 bucket are encrypted. Which of the following actions should the SysOps Administrator take to meet this requirement? (Select TWO.)
    1. Implement AWS Shield to protect against unencrypted objects stored in S3 buckets.
    2. Implement Object access control list (ACL) to deny unencrypted objects from being uploaded to the S3 bucket.
    3. Implement Amazon S3 default encryption to make sure that any object being uploaded is encrypted before it is stored.
    4. Implement Amazon Inspector to inspect objects uploaded to the S3 bucket to make sure that they are encrypted.
    5. Implement S3 bucket policies to deny unencrypted objects from being uploaded to the buckets.

References

AWS_S3_Encryption

AWS S3 Versioning

S3 Versioning

  • S3 Versioning helps to keep multiple variants of an object in the same bucket and can be used to preserve, retrieve, and restore every version of every object stored in the S3 bucket.
  • S3 Object Versioning can be used to protect from unintended overwrites and accidental deletions
  • As Versioning maintains multiple copies of the same objects as a whole and charges accrue for multiple versions for e.g. for a 1GB file with 5 copies with minor differences would consume 5GB of S3 storage space and you would be charged for the same.
  • Buckets can be in one of the three states
    • Unversioned (the default)
    • Versioning-enabled
    • Versioning-suspended
  • S3 Object Versioning is not enabled by default and has to be explicitly enabled for each bucket.
  • Versioning once enabled, cannot be disabled and can only be suspended
  • Versioning enabled on a bucket applies to all the objects within the bucket
  • Permissions are set at the version level. Each version has its own object owner; an AWS account that creates the object version is the owner. So, you can set different permissions for different versions of the same object.
  • Irrespective of the Versioning, each object in the bucket has a version.
    • For Non Versioned bucket, the version ID for each object is null
    • For Versioned buckets, a unique version ID is assigned to each object
  • With Versioning, version ID forms a key element to define the uniqueness of an object within a bucket along with the bucket name and object key

Object Retrieval

  • For Non Versioned bucket
    • An Object retrieval always returns the only object available.
  • For Versioned bucket
    • An object retrieval returns the Current latest object.
    • Non-Current objects can be retrieved by specifying the version ID.

Object Addition

  • For Non Versioned bucket
    • If an object with the same key is uploaded again it overwrites the object
  • For Versioned bucket
    • If an object with the same key is uploaded, the newly uploaded object becomes the current version and the previous object becomes the non-current version.
    • A non-current versioned object can be retrieved and restored hence protecting against accidental overwrites

Object Deletion

  • For Non Versioned bucket
    • An object is permanently deleted and cannot be recovered
  • For the Versioned bucket,
    • All versions remain in the bucket and Amazon inserts a delete marker which becomes the Current version
    • A non-current versioned object can be retrieved and restored hence protecting against accidental overwrites
    • If an Object with a specific version ID is deleted, a permanent deletion happens and the object cannot be recovered

Delete marker

  • Delete Marker object does not have any data or ACL associated with it, just the key and the version ID
  • An object retrieval on a bucket with a delete marker as the Current version would return a 404
  • Only a DELETE operation is allowed on the Delete Marker object
  • If the Delete marker object is deleted by specifying its version ID, the previous non-current version object becomes the current version object
  • If a DELETE request is fired on an object with Delete Marker as the current version, the Delete marker object is not deleted but a Delete Marker is added again

S3 Versioning - Delete Operation

Restoring Previous Versions

  • Copy a previous version of the object into the same bucket. The copied object becomes the current version of that object and all object versions are preserved – Recommended as it keeps all the versions.
  • Permanently delete the current version of the object. When you delete the current object version, you, in effect, turn the previous version into the current version of that object.

Versioning Suspended Bucket

  • Versioning can be suspended to stop accruing new versions of the same object in a bucket.
  • Existing objects in the bucket do not change and only future requests behavior changes.
  • An object with version ID null is added for each new object addition.
  • For each object addition with the same key name, the object with the version ID null is overwritten.
  • An object retrieval request will always return the current version of the object.
  • A DELETE request on the bucket would permanently delete the version ID null object and inserts a Delete Marker
  • A DELETE request does not delete anything if the bucket does not have an object with version ID null
  • A DELETE request can still be fired with a specific version ID for any previous object with version IDs stored

MFA Delete

  • Additional security can be enabled by configuring a bucket to enable MFA (Multi-Factor Authentication) for the deletion of objects.
  • MFA Delete enabled, requires additional authentication for operations
    • Changing the versioning state of the bucket
    • Permanently deleting an object version
  • MFA Delete can be enabled on a bucket to ensure that data in the bucket cannot be accidentally deleted
  • While the bucket owner, the AWS account that created the bucket (root account), and all authorized IAM users can enable versioning, but only the bucket owner (root account) can enable MFA Delete.
  • MFA Delete however does not prevent deletion or allow restoration.
  • MFA Delete cannot be enabled using the AWS Management Console. You must use the AWS Command Line Interface (AWS CLI) or the API.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which set of Amazon S3 features helps to prevent and recover from accidental data loss?
    1. Object lifecycle and service access logging
    2. Object versioning and Multi-factor authentication
    3. Access controls and server-side encryption
    4. Website hosting and Amazon S3 policies
  2. You use S3 to store critical data for your company Several users within your group currently have full permissions to your S3 buckets. You need to come up with a solution that does not impact your users and also protect against the accidental deletion of objects. Which two options will address this issue? Choose 2 answers
    1. Enable versioning on your S3 Buckets
    2. Configure your S3 Buckets with MFA delete
    3. Create a Bucket policy and only allow read only permissions to all users at the bucket level
    4. Enable object life cycle policies and configure the data older than 3 months to be archived in Glacier
  3. To protect S3 data from both accidental deletion and accidental overwriting, you should
    1. enable S3 versioning on the bucket
    2. access S3 data using only signed URLs
    3. disable S3 delete using an IAM bucket policy
    4. enable S3 Reduced Redundancy Storage
    5. enable Multi-Factor Authentication (MFA) protected access
  4. A user has not enabled versioning on an S3 bucket. What will be the version ID of the object inside that bucket?
    1. 0
    2. There will be no version attached
    3. Null
    4. Blank
  5. A user is trying to find the state of an S3 bucket with respect to versioning. Which of the below mentioned states AWS will not return when queried?
    1. versioning-enabled
    2. versioning-suspended
    3. unversioned
    4. versioned

References

AWS S3 Versioning

AWS CloudFront with S3

CloudFront S3 Origin Access Identity - OAI

AWS CloudFront with S3

  • CloudFront can be used to distribute the content from an S3 bucket.
  • For an RTMP distribution, the S3 bucket is the only supported origin, and custom origins cannot be used
  • Using CloudFront over S3 has the following benefits
    • can be more cost-effective if the objects are frequently accessed as at higher usage, and the price for CloudFront data transfer is much lower than the price for S3 data transfer.
    • downloads are faster with CloudFront than with S3 alone because the objects are stored closer to the users
  • CloudFront provides two ways to send authenticated requests to an S3 origin: Origin Access Control (OAC) and Origin Access Identity (OAI).
  • When using S3 as the origin for distribution and the bucket is moved to a different region, CloudFront can take up to an hour to update its records to include the change of region when both of the following are true:
    • Origin Access Control (OAC) and Origin Access Identity (OAI) are used to restrict access to the bucket.
    • Bucket is moved to an S3 region that requires Signature Version 4 for authentication

Origin Access Identity – OAI

CloudFront S3 Origin Access Identity - OAI

  • S3 origin objects must be granted public read permissions and hence the objects are accessible from both S3 as well as CloudFront.
  • Even though CloudFront does not expose the underlying S3 URL, it can be known to the user if shared directly or used by applications.
  • For using CloudFront signed URLs or signed cookies to provide access to the objects, it would be necessary to prevent users from having direct access to the S3 objects.
  • Users accessing S3 objects directly would
    • bypass the controls provided by CloudFront signed URLs or signed cookies, for e.g., control over the date-time that a user can no longer access the content and the IP addresses can be used to access content
    • CloudFront access logs are less useful because they’re incomplete.
  • Origin Access Identity (OAI) can be used to prevent users from directly accessing objects from S3.
  • Origin access identity, which is a special CloudFront user, can be created and associated with the distribution.
  • S3 bucket/object permissions need to be configured to only provide access to the Origin Access Identity.
  • When users access the object from CloudFront, it uses the OAI to fetch the content on the user’s behalf, while the S3 object’s direct access is restricted

Origin Access Control – OAC

CloudFront Origin Access Control - OAC

  • Origin Access Control – OAC is recommended over Origin Access Identity – OAI and supports
    • Enhanced security practices like short-term credentials, frequent credential rotations, and resource-based policies
    • All S3 buckets in all AWS Regions
    • S3 server-side encryption with AWS KMS (SSE-KMS)
    • Comprehensive HTTP methods support including dynamic requests – OAC supports GET, PUT, POST, PATCH, DELETE, OPTIONS, and HEAD.
  • CloudFront OAC needs to be setup with permissions to access the S3 bucket origin, which can be done after creating a CloudFront distribution, but before adding the OAC to the S3 origin in the distribution configuration.
  • For buckets with objects encrypted using server-side encryption with AWS Key Management Service (SSE-KMS), the OAC must be provided with permission to use the AWS KMS key.

CloudFront with S3 Objects

  • CloudFront can be configured to include custom headers or modify existing headers whenever it forwards a request to the origin, to
    • validate the user is not accessing the origin directly, bypassing CDN
    • identify the CDN from which the request was forwarded, if more than one CloudFront distribution is configured to use the same origin
    • if users use viewers that don’t support CORS, configure CloudFront to forward the Origin header to the origin. That will cause the origin to return the Access-Control-Allow-Origin header for every request

Adding & Updating Objects

  • Objects just need to be added to the Origin and CloudFront would start distributing them when accessed.
  • For objects served by CloudFront, the Origin can be updated either by
    • Overwriting the original object
    • Create a different version and update the links exposed to the user.
  • For updating objects, it is recommended to use versioning e.g. have files or the entire folders with versions, so links can be changed when the objects are updated forcing a refresh.
  • With versioning,
    • there is no wait time for an object to expire before CloudFront begins to serve a new version of it.
    • there is no difference in consistency in the object served from the edge
    • no cost is involved to pay for object invalidation.

Removing/Invalidating Objects

  • Objects, by default, would be removed upon expiry (TTL) and the latest object would be fetched from the Origin
  • Objects can also be removed from the edge cache before it expires
    • File or Object Versioning to serve a different version of the object that has a different name.
    • Invalidate the object from edge caches. For the next request, CloudFront returns to the Origin to fetch the object
  • Object or File Versioning is recommended over Invalidating objects
    • if the objects need to be updated frequently.
    • enables to control which object a request returns even when the user has a version cached either locally or behind a corporate caching proxy.
    • makes it easier to analyze the results of object changes as CloudFront access logs include the names of the objects
    • provides a way to serve different versions to different users.
    • simplifies rolling forward & back between object revisions.
    • is less expensive, as no charges for invalidating objects.
    • for e.g. change header-v1.jpg to header-v2.jpg
  • Invalidating objects from the cache
    • objects in the cache can be invalidated explicitly before they expire to force a refresh
    • allows to invalidate selected objects
    • allows to invalidate multiple objects for e.g. objects in a directory or all of the objects whose names begin with the same characters, you can include the * wildcard at the end of the invalidation path.
    • the user might continue to see the old version until it expires from those caches.
    • A specified number of invalidation paths can be submitted each month for free. Any invalidation requests more than the allotted no. per month, a fee is charged for each submitted invalidation path
    • The First 1,000 invalidation paths requests submitted per month are free; charges apply for each invalidation path over 1,000 in a month.
    • Invalidation path can be for a single object for e.g. /js/ab.js or for multiple objects for e.g. /js/* and is counted as a single request even if the * wildcard request may invalidate thousands of objects.
  • For RTMP distribution, objects served cannot be invalidated

Partial Requests (Range GETs)

  • Partial requests using Range headers in a GET request help to download the object in smaller units, improving the efficiency of partial downloads and the recovery from partially failed transfers.
  • For a partial GET range request, CloudFront
    • checks the cache in the edge location for the requested range or the entire object and if exists, serves it immediately
    • if the requested range does not exist, it forwards the request to the origin and may request a larger range than the client requested to optimize performance
    • if the origin supports range header, it returns the requested object range and CloudFront returns the same to the viewer
    • if the origin does not support range header, it returns the complete object and CloudFront serves the entire object and caches it for future.
    • CloudFront uses the cached entire object to serve any future range GET header requests

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are building a system to distribute confidential training videos to employees. Using CloudFront, what method could be used to serve content that is stored in S3, but not publically accessible from S3 directly?
    1. Create an Origin Access Identity (OAI) for CloudFront and grant access to the objects in your S3 bucket to that OAI.
    2. Add the CloudFront account security group “amazon-cf/amazon-cf-sg” to the appropriate S3 bucket policy.
    3. Create an Identity and Access Management (IAM) User for CloudFront and grant access to the objects in your S3 bucket to that IAM User.
    4. Create an S3 bucket policy that lists the CloudFront distribution ID as the Principal and the target bucket as the Amazon Resource Name (ARN).

References

CloudFront Restrict Access to S3