AWS CloudFront with S3

AWS CloudFront with S3

  • CloudFront can be used to distribute the content from an S3 bucket
  • For an RTMP distribution, the S3 bucket is the only supported origin, and custom origins cannot be used
  • Using CloudFront over S3 has the following benefits
    • can be more cost-effective if the objects are frequently accessed as at higher usage, the price for CloudFront data transfer is much lower than the price for S3 data transfer.
    • downloads are faster with CloudFront than with S3 alone because the objects are stored closer to the users
  • When using S3 as the origin for distribution and the bucket is moved to a different region, CloudFront can take up to an hour to update its records to include the change of region when both of the following are true:
    • Origin Access Identity (OAI) is used to restrict access to the bucket
    • Bucket is moved to an S3 region that requires Signature Version 4 for authentication

Origin Access Identity

CloudFront S3 Origin Access Identity - OAI

  • S3 origin objects must be granted public read permissions and hence the objects are accessible from both S3 as well as CloudFront.
  • Even though CloudFront does not expose the underlying S3 URL, it can be known to the user if shared directly or used by applications.
  • For using CloudFront signed URLs or signed cookies to provide access to the objects, it would be necessary to prevent users from having direct access to the S3 objects.
  • Users accessing S3 objects directly would
    • bypass the controls provided by CloudFront signed URLs or signed cookies, for e.g., control over the date-time that a user can no longer access the content and the IP addresses can be used to access content
    • CloudFront access logs are less useful because they’re incomplete.
  • Origin Access Identity (OAI) can be used to prevent users from directly accessing objects from S3.
  • Origin access identity, which is a special CloudFront user, can be created and associated with the distribution.
  • S3 bucket/object permissions need to be configured to only provide access to the Origin Access Identity.
  • When users access the object from CloudFront, it uses the OAI to fetch the content on the user’s behalf, while the S3 object’s direct access is restricted

CloudFront with S3 Objects

  • CloudFront can be configured to include custom headers or modify existing headers whenever it forwards a request to the origin, to
    • validate the user is not accessing the origin directly, bypassing CDN
    • identify the CDN from which the request was forwarded, if more than one CloudFront distribution is configured to use the same origin
    • if users use viewers that don’t support CORS, configure CloudFront to forward the Origin header to the origin. That will cause the origin to return the Access-Control-Allow-Origin header for every request

Adding & Updating Objects

  • Objects just need to be added to the Origin and CloudFront would start distributing them when accessed
  • Objects served by CloudFront the Origin, can be updated either by
    • Overwriting the Original object
    • Create a different version and updating the links exposed to the user
  • For updating objects, its recommended to use versioning for e.g. have files or the entire folders with versions, so the the links can be changed when the objects are updated forcing a refresh
  • With versioning,
    • there is no time wait for an object to expire before CloudFront begins to serve a new version of it
    • there is no difference in consistency in the object served from the edge
    • no cost involved to pay for object invalidation.

Removing/Invalidating Objects

  • Objects, by default, would be removed upon expiry (TTL) and the latest object would be fetched from the Origin
  • Objects can also be removed from the edge cache before it expires
    • File or Object Versioning to serve a different version of the object that has a different name.
    • Invalidate the object from edge caches. For the next request, CloudFront returns to the Origin to fetch the object
  • Object or File Versioning is recommended over Invalidating objects
    • if the objects need to be updated frequently.
    • enables to control which object a request returns even when the user has a version cached either locally or behind a corporate caching proxy.
    • makes it easier to analyze the results of object changes as CloudFront access logs include the names of the objects
    • provides a way to serve different versions to different users.
    • simplifies rolling forward & back between object revisions.
    • is less expensive, as no charges for invalidating objects.
    • for e.g. change header-v1.jpg to header-v2.jpg
  • Invalidating objects from the cache
    • objects in the cache can be invalidated explicitly before they expire to force a refresh
    • allows to invalidate selected objects
    • allows to invalidate multiple objects for e.g. objects in a directory or all of the objects whose names begin with the same characters, you can include the * wildcard at the end of the invalidation path.
    • the user might continue to see the old version until it expires from those caches.
    • A specified number of invalidation paths can be submitted each month for free. Any invalidation requests more than the allotted no. per month, a fee is charged for each submitted invalidation path
    • The First 1,000 invalidation paths requests submitted per month are free; charges apply for each invalidation path over 1,000 in a month.
    • Invalidation path can be for a single object for e.g. /js/ab.js or for multiple objects for e.g. /js/* and is counted as a single request even if the * wildcard request may invalidate thousands of objects.
  • For RTMP distribution, objects served cannot be invalidated

Partial Requests (Range GETs)

  • Partial requests using Range headers in a GET request helps to download the object in smaller units, improving the efficiency of partial downloads and the recovery from partially failed transfers.
  • For a partial GET range request, CloudFront
    • checks the cache in the edge location for the requested range or the entire object and if exists, serves it immediately
    • if the requested range does not exist, it forwards the request to the origin and may request a larger range than the client requested to optimize performance
    • if the origin supports range header, it returns the requested object range and CloudFront returns the same to the viewer
    • if the origin does not support range header, it returns the complete object and CloudFront serves the entire object and caches it for future.
    • CloudFront uses the cached entire object to serve any future range GET header requests

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are building a system to distribute confidential training videos to employees. Using CloudFront, what method could be used to serve content that is stored in S3, but not publically accessible from S3 directly?
    1. Create an Origin Access Identity (OAI) for CloudFront and grant access to the objects in your S3 bucket to that OAI.
    2. Add the CloudFront account security group “amazon-cf/amazon-cf-sg” to the appropriate S3 bucket policy.
    3. Create an Identity and Access Management (IAM) User for CloudFront and grant access to the objects in your S3 bucket to that IAM User.
    4. Create an S3 bucket policy that lists the CloudFront distribution ID as the Principal and the target bucket as the Amazon Resource Name (ARN).