AWS Storage Gateway

AWS Storage Gateway

  • AWS Storage Gateway connects on-premises software appliances with cloud-based storage to provide seamless integration with data security features between on-premises and the AWS storage infrastructure.
  • AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
  • Storage Gateway allows storage of data in the AWS cloud for scalable and cost-effective storage while maintaining data security.
  • Storage Gateway can run either on-premises, as a VM appliance, or in AWS, as an EC2 instance. So if the on-premises data center goes offline and there is no available host, the gateway can be deployed on an EC2 instance.
  • Gateways hosted on EC2 instances can be used for disaster recovery, data mirroring, and providing storage for applications hosted on EC2
  • Storage Gateway, by default, uploads data using SSL and provides data encryption at rest when stored in S3 or Glacier using AES-256
  • Storage Gateway performs encryption of data-in-transit and at-rest.
  • Storage Gateway offers multiple types
    • File Gateway
    • Volume-based Gateway
    • Tape-based

S3 File Gateway

  • supports a file interface into S3 and combines service and a virtual software appliance.
  • allows storing and retrieving of objects in S3 using industry-standard file protocols such as NFS and SMB.
  • Software appliance, or gateway, is deployed into the on-premises environment as a VM running on VMware ESXi or Microsoft Hyper-V hypervisor.
  • provides access to objects in S3 as files or file share mount points. It can be considered as a file system mount on S3.
  • durably stores POSIX-style metadata, including ownership, permissions, and timestamps in S3 as object user metadata associated with the file.
  • provides a cost-effective alternative to on-premises storage.
  • provides low-latency access to data through transparent local caching.
  • manages data transfer to and from AWS, buffers applications from network congestion, optimizes and streams data in parallel, and manages bandwidth consumption.
  • easily integrates with services like IAM, KMS, CloudWatch, CloudTrail, etc.
  • File Gateway allows you to
    • store and retrieve files directly using the NFS version 3 or 4.1 protocol.
    • store and retrieve files directly using the SMB file system version, 2 and 3 protocol.
    • access the data directly in S3 from any AWS Cloud application or service.
    • manage S3 data using lifecycle policies, cross-region replication, and versioning.

Volume Gateways

  • Volume gateways provide cloud-backed storage volumes that can be mounted as Internet Small Computer System Interface (iSCSI) devices from the on-premises application servers.
  • all data is securely stored in AWS, the approach differs from how much data is stored on-premises.
  • exposes compatible iSCSI interface on the front end to easily integrate with existing backup applications and represents another disk drive
  • backs up the data incrementally by taking snapshots which are stored as EBS snapshots in S3. These snapshots can be restored as gateway storage volume or used to create EBS volumes to be attached to an EC2 instance

Gateway Cached Volumes

Storage Gateway Cached Volume
  • Gateway Cached Volumes store data in S3, which acts as a primary data storage, and retains a copy of recently read data locally for low latency access to the frequently accessed data
  • Gateway-cached volumes offer substantial cost savings on primary storage and minimize the need to scale the storage on-premises.
  • All gateway-cached volume data and snapshot data are stored in S3 encrypted at rest using server-side encryption (SSE) and it cannot be accessed with S3 API or any other tools.
  • Each gateway configured for gateway-cached volumes can support up to 32 volumes, with each volume ranging from 1GiB to 32TiB, for a total maximum storage volume of 1,024 TiB (1 PiB).
  • Gateway VM can be allocated disks
    • Cache storage
      • Cache storage acts as the on-premises durable storage, stores the data before uploading it to S3
      • Cache storage also stores recently read data for low-latency access
    • Upload buffer
      • Upload buffer acts as a staging area before the data is uploaded to S3
      • Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in S3

Gateway Stored Volumes

Storage Gateway Stored Volume
  • Gateway stored volumes maintain the entire data set locally to provide low-latency access.
  • Gateway asynchronously backs up point-in-time snapshots (in the form of EBS snapshots) of the data to S3 which provides durable off-site backups
  • Gateway stored volume configuration provides durable and inexpensive off-site backups that you can recover to your local data center or EC2 for e.g., if you need replacement capacity for disaster recovery, you can recover the backups to EC2.
  • Each gateway configured for gateway-stored volumes can support up to 12 32 volumes, ranging from 1GiB to 16TiB, and total volume storage of 192 TiB 512 TiB
  • Gateway VM can be allocated disks
    • Volume Storage
      • For storing the actual data
      • Can be mapped to on-premises direct-attached storage (DAS) or storage area network (SAN) disks
    • Upload buffer
      • Upload buffer acts as a staging area before the data is uploaded to S3
      • Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in Amazon S3

Tape Gateway – Gateway-Virtual Tape Library (VTL)

Storage Gateway VTL
  • Tape Gateway offers a durable, cost-effective data archival solution.
  • VTL interface can help leverage existing tape-based backup application infrastructure to store data on virtual tape cartridges created on the tape gateway.
  • Each Tape Gateway is preconfigured with a media changer and tape drives, which are available to the existing client backup applications as iSCSI devices. Tape cartridges can be added as needed to archive the data.
  • Gateway-VTL provides a virtual tape infrastructure that scales seamlessly with the business needs and eliminates the operational burden of provisioning, scaling, and maintaining a physical tape infrastructure.
  • Gateway VTL has the following components:-
    • Virtual Tape
      • Virtual tape is similar to the physical tape cartridge, except that the data is stored in the AWS storage solution
      • Each gateway can contain 1500 tapes or up to 150 TiB 1 PiB of total tape data, with each tape ranging from 100 GiB to 2.5 TiB
    • Virtual Tape Library
      • Virtual tape library is similar to the physical tape library with tape drives (replaced with VTL tape drive) and robotic arms (replaced with Media changer)
      • Tapes in the Virtual tape library are backup in S3
      • Backup software writes data to the gateway, the gateway stores data locally, and then asynchronously uploads it to virtual tapes in S3.
    • Archive OR Virtual Tape Shelf
      • Virtual tape shelf is similar to the offsite tape holding facility
      • Tapes in the Virtual tape library are backup in Glacier providing an extremely low-cost storage service for data archiving and backup
      • VTS is located in the same region where the gateway was created and every region would have a single VTS irrespective of the number of gateways
      • Archiving tapes
        • When the backup software ejects a tape, the gateway moves the tape to the VTS for long term storage
      • Retrieving tapes
        • Tape can be retrieved from VTS only by first retrieving the tapes first to VTL and would be available in the VTL in about 24 hours
  • Gateway VM can be allocated disks for
    • Cache storage
      • Cache storage acts as the on-premises durable storage, stores the data before uploading it to S3.
      • Cache storage also stores recently read data for low-latency access
    • Upload buffer
      • Upload buffer acts as a staging area before the data is uploaded to the Virtual tape.
      • Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in S3.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following services natively encrypts data at rest within an AWS region? Choose 2 answers
    1. AWS Storage Gateway
    2. Amazon DynamoDB
    3. Amazon CloudFront
    4. Amazon Glacier
    5. Amazon Simple Queue Service
  2. What does the AWS Storage Gateway provide?
    1. It allows to integrate on-premises IT environments with Cloud Storage
    2. A direct encrypted connection to Amazon S3.
    3. It’s a backup solution that provides an on-premises Cloud storage.
    4. It provides an encrypted SSL endpoint for backups in the Cloud.
  3. You’re running an application on-premises due to its dependency on non-x86 hardware and want to use AWS for data backup. Your backup application is only able to write to POSIX-compatible block-based storage. You have 140TB of data and would like to mount it as a single folder on your file server. Users must be able to access portions of this data while the backups are taking place. What backup solution would be most appropriate for this use case?
    1. Use Storage Gateway and configure it to use Gateway Cached volumes.
    2. Configure your backup software to use S3 as the target for your data backups.
    3. Configure your backup software to use Glacier as the target for your data backups
    4. Use Storage Gateway and configure it to use Gateway Stored volumes (Data is hosted on the On-premise server as well. The requirement for 140TB is for file server On-Premise more to confuse and not in AWS. Just need a backup solution hence stored instead of cached volumes)
  4. A customer has a single 3-TB volume on-premises that is used to hold a large repository of images and print layout files. This repository is growing at 500 GB a year and must be presented as a single logical volume. The customer is becoming increasingly constrained with their local storage capacity and wants an off-site backup of this data, while maintaining low-latency access to their frequently accessed data. Which AWS Storage Gateway configuration meets the customer requirements?
    1. Gateway-Cached volumes with snapshots scheduled to Amazon S3
    2. Gateway-Stored volumes with snapshots scheduled to Amazon S3
    3. Gateway-Virtual Tape Library with snapshots to Amazon S3
    4. Gateway-Virtual Tape Library with snapshots to Amazon Glacier
  5. You have a proprietary data store on-premises that must be backed up daily by dumping the data store contents to a single compressed 50GB file and sending the file to AWS. Your SLAs state that any dump file backed up within the past 7 days can be retrieved within 2 hours. Your compliance department has stated that all data must be held indefinitely. The time required to restore the data store from a backup is approximately 1 hour. Your on-premise network connection is capable of sustaining 1gbps to AWS. Which backup methods to AWS would be most cost-effective while still meeting all of your requirements?
    1. Send the daily backup files to Glacier immediately after being generated (will not meet the RTO)
    2. Transfer the daily backup files to an EBS volume in AWS and take daily snapshots of the volume (Not cost effective)
    3. Transfer the daily backup files to S3 and use appropriate bucket lifecycle policies to send to Glacier (Store in S3 for seven days and then archive to Glacier)
    4. Host the backup files on a Storage Gateway with Gateway-Cached Volumes and take daily snapshots (Not Cost effective as local storage as well as S3 storage)
  6. A customer implemented AWS Storage Gateway with a gateway-cached volume at their main office. An event takes the link between the main and branch office offline. Which methods will enable the branch office to access their data? Choose 3 answers
    1. Use a HTTPS GET to the Amazon S3 bucket where the files are located (gateway volumes are only accessible from the AWS Storage Gateway and cannot be directly accessed using Amazon S3 APIs)
    2. Restore by implementing a lifecycle policy on the Amazon S3 bucket.
    3. Make an Amazon Glacier Restore API call to load the files into another Amazon S3 bucket within four to six hours.
    4. Launch a new AWS Storage Gateway instance AMI in Amazon EC2, and restore from a gateway snapshot
    5. Create an Amazon EBS volume from a gateway snapshot, and mount it to an Amazon EC2 instance.
    6. Launch an AWS Storage Gateway virtual iSCSI device at the branch office, and restore from a gateway snapshot
  7. A company uses on-premises servers to host its applications. The company is running out of storage capacity. The applications use
    both block storage and NFS storage. The company needs a high-performing solution that supports local caching without rearchitecting
    its existing applications.Which combination of actions should a solutions architect take to meet these requirements? (Choose two.)
    1. Mount Amazon S3 as a file system to the on-premises servers.
    2. Deploy an AWS Storage Gateway file gateway to replace NFS storage.
    3. Deploy AWS Snowball Edge to provision NFS mounts to on-premises servers.
    4. Deploy an AWS Storage Gateway volume gateway to replace the block storage.
    5. Deploy Amazon Elastic File System (Amazon EFS) volumes and mount them to on-premises servers.

References

  1. AWS_Storage_Gateway_User_Guide
https://www.youtube.com/watch?v=AkehuRl5YPg

Storage Options Whitepaper – Storage Gateway – Import/Export – AWS Certification

AWS Storage Options Whitepaper cont.

Provides a brief summary for the Ideal Use cases and Anti-Patterns for Storage Gateway and Import/Export AWS storage options

AWS Storage Gateway

  • Storage Gateway is a service that connects an on-premises software appliance with cloud-based storage to provide seamless and secure integration between the organization’s on-premises IT environment and AWS’s storage infrastructure.
  • Storage Gateway enables store data securely to the AWS cloud for scalable and cost-effective storage.
  • It provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of your data encrypted in S3.
  • For disaster recovery scenarios, it can serve as a cloud-hosted solution, together with EC2, that mirrors your entire production environment.
  • Storage Gateway can be configured as
    • Gateway-cached volumes
      • Gateway-cached volumes utilizes S3 for primary data backup, while retaining frequently accessed data locally in a cache.
      • These volumes minimize the need to scale the on-premises storage infrastructure, while still providing applications with low-latency access to their frequently accessed data.
      • Data written to the volumes is stored in S3, with only a cache of recently written and recently read data is stored locally on the on-premises storage hardware.
    • Gateway-stored volumes
      • Gateway-stored volumes stores the complete primary data locally, while asynchronously backing up that data to AWS.
      • These volumes provide the on-premises applications with low-latency access to their entire datasets, while providing durable, off-site backups.
      • Data written to the gateway-stored volumes is stored on the on-premises storage hardware, and asynchronously backed up to S3 in the form of EBS snapshots.

Ideal Usage Patterns

  • AWS Storage Gateway use cases include
    • corporate file sharing,
    • enabling existing on-premises backup applications to store primary backups on S3,
    • disaster recovery, and
    • data mirroring to cloud-based compute resources.

Anti-Patterns

  • Database storage
    • For Database backup or storage, EC2 instances using EBS volumes are a natural choice for database storage and workloads.

Performance

  • As the Storage Gateway VM sits between the application, underlying on-premises storage and S3, the performance experienced will be dependent upon a number of factors, including the speed and configuration of the underlying local disks, the network bandwidth between the iSCSI initiator and gateway VM, the amount of local storage allocated to the gateway VM, and the bandwidth between the gateway VM and S3.
  • For gateway-cached volumes, to provide low-latency read access to the on-premises applications, it’s important to provide enough local cache storage to store the recently accessed data.
  • Storage Gateway efficiently uses the Internet bandwidth to speed up the upload of on-premises application data to AWS.
  • Storage Gateway only uploads incremental changes (data that has changed), which minimizes the amount of data sent over the Internet.
  • AWS Direct Connect can be used to further increase throughput and reduce the network costs by establishing a dedicated network connection between the on-premises gateway and AWS.

Durability and Availability

  • AWS Storage Gateway durably stores on-premises application data by uploading it to S3.
  • S3 stores data in multiple facilities and on multiple devices within each facility.
  • S3 also performs regular, systematic data integrity checks and is built to be automatically self-healing.

Cost Model

  • AWS Storage Gateway has four pricing components:
    • gateway usage (per gateway per month),
    • snapshot storage usage (per GB per month),
    • volume storage usage (per GB per month), and
    • data transfer out (per GB per month).

Scalability and Elasticity

  • AWS Storage Gateway stores data in Amazon S3, which has been designed to offer a very high level of scalability and elasticity automatically.

Interfaces

  • AWS Management Console can be used to download the AWS Storage Gateway VM image, select between a gateway-cached or gateway-stored configuration, activate the on-premises by associating the gateway’s IP Address with your AWS account, select an AWS region, and create AWS Storage Gateway volumes and attach these volumes as iSCSI devices to your on-premises application servers.

AWS Import/Export (Upgraded to Snowball)

  • AWS Import/Export accelerates moving large amounts of data into and out of AWS using portable storage devices for transport.
  • AWS transfers the data directly onto and off of storage devices using Amazon’s high-speed internal network and bypassing the Internet and can be much faster and more cost effective than upgrading connectivity.
  • AWS Import/Export supports importing into several types of AWS storage, including EBS snapshots, S3 buckets, and Glacier vaults and exporting data from S3.

Ideal Usage Patterns

  • AWS Import/Export is ideal for transferring large amounts of data in and out of the AWS cloud, especially in cases where transferring the data over the Internet would be too slow (a week or more) or too costly.
  • Common use cases include
    • initial data upload to AWS,
    • content distribution or regular data interchange to/from your customers or business associates,
    • transfer to Amazon S3 or Amazon Glacier for off-site backup and archival storage, and quick retrieval of large backups from Amazon S3 or Amazon Glacier for disaster recovery.

Anti-Patterns

  • AWS Import/Export may not be the ideal solution for data that is more easily transferred over the Internet in less than one week.

Performance

  • Each AWS Import/Export station is capable of loading data at over 100 MB per second
  • Rate of the data load will be bounded by a combination of the read or write speed of the portable storage device and, for Amazon S3 data loads, the average object (file) size.

Durability and Availability

  • Durability and availability characteristics of the target storage i.e. EBS, S3 or Glacier applies, after the data has been imported

Cost Model

  • AWS Import/Export has three pricing components: a per-device fee, a data load time charge (per data-loading-hour), and possible return shipping charges (for expedited shipping, or shipping to destinations not local to that AWS Import/Export region).
  • Storage pricing applies for the destination storage, the standard Amazon EBS snapshot, Amazon S3, and Amazon Glacier request and storage pricing applies.

Scalability and Elasticity

  • Total amount of data you can load using AWS Import/Export is limited only by the capacity of the devices sent to AWS.
  • For Amazon S3, individual files will be loaded as objects in Amazon S3, and may range up to 5 terabytes in size.
  • For Amazon Glacier, individual devices will be loaded as a single archive, and may range up to 4 terabytes in size.
  • Aggregate total amount of data that can be imported is virtually unlimited.

Interfaces

  • To upload or download data, AWS Import/Export job for each storage device shipped need to be created and submitted
  • Jobs can be created using AWS CLI, AWS SDK or native REST API
  • Each job request requires a manifest file, a YAML-formatted text file that contains a set of key-value pairs that supply the required information—such as your device ID, secret access key, and return address—necessary to complete the job.
  • Job request is tied to the storage device through a signature file in the root directory (for Amazon S3 import jobs), or by a barcode taped to the device (for Amazon EBS and Amazon Glacier jobs).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are working with a customer who has 10 TB of archival data that they want to migrate to Amazon Glacier. The customer has a 1-Mbps connection to the Internet. Which service or feature provides the fastest method of getting the data into Amazon Glacier?
    1. Amazon Glacier multipart upload
    2. AWS Storage Gateway
    3. VM Import/Export
    4. AWS Import/Export