AWS Storage Gateway connects on-premises software appliances with cloud-based storage to provide seamless integration with data security features between on-premises and the AWS storage infrastructure.
AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
Storage Gateway allows storage of data in the AWS cloud for scalable and cost-effective storage while maintaining data security.
Storage Gateway can run either on-premises, as a VM appliance, or in AWS, as an EC2 instance. So if the on-premises data center goes offline and there is no available host, the gateway can be deployed on an EC2 instance.
Gateways hosted on EC2 instances can be used for disaster recovery, data mirroring, and providing storage for applications hosted on EC2
Storage Gateway, by default, uploads data using SSL and provides data encryption at rest when stored in S3 or Glacier using AES-256
Storage Gateway performs encryption of data-in-transit and at-rest.
Storage Gateway offers multiple types
File Gateway
Volume-based Gateway
Tape-based
S3 File Gateway
supports a file interface into S3 and combines service and a virtual software appliance.
allows storing and retrieving of objects in S3 using industry-standard file protocols such as NFS and SMB.
Software appliance, or gateway, is deployed into the on-premises environment as a VM running on VMware ESXi or Microsoft Hyper-V hypervisor.
provides access to objects in S3 as files or file share mount points. It can be considered as a file system mount on S3.
durably stores POSIX-style metadata, including ownership, permissions, and timestamps in S3 as object user metadata associated with the file.
provides a cost-effective alternative to on-premises storage.
provides low-latency access to data through transparent local caching.
manages data transfer to and from AWS, buffers applications from network congestion, optimizes and streams data in parallel, and manages bandwidth consumption.
store and retrieve files directly using the NFS version 3 or 4.1 protocol.
store and retrieve files directly using the SMB file system version, 2 and 3 protocol.
access the data directly in S3 from any AWS Cloud application or service.
manage S3 data using lifecycle policies, cross-region replication, and versioning.
Volume Gateways
Volume gateways provide cloud-backed storage volumes that can be mounted as Internet Small Computer System Interface (iSCSI) devices from the on-premises application servers.
all data is securely stored in AWS, the approach differs from how much data is stored on-premises.
exposes compatible iSCSI interface on the front end to easily integrate with existing backup applications and represents another disk drive
backs up the data incrementally by taking snapshots which are stored as EBS snapshots in S3. These snapshots can be restored as gateway storage volume or used to create EBS volumes to be attached to an EC2 instance
Gateway Cached Volumes
Gateway Cached Volumes store data in S3, which acts as a primary data storage, and retains a copy of recently read data locally for low latency access to the frequently accessed data
Gateway-cached volumes offer substantial cost savings on primary storage and minimize the need to scale the storage on-premises.
All gateway-cached volume data and snapshot data are stored in S3 encrypted at rest using server-side encryption (SSE) and it cannot be accessed with S3 API or any other tools.
Each gateway configured for gateway-cached volumes can support up to 32 volumes, with each volume ranging from 1GiB to 32TiB, for a total maximum storage volume of 1,024 TiB (1 PiB).
Gateway VM can be allocated disks
Cache storage
Cache storage acts as the on-premises durable storage, stores the data before uploading it to S3
Cache storage also stores recently read data for low-latency access
Upload buffer
Upload buffer acts as a staging area before the data is uploaded to S3
Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in S3
Gateway Stored Volumes
Gateway stored volumes maintain the entire data set locally to provide low-latency access.
Gateway asynchronously backs up point-in-time snapshots (in the form of EBS snapshots) of the data to S3 which provides durable off-site backups
Gateway stored volume configuration provides durable and inexpensive off-site backups that you can recover to your local data center or EC2 for e.g., if you need replacement capacity for disaster recovery, you can recover the backups to EC2.
Each gateway configured for gateway-stored volumes can support up to 12 32 volumes, ranging from 1GiB to 16TiB, and total volume storage of 192 TiB 512 TiB
Gateway VM can be allocated disks
Volume Storage
For storing the actual data
Can be mapped to on-premises direct-attached storage (DAS) or storage area network (SAN) disks
Upload buffer
Upload buffer acts as a staging area before the data is uploaded to S3
Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in Amazon S3
Tape Gateway – Gateway-Virtual Tape Library (VTL)
Tape Gateway offers a durable, cost-effective data archival solution.
VTL interface can help leverage existing tape-based backup application infrastructure to store data on virtual tape cartridges created on the tape gateway.
Each Tape Gateway is preconfigured with a media changer and tape drives, which are available to the existing client backup applications as iSCSI devices. Tape cartridges can be added as needed to archive the data.
Gateway-VTL provides a virtual tape infrastructure that scales seamlessly with the business needs and eliminates the operational burden of provisioning, scaling, and maintaining a physical tape infrastructure.
Gateway VTL has the following components:-
Virtual Tape
Virtual tape is similar to the physical tape cartridge, except that the data is stored in the AWS storage solution
Each gateway can contain 1500 tapes or up to 150 TiB 1 PiB of total tape data, with each tape ranging from 100 GiB to 2.5 TiB
Virtual Tape Library
Virtual tape library is similar to the physical tape library with tape drives (replaced with VTL tape drive) and robotic arms (replaced with Media changer)
Tapes in the Virtual tape library are backup in S3
Backup software writes data to the gateway, the gateway stores data locally, and then asynchronously uploads it to virtual tapes in S3.
Archive OR Virtual Tape Shelf
Virtual tape shelf is similar to the offsite tape holding facility
Tapes in the Virtual tape library are backup in Glacier providing an extremely low-cost storage service for data archiving and backup
VTS is located in the same region where the gateway was created and every region would have a single VTS irrespective of the number of gateways
Archiving tapes
When the backup software ejects a tape, the gateway moves the tape to the VTS for long term storage
Retrieving tapes
Tape can be retrieved from VTS only by first retrieving the tapes first to VTL and would be available in the VTL in about 24 hours
Gateway VM can be allocated disks for
Cache storage
Cache storage acts as the on-premises durable storage, stores the data before uploading it to S3.
Cache storage also stores recently read data for low-latency access
Upload buffer
Upload buffer acts as a staging area before the data is uploaded to the Virtual tape.
Gateway uploads data over an encrypted Secure Sockets Layer (SSL) connection to AWS, where it is stored encrypted in S3.
AWS Certification Exam Practice Questions
Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
Open to further feedback, discussion and correction.
Which of the following services natively encrypts data at rest within an AWS region? Choose 2 answers
AWS Storage Gateway
Amazon DynamoDB
Amazon CloudFront
Amazon Glacier
Amazon Simple Queue Service
What does the AWS Storage Gateway provide?
It allows to integrate on-premises IT environments with Cloud Storage
A direct encrypted connection to Amazon S3.
It’s a backup solution that provides an on-premises Cloud storage.
It provides an encrypted SSL endpoint for backups in the Cloud.
You’re running an application on-premises due to its dependency on non-x86 hardware and want to use AWS for data backup. Your backup application is only able to write to POSIX-compatible block-based storage. You have 140TB of data and would like to mount it as a single folder on your file server. Users must be able to access portions of this data while the backups are taking place. What backup solution would be most appropriate for this use case?
Use Storage Gateway and configure it to use Gateway Cached volumes.
Configure your backup software to use S3 as the target for your data backups.
Configure your backup software to use Glacier as the target for your data backups
Use Storage Gateway and configure it to use Gateway Stored volumes (Data is hosted on the On-premise server as well. The requirement for 140TB is for file server On-Premise more to confuse and not in AWS. Just need a backup solution hence stored instead of cached volumes)
A customer has a single 3-TB volume on-premises that is used to hold a large repository of images and print layout files. This repository is growing at 500 GB a year and must be presented as a single logical volume. The customer is becoming increasingly constrained with their local storage capacity and wants an off-site backup of this data, while maintaining low-latency access to their frequently accessed data. Which AWS Storage Gateway configuration meets the customer requirements?
Gateway-Cached volumes with snapshots scheduled to Amazon S3
Gateway-Stored volumes with snapshots scheduled to Amazon S3
Gateway-Virtual Tape Library with snapshots to Amazon S3
Gateway-Virtual Tape Library with snapshots to Amazon Glacier
You have a proprietary data store on-premises that must be backed up daily by dumping the data store contents to a single compressed 50GB file and sending the file to AWS. Your SLAs state that any dump file backed up within the past 7 days can be retrieved within 2 hours. Your compliance department has stated that all data must be held indefinitely. The time required to restore the data store from a backup is approximately 1 hour. Your on-premise network connection is capable of sustaining 1gbps to AWS. Which backup methods to AWS would be most cost-effective while still meeting all of your requirements?
Send the daily backup files to Glacier immediately after being generated (will not meet the RTO)
Transfer the daily backup files to an EBS volume in AWS and take daily snapshots of the volume (Not cost effective)
Transfer the daily backup files to S3 and use appropriate bucket lifecycle policies to send to Glacier (Store in S3 for seven days and then archive to Glacier)
Host the backup files on a Storage Gateway with Gateway-Cached Volumes and take daily snapshots (Not Cost effective as local storage as well as S3 storage)
A customer implemented AWS Storage Gateway with a gateway-cached volume at their main office. An event takes the link between the main and branch office offline. Which methods will enable the branch office to access their data? Choose 3 answers
Use a HTTPS GET to the Amazon S3 bucket where the files are located (gateway volumes are only accessible from the AWS Storage Gateway and cannot be directly accessed using Amazon S3 APIs)
Restore by implementing a lifecycle policy on the Amazon S3 bucket.
Make an Amazon Glacier Restore API call to load the files into another Amazon S3 bucket within four to six hours.
Launch a new AWS Storage Gateway instance AMI in Amazon EC2, and restore from a gateway snapshot
Create an Amazon EBS volume from a gateway snapshot, and mount it to an Amazon EC2 instance.
Launch an AWS Storage Gateway virtual iSCSI device at the branch office, and restore from a gateway snapshot
A company uses on-premises servers to host its applications. The company is running out of storage capacity. The applications use both block storage and NFS storage. The company needs a high-performing solution that supports local caching without rearchitecting its existing applications.Which combination of actions should a solutions architect take to meet these requirements? (Choose two.)
Mount Amazon S3 as a file system to the on-premises servers.
Deploy an AWS Storage Gateway file gateway to replace NFS storage.
Deploy AWS Snowball Edge to provision NFS mounts to on-premises servers.
Deploy an AWS Storage Gateway volume gateway to replace the block storage.
Deploy Amazon Elastic File System (Amazon EFS) volumes and mount them to on-premises servers.
Q3 # Looking for to store Blocks storage. But if you use storage gateway it will store the data in S3 which is used for object storage. And in the question mentioned 140 Tb mounted as single folder. In that case storage gateway is not the right solution based on the below limitations
Each gateway-cached volume can store up to 32 TB of data. Data written to the volume is cached on your on-premises hardware and asynchronously uploaded to AWS for durable storage.
Each gateway-stored volume can store up to 16 TB of data. Data written to the volume is stored on your on-premises hardware and asynchronously backed up to AWS for point-in-time snapshots.
Please correct me if I am wrong.
Hi Senthil, good catch. S3 is object storage and isn’t meant to serve as a standalone, POSIX-compliant file system so glacier seems to be the only option.
But the only thing unclear is the ability to be able to access data while the backup is happening.
block-based storage = EBS, or in this case storage GW
mount “it” as a single folder, here “it” I guess means “storage”, in other words, it means “mount the stoage as a single folder” = this is still referred to storage gateway
finally, accessing data while backup = this meets the feature of snapshot, b/c while snapshot of an AWS EBS is taking place, that data is accessible. actually storage GW use EBS snapshot mechanism in the background when backup data.
any comments?
Just curious, size of volume aside (lets say volume is 5TB)… the question states that the application is running on non x86 HW. Let’s say its SPARC/Solaris. Doesnt storage gateway present only x86 linux support? If so, then there would be no answer that would work.
For Q#3
From documentation
Gateway-stored volumes can range from 1 GiB to 16 TiB in size and must be rounded to the nearest GiB. Each gateway configured for gateway-stored volumes can support up to 32 volumes and a total volume storage of 512 TiB (0.5 PiB).
That will prevent from mounting 140TB of data as a single folder
Gateway-cached volumes can range from 1 GiB to 32 TiB in size and must be rounded to the nearest GiB. Each gateway configured for gateway-cached volumes can support up to 32 volumes for a total maximum storage volume of 1,024 TiB (1 PiB).
I am thinking it must be A then??
And even
C is definitely not correct.
“Users must be able to access portions of this data while the backups are taking place.”
To glacier? It takes 3 – 4 hours to retrieve data.?
Q5. With a 1gbps connection it takes only 10 minutes or so to download a 50GB file (even assuming only 50% line utilization). Therefore the simpler c) satisfies this RTO in a more cost-effective way, and the combination of plain S3 and Glacier handles the compliance adequately. Do you agree?
Agreed, actually the confusing line in the question is actually “The time required to restore the data store from a backup is approximately 1 hour”, which is the time it takes to get the data from the backup and not the RTO which is 2 hours only for the 7 days objects and just need to store the data indefinitely.
Expedited retrievals allow you to quickly access your data when occasional urgent requests for a subset of archives are required. For all but the largest archives (250MB+), data accessed using Expedited retrievals are typically made available within 1 – 5 minutes. There are two types of Expedited retrievals: On-Demand and Provisioned. On-Demand requests are like EC2 On-Demand instances and are available the vast majority of the time. Provisioned requests are guaranteed to be available when you need them.
Yup, its the latest enchancement with Glacier where you can do expedited retrievals. Not sure if the AWS exams do keep up the pace with the latest enhancements.
We are using Volume Cached Storage Gateway .We have 32TB snapshots and we need to restore now..I ddint find any direct documentation for restoring this..Any inputs will be appreciated.
Hi Raj, aren’t these EBS snapshots can be directly used to create volumes and attach to an EC2 instance ?
I dont think any EC2 instances supports EBS Volumes of size more than 16TB..I tried and couldn’t create it.
Can I just say, Q1 was in my CSA exam and these scenario questions will do great for people studying for the exam.
Though can you just make the answers at the bottom instead of putting them in bold text?
Cheers!
Thanks Max for the feedback, I am trying to implement a format of that sorts but it would take time.
Hi Guys,
Are the answers now correct? I see a lot of discussion arount Q3 and Q5.
What are the right answers?
Thanks
Pradeep
Correction- Gateway-stored volumes support 32 volumes 16Tib each now
Hi Shankar, these limits keep on changing literally every quarter :). Will update the same.
Please update File Storage Gateway on this page
please explanin Q.5 i have lot of confusion.
can you please mention which part is causing confusion ?
Please update File Storage Gateway on this page
Sure Omr, will check.
Each gateway configured for gateway-stored volumes can support up to 12 32 volumes, ranging from 1GiB to 16TiB, and a total volume storage of 192 TiB ( change this to 512 TiB)
Q1: Storage Gateway: “All data transferred between the gateway and AWS storage is encrypted using SSL. By default, all data stored by volume gateway in S3 is encrypted server-side with Amazon S3-Managed Encryption Keys (SSE-S3).” I believe this is saying that S3 will encrypt data at rest, not Storage Gateway. https://aws.amazon.com/storagegateway/faqs/
Also regarding Q1: “Amazon DynamoDB offers fully managed encryption at rest. DynamoDB encryption at rest provides enhanced security by encrypting your data at rest using an AWS Key Management Service (AWS KMS) managed encryption key for DynamoDB. This functionality eliminates the operational burden and complexity involved in protecting sensitive data.” https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EncryptionAtRest.html
Storage Gateway and Glacier provides encryption at rest by default.
With latest AWS enhancements, DynamoDB and SQS also allows encryption of data but on demand.
Thank for this great blog
one typo
in Gateway stored volumes:
“Gateway-cached volumes can be attached as iSCSI devices from on-premises application servers”
should be
“Gateway-stored volumes can be attached as iSCSI devices from on-premises application servers”
thanks me2resh, updated the same.
Really nice and extensive article for AWS storage gateway!
thumbs up for the great effort.
This needs to be updated .. ” .. Each gateway configured for gateway-stored volumes can support up to 32 volumes, ranging from 1GiB to 16TiB, and a total volume storage of 192 TiB” .. total volume storage now is 512TiB (32×16) for volume storage & 1024Tib for Cached Volumes.
Thanks Venugopal, the post has been updated.
This needs to be updated .. ” .. Each gateway configured for gateway-stored volumes can support up to 32 volumes, ranging from 1GiB to 16TiB, and a total volume storage of 192 TiB” .. total volume storage now is 512TiB (32×16) for volume storage & 1024Tib for Cached Volumes.
Q3 # Looking for to store Blocks storage. But if you use storage gateway it will store the data in S3 which is used for object storage. And in the question mentioned 140 Tb mounted as single folder. In that case storage gateway is not the right solution based on the below limitations
Each gateway-cached volume can store up to 32 TB of data. Data written to the volume is cached on your on-premises hardware and asynchronously uploaded to AWS for durable storage.
Each gateway-stored volume can store up to 16 TB of data. Data written to the volume is stored on your on-premises hardware and asynchronously backed up to AWS for point-in-time snapshots.
Please correct me if I am wrong.
Hi Senthil, good catch. S3 is object storage and isn’t meant to serve as a standalone, POSIX-compliant file system so glacier seems to be the only option.
But the only thing unclear is the ability to be able to access data while the backup is happening.
block-based storage = EBS, or in this case storage GW
mount “it” as a single folder, here “it” I guess means “storage”, in other words, it means “mount the stoage as a single folder” = this is still referred to storage gateway
finally, accessing data while backup = this meets the feature of snapshot, b/c while snapshot of an AWS EBS is taking place, that data is accessible. actually storage GW use EBS snapshot mechanism in the background when backup data.
any comments?
Just curious, size of volume aside (lets say volume is 5TB)… the question states that the application is running on non x86 HW. Let’s say its SPARC/Solaris. Doesnt storage gateway present only x86 linux support? If so, then there would be no answer that would work.
For Q#3
From documentation
Gateway-stored volumes can range from 1 GiB to 16 TiB in size and must be rounded to the nearest GiB. Each gateway configured for gateway-stored volumes can support up to 32 volumes and a total volume storage of 512 TiB (0.5 PiB).
That will prevent from mounting 140TB of data as a single folder
Gateway-cached volumes can range from 1 GiB to 32 TiB in size and must be rounded to the nearest GiB. Each gateway configured for gateway-cached volumes can support up to 32 volumes for a total maximum storage volume of 1,024 TiB (1 PiB).
I am thinking it must be A then??
And even
C is definitely not correct.
“Users must be able to access portions of this data while the backups are taking place.”
To glacier? It takes 3 – 4 hours to retrieve data.?
Q5. With a 1gbps connection it takes only 10 minutes or so to download a 50GB file (even assuming only 50% line utilization). Therefore the simpler c) satisfies this RTO in a more cost-effective way, and the combination of plain S3 and Glacier handles the compliance adequately. Do you agree?
Agreed, actually the confusing line in the question is actually “The time required to restore the data store from a backup is approximately 1 hour”, which is the time it takes to get the data from the backup and not the RTO which is 2 hours only for the 7 days objects and just need to store the data indefinitely.
You can do Expedited retrievals.
https://aws.amazon.com/glacier/faqs/
Q: What are Expedited retrievals?
Expedited retrievals allow you to quickly access your data when occasional urgent requests for a subset of archives are required. For all but the largest archives (250MB+), data accessed using Expedited retrievals are typically made available within 1 – 5 minutes. There are two types of Expedited retrievals: On-Demand and Provisioned. On-Demand requests are like EC2 On-Demand instances and are available the vast majority of the time. Provisioned requests are guaranteed to be available when you need them.
Yup, its the latest enchancement with Glacier where you can do expedited retrievals. Not sure if the AWS exams do keep up the pace with the latest enhancements.
We are using Volume Cached Storage Gateway .We have 32TB snapshots and we need to restore now..I ddint find any direct documentation for restoring this..Any inputs will be appreciated.
Hi Raj, aren’t these EBS snapshots can be directly used to create volumes and attach to an EC2 instance ?
I dont think any EC2 instances supports EBS Volumes of size more than 16TB..I tried and couldn’t create it.
Can I just say, Q1 was in my CSA exam and these scenario questions will do great for people studying for the exam.
Though can you just make the answers at the bottom instead of putting them in bold text?
Cheers!
Thanks Max for the feedback, I am trying to implement a format of that sorts but it would take time.
Hi Guys,
Are the answers now correct? I see a lot of discussion arount Q3 and Q5.
What are the right answers?
Thanks
Pradeep
Correction- Gateway-stored volumes support 32 volumes 16Tib each now
Hi Shankar, these limits keep on changing literally every quarter :). Will update the same.
Please update File Storage Gateway on this page
please explanin Q.5 i have lot of confusion.
can you please mention which part is causing confusion ?
Please update File Storage Gateway on this page
Sure Omr, will check.
Each gateway configured for gateway-stored volumes can support up to 12 32 volumes, ranging from 1GiB to 16TiB, and a total volume storage of 192 TiB ( change this to 512 TiB)
Q1: Storage Gateway: “All data transferred between the gateway and AWS storage is encrypted using SSL. By default, all data stored by volume gateway in S3 is encrypted server-side with Amazon S3-Managed Encryption Keys (SSE-S3).” I believe this is saying that S3 will encrypt data at rest, not Storage Gateway.
https://aws.amazon.com/storagegateway/faqs/
Also regarding Q1: “Amazon DynamoDB offers fully managed encryption at rest. DynamoDB encryption at rest provides enhanced security by encrypting your data at rest using an AWS Key Management Service (AWS KMS) managed encryption key for DynamoDB. This functionality eliminates the operational burden and complexity involved in protecting sensitive data.”
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EncryptionAtRest.html
Storage Gateway and Glacier provides encryption at rest by default.
With latest AWS enhancements, DynamoDB and SQS also allows encryption of data but on demand.
Thank for this great blog
one typo
in Gateway stored volumes:
“Gateway-cached volumes can be attached as iSCSI devices from on-premises application servers”
should be
“Gateway-stored volumes can be attached as iSCSI devices from on-premises application servers”
thanks me2resh, updated the same.
Really nice and extensive article for AWS storage gateway!
thumbs up for the great effort.
Hi Jayendra,
https://aws.amazon.com/storagegateway/faqs/
For gateway stored volumes, total capacity now stands at 512 TiB. Please update.
thanks Amit, the capacity just keeps on growing 🙂
Hi Jayendra Patil,
This needs to be updated .. ” .. Each gateway configured for gateway-stored volumes can support up to 32 volumes, ranging from 1GiB to 16TiB, and a total volume storage of 192 TiB” .. total volume storage now is 512TiB (32×16) for volume storage & 1024Tib for Cached Volumes.
Thanks Venugopal, the post has been updated.
This needs to be updated .. ” .. Each gateway configured for gateway-stored volumes can support up to 32 volumes, ranging from 1GiB to 16TiB, and a total volume storage of 192 TiB” .. total volume storage now is 512TiB (32×16) for volume storage & 1024Tib for Cached Volumes.