AWS DataSync

AWS DataSync

  • AWS DataSync is an online data movement service that simplifies, automates, and accelerates moving data between on-premises storage, other cloud providers, and AWS Storage services.
  • DataSync provides end-to-end security, including encryption and integrity validation.
  • DataSync automates both the management of data-transfer processes and the infrastructure required for high-performance and secure data transfer.
  • DataSync uses a purpose-built network protocol and a parallel, multi-threaded architecture to accelerate the transfers.
  • A DataSync agent is a VM or EC2 instance that AWS DataSync uses to read from or write to a storage system. Agents are commonly used when copying data from on-premises storage to AWS.
  • For transfers between AWS storage services (same or cross-region), no agent is required — data remains within the AWS network.
  • DataSync transfer is described by a Task and a Task Execution is an individual run of a DataSync task.
  • A task can be configured for locations (source and destination), schedule and how it treats metadata, deleted files, and permissions.
  • Task scheduling automatically runs tasks on the configured schedule with hourly, daily, or weekly options.
  • Each time a task is started it performs an incremental copy, transferring only the changes from the source to the destination.
  • If a task is interrupted, for instance, if the network connection goes down or the agent is restarted, the next run of the task will transfer missing files, and the data will be complete and consistent at the end of this run.
  • AWS DataSync can be used with the Direct Connect link to access public service endpoints or private VPC endpoints.
  • The amount of network bandwidth that AWS DataSync will use can be controlled by configuring the built-in bandwidth throttle.
  • DataSync supports both IPv4 and IPv6 addresses for connecting with the service and for data transfers with supported data sources.

DataSync Task Modes

  • DataSync tasks run in one of two modes: Enhanced mode or Basic mode.
  • Enhanced mode
    • Transfers virtually unlimited numbers of files or objects with higher performance than Basic mode.
    • Optimizes the data transfer process by listing, preparing, transferring, and verifying data in parallel.
    • Supports transfers between Amazon S3 locations, cross-cloud transfers (Azure Blob, Google Cloud Storage, Oracle Cloud Object Storage) without an agent, and transfers between NFS/SMB file servers and Amazon S3 using an Enhanced mode agent.
    • Not subject to the same file/object count quotas as Basic mode.
  • Basic mode
    • Transfers files or objects between AWS storage and all other supported DataSync locations.
    • Subject to quotas on the number of files, objects, and directories in a dataset.
    • Sequentially prepares, transfers, and verifies data, making it slower than Enhanced mode for most workloads.

DataSync Supported Locations

  • Network File System (NFS) file servers
  • Server Message Block (SMB) file servers
  • Hadoop Distributed File System (HDFS)
  • Object storage systems
  • Amazon S3 (including S3 on Outposts)
  • Amazon EFS file systems
  • Amazon FSx for Windows File Server file systems
  • Amazon FSx for Lustre file systems
  • Amazon FSx for OpenZFS file systems
  • Amazon FSx for NetApp ONTAP file systems
  • Microsoft Azure Blob Storage
  • Google Cloud Storage
  • Oracle Cloud Object Storage
  • Other S3-compatible cloud storage (Wasabi, DigitalOcean Spaces, etc.)

⚠️ Note: AWS Snowcone devices were discontinued effective November 12, 2024. DataSync on Snowcone is no longer available for new orders. Existing customers had support until November 12, 2025. Use DataSync online transfers or AWS Data Transfer Terminal as alternatives.

Cross-Cloud Transfers

  • DataSync Enhanced mode supports agentless cross-cloud transfers, enabling direct data movement between other cloud providers and Amazon S3 without deploying a DataSync agent.
  • Supported cross-cloud sources include Microsoft Azure Blob Storage, Google Cloud Storage, Oracle Cloud Object Storage, and other S3-compatible storage services.
  • Cross-cloud transfers provide higher performance and scalability compared to agent-based Basic mode transfers.
  • Simplifies multi-cloud data migration and data pipeline workflows.

DataSync Manifests

  • DataSync supports manifests, enabling users to provide a definitive list of source files or objects to be transferred by a task.
  • Using manifests, task execution times can be decreased by specifying only the files or objects that need to be processed.
  • The maximum allowable size for a manifest file with Enhanced mode tasks is 20 GB.
  • Useful for scenarios with hundreds of millions of objects where only a specific subset needs to be transferred.

DataSync Task Reports

  • Task reports provide detailed JSON-formatted output files about what DataSync attempts to transfer, skip, verify, and delete during a task execution.
  • Reports include a summary and detailed reports for files transferred, skipped, verified, and deleted.
  • Task reports are generated after the completion of transfer tasks and stored in an Amazon S3 bucket.
  • Useful for tracking and auditing data transfers, monitoring chain of custody, and troubleshooting transfer errors.

DataSync Security Features

  • DataSync integrates with AWS Secrets Manager for credential management across all location types, including HDFS, Amazon FSx for Windows File Server, and Amazon FSx for NetApp ONTAP.
  • Supports VPC endpoint policies for fine-grained access control and FIPS-enabled VPC endpoints in FIPS-enabled AWS Regions.
  • Data is encrypted in transit using TLS.
  • Supports Kerberos authentication for enhanced SMB file transfer security.
  • Location configurations can be updated without recreating locations, simplifying credential rotation and maintenance.

DataSync Agents

  • An agent is required for transfers between on-premises storage and AWS.
  • No agent is needed for transfers between AWS storage services (same or cross-region) — data remains in the AWS network.
  • Enhanced mode supports agentless cross-cloud transfers (Azure Blob, Google Cloud Storage, Oracle Cloud Object Storage to Amazon S3).
  • Enhanced mode agents are available for on-premises NFS/SMB to Amazon S3 transfers with improved performance.
  • Version 1 DataSync agents were discontinued on December 7, 2023 — affected agents must be replaced.

ℹ️ Note: DataSync Discovery was discontinued on May 20, 2025 and is no longer available as a DataSync feature.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is migrating its applications to AWS. Currently, applications that run on-premises generate hundreds of terabytes of data that is stored on a shared file system. The company is running an analytics application in the cloud that runs hourly to generate insights from this data. The company needs a solution to handle the ongoing data transfer between the on-premises shared file system and Amazon S3. The solution also must be able to handle occasional interruptions in internet connectivity. Which solutions should the company use for the data transfer to meet these requirements?
    1. AWS DataSync
    2. AWS Migration Hub
    3. AWS Snowball Edge Storage Optimized
    4. AWS Transfer for SFTP
  2. A company needs to transfer 500 million objects from Google Cloud Storage to Amazon S3 with minimal operational overhead. The transfer should be completed as fast as possible without deploying any infrastructure. Which approach best meets these requirements?
    1. Deploy a DataSync agent on a Google Cloud VM and use Basic mode tasks
    2. Use DataSync Enhanced mode with agentless cross-cloud transfer directly from Google Cloud Storage to Amazon S3
    3. Use AWS Transfer Family with SFTP to pull data from Google Cloud Storage
    4. Use S3 Batch Operations to copy from Google Cloud Storage
  3. A company uses AWS DataSync to replicate data between Amazon S3 buckets in different AWS Regions. The dataset contains over 100 million objects and the company needs improved transfer performance and scalability. Which DataSync configuration should be used?
    1. Use Basic mode with multiple parallel tasks
    2. Deploy a DataSync agent in each Region
    3. Use Enhanced mode for the S3-to-S3 transfer task
    4. Enable S3 Transfer Acceleration on both buckets
  4. An organization needs to transfer only a specific set of 10,000 files from an on-premises NFS server containing millions of files to Amazon S3. They want to minimize the time spent scanning the source. Which DataSync feature should they use?
    1. Task filtering with include patterns
    2. DataSync manifests to specify the exact list of files to transfer
    3. Schedule the task during off-peak hours
    4. Use Enhanced mode with no manifest
  5. A security team requires detailed audit reports of all files transferred, skipped, and deleted during a DataSync task execution for compliance purposes. Which DataSync feature provides this capability?
    1. CloudWatch Logs integration
    2. AWS CloudTrail data events
    3. DataSync task reports
    4. Amazon S3 server access logs

References