AWS FSx for Lustre

AWS FSx for Lustre

  • Amazon FSx for Lustre is a fully managed service, that makes it easy and cost-effective to launch and run the world’s most popular HPC high-performance Lustre file system.
  • Lustre is an open-source file system designed for applications that require fast storage – where you want the storage to keep up with the compute.
  • FSx handles the traditional complexity of setting up and managing high-performance Lustre file systems
  • FSx for Lustre is POSIX-compliant and can be used with existing Linux-based applications without having to make any changes.
  • FSx for Lustre provides a native file system interface and works as any file system does with your Linux operating system.
  • FSx for Lustre provides read-after-write consistency and supports file locking.
  • FSx for Lustre is compatible with the most popular Linux-based AMIs, including Amazon Linux, Amazon Linux 2, Red Hat Enterprise Linux (RHEL), CentOS, SUSE Linux and Ubuntu.
  • FSx for Lustre is accessible from compute workloads running on EC2 instances and containers running on EKS.
  • FSx for Lustre can be accessed from a Linux instance, by installing the open-source Lustre client and mounting the file system using standard Linux commands.
  • FSx for Lustre is ideal for use cases where speed matters, such as machine learning, high-performance computing (HPC), video processing, financial modelling, genome sequencing, and electronic design automation (EDA)

FSx for Lustre Deployment Options

Scratch file systems

  • designed for temporary storage and short-term processing of data.
  • provide high burst throughput of up to six times the baseline throughput of 200 MBps per TiB of storage capacity.
  • data is not replicated and does not persist if a file server fails.
  • ideal for cost-optimized storage for short-term, processing-heavy workloads.

Persistent file systems

  • designed for long-term storage and workloads.
  • is highly available, and data is automatically replicated within the AZ that is associated with the file system.
  • data volumes attached to the file servers are replicated independently from the file servers to which they are attached.
  • if a file server becomes unavailable, it is replaced automatically within minutes of failure.
  • continuously monitored for hardware failures, and automatically replaces infrastructure components in the event of a failure.
  • ideal for workloads that run for extended periods or indefinitely, and that might be sensitive to disruptions in availability.

FSx for Lustre - Scratch vs Persistence

FSx for Lustre with S3

  • Amazon FSx also integrates seamlessly with S3, making it easy to process cloud data sets with the Lustre high-performance file system.
  • Amazon FSx for Lustre file system transparently presents S3 objects as files and allows writing changed data back to S3.
  • Amazon FSx for Lustre file system can be linked with a specified S3 bucket, making the data in the S3 accessible to the file system.
  • S3 objects’ names and prefixes will be visible as files and directories
  • S3 objects are lazy-loaded by default.
    • FSx automatically loads the corresponding objects from S3 only when first accessed by the applications.
    • Subsequent reads of these files are served directly out of the file system with low, consistent latencies.
    • Amazon FSx for Lustre file system can optionally be batch hydrated.
  • Amazon FSx for Lustre uses parallel data transfer techniques to transfer data from S3 at up to hundreds of GBs/s.
  • Files from the file system can be exported back to the S3 bucket

FSx for Lustre Security

  • FSx for Lustre provides encryption at rest for the file system and the backups, by default, using KMS
  • FSx encrypts data-in-transit when accessed from supported EC2 instances only

FSx for Lustre Scalability

  • Amazon FSx for Lustre file systems scale to hundreds of GB/s of throughput and millions of IOPS.
  • FSx for Lustre also supports concurrent access to the same file or directory from thousands of compute instances.
  • FSx for Lustre provides consistent, sub-millisecond latencies for file operations.

FSx for Lustre Availability and Durability

  • On a scratch file system, file servers are not replaced if they fail and data is not replicated.
  • On a persistent file system, if a file server becomes unavailable it is replaced automatically and within minutes.
  • Amazon FSx for Lustre provides a parallel file system, where data is stored across multiple network file servers to maximize performance and reduce bottlenecks, and each server has multiple disks.
  • Amazon FSx takes daily automatic incremental backups of the file systems, and allows manual backups at any point.
  • Backups are highly durable and file-system-consistent

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A solutions architect is designing storage for a high performance computing (HPC) environment based on Amazon Linux. The workload stores and processes a large amount of engineering drawings that require shared storage and heavy computing. Which storage option would be the optimal solution?
    1. Amazon Elastic File System (Amazon EFS)
    2. Amazon FSx for Lustre
    3. Amazon EC2 instance store
    4. Amazon EBS Provisioned IOPS SSD (io1)
  2. A company is planning to deploy a High Performance Computing (HPC) cluster in its VPC that requires a scalable, high performance file system. The storage service must be optimized for efficient workload processing, and the data must be accessible via a fast and scalable file system interface. It should also work natively with Amazon S3 that enables you to easily process your S3 data with a high-performance POSIX interface. Which of the following is the MOST suitable service that you should use for this scenario?
    1. Amazon Elastic File System (Amazon EFS)
    2. Amazon FSx for Lustre
    3. Amazon Elastic Block Store
    4. Amazon EBS Provisioned IOPS SSD (io1)

References

Amazon_FSx_for_Lustre