S3 Vectors & S3 Tables – AI-Native Storage

Q: What is Amazon S3 Vectors?

S3 Vectors is a purpose-built storage service for AI embeddings. It stores up to 2 billion vectors per index, supports 100ms query latencies, and costs up to 90% less than specialized vector databases — integrating natively with Bedrock and SageMaker.

Q: What is Amazon S3 Tables?

S3 Tables provides first-class Apache Iceberg table support in S3. It manages compaction, snapshot management, and schema evolution automatically, enabling analytics queries via Athena, Redshift, and EMR with built-in Intelligent-Tiering.

Q: When should I use S3 Vectors vs OpenSearch?

Use S3 Vectors for cost-optimized vector storage at scale (billions of embeddings) with simple similarity search. Use OpenSearch when you need hybrid search (vector + keyword), complex filtering, or real-time indexing with sub-10ms latency.

July 1, 2026 ~ Last updated on : July 3, 2026 ~ Kiro Agent

Amazon S3 Vectors & S3 Tables Overview

Amazon S3 has evolved beyond simple object storage into an AI-native and analytics-native storage platform with two groundbreaking services: Amazon S3 Vectors and Amazon S3 Tables. These services bring purpose-built capabilities directly into S3, eliminating the need for separate specialized databases while maintaining S3’s legendary durability, scalability, and cost efficiency.

S3 Vectors (GA December 2025) – The first cloud object storage with native vector support for AI workloads, semantic search, and RAG applications
S3 Tables (GA December 2024) – Fully managed Apache Iceberg tables optimized for analytics workloads with automatic maintenance

Amazon S3 Vectors

What is Amazon S3 Vectors?

Amazon S3 Vectors is the first cloud object store with native support to store and query vector embeddings, providing purpose-built, cost-optimized vector storage for AI agents, AI inference, and semantic search. It reduces the total cost of uploading, storing, and querying vectors by up to 90% compared to specialized vector database solutions.

Key Features

Massive Scale – Store up to 2 billion vectors per index, 10,000 indexes per bucket (up to 20 trillion vectors total)
Sub-second Performance – Warm query latency as low as 100ms; infrequent queries return in under 1 second

90% Cost Reduction – Dramatically lower costs compared to specialized vector databases
Fully Serverless – No infrastructure to provision or manage; pay only for what you use
Strong Consistency – Subsequent queries always include the most recently added data

Metadata Filtering – Up to 50 metadata keys per vector (10 non-filterable) for combined similarity + attribute filtering
Distance Metrics – Supports Cosine and Euclidean distance metrics
Encryption – SSE-S3 (default) or SSE-KMS with custom keys at bucket or index level
High Write Throughput – Up to 1,000 PUT transactions per second for streaming updates

Top-K Results – Return up to 100 search results per query

Architecture Components

Component	Description
Vector Bucket	A new S3 bucket type dedicated to vector storage with vector-specific APIs
Vector Index	Container for vectors within a bucket; defines dimensionality and distance metric
Vectors	Individual embeddings with key, data (float32), and optional metadata
Metadata	Key-value pairs attached to vectors; filterable and non-filterable types

AWS Service Integrations

Amazon Bedrock Knowledge Bases – Native integration for cost-effective RAG applications; select S3 Vectors as the vector store when creating a knowledge base
Amazon OpenSearch Service – Tiered strategy: store long-term vectors in S3 Vectors and export high-priority vectors to OpenSearch for real-time, low-latency search

Amazon SageMaker Unified Studio – Create and manage knowledge bases with S3 Vectors within the unified AI development environment
AWS CloudFormation – Deploy and manage vector resources as infrastructure as code
AWS PrivateLink – Private network connectivity for secure access

S3 Vectors Use Cases

Semantic Search – Search across millions of documents, images, videos, and audio based on meaning rather than keywords

Retrieval Augmented Generation (RAG) – Provide contextual information to LLMs from large document collections at reduced cost
AI Agent Memory – Give AI agents lasting memory by storing every interaction and insight across petabytes of vector data
Media Intelligence – Index video content at frame level (e.g., 5,400+ embeddings per hour of video) for instant scene retrieval

Personalization Engines – Store user preference embeddings for recommendation systems
Medical Image Similarity – Compare medical images (radiology, pathology) against vast databases for diagnosis assistance
Code Search – Navigate large codebases using semantic similarity

S3 Vectors Pricing (US East – N. Virginia)

Dimension	Rate
Storage	$0.06 per GB/month
PUT (Upload)	$0.20 per GB uploaded (min 128KB per PUT)
Query – API Fee	$2.50 per million queries
Query – Data Processed (first 100K vectors)	$0.004 per TB
Query – Data Processed (100K–10M vectors)	$0.002 per TB
Query – Data Processed (10M+ vectors)	$0.0004 per TB
Query – Data Returned	$0.01 per GB (first 512KB/query free)
Vector Bucket	Free to create

Pricing Example: 10 million vectors (6.17 KB each) across 40 indexes, with 1M queries/month returning top 100 results = approximately $11.38/month. This demonstrates the dramatic cost savings compared to traditional vector databases.

Amazon S3 Tables

What are Amazon S3 Tables?

Amazon S3 Tables are fully managed Apache Iceberg tables that automate the operational burden of managing data lakes and lakehouses. They deliver storage specifically optimized for analytics workloads with automatic table maintenance, delivering up to 3x faster query performance and 10x higher transactions per second compared to self-managed Iceberg tables in general purpose S3 buckets.

Key Features

Fully Managed Apache Iceberg – Automatic compaction, snapshot management, and unreferenced file removal
Advanced Compaction Strategies – Binpack (default), sort compaction, and z-order compaction for multi-dimensional queries

S3 Tables Intelligent-Tiering – Automatically moves data to the most cost-effective tier, reducing storage costs by up to 80%
Automatic Replication – Cross-region table replication for reduced query latency and disaster recovery
Iceberg REST Catalog API – Compatible with any Iceberg engine (Spark, Trino, Flink, Athena, Redshift, Snowflake)

Apache Iceberg V3 Support – Deletion vectors for efficient batch updates, reducing write amplification
Table-Level Security – Built-in access control, encryption, and lifecycle management per table
AWS Analytics Integration – Native integration with AWS Glue Data Catalog, Lake Formation, Athena, EMR, and Redshift

MCP Support – AI agents and LLMs can interact with S3 Tables via Model Context Protocol
11 Nines Durability – 99.999999999% durability with 99.99% availability

How S3 Tables Work

Create a Table Bucket – A new S3 bucket type purpose-built for tabular data
Create Tables – Define Apache Iceberg tables using SQL (via Athena) or Iceberg REST Catalog API

Ingest Data – Write data using Spark, Flink, Firehose, or any Iceberg-compatible engine
Automatic Maintenance – S3 continuously compacts, manages snapshots, and removes orphan files
Query with Any Engine – Use Athena, Redshift, Spark, Trino, DuckDB, or Snowflake

S3 Tables Use Cases

Data Lake Modernization – Migrate from Parquet/Hive/Hadoop to managed Iceberg tables with reduced complexity
Streaming Analytics – Stream data from IoT sensors, transactions, and application logs with near real-time queryability
Big Data Analytics – High-throughput workloads benefiting from 10x higher TPS

AI-Powered Analytics – Query data using natural language through MCP for ad-hoc exploration
Transactional Data Lakes – ACID transactions with time-travel and schema evolution
Compliance & Audit – Immutable audit trails with snapshot history and data versioning

S3 Tables Pricing (US West – Oregon)

Dimension	Rate
Storage (Standard)	$0.0265 per GB/month (first 50TB)
PUT Requests	$0.005 per 1,000 requests
GET Requests	$0.0004 per 1,000 requests
Object Monitoring	$0.025 per 1,000 objects
Compaction – Objects Processed	$0.002 per 1,000 objects
Compaction – Data Processed	$0.005 per GB
Replication – Table Updates	$0.010 per 1,000 table updates
Table Bucket	Free to create

Pricing Example: 1 TB table with 30,000 new files/month, 500K GET requests, and automatic compaction = approximately $28.54/month (storage + requests + monitoring + compaction).

Vector Database Comparison: S3 Vectors vs OpenSearch vs Pinecone vs pgvector

Feature	S3 Vectors	OpenSearch Service	Pinecone	pgvector (Aurora/RDS)
Max Vectors	2B per index (20T per bucket)	Billions (distributed)	Billions (serverless)	Millions (instance-bound)
Query Latency	~100ms (warm), <1s (cold)	Single-digit ms	~10-50ms	~10-100ms (depends on index)
QPS (Queries/Second)	Low-Medium (optimized for infrequent)	Very High (thousands)	High (thousands)	Medium (hundreds)
Cost at Scale	Very Low (90% cheaper)	High (compute + storage)	High (pod/serverless units)	Medium (instance cost)
Infrastructure	Fully Serverless	Managed (Serverless option)	Fully Serverless	Managed instances
Hybrid Search	Metadata filtering only	Full (keyword + vector + filters)	Metadata filtering	Full SQL + vector
Real-time Updates	Strong consistency, 1K TPS writes	Near real-time	Near real-time	Immediate (ACID)
AWS Integration	Native (Bedrock, OpenSearch, SageMaker)	Native AWS service	Third-party (API)	Native (Aurora/RDS)
Best For	Large-scale, cost-sensitive, infrequent queries	Real-time search, high QPS	Quick start, managed vector DB	Existing PostgreSQL apps, ACID needs

When to Use Each Vector Solution

Choose S3 Vectors when:
- You have large vector datasets (millions to billions) with infrequent query patterns
- Cost optimization is the primary concern
- You need long-term durable storage for vector embeddings
- Building RAG applications with Amazon Bedrock Knowledge Bases
- Implementing a tiered strategy (cold vectors in S3, hot vectors in OpenSearch)
- Query latency of 100ms–1s is acceptable
Choose OpenSearch Service when:
- You need single-digit millisecond latency
- High query throughput (thousands of QPS) is required
- You need hybrid search combining keyword, vector, and structured filters
- Real-time applications like product recommendations or fraud detection

Choose Pinecone when:
- You want a fully managed vector-only database with minimal setup
- Multi-cloud or vendor-neutral strategy is important
- You don’t need tight AWS service integration
Choose pgvector when:
- You already use PostgreSQL and want to add vector search to existing data
- You need ACID transactions combining relational and vector data
- Vector dataset is relatively small (millions, not billions)
- You prefer a single database for both structured queries and similarity search

Analytics Storage Comparison: S3 Tables vs Athena vs Redshift

Feature	S3 Tables	Amazon Athena	Amazon Redshift
Type	Managed Storage Layer (Iceberg)	Serverless Query Engine	Data Warehouse
Infrastructure	No compute to manage	Serverless (no provisioning)	Provisioned clusters or Serverless
Table Format	Apache Iceberg (managed)	Reads Iceberg, Hive, Delta, Parquet, CSV, JSON	Native columnar + Spectrum for S3/Iceberg
Table Maintenance	Automatic (compaction, snapshots, cleanup)	None (user-managed)	Automatic (VACUUM, ANALYZE)
Query Performance	Depends on engine (Athena, Spark, etc.)	Seconds to minutes (scan-based)	Sub-second to seconds (optimized)
Concurrency	10x higher TPS than self-managed Iceberg	Limited concurrent queries	High (hundreds of concurrent queries)
Pricing Model	Storage + requests + compaction	$5 per TB scanned	Per-node-hour or RPU-hour (Serverless)
Best For	Managed lakehouse storage at scale	Ad-hoc queries, exploration	Complex analytics, dashboards, BI

Understanding the Relationship

S3 Tables is a storage layer, not a query engine. It complements Athena and Redshift rather than replacing them:

S3 Tables + Athena – Best for serverless data lake analytics with automatic table optimization. Athena queries S3 Tables directly with improved performance from automatic compaction.
S3 Tables + Redshift – Best for high-concurrency BI dashboards and complex joins. Redshift reads S3 Tables through Spectrum or direct Iceberg integration.
S3 Tables + Spark (EMR/Glue) – Best for large-scale ETL, streaming ingestion, and ML feature engineering.

When to Use Each Analytics Approach

Choose S3 Tables when:
- You want managed Iceberg tables without maintaining compaction jobs
- Data lake with multiple query engines (Athena + Redshift + Spark)
- Streaming data that needs to be queryable in near real-time
- You want Intelligent-Tiering to automatically optimize storage costs
- Multi-region data access with automatic replication

Choose Athena (with S3 Tables or general purpose S3) when:
- Ad-hoc exploration of data in S3
- Pay-per-query model is preferred (no idle costs)
- Simple queries that don’t require complex joins across large datasets
- Infrequent querying where provisioned resources would be wasteful
Choose Redshift when:
- High-concurrency BI dashboards with sub-second response times
- Complex analytical queries with many joins across large tables
- Predictable, heavy workloads that justify provisioned compute
- Advanced features like materialized views, stored procedures, and ML integration

S3 Vectors vs S3 Tables – Quick Comparison

Aspect	S3 Vectors	S3 Tables
Data Type	Vector embeddings (unstructured → vectors)	Tabular/structured data (rows and columns)
Query Type	Similarity search (nearest neighbor)	SQL analytics (filter, aggregate, join)
Primary Use Case	AI/ML, RAG, semantic search	Data lakes, analytics, BI
Bucket Type	Vector Bucket	Table Bucket
Open Standard	Proprietary S3 Vectors API	Apache Iceberg (open)
GA Date	December 2025	December 2024
Complementary Use	Store embeddings for AI agents and search	Store structured data for analytics queries

Together: S3 Vectors and S3 Tables can work in tandem. For example, store product catalog data in S3 Tables for analytical queries while storing product description embeddings in S3 Vectors for semantic search. Both share S3’s durability, security model, and operational simplicity.

AWS Certification Exam Practice Questions

Question 1

A company needs to store 500 million vector embeddings for a semantic search application. The application receives approximately 100 queries per minute and requires results within 1 second. Cost optimization is the highest priority. Which solution best meets these requirements?

Amazon OpenSearch Service with vector engine
Amazon S3 Vectors
Amazon Aurora PostgreSQL with pgvector extension
Self-managed Pinecone on Amazon EC2

Show Answer

Answer: B –

Explanation: S3 Vectors is designed for large-scale vector storage (up to 2B per index) with sub-second query performance at up to 90% lower cost than specialized vector databases. With 100 queries/minute (infrequent pattern) and cost as the primary concern, S3 Vectors is the ideal choice. OpenSearch would provide faster latency but at significantly higher cost. pgvector would struggle at 500M vectors. Pinecone on EC2 is not a valid deployment model.

Question 2

A data engineering team manages an Apache Iceberg data lake on Amazon S3. They spend significant time running compaction jobs, managing snapshots, and cleaning orphan files. Queries are becoming slower as the data grows. Which solution would reduce their operational overhead while improving query performance?

Move data to Amazon Redshift
Use Amazon S3 Tables with automatic table maintenance
Implement S3 Lifecycle policies to delete old files
Add more partitions to the existing Iceberg tables

Show Answer

Answer: B –

Explanation: Amazon S3 Tables provides fully managed Apache Iceberg tables with automatic compaction, snapshot management, and unreferenced file removal. This eliminates the operational burden while delivering up to 3x faster query performance. Moving to Redshift changes the architecture entirely. Lifecycle policies don’t understand Iceberg metadata. Adding partitions doesn’t address compaction issues.

Question 3

An AI team is building a RAG application using Amazon Bedrock Knowledge Bases. They need to store vector embeddings for 50 million documents with the lowest possible cost. The application will query the knowledge base approximately 10,000 times per day. Which vector store should they select in the Bedrock Knowledge Bases configuration? (Select TWO benefits of this choice)

Amazon OpenSearch Serverless – provides lowest latency
Amazon S3 Vectors – reduces vector storage and query costs by up to 90%
Amazon S3 Vectors – provides native integration with Bedrock Knowledge Bases

Amazon Aurora PostgreSQL – provides ACID transactions for vectors
Amazon OpenSearch Serverless – provides automatic scaling with no minimum charges

Show Answer

Answer: B, C

Explanation: S3 Vectors is natively integrated with Amazon Bedrock Knowledge Bases and reduces the cost of vector storage and querying by up to 90%. With 10,000 queries/day (~7 per minute), the infrequent query pattern is ideal for S3 Vectors. While OpenSearch provides lower latency, the question prioritizes cost. Aurora pgvector is not a native Bedrock Knowledge Bases vector store option.

Question 4

A company has Apache Iceberg tables stored in general purpose S3 buckets. They want to improve query performance, reduce storage costs with automatic tiering, and replicate tables to a second region for disaster recovery. Which combination of features is available with Amazon S3 Tables? (Select THREE)

Automatic compaction with sort and z-order strategies
S3 Glacier Deep Archive storage for table data
S3 Tables Intelligent-Tiering for automatic storage cost optimization
Cross-region table replication
Real-time CDC (Change Data Capture) to Amazon DynamoDB
Native integration with Amazon Kinesis Data Streams

Show Answer

Answer: A, C, D

Explanation: S3 Tables supports automatic compaction (including sort and z-order), Intelligent-Tiering that reduces storage costs by up to 80%, and automatic cross-region table replication. S3 Tables does not support Glacier storage classes, CDC to DynamoDB, or native Kinesis integration.

Question 5

A solutions architect is designing a tiered vector search architecture. Real-time product recommendations need single-digit millisecond latency, while a historical document search (accessed a few times per hour) needs to be cost-optimized. Which architecture should the architect implement?

Use Amazon OpenSearch Service for all vector workloads with auto-scaling

Store all vectors in Amazon S3 Vectors and optimize with metadata filtering
Store hot vectors in Amazon OpenSearch Service for real-time queries and cold vectors in Amazon S3 Vectors for infrequent historical search
Use Amazon Aurora pgvector for product recommendations and Pinecone for document search

Show Answer

Answer: C –

Explanation: The tiered strategy combining OpenSearch (for hot, real-time queries requiring single-digit ms latency) and S3 Vectors (for cold, infrequent queries where sub-second latency is acceptable) provides the optimal balance of performance and cost. S3 Vectors natively integrates with OpenSearch, allowing vectors to be exported from S3 to OpenSearch when demand increases. Using OpenSearch for everything would be costly; using S3 Vectors for everything wouldn’t meet the real-time latency requirement.

Frequently Asked Questions

What is Amazon S3 Vectors?

S3 Vectors is a purpose-built storage service for AI embeddings. It stores up to 2 billion vectors per index, supports 100ms query latencies, and costs up to 90% less than specialized vector databases — integrating natively with Bedrock and SageMaker.

What is Amazon S3 Tables?

S3 Tables provides first-class Apache Iceberg table support in S3. It manages compaction, snapshot management, and schema evolution automatically, enabling analytics queries via Athena, Redshift, and EMR with built-in Intelligent-Tiering.

When should I use S3 Vectors vs OpenSearch?

Use S3 Vectors for cost-optimized vector storage at scale (billions of embeddings) with simple similarity search. Use OpenSearch when you need hybrid search (vector + keyword), complex filtering, or real-time indexing with sub-10ms latency.

References

AWS Cloud Migration Services – 7R Strategies

December 29, 2022 ~ Last updated on : June 26, 2026 ~ jayendrapatil ~ 6 Comments

🔄 MAJOR UPDATE NOTICE – June 2026

The AWS migration services landscape has undergone significant changes:

AWS Migration Hub – No longer accepting new customers (Nov 2025). Replaced by AWS Transform.

AWS Application Discovery Service – No longer accepting new customers (Nov 2025). Replaced by AWS Transform.
AWS Server Migration Service (SMS) – Discontinued (March 2022). Replaced by AWS Transform MGN.

AWS Application Migration Service (MGN) – Rebranded to AWS Transform MGN (June 2026).
AWS Snowmobile – Retired (March 2024).
AWS Snowball Edge – Only available to existing customers (Nov 2025). New customers should use AWS DataSync or AWS Data Transfer Terminal.

See new sections below for AWS Transform, AWS DataSync, AWS Data Transfer Terminal, and AWS Interconnect.

AWS Cloud Migration Services

AWS Cloud Migration services help to address a lot of common use cases such as
- cloud migration,
- disaster recovery,
- data center decommission, and
- content distribution.
For migrating data from on-premises to AWS, the major aspect for consideration are
- amount of data and network speed
- data security in transit
- existing application knowledge for recreation

Application & Database Cloud Migration Services

AWS Transform

is the next-generation migration and modernization service launched in May 2025, replacing AWS Migration Hub and integrating multiple migration capabilities into a unified platform.

uses agentic AI to automate discovery, dependency mapping, migration planning, network conversion, and EC2 instance optimization.
accelerates full-stack Windows modernization, mainframe modernization, and VMware migration.
provides a unified experience that consolidates capabilities previously spread across Migration Hub, Application Discovery Service, and Application Migration Service.
generates migration plans for tens of thousands of servers and applications in hours.

automatically creates or updates landing zones, modernizes and right-sizes networks, and containerizes applications during migration.
supports custom transformations of code, APIs, frameworks, and more—making tech stacks AI-ready while eliminating technical debt.
Key capabilities include:
- AWS Transform for VMware – Automates VMware-to-AWS migration with dependency mapping, wave planning, and network configuration conversions.
- AWS Transform MGN (formerly Application Migration Service) – Proven replication engine for lift-and-shift migrations.
- Strategy Recommendations – AI-driven migration and modernization strategy building.
- EC2 Instance Recommendations – Cost estimation for running existing servers in AWS.
- Migration Journeys – Prescriptive guided migration and modernization workflows.

AWS Transform MGN (formerly AWS Application Migration Service)

is the primary migration service for lift-and-shift migrations to AWS (rebranded from AWS Application Migration Service in June 2026).
simplifies migration by allowing the same automated process for a wide range of applications, without changes to applications, their architecture, or the migrated servers.
supports non-disruptive tests prior to cutover.
performs continuous block-level replication of source servers to AWS.
supports migration from physical, virtual, or cloud servers to AWS.

replaces both AWS Server Migration Service (SMS) and CloudEndure Migration.
is used to Re-host (lift-and-shift).

AWS Migration Hub (Maintenance Mode)

⚠️ Note: AWS Migration Hub stopped accepting new customers on November 7, 2025. Existing customers can continue using the service. New customers should use AWS Transform.

provides a centralized, single place to discover the existing servers, plan migrations, and track the status of each application migration.

provides visibility into the application portfolio and streamlines planning and tracking.
helps visualize the connections and the status of the migrating servers and databases, regardless of which migration tool is used.
stores all the data in the selected Home Region and provides a single repository of discovery and migration planning information for the entire portfolio and a single view of migrations into multiple AWS Regions.

helps track the status of the migrations in all AWS Regions, provided the migration tools are available in that Region.
helps understand the environment by letting you explore information collected by AWS discovery tools and stored in the AWS Application Discovery Service’s repository.
supports migration status updates from the following tools:
- AWS Transform MGN (formerly Application Migration Service)
- AWS Database Migration Service – DMS

migration tools send migration status to the selected Home Region
supports EC2 instance recommendations, that provide you with the ability to estimate the cost of running the existing servers in AWS.
supports Strategy Recommendations, that help easily build a migration and modernization strategy for the applications running on-premises or in AWS.

All current Migration Hub features, including Strategy Recommendations, EC2 Instance Recommendations, Migration Hub Journeys, and Orchestrator, are available in AWS Transform with improved functionality.

AWS Application Discovery Service (Maintenance Mode)

⚠️ Note: AWS Application Discovery Service stopped accepting new customers on November 7, 2025. The Discovery Connector was deprecated on November 17, 2025. New customers should use AWS Transform for VM discovery and assessment.

AWS Application Discovery Service helps plan migration to the AWS cloud by collecting usage and configuration data about the on-premises servers.

helps enterprises obtain a snapshot of the current state of their data center servers by collecting server specification information, hardware configuration, performance data, details of running processes, and network connections
is integrated with AWS Migration Hub,
- which simplifies migration tracking as it aggregates migration status information into a single console.
- can help view the discovered servers, group them into applications, and then track the migration status of each application.
discovered data for all the regions is stored in the AWS Migration Hub home Region.
The data can be exported for analysis in Microsoft Excel or AWS analysis tools such as Amazon Athena and Amazon QuickSight.

supports Agentless Collector (for VMware environments) and Discovery Agent (for all environments) for performing discovery and collecting data about the on-premises servers.
Note: The Discovery Connector (agentless, vCenter-based) was deprecated on November 17, 2025. The Agentless Collector (supports network connection discovery since November 2024) remains available for existing customers.

AWS Server Migration Service (SMS)

⚠️ DEPRECATED: AWS Server Migration Service was discontinued on March 31, 2022. Use AWS Transform MGN (formerly Application Migration Service) for all lift-and-shift migrations.

was an agentless service that made it easier and faster to migrate thousands of on-premises workloads to AWS.
helped automate, schedule, and track incremental replications of live server volumes, making it easier to coordinate large-scale server migrations.
supported migration of virtual machines from VMware vSphere, Windows Hyper-V and Azure VM to AWS.

replicated each server volume, which was saved as a new AMI, which could be launched as an EC2 instance.
was a significant enhancement of EC2 VM Import/Export service.
was used to Re-host.
Migration Path: Use AWS Transform MGN, which supports physical, virtual, and cloud servers with continuous block-level replication and non-disruptive testing.

AWS Database Migration Service (DMS)

helps migrate databases to AWS quickly and securely.
source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.

supports homogeneous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle or Microsoft SQL Server to Amazon Aurora.
monitors for replication tasks, network or host failures, and automatically provisions a host replacement in case of failures that can’t be repaired
supports both one-time data migration into RDS and EC2-based databases as well as for continuous data replication

supports continuous replication of the data with high availability and consolidate databases into a petabyte-scale data warehouse by streaming data to Amazon Redshift and Amazon S3
provides free AWS Schema Conversion Tool (SCT) that automates the conversion of Oracle PL/SQL and SQL Server T-SQL code to equivalent code in the Amazon Aurora / MySQL dialect of SQL or the equivalent PL/pgSQL code in PostgreSQL
AWS DMS Serverless (launched June 2023)
- automatically provisions, scales, and manages migration resources without infrastructure management.
- removes the need for capacity estimation, provisioning, cost-optimization, and version/patch management.
- supports automatic storage scaling beyond the default 100GB limit for large transaction volumes.
- supports S3 source endpoints for migrating CSV or Parquet data.
- supports homogeneous migrations via CLI, SDK, and API with fully automated replication (October 2024).
- supports premigration assessments to identify potential issues before migration.
Note: AWS DMS Fleet Advisor reaches end of support on May 20, 2026.

AWS EC2 VM Import/Export

allows easy import of virtual machine images from existing environment to EC2 instances and export them back to on-premises environment
allows leveraging of existing investments in the virtual machines, built to meet compliance requirements, configuration management and IT security by bringing those virtual machines into EC2 as ready-to-use instances
Common usages include
- Migrate Existing Applications and Workloads to EC2, allowing preserving of the software and settings configured in the existing VMs.
- Copy Your VM Image Catalog to EC2
- Create a Disaster Recovery Repository for your VM images
Note: For server migrations, AWS Transform MGN is the recommended service as it provides continuous replication, non-disruptive testing, and automated cutover. VM Import/Export remains available for specific image import/export use cases.

Data Transfer Services

VPN

connection utilizes IPSec to establish encrypted network connectivity between on-premises network and VPC over the Internet.

connections can be configured in minutes and a good solution for an immediate need, have low to modest bandwidth requirements, and can tolerate the inherent variability in Internet-based connectivity.
still requires internet and be configured using VGW and CGW

AWS Direct Connect

provides a dedicated physical connection between the corporate network and AWS Direct Connect location with no data transfer over the Internet.

helps bypass Internet service providers (ISPs) in the network path
helps reduce network costs, increase bandwidth throughput, and provide a more consistent network experience than with Internet-based connection
takes time to setup and involves third parties
are not redundant and would need another direct connect connection or a VPN connection

Security
- provides a dedicated physical connection without internet
- For additional security can be used with VPN
- Supports MACsec (IEEE 802.1AE) encryption on dedicated connections and supported partner interconnects for Layer 2 encryption.
Recent Updates:
- Native 400 Gbps Dedicated Connections available at select locations (July 2024).
- Direct Connect gateway can now associate directly with AWS Cloud WAN core network without intermediate Transit Gateway (November 2024).
- 4-byte Autonomous System (AS) number support for virtual interfaces (September 2025).

AWS Interconnect (NEW – GA April 2026)

is a managed connectivity service that simplifies connectivity into AWS, launched as GA in April 2026.
enables customers to establish private, high-speed network connections with dedicated bandwidth to and from AWS across hybrid and multicloud environments.
AWS Interconnect – Last Mile
- automates the end-to-end process of establishing private, resilient connectivity between customer on-premises locations and AWS.
- customers select their location, preferred AWS Region, and bandwidth speed—everything else is automated.
- automates complex network configuration including BGP peering, VLAN configuration, and ASN assignment.
- supports dynamic bandwidth scaling from 1 Gbps to 100 Gbps through the AWS console with zero downtime maintenance.
AWS Interconnect – Multicloud
- enables private, secure connectivity between AWS VPCs and other cloud environments (e.g., Google Cloud).
- uses pre-built capacity pools between AWS and partner cloud providers, eliminating physical cross-connect management.
- connection can be established in minutes through a simple two-step creation and approval process.
simplifies what previously required Direct Connect setup with third-party coordination.

AWS Snow Family

⚠️ Availability Changes:

Snowmobile – Retired (March 2024).
Snowcone (HDD and SSD) – Discontinued (November 2024).
Previous-gen Snowball Edge devices (Storage Optimized 80TB, Compute Optimized 52 vCPU, Compute Optimized GPU) – Discontinued (November 2024).
Snowball Edge (latest generation) – Available to existing customers only (November 2025). New customers should use AWS DataSync for online transfers or AWS Data Transfer Terminal for physical transfers.

AWS Snowball Edge (latest generation)
- is a petabyte-scale data transfer service built around a secure device that moves data into and out of the AWS Cloud quickly and efficiently.
- transfers the data to S3 bucket.
- transfer times are about a week from start to finish.
- commonly used to ship terabytes or petabytes of analytics data, healthcare and life sciences data, video libraries, image repositories, backups, and archives as part of data center shutdown, tape replacement, or application migration projects.
- contains embedded computing platform that helps perform simple processing tasks.
- can be rack shelved and may also be clustered together, making it simpler to collect and store data in extremely remote locations.
- commonly used in environments with intermittent connectivity (such as manufacturing, industrial, and transportation); or in extremely remote locations (such as military or maritime operations) before shipping them back to AWS data centers.
- delivers serverless computing applications at the network edge using AWS Greengrass and Lambda functions.
- Only available to existing customers as of November 7, 2025.
AWS Snowmobile (RETIRED)
- Retired in March 2024. AWS no longer offers this service.
- Previously moved up to 100PB of data in a 45-foot long ruggedized shipping container.
- Was ideal for multi-petabyte or Exabyte-scale digital media migrations and datacenter shutdowns.
- Alternatives: For large-scale transfers, use AWS Data Transfer Terminal or multiple Snowball Edge devices (existing customers), or AWS DataSync for online transfers.

AWS Import/Export (Legacy – Upgraded to Snowball)

accelerated moving large amounts of data into and out of AWS using secure Snowball appliances
AWS transferred the data directly onto and off of the storage devices using Amazon’s high-speed internal network, bypassing the Internet

Data Migration
- for significant data size, AWS Import/Export was faster than Internet transfer and more cost-effective than upgrading the connectivity
- if loading the data over the Internet would take a week or more, AWS Import/Export should be considered
- data from appliances could be imported to S3, Glacier and EBS volumes and exported from S3
- not suitable for applications that cannot tolerate offline transfer time
Security
- Snowball uses an industry-standard Trusted Platform Module (TPM) that has a dedicated processor designed to detect any unauthorized modifications to the hardware, firmware, or software to physically secure the AWS Snowball device.
Note: With Snow Family availability changes, new customers should use AWS DataSync or AWS Data Transfer Terminal.

AWS DataSync (Recommended for Online Transfers)

is an online data movement service that simplifies and accelerates data migrations to AWS.

moves data quickly and securely between on-premises storage, edge locations, other cloud providers, and AWS Storage.
automates scheduling, monitoring, encryption, and end-to-end data validation.
recommended replacement for AWS Snow Family for new customers needing online data transfer.
Key Features:
- Transfers file and object data between storage services.
- Supports on-premises NFS, SMB, HDFS, self-managed object storage, AWS S3, EFS, FSx, and more.
- Automatic encryption in-flight and end-to-end data integrity validation.
- DataSync Discovery – Provides visibility into on-premises storage performance and utilization with migration recommendations.
- Enhanced Mode (May 2025) – Supports cross-cloud transfers without requiring a DataSync agent, with higher performance and scalability.

Use Cases:
- Online data migration to AWS Storage services.
- Ongoing data replication between on-premises and cloud.
- Cross-cloud data movement (AWS to/from other cloud providers).
- Large-scale data migrations with automated scheduling.

AWS Data Transfer Terminal (NEW – December 2024)

are physical locations around the world where customers bring data storage devices and connect them to the AWS network for high-speed, secure data transfer.
recommended replacement for AWS Snow Family for new customers needing physical data transfer.

provides a secure, upload-ready, physical location—customers bring their own storage devices.
enables upload to any AWS endpoint including Amazon S3, Amazon EFS, or others using a high-throughput connection.
suited for data transfer or migration use cases where large amounts of data need to be transferred quickly.

customers can also bring Snowball Edge devices to these locations for upload.
Key Differences from Snow Family:
- Customer brings their own storage devices (no AWS-provided appliance).
- No shipping required—customer physically visits the terminal.
- Direct connection to AWS high-speed network at the terminal location.
- On-demand access without device ordering lead times.

AWS Storage Gateway

connects an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization’s on-premises IT environment and the AWS storage infrastructure
provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of the data encrypted in S3 or Glacier.
for disaster recovery scenarios, Storage Gateway, together with EC2, can serve as a cloud-hosted solution that mirrors the entire production environment

Gateway Types:
- S3 File Gateway – NFS/SMB access to S3 objects.
- FSx File Gateway – Local cache for Windows-based file shares on FSx for Windows File Server. (No longer accepting new customers as of October 2024.)
- Volume Gateway (Cached) – S3 holds primary data, frequently accessed data cached locally.
- Volume Gateway (Stored) – Entire data stored locally, asynchronously backed up to S3.
- Tape Gateway – iSCSI-based virtual tape library (VTL) for offline data archiving.
Security
- Encrypts all data in transit to and from AWS by using SSL/TLS.
- All data in AWS Storage Gateway is encrypted at rest using AES-256.
- Authentication between the gateway and iSCSI initiators can be secured by using Challenge-Handshake Authentication Protocol (CHAP).

Recent Updates:
- Migrating from Amazon Linux 2 to AL2023 (required before June 30, 2026 AL2 EOL).
- IPv6 support for Storage Gateway endpoints, APIs, and appliance interfaces (September 2025).
- Terraform modules support AL2023 with Elastic IP association for private activations (March 2026).

Simple Storage Service – S3

Data Transfer
- Files up to 5GB can be transferred using single operation
- Multipart uploads can be used to upload files up to 5 TB and speed up data uploads by dividing the file into multiple parts
- transfer rate still limited by the network speed
- S3 Transfer Acceleration uses CloudFront edge locations to accelerate uploads over long distances.
Security
- Data in transit can be secured by using SSL/TLS or client-side encryption.
- Encrypt data at-rest by performing server-side encryption using Amazon S3-Managed Keys (SSE-S3), AWS Key Management Service (KMS)-Managed Keys (SSE-KMS), or Customer Provided Keys (SSE-C). Or by performing client-side encryption using AWS KMS–Managed Customer Master Key (CMK) or Client-Side Master Key.
- Note: SSE-S3 is now applied by default to all new objects (January 2023).

AWS Migration Strategy Summary

Use Case	Recommended Service (2025+)	Previous Service
Migration planning & discovery	AWS Transform	Migration Hub + Application Discovery Service
Lift-and-shift server migration	AWS Transform MGN	SMS → Application Migration Service
Database migration	AWS DMS / DMS Serverless	AWS DMS
Online data transfer	AWS DataSync	Snow Family / Storage Gateway
Physical bulk data transfer	AWS Data Transfer Terminal	Snow Family (Snowball/Snowmobile)
Private network connectivity	AWS Direct Connect / AWS Interconnect	AWS Direct Connect
Hybrid storage	AWS Storage Gateway	AWS Storage Gateway
VM image import	VM Import/Export	VM Import/Export

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your must architect the migration of a web application to AWS. The application consists of Linux web servers running a custom web server. You are required to save the logs generated from the application to a durable location. What options could you select to migrate the application to AWS? (Choose 2)
1. Create an AWS Elastic Beanstalk application using the custom web server platform. Specify the web server executable and the application project and source files. Enable log file rotation to Amazon Simple Storage Service (S3). (EB does not work with Custom server executable)
2. Create Dockerfile for the application. Create an AWS OpsWorks stack consisting of a custom layer. Create custom recipes to install Docker and to deploy your Docker container using the Dockerfile. Create custom recipes to install and configure the application to publish the logs to Amazon CloudWatch Logs (OpsWorks Stacks is now deprecated (EOL May 2024). Also, the last sentence mentions configure the application to push the logs to S3, which would need changes to application as it needs to use SDK or CLI)
3. Create Dockerfile for the application. Create an AWS OpsWorks stack consisting of a Docker layer that uses the Dockerfile. Create custom recipes to install and configure Amazon Kinesis to publish the logs into Amazon CloudWatch. (Kinesis not needed, OpsWorks deprecated)
4. Create a Dockerfile for the application. Create an AWS Elastic Beanstalk application using the Docker platform and the Dockerfile. Enable logging the Docker configuration to automatically publish the application logs. Enable log file rotation to Amazon S3. (Use Docker configuration with awslogs and EB with Docker)
5. Use VM import/Export to import a virtual machine image of the server into AWS as an AMI. Create an Amazon Elastic Compute Cloud (EC2) instance from AMI, and install and configure the Amazon CloudWatch Logs agent. Create a new AMI from the instance. Create an AWS Elastic Beanstalk application using the AMI platform and the new AMI. (Use VM Import/Export to create AMI and CloudWatch logs agent to log)
Your company hosts an on-premises legacy engineering application with 900GB of data shared via a central file server. The engineering data consists of thousands of individual files ranging in size from megabytes to multiple gigabytes. Engineers typically modify 5-10 percent of the files a day. Your CTO would like to migrate this application to AWS, but only if the application can be migrated over the weekend to minimize user downtime. You calculate that it will take a minimum of 48 hours to transfer 900GB of data using your company’s existing 45-Mbps Internet connection. After replicating the application’s environment in AWS, which option will allow you to move the application’s data to AWS without losing any data and within the given timeframe?
1. Copy the data to Amazon S3 using multiple threads and multi-part upload for large files over the weekend, and work in parallel with your developers to reconfigure the replicated application environment to leverage Amazon S3 to serve the engineering files. (Still limited by 45 Mbps speed with minimum 48 hours when utilized to max)
2. Sync the application data to Amazon S3 starting a week before the migration, on Friday morning perform a final sync, and copy the entire data set to your AWS file server after the sync completes. (Works best as the data changes can be propagated over the week and are fractional and downtime would be known. Note: AWS DataSync would be ideal for this use case today.)
3. Copy the application data to a 1-TB USB drive on Friday and immediately send overnight, with Saturday delivery, the USB drive to AWS Import/Export to be imported as an EBS volume, mount the resulting EBS volume to your AWS file server on Sunday. (Downtime is not known when the data upload would be done, although Amazon says the same day the package is received)
4. Leverage the AWS Storage Gateway to create a Gateway-Stored volume. On Friday copy the application data to the Storage Gateway volume. After the data has been copied, perform a snapshot of the volume and restore the volume as an EBS volume to be attached to your AWS file server on Sunday. (Still uses the internet)
You are tasked with moving a legacy application from a virtual machine running inside your datacenter to an Amazon VPC. Unfortunately this app requires access to a number of on-premises services and no one who configured the app still works for your company. Even worse there’s no documentation for it. What will allow the application running inside the VPC to reach back and access its internal dependencies without being reconfigured? (Choose 3 answers)
1. An AWS Direct Connect link between the VPC and the network housing the internal services
2. An Internet Gateway to allow a VPN connection. (Virtual and Customer gateway is needed)
3. An Elastic IP address on the VPC instance
4. An IP address space that does not conflict with the one on-premises
5. Entries in Amazon Route 53 that allow the Instance to resolve its dependencies’ IP addresses
6. A VM Import of the current virtual machine
An enterprise runs 103 line-of-business applications on virtual machines in an on-premises data center. Many of the applications are simple PHP, Java, or Ruby web applications, are no longer actively developed, and serve little traffic. Which approach should be used to migrate these applications to AWS with the LOWEST infrastructure costs?
1. Deploy the applications to single-instance AWS Elastic Beanstalk environments without a load balancer.
2. Use AWS SMS to create AMIs for each virtual machine and run them in Amazon EC2. (Note: AWS SMS is deprecated. AWS Transform MGN would be the equivalent today.)
3. Convert each application to a Docker image and deploy to a small Amazon ECS cluster behind an Application Load Balancer.
4. Use VM Import/Export to create AMIs for each virtual machine and run them in single-instance AWS Elastic Beanstalk environments by configuring a custom image.
[NEW] A company needs to migrate 500 VMware virtual machines to AWS with minimal downtime. The company wants automated dependency mapping, wave planning, and network conversion. Which service should they use?
1. AWS Server Migration Service
2. AWS Migration Hub with Application Migration Service
3. AWS Transform for VMware (AWS Transform for VMware provides automated dependency mapping, wave planning, and network configuration conversions using agentic AI.)
4. VM Import/Export with CloudFormation
[NEW] A company needs to transfer 50TB of data to AWS S3 as quickly as possible. They are a new AWS customer. Which combination of services should they consider? (Choose 2)
1. AWS Snowball Edge (Not available to new customers since November 2025)
2. AWS Data Transfer Terminal (Physical location for high-speed upload using customer’s own devices. Available to new customers.)
3. AWS DataSync (Online data transfer with automated scheduling, encryption, and validation.)
4. AWS Snowmobile (Retired in March 2024)
[NEW] A company wants to establish private connectivity between their AWS VPCs and Google Cloud environment without managing physical cross-connects. Which service should they use?
1. AWS Direct Connect with VPN overlay
2. AWS Site-to-Site VPN
3. AWS Interconnect – Multicloud (Provides pre-built capacity pools between AWS and partner cloud providers, eliminating physical cross-connect management. GA April 2026.)
4. AWS Transit Gateway with peering
[NEW] A company wants to migrate databases to AWS with minimal infrastructure management. They need automatic scaling and don’t want to manage replication instances. Which service option should they use?
1. AWS DMS with provisioned replication instances
2. AWS DMS Serverless (Automatically provisions, scales, and manages migration resources. Supports automatic storage scaling and premigration assessments.)
3. AWS SCT with manual migration
4. AWS Glue ETL jobs

References

AWS Storage Options – S3 & Glacier

May 23, 2016 ~ Last updated on : June 24, 2026 ~ jayendrapatil ~ 30 Comments

📋 Post Updated: June 2026

This post has been updated to reflect the current AWS S3 storage classes (8 classes as of 2025), the deprecation of standalone Amazon Glacier vaults, S3 Glacier storage class renaming, removal of S3 Reduced Redundancy Storage (RRS) recommendation, and new S3 capabilities including S3 Tables, S3 Vectors, and S3 Express One Zone.

Amazon S3

highly-scalable, reliable, and low-latency data storage infrastructure at very low costs.
provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from within Amazon EC2 or from anywhere on the web.

allows you to write, read, and delete objects containing from 1 byte to 5 terabytes of data each.
number of objects you can store in an Amazon S3 bucket is virtually unlimited.
highly secure, supporting encryption at rest and in transit, and providing multiple mechanisms to provide fine-grained control of access to Amazon S3 resources.

as of January 5, 2023, all new objects are automatically encrypted with SSE-S3 (server-side encryption with S3 managed keys) at no additional cost.
highly scalable, allowing concurrent read or write access to Amazon S3 data by many separate clients or application threads.
provides data lifecycle management capabilities, allowing users to define rules to automatically transition data between storage classes (including S3 Glacier classes) or delete data at end of life.

stores data redundantly across a minimum of 3 Availability Zones by default (except One Zone classes), providing built-in resilience against widespread disaster.

S3 Storage Classes

Amazon S3 offers 8 storage classes designed for different access patterns and cost requirements:

S3 Standard – General-purpose storage for frequently accessed data. High throughput and low latency.

S3 Intelligent-Tiering – Automatic cost optimization by moving data between access tiers (Frequent, Infrequent, Archive Instant Access) based on changing access patterns, with no retrieval charges or operational overhead.
S3 Standard-Infrequent Access (S3 Standard-IA) – For data accessed less frequently but requiring rapid access when needed. Lower storage cost with per-GB retrieval charge.
S3 One Zone-Infrequent Access (S3 One Zone-IA) – Lower-cost option for infrequently accessed data that does not require multi-AZ resilience. Replaces the legacy Reduced Redundancy Storage (RRS).

S3 Express One Zone – Single-digit millisecond data access with up to 10x faster performance and 80% lower request costs than S3 Standard. Data stored in a single Availability Zone. Ideal for latency-sensitive applications like ML training and analytics.
S3 Glacier Instant Retrieval – Lowest-cost storage for long-lived data rarely accessed (once per quarter) that requires millisecond retrieval. 68% lower cost than S3 Standard-IA.
S3 Glacier Flexible Retrieval (formerly S3 Glacier) – For archive data accessed once or twice per year. Retrieval options: Expedited (1-5 minutes), Standard (3-5 hours), or free Bulk (5-12 hours). Minimum 90-day storage.

S3 Glacier Deep Archive – Lowest-cost storage class for long-term archive and digital preservation. Retrieval: Standard (within 12 hours) or Bulk (within 48 hours). Minimum 180-day storage.

Ideal Use Cases

Storage & Distribution of static web content and media
- frequently used to host static websites and provides a highly-available and highly-scalable solution for websites with only static content, including HTML files, images, videos, and client-side scripts such as JavaScript
- works well for fast growing websites hosting data intensive, user-generated content, such as video and photo sharing sites as no storage provisioning is required
- content can either be directly served from Amazon S3 since each object in Amazon S3 has a unique HTTP URL address
- can also act as an Origin store for the Content Delivery Network (CDN) such as Amazon CloudFront
- it works particularly well for hosting web content with extremely spiky bandwidth demands because of S3’s elasticity
Data Store for Large Objects
- can be paired with RDS or NoSQL database and used to store large objects for e.g. file or objects, while the associated metadata for e.g. name, tags, comments etc. can be stored in RDS or NoSQL database where it can be indexed and queried providing faster access to relevant data

Data store for computation and large-scale analytics
- commonly used as a data store for computation and large-scale analytics, such as analyzing financial transactions, clickstream analytics, and media transcoding.
- data can be accessed from multiple computing nodes concurrently without being constrained by a single connection because of its horizontal scalability
- S3 Tables (launched Dec 2024) provides storage optimized for tabular data in Apache Iceberg format, with up to 3x faster query throughput for analytics workloads
Backup and Archival of critical data
- used as a highly durable, scalable, and secure solution for backup and archival of critical data, and to provide disaster recovery solutions for business continuity.
- stores objects redundantly on multiple devices across multiple facilities, it provides the highly-durable storage infrastructure needed for these scenarios.
- it’s versioning capability is available to protect critical data from inadvertent deletion
AI and Machine Learning
- S3 Vectors (GA Dec 2025) provides native vector storage with subsecond query performance for AI embeddings, reducing costs up to 90% compared to dedicated vector databases
- integrated with Amazon Bedrock Knowledge Bases for retrieval augmented generation (RAG) workloads
Data Lakes
- S3 serves as the foundation for building data lakes, with native integration with analytics services like Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum
- Mountpoint for Amazon S3 (GA Aug 2023) allows mounting S3 buckets as local file systems on Linux compute instances for high-throughput workloads

Anti-Patterns

Amazon S3 has following Anti-Patterns where it is not an optimal solution

Dynamic website hosting
- While Amazon S3 is ideal for hosting static websites, dynamic websites requiring server side interaction, scripting or database interaction cannot be hosted and should rather be hosted on Amazon EC2 or AWS Lambda with API Gateway
Rapidly Changing Data
- Data that needs to updated frequently might be better served by a storage solution with lower read/write latencies, such as Amazon EBS volumes, RDS, or DynamoDB.
File System Requirements
- Amazon S3 uses a flat namespace and isn’t meant to serve as a standalone, POSIX-compliant file system. However, by using delimiters (commonly the ‘/’ character) you can emulate hierarchical folder structures within a bucket.
- NOTE: Mountpoint for Amazon S3 provides file system access for read-heavy workloads, but is not a full POSIX file system. For full POSIX compliance, consider Amazon EFS or Amazon FSx.

Performance

Access to Amazon S3 from within Amazon EC2 in the same region is fast.
Amazon S3 is designed so that server-side latencies are insignificant relative to Internet latencies.

Amazon S3 automatically scales to high request rates — your application can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per partitioned prefix in a bucket. There are no limits to the number of prefixes in a bucket.
If Amazon S3 is accessed using multiple threads, multiple applications, or multiple clients concurrently, total Amazon S3 aggregate throughput will typically scale to rates that far exceed what any single server can generate or consume.
S3 Express One Zone provides single-digit millisecond latency and up to 10x faster performance than S3 Standard for latency-sensitive workloads.

S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket using CloudFront’s globally distributed edge locations.

Durability & Availability

Amazon S3 storage provides the highest level of data durability and availability, by automatically and synchronously storing your data across a minimum of three Availability Zones within the selected geographical region
Amazon S3 is designed to sustain the concurrent loss of data in two facilities, making it very well-suited to serve as the primary data storage for mission-critical data.

Amazon S3 is designed for 99.999999999% (11 nines) durability per object and 99.99% availability over a one-year period.
Amazon S3 data can be protected from unintended deletions or overwrites using Versioning.
Versioning can be enabled with MFA (Multi Factor Authentication) Delete on the bucket, which would require two forms of authentication to delete an object

S3 Object Lock provides write-once-read-many (WORM) protection to prevent objects from being deleted or overwritten for a fixed period or indefinitely (Governance or Compliance mode).
~~For Non Critical and Reproducible data, S3 Reduced Redundancy Storage (RRS) was previously available but is no longer recommended.~~ Use S3 One Zone-IA instead for non-critical, reproducible data at lower cost with 99.5% availability.

Cost Model

With Amazon S3, you pay only for what you use and there is no minimum fee.

Amazon S3 pricing components include: storage (per GB per month, varies by storage class), data transfer out (per GB per month), requests and data retrievals (per n thousand requests per month), and optional management/analytics features.
S3 Intelligent-Tiering has a small monthly monitoring and automation charge per object but no retrieval fees, making it ideal for data with unknown or changing access patterns.

Scalability & Elasticity

Amazon S3 has been designed to offer a very high level of scalability and elasticity automatically

Amazon S3 supports a virtually unlimited number of files in any bucket
Amazon S3 bucket can store a virtually unlimited number of bytes
Amazon S3 allows you to store any number of objects (files) in a single bucket, and Amazon S3 will automatically manage scaling and distributing redundant copies of your information across multiple AZs in the same region, all using Amazon’s high-performance infrastructure.

Security & Access Management

Default Encryption: Since January 5, 2023, all new objects are automatically encrypted with SSE-S3. Options include SSE-S3, SSE-KMS (AWS KMS keys), SSE-C (customer-provided keys), and client-side encryption.
SSE-C Disabled by Default: As of April 2026, SSE-C is disabled by default on all new S3 general purpose buckets for improved security.
S3 Access Points: Simplify managing data access at scale by creating named access points with distinct permissions and network controls for different applications or teams.

S3 Block Public Access: Bucket-level and account-level settings to prevent public access.
Bucket Policies & ACLs: Fine-grained access control using IAM policies, bucket policies, and (legacy) Access Control Lists.
VPC Endpoints: Access S3 privately from within a VPC without traversing the public internet.

Interfaces

Amazon S3 provides standards-based REST APIs for both management and data operations.
NOTE – SOAP support over HTTP was deprecated. New Amazon S3 features are not supported for SOAP. Use the REST API or the AWS SDKs.
Amazon S3 provides SDKs in multiple languages (Java, Python, .NET, Go, JavaScript/TypeScript, PHP, Ruby, and more) that wrap the underlying APIs

AWS CLI provides high-level S3 file commands (ls, cp, mv, sync, etc.) with support for parallel transfers and recursive operations.
AWS Management Console provides a web-based interface for managing S3 buckets and objects
Mountpoint for Amazon S3 – open-source file client that mounts S3 buckets as local file systems on Linux, optimized for high-throughput read-heavy workloads (GA August 2023).

All interfaces provide the ability to store Amazon S3 objects in uniquely-named buckets, with each object identified by a unique Object key within that bucket.

S3 Data Query & Analytics

Amazon Athena – Serverless query service to analyze data in S3 using standard SQL without loading data into a database.
S3 Tables (Dec 2024) – Fully managed Apache Iceberg tables optimized for analytics, with up to 3x faster query throughput. Supports Intelligent-Tiering and replication.

S3 Vectors (GA Dec 2025) – Native vector storage and query for AI embeddings with subsecond performance, up to 2 billion vectors per index.
S3 Storage Lens – Cloud storage analytics providing organization-wide visibility into object storage usage, activity, and cost optimization recommendations.
~~S3 Select~~ – Closed to new customers as of July 25, 2024. Use Amazon Athena, S3 Object Lambda, or client-side filtering as alternatives.

Amazon S3 Glacier

⚠️ Standalone Amazon Glacier Vaults – No Longer Available to New Customers

As of December 15, 2025, the original standalone vault-based Amazon Glacier service stopped accepting new customers. Existing customers can continue using it, but no migration is required.

Recommendation: Use the S3 Glacier storage classes (Instant Retrieval, Flexible Retrieval, Deep Archive) which are fully integrated with Amazon S3 and provide the same low-cost archival storage with better management capabilities.

AWS provides a Data Transfer from Amazon S3 Glacier Vaults to Amazon S3 guidance for migrating existing vault data to S3 buckets.

Amazon S3 Glacier storage classes provide extremely low-cost storage for data archival and long-term backup:

S3 Glacier Instant Retrieval – Millisecond access for archive data accessed once per quarter. Up to 68% lower cost than S3 Standard-IA. Minimum 90-day storage.
S3 Glacier Flexible Retrieval (formerly S3 Glacier) – For archive data accessed once or twice per year. Retrieval options:
- Expedited: 1-5 minutes
- Standard: 3-5 hours
- Bulk: 5-12 hours (free)
Minimum 90-day storage duration.

S3 Glacier Deep Archive – Lowest-cost storage for data retained for 7-10+ years. Retrieval options:
- Standard: Within 12 hours
- Bulk: Within 48 hours
Minimum 180-day storage duration.

Ideal Usage Patterns

Amazon S3 Glacier classes are ideally suited for long-term archival storage for infrequently accessed data including:
- Offsite enterprise information archiving
- Media asset preservation
- Research and scientific data retention
- Digital preservation and magnetic tape replacement
- Regulatory and compliance archives
- Healthcare records, financial records retention
S3 Glacier Instant Retrieval is ideal for data like medical images, news media assets, or user-generated content archives that need millisecond access but are rarely retrieved.

Anti-Patterns

Amazon S3 Glacier storage classes have following Anti-Patterns where they are not an optimal solution

Rapidly changing data
- Data that must be updated very frequently should use a storage solution with lower read/write latencies such as Amazon EBS, DynamoDB, or S3 Standard

Real time access (Flexible Retrieval and Deep Archive)
- Data stored in Glacier Flexible Retrieval or Deep Archive cannot be accessed in real time and requires a restore request with retrieval times from minutes to hours. If immediate access is needed, use S3 Standard, S3 Glacier Instant Retrieval, or S3 Intelligent-Tiering.
Short-lived data
- Glacier classes have minimum storage duration charges (90 days for Instant/Flexible, 180 days for Deep Archive). Data deleted before the minimum is charged for the remainder.

Performance

S3 Glacier Instant Retrieval: Millisecond access time, same performance as S3 Standard-IA.
S3 Glacier Flexible Retrieval: Expedited (1-5 min), Standard (3-5 hours), Bulk (5-12 hours, free).

S3 Glacier Deep Archive: Standard (within 12 hours), Bulk (within 48 hours).

Durability and Availability

All S3 Glacier storage classes redundantly store data across a minimum of three Availability Zones
Designed to provide 99.999999999% (11 nines) durability per object
Data is synchronously stored across multiple facilities before returning SUCCESS on upload.

Regular, systematic data integrity checks are performed and the system is built to be automatically self-healing.

Cost Model

S3 Glacier pricing components include: storage (per GB per month), data transfer out (per GB per month), requests (per thousand requests per month), and data retrievals (per GB retrieved).
S3 Glacier Flexible Retrieval Bulk retrievals are free.
Early deletion charges apply if objects are deleted before the minimum storage duration (90 days for Instant/Flexible, 180 days for Deep Archive).

S3 Glacier Deep Archive offers storage starting at approximately $0.00099 per GB per month (lowest cost in the cloud).

Scalability & Elasticity

Individual objects can be up to 5 TB in size.
There is no limit to the total amount of data stored — Amazon S3 Glacier scales automatically from gigabytes to petabytes.

Interfaces & Lifecycle Integration

S3 Glacier storage classes are fully managed through the Amazon S3 APIs and console — objects are transitioned to Glacier classes via S3 Lifecycle policies or direct PUT with storage class specification.
S3 Lifecycle policies can automatically transition objects from S3 Standard → S3 Standard-IA → S3 Glacier Instant Retrieval → S3 Glacier Flexible Retrieval → S3 Glacier Deep Archive based on age.
Restoring objects from Glacier Flexible Retrieval or Deep Archive creates a temporary copy in S3 Standard for a specified retention period; the archived object remains in Glacier.

S3 Batch Operations can restore archived objects at scale across millions of objects.
Objects in S3 Glacier classes are managed through S3 APIs — they appear in S3 bucket listings and can be managed with standard S3 tools.
For data migration into AWS at scale, use the AWS Snow Family (Snowball Edge, Snowcone) for physical data transport. ~~AWS Import/Export~~ (legacy disk-based service) has been replaced by the Snow Family.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

You want to pass queue messages that are 1GB each. How should you achieve this?
1. Use Kinesis as a buffer stream for message bodies. Store the checkpoint id for the placement in the Kinesis Stream in SQS.
2. Use the Amazon SQS Extended Client Library for Java and Amazon S3 as a storage mechanism for message bodies. (Amazon SQS messages with Amazon S3 can be useful for storing and retrieving messages with a message size of up to 2 GB. To manage Amazon SQS messages with Amazon S3, use the Amazon SQS Extended Client Library for Java. Refer link)
3. Use SQS’s support for message partitioning and multi-part uploads on Amazon S3.
4. Use AWS EFS as a shared pool storage medium. Store filesystem pointers to the files on disk in the SQS message bodies.
Company ABCD has recently launched an online commerce site for bicycles on AWS. They have a “Product” DynamoDB table that stores details for each bicycle, such as, manufacturer, color, price, quantity and size to display in the online store. Due to customer demand, they want to include an image for each bicycle along with the existing details. Which approach below provides the least impact to provisioned throughput on the “Product” table?
1. Serialize the image and store it in multiple DynamoDB tables
2. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
3. Add an image data type to the “Product” table to store the images in binary format
4. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image
A company has 500 TB of archival data that must be retained for 10 years for regulatory compliance. The data is rarely accessed but must be retrievable within 12 hours when needed. Which S3 storage class is the MOST cost-effective?
1. S3 Standard-IA
2. S3 Glacier Instant Retrieval
3. S3 Glacier Flexible Retrieval
4. S3 Glacier Deep Archive (For data retained 7-10+ years with retrieval within 12 hours, Deep Archive provides the lowest cost at approximately $0.00099/GB/month with Standard retrieval within 12 hours.)

A media company stores user-uploaded photos that are frequently accessed for the first 30 days, occasionally accessed for the next 90 days, and rarely accessed after that. They want to minimize storage costs without operational overhead. Which solution is MOST appropriate?
1. Store in S3 Standard and create lifecycle rules to transition to S3 Standard-IA after 30 days and S3 Glacier Flexible Retrieval after 120 days
2. Store in S3 Intelligent-Tiering which automatically moves objects between Frequent, Infrequent, and Archive Instant Access tiers based on access patterns (S3 Intelligent-Tiering eliminates operational overhead by automatically optimizing costs based on changing access patterns with no retrieval charges.)
3. Store in S3 One Zone-IA with lifecycle rules
4. Store in S3 Standard and manually move objects between storage classes
An organization needs to query CSV data stored in S3 without provisioning any infrastructure. The data is several terabytes and they need to run ad-hoc SQL queries. Which AWS service should they use?
1. Amazon RDS
2. Amazon Redshift
3. Amazon Athena (Amazon Athena is a serverless query service that can run SQL queries directly against data in S3 without loading it into a database. It’s ideal for ad-hoc queries on S3 data.)
4. S3 Select

A healthcare company needs to store patient records in S3 that cannot be deleted or modified for 7 years due to compliance regulations. Which S3 feature should they use?
1. S3 Versioning with MFA Delete
2. S3 Bucket Policy denying delete operations
3. S3 Object Lock in Compliance mode with a 7-year retention period (S3 Object Lock in Compliance mode provides WORM protection that cannot be overridden by any user, including the root account, ensuring objects cannot be deleted or overwritten for the retention period.)
4. S3 Glacier Vault Lock
A machine learning team needs to store and query billions of vector embeddings from their AI models with subsecond performance. Which AWS service is purpose-built for this use case?
1. Amazon OpenSearch Service
2. Amazon DynamoDB
3. Amazon S3 with Athena
4. Amazon S3 Vectors (S3 Vectors provides native vector storage and query capabilities with subsecond performance, supporting up to 2 billion vectors per index, purpose-built for AI embedding workloads at S3’s low cost.)

References

AWS S3 Data Durability

March 8, 2016 ~ Last updated on : July 4, 2026 ~ jayendrapatil

AWS S3 Data Durability

Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage.

S3 is designed to provide 99.999999999% (11 nines) durability of objects over a given year.
S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive redundantly store objects on multiple devices across a minimum of three Availability Zones in an AWS Region.

S3 One Zone-IA stores data redundantly across multiple devices within a single Availability Zone. It still offers 11 nines of durability but may be susceptible to data loss in the unlikely case of the loss or damage to all or part of an AWS Availability Zone.
S3 Express One Zone stores data within a single Availability Zone for high-performance, single-digit millisecond latency access. It is designed for 99.95% availability.
To help ensure data durability, Amazon S3 PUT and PUT Object copy operations synchronously store data across multiple facilities before returning SUCCESS.

Once the objects are stored, Amazon S3 maintains their durability by quickly detecting and repairing any lost redundancy.
Amazon S3 regularly verifies the integrity of data stored using checksums and provides auto-healing capability.
S3 is designed to sustain data in the event of the loss of an entire Availability Zone.

S3 Data Integrity Protections

As of December 2024, Amazon S3 provides default data integrity protections for all new object uploads.
AWS SDKs automatically calculate CRC-based checksums (CRC64NVME by default) for uploads as data is transmitted over the network.
S3 independently verifies these checksums and accepts objects only after confirming data integrity was maintained in transit.

If no checksum is provided on upload, S3 automatically calculates and applies a CRC64NVME checksum as default integrity protection.
S3 continually monitors data durability over time with periodic integrity checks of data at rest.

S3 Storage Classes – Durability & Availability Comparison

Storage Class	Durability	Availability	AZs
S3 Standard	99.999999999% (11 nines)	99.99%	≥ 3
S3 Intelligent-Tiering	99.999999999% (11 nines)	99.9%	≥ 3
S3 Express One Zone	99.999999999% (11 nines)	99.95%	1
S3 Standard-IA	99.999999999% (11 nines)	99.9%	≥ 3
S3 One Zone-IA	99.999999999% (11 nines)	99.5%	1
S3 Glacier Instant Retrieval	99.999999999% (11 nines)	99.9%	≥ 3
S3 Glacier Flexible Retrieval	99.999999999% (11 nines)	99.99%	≥ 3
S3 Glacier Deep Archive	99.999999999% (11 nines)	99.99%	≥ 3

Additional Data Protection Features

S3 Versioning – Preserves, retrieves, and restores every version of every object stored in a bucket, allowing easy recovery from unintended user actions and application failures.

S3 Object Lock – Provides Write Once Read Many (WORM) capability, preventing object deletion or overwriting for a specified retention period.
S3 Replication – Enables automatic, asynchronous copying of objects across S3 buckets in same or different AWS Regions for additional redundancy and compliance.
S3 Multi-Region Access Points – Provides a global endpoint to route requests to the nearest replicated bucket, improving availability across regions.

Key Points for Certification Exams

All S3 storage classes are designed for 99.999999999% (11 nines) durability.
S3 Standard stores data across a minimum of 3 AZs – NOT across regions, NOT in a single facility.
S3 One Zone-IA and S3 Express One Zone store data in a single AZ but still provide 11 nines durability.

One Zone classes may lose data if the entire AZ is lost (fire, flood, etc.) – use for re-creatable data only.
S3 provides both durability (data not lost) and availability (data accessible) – these are different metrics.
S3 automatically detects and repairs lost redundancy (auto-healing).

AWS Certification Exam Practice Questions

Question 1:

A customer is leveraging Amazon Simple Storage Service in eu-west-1 to store static content for a web-based property. The customer is storing objects using the Standard Storage class. Where are the customer’s objects replicated?

Single facility in eu-west-1 and a single facility in eu-central-1

Single facility in eu-west-1 and a single facility in us-east-1
Multiple facilities across a minimum of 3 Availability Zones in eu-west-1
A single facility in eu-west-1

Show Answer

Answer: 3

S3 Standard stores objects redundantly across a minimum of three Availability Zones within the same AWS Region. Objects are NOT replicated across regions by default.

Question 2:

A company wants to store infrequently accessed backup data at the lowest possible cost. The data can be re-created if lost. Which S3 storage class should they use?

S3 Standard
S3 Standard-IA
S3 One Zone-IA
S3 Glacier Deep Archive

Show Answer

Answer: 3

S3 One Zone-IA is the best choice for infrequently accessed, re-creatable data as it costs 20% less than S3 Standard-IA. While it stores data in a single AZ (susceptible to AZ-level disasters), it still provides 11 nines durability and the data can be re-created if lost.

Question 3:

What is the designed durability of Amazon S3?

99.99%
99.999%
99.9999999%
99.999999999%

Show Answer

Answer: 4

Amazon S3 is designed for 99.999999999% (11 nines) durability. This applies to all S3 storage classes. Note that durability (data not lost) is different from availability (data accessible when requested).

Question 4:

Which of the following statements about S3 data integrity are correct? (Choose 2)

S3 automatically calculates and verifies checksums for uploaded objects
S3 encrypts data at rest by default using customer-managed keys

S3 regularly performs integrity checks on stored data and automatically repairs any lost redundancy
S3 replicates data across multiple AWS Regions by default

Answer: 1, 3

S3 provides default data integrity protections with automatic CRC-based checksums on upload (since Dec 2024) and performs periodic integrity checks of data at rest with auto-healing. S3 encrypts at rest with SSE-S3 (AWS-managed keys) by default, not customer-managed keys. Cross-region replication must be explicitly configured.

Amazon S3 Vectors & S3 Tables Overview

Amazon S3 Vectors

What is Amazon S3 Vectors?

Key Features

Architecture Components

AWS Service Integrations

S3 Vectors Use Cases

S3 Vectors Pricing (US East – N. Virginia)

Amazon S3 Tables

What are Amazon S3 Tables?

Key Features

How S3 Tables Work

S3 Tables Use Cases

S3 Tables Pricing (US West – Oregon)

Vector Database Comparison: S3 Vectors vs OpenSearch vs Pinecone vs pgvector

When to Use Each Vector Solution

Analytics Storage Comparison: S3 Tables vs Athena vs Redshift

Understanding the Relationship

When to Use Each Analytics Approach

S3 Vectors vs S3 Tables – Quick Comparison

AWS Certification Exam Practice Questions

Question 1

Question 2

Question 3

Question 4

Question 5

Frequently Asked Questions

What is Amazon S3 Vectors?

What is Amazon S3 Tables?

When should I use S3 Vectors vs OpenSearch?

Related Posts

References

🔄 MAJOR UPDATE NOTICE – June 2026

AWS Cloud Migration Services

Application & Database Cloud Migration Services

AWS Transform MGN (formerly AWS Application Migration Service)

AWS Migration Hub (Maintenance Mode)

AWS Application Discovery Service (Maintenance Mode)

AWS Server Migration Service (SMS)

AWS EC2 VM Import/Export

Data Transfer Services

AWS Interconnect (NEW – GA April 2026)

AWS Import/Export (Legacy – Upgraded to Snowball)

AWS DataSync (Recommended for Online Transfers)

AWS Data Transfer Terminal (NEW – December 2024)

AWS Migration Strategy Summary

AWS Certification Exam Practice Questions

References

📋 Post Updated: June 2026

Amazon S3

S3 Storage Classes

Ideal Use Cases

Anti-Patterns

Performance

Durability & Availability

Cost Model

Scalability & Elasticity

Security & Access Management

Interfaces

S3 Data Query & Analytics

Amazon S3 Glacier

⚠️ Standalone Amazon Glacier Vaults – No Longer Available to New Customers

Ideal Usage Patterns

Anti-Patterns

Performance

Durability and Availability

Cost Model

Scalability & Elasticity

Interfaces & Lifecycle Integration

AWS Certification Exam Practice Questions

References

AWS S3 Data Durability

S3 Data Integrity Protections

S3 Storage Classes – Durability & Availability Comparison

Additional Data Protection Features

Key Points for Certification Exams

AWS Certification Exam Practice Questions

References