AWS OpenSearch Service

📌 Service Renamed: Amazon Elasticsearch Service was renamed to Amazon OpenSearch Service on September 8, 2021. OpenSearch is an open-source, community-driven fork of Elasticsearch and Kibana. All existing Elasticsearch domains were automatically upgraded. The service now runs OpenSearch (versions 1.x, 2.x) and continues to support Elasticsearch versions 1.5 to 7.10.

Amazon OpenSearch Service (formerly Elasticsearch Service) is a managed service that makes it easy to deploy, operate, and scale OpenSearch clusters in the AWS Cloud.
OpenSearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, clickstream analytics, vector search for AI/ML, and security analytics (SIEM).

OpenSearch Service provides
- real-time, distributed search and analytics engine
- ability to provision all the resources for the OpenSearch cluster and launch the cluster
- easy to use cluster scaling options. Scaling the OpenSearch Service domain by adding or modifying instances, and storage volumes is an online operation that does not require any downtime.
- self-healing clusters, which automatically detect and replace failed OpenSearch nodes, reducing the overhead associated with self-managed infrastructures
- domain snapshots to back up and restore domains and replicate domains across AZs
- data durability
- enhanced security with IAM access control, fine-grained access control, and security groups
- node monitoring
- multiple configurations of CPU, memory, and storage capacity, known as instance types
- storage volumes for the data using EBS volumes
- Multiple geographical locations for your resources, known as regions and Availability Zones
- ability to span cluster nodes across multiple AZs in the same region, known as zone awareness, for high availability and redundancy. OpenSearch Service automatically distributes the primary and replica shards across instances in different AZs.
- Multi-AZ with Standby deployment option for 99.99% availability SLA with consistent performance for business-critical workloads
- dedicated master nodes to improve cluster stability
- data visualization using OpenSearch Dashboards (successor to Kibana)
- integration with CloudWatch for monitoring OpenSearch domain metrics
- integration with CloudTrail for auditing configuration API calls to OpenSearch domains
- integration with S3, Kinesis, and DynamoDB for loading streaming data
- ability to handle structured and unstructured data
- HTTP Rest APIs
- vector database capabilities for k-nearest neighbor (k-NN) search, enabling AI/ML and RAG workloads
- security analytics for real-time threat detection and SIEM use cases

OpenSearch Domains

OpenSearch Service domains are OpenSearch clusters created using the OpenSearch Service console, CLI, or API.

Each domain is the cluster in the cloud with the specified compute and storage resources.
Enables you to create and delete domains, define infrastructure attributes, and control access and security.
OpenSearch Service automates common administrative tasks, such as performing backups, monitoring instances, and patching software once the domain is running.

OpenSearch Serverless

Amazon OpenSearch Serverless is a serverless deployment option that removes the need to manage clusters, nodes, or capacity.
Automatically provisions, scales, and optimizes infrastructure based on workload demands.
Next-generation architecture (GA May 2026) features:
- Scale to zero — no idle costs when the collection is not in use
- 20x faster autoscaling — provisions in seconds instead of minutes
- Up to 60% lower cost compared to provisioning clusters for peak capacity
- Decoupled storage and compute with usage-based pricing
- Designed for agentic AI workloads with rapid burst capabilities

Supports three collection types: Search, Time series, and Vector search
Vector search collections serve as the backend for RAG (Retrieval-Augmented Generation) applications with Amazon Bedrock Knowledge Bases
Capacity is measured in OpenSearch Compute Units (OCUs)

OpenSearch Optimized Instances (OR1)

OR1 is an OpenSearch-optimized instance family introduced in November 2023.

Uses EBS volumes for primary storage with data copied synchronously to S3 for 11 nines (99.999999999%) of durability.
Delivers up to 30% price-performance improvement over memory-optimized instances.
Ideal for heavy indexing use cases and large-scale log analytics.
Supports multi-tier storage with hot and warm tiers powered by S3-backed storage.

OpenSearch Security

Access to OpenSearch Service management APIs for operations such as creating and scaling domains are controlled with AWS IAM policies.
OpenSearch Service domains can be configured to be accessible with an endpoint within the VPC or a public endpoint accessible to the internet.
Network access for VPC endpoints is controlled by security groups and for public endpoints, access can be granted or restricted by IP address.

OpenSearch Service provides user authentication via IAM and basic authentication using username and password.
Authorization can be granted at the domain level (via Domain Access Policies) as well as at the index, document, and field level (via the fine-grained access control feature).
Fine-grained access control extends OpenSearch Dashboards with read-only views and secure multi-tenant support.

OpenSearch Service supports integration with Amazon Cognito, allowing end-users to log in to OpenSearch Dashboards through enterprise identity providers such as Microsoft Active Directory using SAML 2.0, Cognito User Pools, and more.
OpenSearch Service supports encryption at rest through AWS Key Management Service (KMS), node-to-node encryption over TLS, and the ability to require clients to communicate with HTTPS.
Encryption at rest encrypts shards, log files, swap files, and automated S3 snapshots.

Security Analytics provides built-in SIEM capabilities with pre-packaged detection rules (Sigma format), automated correlation, and real-time alerting for threat detection.

Multi-AZ with Standby

Multi-AZ with Standby is a deployment option that provides 99.99% availability SLA for business-critical workloads.
Distributes data nodes across three AZs with standby nodes that are fully provisioned and ready to take over if an AZ fails.

Provides consistent performance during AZ failures by automatically failing over to standby nodes.
Available for OpenSearch version 1.3 and above in regions with at least three Availability Zones.
Recommended as the default deployment option for production workloads.

Vector Search and AI/ML Integration

OpenSearch Service supports vector database capabilities using k-nearest neighbor (k-NN) search since 2019.

Supports three vector engines: FAISS (Facebook AI Similarity Search), NMSLIB, and Lucene.
Supports both exact nearest-neighbor and approximate nearest-neighbor (ANN) matching.
Integrates with Amazon Bedrock Knowledge Bases as a vector store for RAG applications.
Supports the connector framework to connect to models hosted by various providers (Amazon Bedrock, SageMaker, DeepSeek, OpenAI, Cohere).

Search flow pipelines enable chaining of ML models with search for semantic search, hybrid search, and conversational search.
Vector search with UltraWarm (2025) enables cost-effective storage of vector embeddings in warm storage tier.
Ideal for semantic search, recommendation engines, image similarity, and generative AI applications.

Zero-ETL Integrations and Direct Query

OpenSearch Service supports zero-ETL integrations that eliminate the need to build custom data pipelines:
- DynamoDB zero-ETL — real-time data replication from DynamoDB tables to OpenSearch
- Amazon DocumentDB zero-ETL — sync DocumentDB collections to OpenSearch indexes
- Amazon S3 zero-ETL — query operational logs in S3 data lakes without ingestion
- Amazon Security Lake zero-ETL — security analytics directly on Security Lake data
- Amazon RDS/Aurora integration — replicate relational data to OpenSearch via OpenSearch Ingestion

Direct Query enables querying data in place (S3, CloudWatch Logs, Security Lake, Amazon Managed Service for Prometheus) without building ingestion pipelines using SQL, PPL, or PromQL.
Amazon OpenSearch Ingestion is a fully managed, serverless data pipeline for delivering data to OpenSearch Service domains and collections.

OpenSearch Dashboards

OpenSearch Dashboards (successor to Kibana) is the visualization and UI tool for OpenSearch Service.

Supports visualizations including line charts, bar graphs, pie charts, heatmaps, and more.
OpenSearch UI (launched November 2024) provides a modern analytics experience with natural language querying powered by Amazon Q Developer.
PPL (Piped Processing Language) provides 35+ commands for log analytics, faceted exploration, and deep analysis.

Supports querying across multiple managed clusters, serverless collections, and S3 data sources from a single endpoint.

Storage Tiers

OpenSearch Service provides three storage tiers:
- Hot — high-performance storage for frequently accessed data (EBS or OR1 instances)
- UltraWarm — cost-effective warm storage backed by S3 for less frequently accessed data (read-only, now also supports vector search)
- Cold — lowest-cost storage for infrequently accessed data, detached from compute
Data can be automatically moved between tiers using Index State Management (ISM) policies.

Multi-tier storage with OR1 instances provides a new architecture combining S3 cloud technology with local instance storage.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

You need to perform ad-hoc analysis on log data, including searching quickly for specific error codes and reference numbers. Which should you evaluate first?
1. Amazon OpenSearch Service (OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale OpenSearch clusters in the AWS cloud. OpenSearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analytics. Refer link)
2. Amazon Redshift
3. Amazon EMR
4. Amazon DynamoDB

You are hired as the new head of operations for a SaaS company. Your CTO has asked you to make debugging any part of your entire operation simpler and as fast as possible. She complains that she has no idea what is going on in the complex, service-oriented architecture, because the developers just log to disk, and it’s very hard to find errors in logs on so many services. How can you best meet this requirement and satisfy your CTO?
1. Copy all log files into AWS S3 using a cron job on each instance. Use an S3 Notification Configuration on the PutBucket event and publish events to AWS Lambda. Use the Lambda to analyze logs as soon as they come in and flag issues.
2. Begin using CloudWatch Logs on every service. Stream all Log Groups into S3 objects. Use AWS EMR cluster jobs to perform adhoc MapReduce analysis and write new queries when needed.
3. Copy all log files into AWS S3 using a cron job on each instance. Use an S3 Notification Configuration on the PutBucket event and publish events to AWS Kinesis. Use Apache Spark on AWS EMR to perform at-scale stream processing queries on the log chunks and flag issues.
4. Begin using CloudWatch Logs on every service. Stream all Log Groups into an Amazon OpenSearch Service Domain running OpenSearch Dashboards and perform log analysis on a search cluster. (Amazon OpenSearch Service with OpenSearch Dashboards is designed specifically for real-time, ad-hoc log analysis and aggregation)
A company wants to implement a security analytics solution to detect threats across its AWS environment in near real-time. The solution should use pre-built detection rules and provide automated correlation of security events. Which approach is most suitable?
1. Stream VPC Flow Logs to Amazon S3 and query with Amazon Athena on a scheduled basis.
2. Use Amazon GuardDuty to detect threats and send findings to Amazon SNS for manual review.
3. Use Amazon OpenSearch Service Security Analytics with pre-packaged Sigma detection rules to correlate and alert on security events in real time. (OpenSearch Service Security Analytics provides built-in SIEM capabilities with Sigma-format detection rules, automated event correlation, and real-time alerting)
4. Send all logs to Amazon CloudWatch Logs and create metric filters for each threat pattern.
A company needs to build a generative AI application that answers questions based on internal company documents. The solution requires storing vector embeddings and performing similarity searches at scale. Which combination of AWS services should be used? (Select TWO)
1. Amazon OpenSearch Service as the vector database for storing and searching embeddings
2. Amazon Bedrock Knowledge Bases with OpenSearch Service as the backend vector store
3. Amazon DynamoDB with Global Secondary Indexes for similarity search
4. Amazon RDS with full-text search enabled
5. Amazon ElastiCache for Redis with vector similarity search
(Amazon OpenSearch Service provides native k-NN vector search capabilities and integrates with Amazon Bedrock Knowledge Bases for RAG applications. This combination provides scalable vector storage, similarity search, and integration with foundation models.)
A company has operational data in Amazon DynamoDB and wants to enable full-text search and analytics on this data in near real-time without building custom ETL pipelines. Which solution requires the LEAST operational overhead?
1. Enable DynamoDB Streams and write a Lambda function to index data into OpenSearch Service.
2. Export DynamoDB data to S3 and use OpenSearch Ingestion to load it into OpenSearch.
3. Use DynamoDB zero-ETL integration with Amazon OpenSearch Service for automatic real-time replication. (DynamoDB zero-ETL integration with OpenSearch Service provides fully managed, code-free real-time data replication without building custom pipelines)
4. Use AWS Glue to ETL DynamoDB data into OpenSearch on a scheduled basis.
A startup wants to implement full-text search for their application but does not want to manage infrastructure or pay for idle capacity. They expect bursty traffic patterns with periods of zero usage. Which option is most cost-effective?
1. Deploy a small OpenSearch Service managed cluster with a single node.
2. Use Amazon OpenSearch Serverless with scale-to-zero enabled. (OpenSearch Serverless next-generation architecture supports scale to zero with no idle costs, 20x faster autoscaling, and up to 60% cost savings compared to provisioning for peak capacity)
3. Use Amazon CloudSearch for managed search.
4. Deploy Elasticsearch on EC2 instances with Auto Scaling.