Amazon Q Business – Enterprise AI Assistant Guide

Amazon Q Business Overview

  • Amazon Q Business is a fully managed, generative AI-powered enterprise assistant built on Amazon Bedrock that answers questions, provides summaries, generates content, and completes tasks based on enterprise data.
  • Provides permissions-aware responses with citations from enterprise data sources for use cases such as IT helpdesk, HR, benefits, and compliance.
  • Supports Retrieval Augmented Generation (RAG) — combining enterprise knowledge retrieval with LLM-powered response generation.
  • Integrates with 40+ data source connectors, built-in and custom plugins, and Amazon Q Apps for citizen-developed AI applications.
  • Available through a web experience, browser extensions (Chrome, Firefox, Edge), Slack, and Microsoft Teams integrations.
  • Important: Amazon Q Business will no longer be open to new customers starting July 31, 2026. Existing customers remain fully supported. AWS recommends migrating to Amazon Quick for similar and enhanced capabilities.

Amazon Q Business Architecture

Amazon Q Business Architecture

Core Components

  • Application
    • Top-level container that encapsulates the entire Q Business deployment.
    • Each application has its own configuration, data sources, plugins, guardrails, and web experience.
    • Linked to an IAM Identity Center instance or IAM Federation for user authentication.
    • Supports both authenticated (IAM Identity Center/IAM Federation) and anonymous access modes.
  • Index
    • Stores and organizes ingested enterprise documents for retrieval.
    • Two index types available:
      • Starter Index — runs in 1 AZ, ideal for proof-of-concept; includes 20,000 documents or 200 MB extracted text capacity and 100 hours connector usage.
      • Enterprise Index — runs across 3 AZs for high availability; same base capacity with support for customer managed key (CMK) encryption.
    • Capacity can be scaled by adding additional index units.
  • Retriever
    • Responsible for fetching relevant documents from the index to answer user queries.
    • Two retriever options:
      • Native Retriever — built-in retriever managed by Amazon Q Business with semantic search capabilities.
      • Amazon Kendra Retriever — uses an existing Amazon Kendra index for retrieval, ideal for organizations already using Kendra with advanced search tuning.
  • Data Sources
    • Connectors that crawl, ingest, and synchronize enterprise content into the index.
    • Support scheduled sync (incremental and full) to keep index current.
    • Crawl Access Control Lists (ACLs) by default for document-level security.
  • Web Experience
    • Managed chat interface for end users to interact with Amazon Q Business.
    • Customizable with organization branding, visual themes, and conversation starters.
    • Supports single sign-on (SSO) via IAM Identity Center.
    • Can be embedded directly into applications and websites.

How RAG Works in Q Business

  1. User submits a natural language query through the web experience or integration.
  2. The retriever searches the index for relevant enterprise documents.
  3. ACLs are evaluated to ensure the user has permission to access retrieved documents.
  4. Retrieved documents (with citations) are passed to the underlying LLM.
  5. The LLM generates a comprehensive, contextual response grounded in enterprise data.
  6. Response is returned with source citations for verification.

Data Source Connectors

  • Amazon Q Business provides 40+ pre-built connectors to synchronize data from enterprise content repositories.
  • Connectors can be scheduled for automatic sync (full or incremental) to keep the index up-to-date.
  • All connectors crawl ACLs by default to maintain document-level security.

Cloud Storage & File Systems

Connector Description
Amazon S3 Indexes documents stored in S3 buckets. Supports PDF, HTML, Word, PowerPoint, Excel, CSV, and text files. Configurable with prefix filters.
Amazon FSx for Windows Indexes documents from FSx Windows file shares with NTFS ACL support.
Box Crawls files, folders, comments, and tasks from Box enterprise accounts.
Dropbox Indexes files, paper documents, and shared folders from Dropbox Business accounts.
Google Drive Crawls Google Docs, Sheets, Slides, PDFs, and shared drives with Google Workspace ACL support.
Microsoft OneDrive Indexes personal and shared files from Microsoft 365 OneDrive accounts.

Collaboration & Productivity

Connector Description
Confluence (Cloud) Crawls spaces, pages, blogs, comments, and attachments from Atlassian Confluence Cloud.
Confluence (Server) Indexes on-premises Confluence Server/Data Center instances.
Microsoft SharePoint (Cloud) Crawls sites, document libraries, lists, and pages from SharePoint Online with Microsoft 365 ACLs.
Microsoft SharePoint Server Supports SharePoint Server 2016, 2019, and Subscription Edition for on-premises deployments.
Microsoft Teams Indexes channel messages, files, wikis, and meeting notes from Teams.
Slack Crawls public and private channel messages, threads, and shared files.
Smartsheet Indexes sheets, reports, and dashboards from Smartsheet workspaces.
Quip Crawls documents, spreadsheets, and chat threads from Salesforce Quip (legacy connector).

Communication & Email

Connector Description
Gmail Indexes email messages and attachments from Google Workspace Gmail accounts.
Google Calendar (Preview) Crawls calendar events and descriptions from Google Workspace.
Microsoft Exchange Indexes emails, calendar events, contacts, and attachments from Exchange Online.

Project Management & ITSM

Connector Description
Jira Crawls issues, projects, comments, attachments, and worklogs from Jira Cloud.
ServiceNow Online Indexes knowledge articles, incidents, catalog items, and attachments from ServiceNow.
Zendesk Crawls tickets, articles, comments, and community posts from Zendesk.
Asana (Preview) Indexes tasks, projects, and comments from Asana workspaces.

CRM & Business Applications

Connector Description
Salesforce Online Crawls knowledge articles, accounts, cases, opportunities, feeds, and custom objects.

Source Code & Development

Connector Description
GitHub (Cloud) Indexes repositories, issues, pull requests, READMEs, and wiki pages from GitHub.com.
GitHub (Server) Crawls on-premises GitHub Enterprise Server instances.

Web & Custom

Connector Description
Amazon Q Web Crawler Crawls and indexes content from specified websites with configurable depth and URL filters.
Custom Data Source Connector Enables integration with any data source using the Amazon Q Business API. Developers push documents programmatically via BatchPutDocument API.

Database Connectors (via Custom Connector)

  • Database sources like MySQL, PostgreSQL, and Oracle can be integrated using the Custom Data Source Connector.
  • Developers extract data from databases, format as documents, and push to Q Business via the BatchPutDocument API.
  • Supports any structured data source that can be programmatically accessed.

Plugins & Actions

  • Plugins enable Amazon Q Business users to perform actions in third-party applications directly from the chat interface.
  • Users can create tickets, update records, send notifications, and query application data using natural language.
  • Plugins are only available with the Pro subscription tier.
  • Amazon Q Business supports 50+ action types across built-in and custom plugins.

Built-in Plugins

Plugin Capabilities
Jira Cloud Create issues, update status, add comments, assign tickets, search issues, transition workflows
ServiceNow Create/update incidents, search knowledge base, manage change requests, catalog items
Zendesk Create/update tickets, search articles, manage users, add comments
Salesforce Create/update cases, search accounts and contacts, manage opportunities
PagerDuty Create/acknowledge/resolve incidents, manage on-call schedules, escalation policies
Smartsheet Create/update rows, search sheets, manage attachments, update cells

Custom Plugins

  • Custom plugins allow integration with any third-party application using an OpenAPI schema definition.
  • Steps to create a custom plugin:
    1. Define an OpenAPI 3.0 specification describing the API endpoints, parameters, and responses.
    2. Configure authentication (OAuth 2.0, API key, or no auth).
    3. Upload the schema to Amazon Q Business and configure the plugin.
    4. Amazon Q Business automatically discovers available actions from the schema.
  • Use cases: submit time-off requests, send meeting invites, query internal APIs, trigger CI/CD pipelines.
  • Custom plugins support OAuth 2.0 authorization code flow for secure per-user authentication.

Chat Orchestration

  • Amazon Q Business automatically orchestrates end user chat requests across configured plugins and data sources.
  • Determines whether a query requires knowledge retrieval, plugin action execution, or both.
  • Enables multi-step workflows combining data retrieval and actions in a single conversation.

Amazon Q Apps

  • Amazon Q Apps enables users to build lightweight, purpose-built AI applications without any coding — empowering citizen developers.
  • Available exclusively to Pro subscription users (since July 1, 2024).
  • Users create Q Apps directly from the web experience interface using natural language descriptions or by converting chat conversations into reusable apps.

App Builder

  • Q Apps are composed of cards — modular building blocks that define inputs, processing, and outputs:
    • Text Input Card — accepts user text input
    • File Upload Card — allows file uploads (up to 10 MB per card)
    • Query Card — sends a prompt to the LLM with optional enterprise data context
    • Output Card — displays generated responses
  • Cards can be connected in sequence to create multi-step workflows.
  • Apps can leverage enterprise data sources configured in the Q Business application.

Sharing & Permissions

  • Private sharing — share apps with specific users within the Q Business application environment.
  • Library publishing — publish apps to the organization’s app library for broader discovery.
  • App creators control visibility and access at a granular level.
  • Administrators can enable/disable Q Apps at the application level.

Data Collection

  • Q Apps support data collection forms that allow shared apps to collect structured data from multiple users.
  • Useful for surveys, feedback collection, intake forms, and structured workflows.

Example Use Cases

  • Meeting summary generator — upload meeting notes, get action items and summaries
  • RFP response assistant — input requirements, generate proposal drafts from company knowledge
  • Onboarding checklist app — guide new hires through company policies and procedures
  • Competitive analysis tool — input competitor info, get insights from internal research documents

Admin Controls & Guardrails

  • Amazon Q Business provides configurable guardrails (chat controls) to manage and control the end user chat experience.
  • Controls are organized into global controls and topic-level controls.

Global Controls

  • Response source controls — specify whether responses use:
    • Enterprise data only (strict RAG mode)
    • Enterprise data + LLM model knowledge (when enterprise data lacks answers)
  • Blocked phrases — define specific words or phrases that Amazon Q Business must never include in responses.
  • File upload control — enable or disable end user file uploads during chat sessions.
  • Chat personalization — control whether responses are personalized using IAM Identity Center user attributes (address, job info).
  • Chat orchestration — enable/disable automatic routing of requests across plugins and data sources.
  • Hallucination detection — enable automatic checking and correction of responses for inconsistencies.
  • Global controls cannot be created or deleted — only updated.

Topic-Level Controls

  • Define natural language topics that Amazon Q Business should handle in specific ways.
  • For each topic, configure:
    • Topic description — natural language description of the topic area
    • Example user messages — sample queries that fall under this topic
    • Response behavior:
      • Allow responses from enterprise data only
      • Allow responses from enterprise data + model knowledge
      • Block the topic entirely (refuse to answer)
    • Custom response message — provide a specific response for blocked topics
  • Topic controls can be scoped to specific users and groups for fine-grained governance.

Blocked Topics

  • Administrators can block entire topics to prevent the assistant from discussing sensitive subjects.
  • Common blocked topics: competitor information, executive compensation, unreleased products, legal opinions.
  • When a blocked topic is detected, Q Business returns the configured custom response message.

Access Control & Security

  • Amazon Q Business implements defense-in-depth security with multiple layers of access control.
  • Built on Amazon Bedrock, inheriting automated abuse detection and responsible AI controls.

IAM Identity Center Integration

  • AWS IAM Identity Center (recommended) provides centralized identity management for Q Business.
  • Supports single sign-on (SSO) with external identity providers (Okta, Azure AD, Ping Identity, etc.).
  • Manages user subscriptions, group memberships, and application access centrally.
  • Enables automatic subscription deduplication across multiple Q Business applications sharing the same Identity Center instance.
  • IAM Federation (alternative) — supports OIDC and SAML identity providers for organizations not using Identity Center.

Document-Level Security (ACL Crawling)

  • Amazon Q Business crawls Access Control Lists (ACLs) from data sources by default.
  • Maps source system users/groups to IAM Identity Center identities via a User Store.
  • Ensures users only receive answers from documents they have permission to access in the source system.
  • ACL crawling supports:
    • User-level permissions
    • Group-level permissions
    • Inherited permissions (folder hierarchies)
  • Once ACL crawling is enabled, it cannot be disabled — this is a permanent setting.
  • Documents without ACL entries are accessible to all authenticated users by default.

Encryption

  • Encryption at rest — all data in the index is encrypted using AWS KMS keys.
  • Customer Managed Keys (CMK) — supported with Enterprise index type for full key control.
  • Encryption in transit — all communications use TLS 1.2+.
  • Data source credentials stored securely in AWS Secrets Manager.

Network Security

  • Amazon Q Business supports VPC endpoints (AWS PrivateLink) for private connectivity.
  • Data source connections can traverse VPCs for on-premises connectors.
  • All API calls are logged in AWS CloudTrail for auditing.

Subscription Management

  • Amazon Q Business uses a per-user subscription model with charges for both user subscriptions and index capacity.

User Subscription Tiers

Feature Lite Plan ($3/user/month) Pro Plan ($20/user/month)
Ideal for Enterprise-wide deployment, frontline workers Knowledge workers, power users
Q&A on knowledge bases ✅ With citations ✅ With citations
Q&A on LLM knowledge
File upload to chat
Content generation
Amazon Q Apps
Built-in plugins
Custom plugins
Slack/Teams integrations Browser extensions only ✅ Full integrations
QuickSight integration ✅ Reader Pro
Chat orchestration
Web experience (SSO)
Permissions-aware responses

Index Pricing

Index Type Pricing Included Capacity
Starter $0.14/hour per unit 20,000 docs or 200 MB text, 100 hrs connector usage
Enterprise $0.264/hour per unit 20,000 docs or 200 MB text, 100 hrs connector usage + CMK support

Subscription Billing Details

  • Charges start only after first use by the user.
  • Subscriptions are prorated when created or upgraded (based on remaining days in the month).
  • Cancellations and downgrades are not prorated — they apply at the start of the next billing month.
  • AWS deduplicates subscriptions across Q Business applications sharing the same IAM Identity Center instance — each user is charged only once at their highest subscription level.
  • For IAM Federation, users are charged once per IAM Identity Provider.

Amazon Q Business vs Bedrock Knowledge Bases vs Amazon Kendra

Feature Amazon Q Business Bedrock Knowledge Bases Amazon Kendra
Primary Purpose Enterprise AI assistant (turnkey RAG + actions) Managed RAG for custom AI applications Intelligent enterprise search
Target User Business users & admins (no-code) Developers building AI apps Developers & search admins
Built-in Chat UI ✅ Web experience, browser extensions ❌ (requires custom UI) ❌ (search UI only, needs custom chat)
Data Connectors 40+ managed connectors S3, Confluence, SharePoint, Web Crawler, Google Drive, OneDrive 30+ managed connectors
Retrieval Method Native or Kendra retriever Vector search (OpenSearch, Pinecone, etc.) Semantic + keyword search
LLM Integration Built-in (managed by AWS) Choose any Bedrock FM Requires custom LLM integration
Plugins/Actions ✅ Built-in + custom (OpenAPI) ✅ Via Bedrock Agents
Citizen Developer Apps ✅ Q Apps
Access Control ACL crawling, IAM Identity Center Metadata filtering ACL crawling, token-based
Admin Guardrails ✅ Topic controls, blocked phrases ✅ Bedrock Guardrails (separate) ❌ (search-level only)
Pricing Model Per user/month + index capacity Per KB storage + retrieval queries Per index hour + connector usage
Best For Rapid enterprise AI assistant deployment Custom RAG applications with specific FMs Enterprise search with NLP ranking
Availability Status Closing to new customers July 31, 2026 (migrate to Amazon Quick) GA, actively developed Closing to new customers (migrate to Quick)

Use Cases

Internal Knowledge Base

  • Connect company wikis, SharePoint, Confluence, and file shares to provide instant answers about policies, procedures, and institutional knowledge.
  • Reduce time employees spend searching across multiple systems.
  • Maintain permissions — users only see information they’re authorized to access.

IT Helpdesk

  • Index IT documentation, runbooks, and knowledge articles from ServiceNow.
  • Use plugins to create/update tickets directly from the chat interface.
  • Automate common L1 support queries (password resets, VPN setup, software installation guides).
  • Escalate complex issues by creating tickets with pre-populated context.

HR Assistant

  • Answer employee questions about benefits, PTO policies, expense procedures, and onboarding.
  • Connect to HR systems via plugins for actions like submitting time-off requests.
  • Reduce HR ticket volume by providing instant self-service answers.
  • Use topic-level controls to block sensitive HR topics (individual salaries, disciplinary actions).

Customer Support (Internal)

  • Equip support agents with instant access to product documentation, troubleshooting guides, and customer history.
  • Reduce average handle time by surfacing relevant solutions in real-time.
  • Create Zendesk/Salesforce tickets with full context directly from the assistant.

Compliance & Legal Q&A

  • Index regulatory documents, compliance policies, audit reports, and legal guidelines.
  • Provide rapid answers about compliance requirements with document citations.
  • Use guardrails to ensure responses don’t constitute legal advice (blocked topic with custom message).
  • Maintain strict access controls — only compliance team members can access sensitive regulatory documents.

Migration to Amazon Quick

  • AWS announced that Amazon Q Business will no longer accept new customers starting July 31, 2026.
  • Existing customers remain fully supported with bug fixes and security updates, but no new features.
  • AWS recommends migrating to Amazon Quick — the next evolution of Q Business with enhanced capabilities.
  • Amazon Quick provides:
    • Quick Flows — workflow automation (replacing Q Apps)
    • QuickSight integration — structured data analysis and visualization
    • Quick Research — in-depth analysis and expert insights
    • Spaces — unified knowledge management
    • MCP (Model Context Protocol) — open standard for connecting to external tools and data sources
  • Migration path: Use Bring Your Own Index (BYOI) to connect existing Q Business index to Quick without disrupting current operations.
  • Q Apps must be manually migrated to Quick Flows.
  • Guardrails and User Store configurations are not included in BYOI — must be recreated in Quick.

AWS Certification Exam Practice Questions

Question 1: A company wants to deploy Amazon Q Business for their 5,000 employees. Frontline workers need basic Q&A access, while 200 knowledge workers need full capabilities including content generation and plugins. What is the most cost-effective subscription approach?

  1. Subscribe all 5,000 users to Pro plan
  2. Subscribe 4,800 users to Lite plan and 200 users to Pro plan
  3. Subscribe all users to Lite plan and upgrade on request
  4. Use anonymous access for all users to avoid subscription costs
Show Answer

Answer: B –

Explanation: The Lite plan ($3/user/month) provides Q&A on knowledge bases with citations and permissions-aware responses, sufficient for frontline workers. The Pro plan ($20/user/month) adds content generation, plugins, Q Apps, and integrations needed by knowledge workers. This gives $14,400/month for Lite users + $4,000/month for Pro users = $18,400/month vs. $100,000/month for all Pro.

Question 2: An organization uses Amazon Q Business with documents stored across SharePoint, Confluence, and S3. A user asks a question, but receives no answer despite the information existing in Confluence. What is the MOST likely cause?

  1. The Confluence connector has not completed its sync schedule
  2. The user does not have ACL permissions to access the Confluence document
  3. Amazon Q Business does not support Confluence as a data source
  4. The Enterprise index type is required for multiple data sources
Show Answer

Answer: B –

Explanation: Amazon Q Business crawls ACLs by default and provides permissions-aware responses. If a user doesn’t have access to a document in the source system (Confluence), Q Business will not include that document in its response, even if the information exists. Option A is possible but less likely if the connector is configured for regular syncs.

Question 3: A company wants to prevent Amazon Q Business from answering questions about competitor pricing and executive compensation. Which feature should the administrator configure?

  1. IAM policies to restrict user access
  2. Global controls with blocked phrases
  3. Topic-level controls with blocked topic behavior
  4. Remove all documents mentioning competitors from data sources
Show Answer

Answer: C –

Explanation: Topic-level controls allow administrators to define natural language topics (e.g., “competitor pricing,” “executive compensation”) and configure blocked behavior with custom response messages. Global blocked phrases only block specific words/phrases in responses, not entire topics. Topic-level controls provide more comprehensive governance over sensitive subjects.

Question 4: A development team wants Amazon Q Business users to create Jira tickets directly from the chat interface when they encounter issues. Which component is needed?

  1. Jira data source connector
  2. Jira built-in plugin
  3. Custom data source connector with Jira API
  4. Amazon Q Apps with Jira integration
Show Answer

Answer: B –

Explanation: The Jira built-in plugin enables users to perform actions (create issues, update status, add comments) in Jira directly from the Q Business chat interface. The Jira data source connector is for indexing/reading Jira content, not performing actions. Plugins enable write operations while connectors enable read/index operations.

Question 5: An organization is evaluating whether to use Amazon Q Business or Amazon Bedrock Knowledge Bases for their enterprise AI assistant. They need a turnkey solution with built-in chat UI, 40+ data connectors, no-code setup, and citizen developer app capabilities. Which service best fits their requirements?

  1. Amazon Bedrock Knowledge Bases with custom UI
  2. Amazon Q Business
  3. Amazon Kendra with custom LLM integration
  4. Amazon Bedrock Agents with Confluence connector
Show Answer

Answer: B –

Explanation: Amazon Q Business provides all requested capabilities: built-in web experience chat UI, 40+ managed data connectors, no-code admin setup, and Q Apps for citizen developers. Bedrock Knowledge Bases requires custom UI development and has fewer native connectors. Kendra provides search but not a conversational AI assistant. Q Business is the fully managed turnkey enterprise AI assistant solution.

Frequently Asked Questions

What is Amazon Q Business?

Amazon Q Business is a fully managed generative AI assistant for enterprises. It connects to 40+ data sources (SharePoint, Confluence, Salesforce, etc.), understands your company’s information, and provides accurate answers with citations while respecting existing access controls.

How much does Amazon Q Business cost?

Q Business Lite costs $3/user/month (Q&A and search only). Q Business Pro costs $20/user/month (includes plugins, actions, Q Apps, and advanced features). There’s also a per-index-unit and document storage charge.

What is the difference between Q Business and Bedrock Knowledge Bases?

Q Business is a ready-to-use enterprise assistant with built-in web UI, 40+ connectors, plugins, and admin controls. Bedrock Knowledge Bases is a developer building block for custom RAG applications that you integrate into your own apps via API.

References

Bedrock Agents, Knowledge Bases & Guardrails

Amazon Bedrock Agents, Knowledge Bases & Guardrails – Complete Guide

Amazon Bedrock provides a comprehensive platform for building, deploying, and managing generative AI applications. This deep-dive guide covers the advanced capabilities of Bedrock’s key components: Knowledge Bases for RAG, Agents for autonomous task execution, AgentCore for production deployment, Guardrails for safety, Model Evaluation, Fine-tuning, and Prompt Management.

Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases is a fully managed RAG (Retrieval-Augmented Generation) capability that connects foundation models to proprietary data sources. It handles the entire workflow from data ingestion, chunking, embedding, storage, to retrieval and prompt augmentation.

Knowledge Base Types

  • Custom Knowledge Base – You choose the vector store, embedding model, chunking strategy, and data sources. Provides full control over the RAG pipeline.
  • Managed Knowledge Base (GA June 2026) – Amazon Bedrock manages the underlying infrastructure including vector storage, embeddings, re-ranking, and retrieval optimization. Supports auto-scaling, agentic retrieval for multi-hop reasoning, and multimodal data ingestion.

Data Sources

  • Amazon S3 – Primary data source supporting documents in PDF, TXT, MD, HTML, CSV, DOC/DOCX, XLS/XLSX, and JSON formats.
  • Confluence – Connects to Atlassian Confluence workspaces for ingesting wiki pages and documentation.
  • Microsoft SharePoint – Ingests documents from SharePoint Online sites and libraries.
  • Salesforce – Connects to Salesforce objects like Knowledge Articles and custom objects.
  • Web Crawler – Crawls and ingests web pages from specified URLs with configurable depth and scope.
  • Google Drive – Connects to Google Drive for document ingestion (Managed KB).
  • OneDrive – Connects to Microsoft OneDrive (Managed KB).

Chunking Strategies

  • Default Chunking – Splits content into chunks of approximately 300 tokens, honoring sentence boundaries.
  • Fixed-size Chunking – Splits content into chunks of a user-defined token size (1–8192 tokens) with configurable overlap percentage for context continuity.
  • Semantic Chunking – Groups text by meaning using embedding similarity. Breakpoints are created when semantic similarity between consecutive sentences drops below a threshold. Produces more coherent chunks but is computationally more expensive.
  • Hierarchical Chunking – Creates parent-child chunk relationships. Parent chunks provide broader context while child chunks contain specific details. During retrieval, child chunks are returned with parent context for better comprehension.
  • No Chunking – Treats each document as a single chunk. Best for short documents or pre-chunked data.

Embedding Models

  • Amazon Titan Text Embeddings V2 – AWS native model supporting configurable output dimensions (256, 512, or 1024). Supports text normalization and multiple languages. Optimized for RAG workloads with high accuracy-to-cost ratio.
  • Cohere Embed – Multilingual embedding model available in English and multilingual variants. Supports input types (search_document, search_query) for optimized retrieval.
  • Amazon Titan Multimodal Embeddings – Supports both text and image embeddings in a unified vector space.

Vector Stores

  • Amazon OpenSearch Serverless – Default option with serverless scaling. Supports hybrid search (semantic + keyword), metadata filtering, and automatic index management.
  • Amazon OpenSearch Service (Managed Cluster) – Added March 2025. Provides more control over cluster configuration, instance types, and scaling policies.
  • Amazon Aurora PostgreSQL – Uses pgvector extension. Supports hybrid search (added April 2025) and integrates with existing Aurora databases.
  • Pinecone – Third-party managed vector database with serverless and pod-based options.
  • Redis Enterprise Cloud – In-memory vector store for low-latency retrieval.
  • MongoDB Atlas – Document database with vector search capabilities. Supports hybrid search (added April 2025).
  • Amazon Neptune Analytics – Graph + vector search for knowledge graph use cases.
  • Amazon S3 – Added July 2025 for cost-effective vector storage with S3-native retrieval.

Hybrid Search

  • Combines semantic (vector) search with keyword (lexical) search for improved retrieval accuracy.
  • Semantic search captures meaning and handles paraphrasing; keyword search handles exact matches, names, and codes.
  • Supported on OpenSearch Serverless, OpenSearch Managed Clusters, Aurora PostgreSQL (April 2025), and MongoDB Atlas (April 2025).
  • Results are combined using Reciprocal Rank Fusion (RRF) to produce a unified ranking.

Metadata Filtering

  • Each document can have a metadata JSON file (up to 10 KB) with custom attributes.
  • Filters are applied as pre-filtering before vector search, reducing the search space.
  • Supports operators: equals, notEquals, greaterThan, lessThan, in, notIn, startsWith, stringContains.
  • Enables multi-tenant RAG by filtering documents based on tenant ID, access controls, or document categories.

Advanced Parsing

  • Foundation Model Parsing – Uses an FM (e.g., Claude) to extract and interpret content from complex documents including PDFs with tables, charts, and images. Provides customizable extraction prompts.
  • Amazon Textract Parsing – OCR-based parsing for scanned documents and images.
  • Standard Parsing – Default text extraction for supported document formats.
  • FM parsing is ideal for documents with complex layouts, embedded images, or non-standard formatting that standard parsers cannot handle accurately.

📖 Deep Dive Guides: Bedrock vs SageMaker | RAG Architecture | Prompt Engineering | Responsible AI | AI Services Decision Guide

Amazon Bedrock Agents

Amazon Bedrock Agents enables developers to build autonomous AI agents that can plan multi-step tasks, invoke APIs, and interact with knowledge bases to accomplish complex goals. Agents use foundation models for reasoning and orchestration.

Agent Architecture

  • Foundation Model – The reasoning engine that interprets user requests, plans actions, and generates responses.
  • Instructions – System-level prompts that define the agent’s persona, capabilities, and behavioral guidelines.
  • Action Groups – Collections of tools/APIs the agent can invoke, defined via OpenAPI schemas or function definitions.
  • Knowledge Bases – Connected data sources for RAG-based retrieval to ground responses in proprietary data.
  • Guardrails – Safety filters applied to agent inputs and outputs.

Orchestration

  • Agents use a ReAct (Reasoning + Acting) orchestration loop by default: the FM reasons about the task, decides on an action, executes it, observes results, and iterates.
  • Custom Orchestration – Use a Lambda function to define custom orchestration logic, overriding the default ReAct loop for specialized workflows.
  • The orchestration loop continues until the agent determines it has sufficient information to generate a final response or reaches the maximum iteration limit.

Action Groups & Tool Use

  • Action groups define the tools available to the agent using either OpenAPI schemas or simplified function definitions.
  • Lambda Functions – Backend logic executed when the agent invokes an action. Receives the API operation, parameters, and session context.
  • Return of Control (ROC) – Instead of executing a Lambda, the agent returns control to the calling application with the action details. The application executes the action and returns results to continue the conversation.
  • Code Interpreter – Built-in action group that allows the agent to generate and execute Python code in a secure sandbox for data analysis, calculations, and chart generation.
  • User Confirmation – Configurable step where the agent asks for user approval before executing sensitive actions.

Multi-Step Reasoning

  • Agents decompose complex requests into sequential sub-tasks, executing each step and using results to inform the next.
  • Supports query decomposition for knowledge base retrieval – breaking a complex question into simpler sub-queries.
  • Chain-of-thought traces are available for debugging and observability.

Inline Agents

  • Dynamically configure agent capabilities at runtime without pre-creating agent resources.
  • Specify instructions, action groups, knowledge bases, and guardrails in the API call itself.
  • Enables dynamic workflow adaptation where agent roles and tools change based on context.
  • Launched with multi-agent collaboration GA (March 2025).

Multi-Agent Collaboration (Supervisor/Child)

  • Supervisor Agent – Orchestrates the workflow by breaking requests into sub-tasks and delegating to specialized child agents.
  • Child Agents (Collaborator Agents) – Specialized agents focused on specific domains (e.g., checking maintenance, analyzing alarms, evaluating KPIs).
  • Supervisor routes tasks, consolidates outputs, and generates unified final responses.
  • Supports both SUPERVISOR mode (supervisor decides routing) and SUPERVISOR_ROUTER mode (classifier-based routing).
  • GA since March 2025 with support for up to 5 collaborator agents per supervisor.

Agent Memory

  • Session Memory (Short-term) – Maintains conversation context within a session. Automatically managed within the session window (configurable idle timeout).
  • Long-term Memory – Persists information across sessions. Extracts key facts, preferences, and context from conversations and stores them for future sessions.
  • Memory enables personalized experiences where agents remember user preferences, past interactions, and ongoing tasks.
  • Supports metadata on memory records for organizing, filtering, and routing retrieval.

Prompt Engineering for Agents

  • System Instructions – Define the agent’s role, personality, constraints, and response format.
  • Advanced Prompts – Customize prompts at each orchestration step: pre-processing, orchestration, knowledge base response generation, and post-processing.
  • Prompt Templates – Use variables (e.g., $tool_results$, $knowledge_base_results$) to structure how the agent processes information.
  • Best practices: Be specific about capabilities, define clear boundaries, provide examples of expected behavior, and specify output formats.

Amazon Bedrock AgentCore

Amazon Bedrock AgentCore (GA June 2026) is a code-first platform to build, deploy, connect, and optimize AI agents at scale. It provides production-grade infrastructure including runtime, identity, tools, memory, observability, and evaluation — regardless of the framework or model used.

Managed Deployment (AgentCore Runtime & Harness)

  • AgentCore Harness – The managed orchestration layer (“body”) for agents. Handles the orchestration loop, tool execution, context window management, state persistence, failure recovery, and session isolation.
  • Define agents via configuration: model, tools, skills, instructions. AgentCore assembles and runs the agent loop.
  • Each agent runs in its own isolated environment with filesystem, shell, memory, and web browsing capabilities.
  • Supports any open-source framework (LangGraph, CrewAI, Strands) and any model.
  • Provides MicroVM-based isolation for secure execution of tools and code.

AgentCore Identity & Access

  • AgentCore Identity – Provides robust identity and access management for agents at scale.
  • Agents can access resources/tools on behalf of users or themselves with pre-authorized user consent.
  • Compatible with existing identity providers (Okta, Auth0, Entra ID) — no user migration required.
  • Workload Identities – Unique identities assigned to agents for authentication and authorization.
  • Centralized identity management regardless of deployment environment (AgentCore Runtime, self-hosted, hybrid).
  • Eliminates need for custom access controls and identity infrastructure.

Tool Management (AgentCore Gateway)

  • AgentCore Gateway – Unified MCP (Model Context Protocol) gateway for tool discovery and invocation.
  • Serves as a single endpoint for accessing tools from different teams, organizations, and applications.
  • Fine-grained access control with gateway interceptors for per-principal permissions.
  • Supports the AWS-curated skills catalog accessible with a single toggle.
  • Web Search tool enables agents to ground responses in current web knowledge.

Memory Management

  • Memory provisions automatically when a harness is created.
  • Extracts useful information from short-term memory and stores as long-term memory records.
  • Supports strictly consistent metadata on memory records for organized retrieval.
  • Agents recognize returning users without additional setup.

Quality Evaluations

  • Batch Evaluation – Define what “good” looks like and measure candidate changes against quality bars at scale.
  • Customers specify evaluation criteria and AgentCore runs assessments across multiple test cases.
  • Supports comparison of agent versions before deployment.

A/B Testing

  • Controlled comparison between agent versions by splitting live production traffic.
  • Measures outcomes side-by-side to confirm improvements hold under real conditions.
  • Enables data-driven decisions about agent updates and configuration changes.

Policy Controls

  • AgentCore Policy – Authorization capability that controls which actions agents are authorized to take.
  • Integrates with Amazon Bedrock Guardrails for content safety and prompt injection protection.
  • Provides enterprise defenses against security and safety risks in agent workloads.
  • Supports sensitive data exposure prevention and prompt injection attack detection.

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides configurable safeguards for generative AI applications. It helps detect and filter harmful content, block undesirable topics, redact sensitive information, and reduce hallucinations — applied to both user inputs and model responses.

Content Filters

  • Detect and filter harmful content across six categories with configurable strength levels (None, Low, Medium, High):
  • Hate – Content that discriminates, criticizes, insults, or dehumanizes based on identity attributes.
  • Insults – Content that demeans, bullies, or includes negative/derogatory language.
  • Sexual – Content that indicates sexual interest, activity, or arousal.
  • Violence – Content that glorifies or threatens physical harm to individuals or groups.
  • Misconduct – Content related to criminal activity, including fraud, theft, and illegal substance use.
  • Prompt Attacks – Detects prompt injection and jailbreak attempts designed to bypass safety controls.
  • Supports tiered filtering (announced June 2025) for cost-optimized content moderation at scale.

Image Content Filters (GA March 2025)

  • Extends content filtering to image modality — moderates both image and text content.
  • Applies to all categories: hate, insults, sexual, violence, misconduct, and prompt attacks.
  • Blocks up to 88% of harmful multimodal content.
  • Industry-leading safeguards for applications handling user-uploaded images or model-generated images.

Denied Topics

  • Define custom topics that the AI should refuse to engage with.
  • Provide a natural language definition and optional sample phrases for each denied topic.
  • Example: A bank’s AI assistant can deny conversations about investment advice or cryptocurrencies.
  • Applied to both user inputs (block the question) and model outputs (block the response).

Word Filters

  • Block specific words or phrases from appearing in inputs or outputs.
  • Supports exact match and managed word lists (e.g., profanity lists).
  • Useful for blocking competitor names, internal project codes, or inappropriate terminology.

Sensitive Information Filters

  • PII Detection – Identifies personally identifiable information including names, email addresses, phone numbers, SSNs, credit card numbers, and more.
  • Regex Patterns – Define custom patterns for domain-specific sensitive data (e.g., account numbers, internal IDs).
  • Actions: Block (reject the entire message) or Anonymize/Redact (mask the PII and allow the message through).
  • Supports over 30 built-in PII entity types.

Contextual Grounding Check

  • Detects hallucinations in RAG and summarization use cases.
  • Grounding – Validates that model responses are factually consistent with the provided reference source/context.
  • Relevance – Checks that the response is relevant to the user’s query.
  • Configurable thresholds for grounding and relevance scores.
  • Filters over 75% of hallucinated responses in RAG applications.

Automated Reasoning Checks

  • Uses formal verification methods grounded in mathematical logic to validate AI-generated outputs.
  • Detects hallucinations, suggests corrections, and highlights unstated assumptions.
  • Provides provably correct, auditable assessments with deterministic formal logic.
  • First and only safeguard using Automated Reasoning to prevent factual errors.
  • Policy refinement workflows added June 2026 for iterative improvement.

ApplyGuardrail API

  • Standalone API to apply guardrails independently of model invocation.
  • Enables guardrail evaluation on any text content — even from non-Bedrock models or external systems.
  • Use cases: validate content from third-party LLMs, pre-screen user inputs, post-process outputs from any source.
  • InvokeGuardrailChecks API – Enhanced API for agentic AI applications requiring step-level guardrail checks.

Code Domain Support (Jan 2025)

  • Protects against undesirable content within code elements.
  • Inspects user prompts, comments, variables, function names, and string literals.
  • Prevents injection of harmful content via code constructs.

Amazon Bedrock Model Evaluation

Amazon Bedrock Evaluations helps you compare, evaluate, and select foundation models for your specific use cases. It supports automatic evaluation, human evaluation, and LLM-as-a-judge workflows.

Automatic Evaluation

  • Evaluate models using built-in metrics without human involvement.
  • Accuracy – Measures correctness of model responses using metrics like BERTScore, ROUGE, and exact match.
  • Robustness – Tests model consistency across paraphrased inputs and adversarial perturbations.
  • Toxicity – Measures harmful or inappropriate content in model outputs.
  • Supports custom datasets in JSONL format with prompt-response-reference triples.
  • Can evaluate models running on Bedrock, other cloud providers, or on-premises (GA April 2025).

LLM-as-a-Judge (Preview Dec 2024)

  • Uses a foundation model to evaluate other models with human-like quality assessment.
  • Fraction of the cost and time of human evaluations.
  • Supports custom evaluation criteria and scoring rubrics.

RAG Evaluation

  • Evaluate end-to-end RAG systems including retrieval quality and generation accuracy.
  • Metrics: context relevance, answer faithfulness, answer relevance.
  • Can evaluate fully built applications, not just individual model responses.

Human Evaluation Workflows

  • Set up human evaluation jobs with custom work teams.
  • Evaluators rate model responses on custom criteria (helpfulness, harmlessness, coherence).
  • Supports comparison of multiple models side-by-side.
  • Integrates with Amazon SageMaker Ground Truth for workforce management.

Model Comparison

  • Compare multiple foundation models on the same evaluation dataset.
  • Side-by-side results with statistical significance testing.
  • Helps select optimal model balancing quality, latency, and cost for specific use cases.

Amazon Bedrock Fine-Tuning & Customization

Amazon Bedrock provides multiple model customization techniques to adapt foundation models to specific tasks and domains.

Continued Pre-Training

  • Extend a model’s knowledge by training on unlabeled, domain-specific data.
  • Adapts the model’s language understanding to specialized vocabularies and concepts.
  • Training data format: Plain text documents in S3 (no prompt-completion pairs needed).
  • Best for: Domain adaptation (medical, legal, financial terminology).

Instruction Fine-Tuning

  • Train models on labeled prompt-completion pairs to improve task-specific performance.
  • Training data format: JSONL with {"prompt": "...", "completion": "..."} or chat-format messages.
  • Supports validation datasets for monitoring overfitting.
  • Configurable hyperparameters: epochs, batch size, learning rate, warmup steps.
  • Best for: Improving performance on specific tasks like classification, extraction, or formatting.

Reinforcement Fine-Tuning (RFT) – GA December 2025

  • Advanced customization using reward-based learning without requiring large labeled datasets.
  • Bring your own prompts or use existing Bedrock API invocation logs as training data.
  • Delivers 66% accuracy gains on average over base models.
  • Supported models: Amazon Nova, OpenAI GPT OSS 20B, Qwen 3 32B (Feb 2026).
  • Automates the reinforcement workflow — accessible to developers without deep ML expertise.
  • Built-in evaluation tools to compare RFT model against the base model.
  • Supports iterative fine-tuning: build upon previously customized models for continuous improvement.
  • Training data: JSONL with prompts; rewards are computed by a verifier/judge function you define.

Model Distillation

  • Transfer knowledge from a larger “teacher” model to a smaller “student” model.
  • Provide input prompts in JSONL; Bedrock generates responses from the teacher model and uses them to fine-tune the student.
  • Achieves teacher-model quality at student-model cost and latency.
  • Best for: Reducing inference costs while maintaining quality for specific use cases.

Training Data Format Summary

Method Data Format Data Requirements
Continued Pre-Training Plain text files Unlabeled domain corpus
Instruction Fine-Tuning JSONL (prompt/completion) Min ~100 examples, recommended 1000+
Reinforcement Fine-Tuning JSONL (prompts) + verifier Prompts + reward/judge function
Distillation JSONL (input prompts) Prompts only; teacher generates completions

Amazon Bedrock Prompt Flows & Management

Amazon Bedrock provides tools for creating, managing, and orchestrating prompts and generative AI workflows.

Prompt Management (GA November 2024)

  • Streamlined interface to create, evaluate, version, and share prompts.
  • Prompt Versioning – Each version is linked to its evaluation results. Supports rollback, audit trails, and A/B testing.
  • Prompt Variables – Template variables (e.g., {{context}}, {{question}}) for dynamic prompt construction.
  • Model Selection – Test the same prompt across different foundation models to compare performance.
  • Sharing – Share prompts across teams and projects for collaboration and reuse.
  • Treats prompts as critical as code — version-controlled and reproducible.

Bedrock Flows (Visual Flow Builder)

  • Intuitive visual builder to create, test, and deploy generative AI workflows.
  • Drag-and-drop interface to link Prompts, Agents, Knowledge Bases, Guardrails, and AWS services.
  • Node Types:
    • Prompt Node – Invokes a foundation model with a configured prompt.
    • Agent Node – Invokes a Bedrock Agent for autonomous task execution.
    • Knowledge Base Node – Retrieves relevant information from a Knowledge Base.
    • Condition Node – Routes flow based on conditional logic.
    • Lambda Node – Executes custom business logic.
    • Lex Node – Integrates with Amazon Lex for conversational interfaces.
    • Iterator Node – Loops over collections of items.
    • Collector Node – Aggregates results from parallel or iterated executions.
  • Serverless execution — pricing based on resources consumed (model invocations, Lambda, etc.).
  • Supports versioning and aliases for deployment management.

A/B Testing for Prompts

  • Version-control prompts and compare performance across versions.
  • Use Bedrock Evaluations to measure quality differences between prompt versions.
  • Deploy prompt versions with aliases and switch traffic between versions.
  • Combine with AgentCore A/B testing for full agent-level experimentation.

Comparison: Bedrock Knowledge Bases vs Amazon Kendra vs OpenSearch

Feature Bedrock Knowledge Bases Amazon Kendra Amazon OpenSearch Service
Primary Purpose RAG for generative AI Intelligent enterprise search Full-text search, analytics, vector search
Search Type Semantic + hybrid (keyword) Semantic + keyword (NLU-based) Full-text, keyword, vector (k-NN), hybrid
RAG Integration Native (fully managed) Via Retrieve API + custom orchestration Custom implementation required
Management Fully managed Fully managed Managed clusters or serverless
Data Sources S3, Confluence, SharePoint, Salesforce, Web Crawler, Google Drive, OneDrive 40+ connectors (S3, SharePoint, Salesforce, databases, ServiceNow, etc.) Custom ingestion pipelines
Chunking Fixed, semantic, hierarchical, no chunking Automatic (document passages) Custom (application-managed)
Vector Store Managed or BYO (OpenSearch, Aurora, Pinecone, Redis, MongoDB) Built-in (not configurable) Native k-NN plugin
Metadata Filtering Yes (custom JSON metadata) Yes (document attributes) Yes (field-level filtering)
Access Control Via metadata filtering Native ACL integration (SharePoint, etc.) Fine-grained access control
Multimodal Yes (FM parsing for images/tables) Limited (document text extraction) Yes (with custom embeddings)
Re-ranking Yes (Managed KB) Built-in semantic re-ranking Custom (Learning to Rank plugin)
Best For GenAI applications, RAG pipelines, AI agents Enterprise search portals, FAQ systems, document discovery Custom search, log analytics, observability, full control over retrieval
Pricing Model Pay per query + storage (vector store) Index-based (provisioned capacity) Instance/serverless OCU hours

AWS Certification Exam Practice Questions

Question 1:

A company is building a RAG application using Amazon Bedrock Knowledge Bases. Their documents contain complex tables, charts, and embedded images in PDF format. Standard text extraction is losing critical information. Which parsing approach should they use to improve data quality?

  1. Fixed-size chunking with 512 tokens
  2. Foundation model parsing with a customized extraction prompt
  3. Semantic chunking with sentence boundary detection
  4. Amazon Textract with default settings
Show Answer

Answer: B – Foundation model parsing uses an FM (e.g., Claude) to interpret complex document layouts including tables, charts, and images. It allows customizable extraction prompts to capture the specific information needed. While Textract handles OCR, FM parsing provides superior understanding of document structure and semantics.

Question 2:

A financial services company wants their Bedrock Agent to execute a trade only after receiving explicit user approval. Which feature should they implement?

  1. Guardrails with denied topics
  2. Return of Control with user confirmation
  3. Custom orchestration with Lambda
  4. Multi-agent collaboration with a supervisor
Show Answer

Answer: B – Return of Control (ROC) allows the agent to return the proposed action to the calling application instead of executing it directly. Combined with user confirmation configuration, this ensures sensitive actions like trade execution require explicit user approval before proceeding.

Question 3:

An organization is deploying multiple AI agents that need to access different enterprise tools and data sources on behalf of users. Each agent requires its own identity with scoped permissions and integration with their existing Okta identity provider. Which service should they use?

  1. Amazon Bedrock Agents with IAM roles
  2. Amazon Bedrock AgentCore Identity
  3. AWS IAM Identity Center with SAML federation
  4. Amazon Cognito User Pools
Show Answer

Answer: B – Amazon Bedrock AgentCore Identity provides robust identity and access management for agents at scale. It’s compatible with existing identity providers (including Okta) without requiring user migration, assigns unique workload identities to agents, and provides centralized identity management regardless of deployment environment.

Question 4:

A healthcare company uses Amazon Bedrock to generate patient-facing content. They need to ensure responses don’t contain hallucinated medical information and are always grounded in the reference documents provided. Which Guardrails feature provides the MOST reliable hallucination detection?

  1. Content filters set to High
  2. Contextual grounding check
  3. Denied topics for medical advice
  4. Automated Reasoning checks
Show Answer

Answer: D – Automated Reasoning checks use formal verification methods grounded in mathematical logic to validate AI-generated outputs. They provide provably correct, auditable assessments and can detect hallucinations, suggest corrections, and highlight unstated assumptions — making them the most reliable option for critical healthcare content. Contextual grounding is useful but probabilistic, while Automated Reasoning is deterministic.

Question 5:

A company wants to improve their foundation model’s performance on a specific classification task but has limited labeled data (only 50 examples). They do have access to a high-quality larger model and 5,000 unlabeled prompts representative of their use case. Which customization approach is MOST appropriate?

  1. Instruction fine-tuning with the 50 labeled examples
  2. Continued pre-training with domain documents
  3. Model distillation using the larger model as teacher
  4. Reinforcement fine-tuning with a reward function
Show Answer

Answer: C – Model distillation transfers knowledge from a larger “teacher” model to a smaller “student” model. The company provides their 5,000 unlabeled prompts, Bedrock generates high-quality responses from the teacher model, and uses those to fine-tune the student. This achieves teacher-model quality at lower cost without requiring labeled data. With only 50 labeled examples, instruction fine-tuning would likely underperform.

Frequently Asked Questions

What is a Bedrock Knowledge Base?

A Bedrock Knowledge Base connects your data sources (S3, web pages, Confluence, etc.) to foundation models via RAG. It automatically chunks documents, generates embeddings, stores them in a vector database, and retrieves relevant context to ground model responses in your data.

What are Bedrock Guardrails?

Guardrails are configurable safety controls that filter harmful content, block denied topics, mask PII, and verify response grounding. They can be applied to any Bedrock model call, agent, or knowledge base to ensure responsible AI usage within your organization’s policies.

How do Bedrock Agents work?

Bedrock Agents use a foundation model to break down user requests into steps, determine which tools/APIs to call (action groups), execute them, and synthesize results. They support multi-step reasoning, code execution, memory across sessions, and can collaborate with other agents.

References