Bedrock Agents, Knowledge Bases & Guardrails

Amazon Bedrock Agents, Knowledge Bases & Guardrails – Complete Guide

Amazon Bedrock provides a comprehensive platform for building, deploying, and managing generative AI applications. This deep-dive guide covers the advanced capabilities of Bedrock’s key components: Knowledge Bases for RAG, Agents for autonomous task execution, AgentCore for production deployment, Guardrails for safety, Model Evaluation, Fine-tuning, and Prompt Management.

Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases is a fully managed RAG (Retrieval-Augmented Generation) capability that connects foundation models to proprietary data sources. It handles the entire workflow from data ingestion, chunking, embedding, storage, to retrieval and prompt augmentation.

Knowledge Base Types

  • Custom Knowledge Base – You choose the vector store, embedding model, chunking strategy, and data sources. Provides full control over the RAG pipeline.
  • Managed Knowledge Base (GA June 2026) – Amazon Bedrock manages the underlying infrastructure including vector storage, embeddings, re-ranking, and retrieval optimization. Supports auto-scaling, agentic retrieval for multi-hop reasoning, and multimodal data ingestion.

Data Sources

  • Amazon S3 – Primary data source supporting documents in PDF, TXT, MD, HTML, CSV, DOC/DOCX, XLS/XLSX, and JSON formats.
  • Confluence – Connects to Atlassian Confluence workspaces for ingesting wiki pages and documentation.
  • Microsoft SharePoint – Ingests documents from SharePoint Online sites and libraries.
  • Salesforce – Connects to Salesforce objects like Knowledge Articles and custom objects.
  • Web Crawler – Crawls and ingests web pages from specified URLs with configurable depth and scope.
  • Google Drive – Connects to Google Drive for document ingestion (Managed KB).
  • OneDrive – Connects to Microsoft OneDrive (Managed KB).

Chunking Strategies

  • Default Chunking – Splits content into chunks of approximately 300 tokens, honoring sentence boundaries.
  • Fixed-size Chunking – Splits content into chunks of a user-defined token size (1–8192 tokens) with configurable overlap percentage for context continuity.
  • Semantic Chunking – Groups text by meaning using embedding similarity. Breakpoints are created when semantic similarity between consecutive sentences drops below a threshold. Produces more coherent chunks but is computationally more expensive.
  • Hierarchical Chunking – Creates parent-child chunk relationships. Parent chunks provide broader context while child chunks contain specific details. During retrieval, child chunks are returned with parent context for better comprehension.
  • No Chunking – Treats each document as a single chunk. Best for short documents or pre-chunked data.

Embedding Models

  • Amazon Titan Text Embeddings V2 – AWS native model supporting configurable output dimensions (256, 512, or 1024). Supports text normalization and multiple languages. Optimized for RAG workloads with high accuracy-to-cost ratio.
  • Cohere Embed – Multilingual embedding model available in English and multilingual variants. Supports input types (search_document, search_query) for optimized retrieval.
  • Amazon Titan Multimodal Embeddings – Supports both text and image embeddings in a unified vector space.

Vector Stores

  • Amazon OpenSearch Serverless – Default option with serverless scaling. Supports hybrid search (semantic + keyword), metadata filtering, and automatic index management.
  • Amazon OpenSearch Service (Managed Cluster) – Added March 2025. Provides more control over cluster configuration, instance types, and scaling policies.
  • Amazon Aurora PostgreSQL – Uses pgvector extension. Supports hybrid search (added April 2025) and integrates with existing Aurora databases.
  • Pinecone – Third-party managed vector database with serverless and pod-based options.
  • Redis Enterprise Cloud – In-memory vector store for low-latency retrieval.
  • MongoDB Atlas – Document database with vector search capabilities. Supports hybrid search (added April 2025).
  • Amazon Neptune Analytics – Graph + vector search for knowledge graph use cases.
  • Amazon S3 – Added July 2025 for cost-effective vector storage with S3-native retrieval.

Hybrid Search

  • Combines semantic (vector) search with keyword (lexical) search for improved retrieval accuracy.
  • Semantic search captures meaning and handles paraphrasing; keyword search handles exact matches, names, and codes.
  • Supported on OpenSearch Serverless, OpenSearch Managed Clusters, Aurora PostgreSQL (April 2025), and MongoDB Atlas (April 2025).
  • Results are combined using Reciprocal Rank Fusion (RRF) to produce a unified ranking.

Metadata Filtering

  • Each document can have a metadata JSON file (up to 10 KB) with custom attributes.
  • Filters are applied as pre-filtering before vector search, reducing the search space.
  • Supports operators: equals, notEquals, greaterThan, lessThan, in, notIn, startsWith, stringContains.
  • Enables multi-tenant RAG by filtering documents based on tenant ID, access controls, or document categories.

Advanced Parsing

  • Foundation Model Parsing – Uses an FM (e.g., Claude) to extract and interpret content from complex documents including PDFs with tables, charts, and images. Provides customizable extraction prompts.
  • Amazon Textract Parsing – OCR-based parsing for scanned documents and images.
  • Standard Parsing – Default text extraction for supported document formats.
  • FM parsing is ideal for documents with complex layouts, embedded images, or non-standard formatting that standard parsers cannot handle accurately.

📖 Deep Dive Guides: Bedrock vs SageMaker | RAG Architecture | Prompt Engineering | Responsible AI | AI Services Decision Guide

Amazon Bedrock Agents

Amazon Bedrock Agents enables developers to build autonomous AI agents that can plan multi-step tasks, invoke APIs, and interact with knowledge bases to accomplish complex goals. Agents use foundation models for reasoning and orchestration.

Agent Architecture

  • Foundation Model – The reasoning engine that interprets user requests, plans actions, and generates responses.
  • Instructions – System-level prompts that define the agent’s persona, capabilities, and behavioral guidelines.
  • Action Groups – Collections of tools/APIs the agent can invoke, defined via OpenAPI schemas or function definitions.
  • Knowledge Bases – Connected data sources for RAG-based retrieval to ground responses in proprietary data.
  • Guardrails – Safety filters applied to agent inputs and outputs.

Orchestration

  • Agents use a ReAct (Reasoning + Acting) orchestration loop by default: the FM reasons about the task, decides on an action, executes it, observes results, and iterates.
  • Custom Orchestration – Use a Lambda function to define custom orchestration logic, overriding the default ReAct loop for specialized workflows.
  • The orchestration loop continues until the agent determines it has sufficient information to generate a final response or reaches the maximum iteration limit.

Action Groups & Tool Use

  • Action groups define the tools available to the agent using either OpenAPI schemas or simplified function definitions.
  • Lambda Functions – Backend logic executed when the agent invokes an action. Receives the API operation, parameters, and session context.
  • Return of Control (ROC) – Instead of executing a Lambda, the agent returns control to the calling application with the action details. The application executes the action and returns results to continue the conversation.
  • Code Interpreter – Built-in action group that allows the agent to generate and execute Python code in a secure sandbox for data analysis, calculations, and chart generation.
  • User Confirmation – Configurable step where the agent asks for user approval before executing sensitive actions.

Multi-Step Reasoning

  • Agents decompose complex requests into sequential sub-tasks, executing each step and using results to inform the next.
  • Supports query decomposition for knowledge base retrieval – breaking a complex question into simpler sub-queries.
  • Chain-of-thought traces are available for debugging and observability.

Inline Agents

  • Dynamically configure agent capabilities at runtime without pre-creating agent resources.
  • Specify instructions, action groups, knowledge bases, and guardrails in the API call itself.
  • Enables dynamic workflow adaptation where agent roles and tools change based on context.
  • Launched with multi-agent collaboration GA (March 2025).

Multi-Agent Collaboration (Supervisor/Child)

  • Supervisor Agent – Orchestrates the workflow by breaking requests into sub-tasks and delegating to specialized child agents.
  • Child Agents (Collaborator Agents) – Specialized agents focused on specific domains (e.g., checking maintenance, analyzing alarms, evaluating KPIs).
  • Supervisor routes tasks, consolidates outputs, and generates unified final responses.
  • Supports both SUPERVISOR mode (supervisor decides routing) and SUPERVISOR_ROUTER mode (classifier-based routing).
  • GA since March 2025 with support for up to 5 collaborator agents per supervisor.

Agent Memory

  • Session Memory (Short-term) – Maintains conversation context within a session. Automatically managed within the session window (configurable idle timeout).
  • Long-term Memory – Persists information across sessions. Extracts key facts, preferences, and context from conversations and stores them for future sessions.
  • Memory enables personalized experiences where agents remember user preferences, past interactions, and ongoing tasks.
  • Supports metadata on memory records for organizing, filtering, and routing retrieval.

Prompt Engineering for Agents

  • System Instructions – Define the agent’s role, personality, constraints, and response format.
  • Advanced Prompts – Customize prompts at each orchestration step: pre-processing, orchestration, knowledge base response generation, and post-processing.
  • Prompt Templates – Use variables (e.g., $tool_results$, $knowledge_base_results$) to structure how the agent processes information.
  • Best practices: Be specific about capabilities, define clear boundaries, provide examples of expected behavior, and specify output formats.

Amazon Bedrock AgentCore

Amazon Bedrock AgentCore (GA June 2026) is a code-first platform to build, deploy, connect, and optimize AI agents at scale. It provides production-grade infrastructure including runtime, identity, tools, memory, observability, and evaluation — regardless of the framework or model used.

Managed Deployment (AgentCore Runtime & Harness)

  • AgentCore Harness – The managed orchestration layer (“body”) for agents. Handles the orchestration loop, tool execution, context window management, state persistence, failure recovery, and session isolation.
  • Define agents via configuration: model, tools, skills, instructions. AgentCore assembles and runs the agent loop.
  • Each agent runs in its own isolated environment with filesystem, shell, memory, and web browsing capabilities.
  • Supports any open-source framework (LangGraph, CrewAI, Strands) and any model.
  • Provides MicroVM-based isolation for secure execution of tools and code.

AgentCore Identity & Access

  • AgentCore Identity – Provides robust identity and access management for agents at scale.
  • Agents can access resources/tools on behalf of users or themselves with pre-authorized user consent.
  • Compatible with existing identity providers (Okta, Auth0, Entra ID) — no user migration required.
  • Workload Identities – Unique identities assigned to agents for authentication and authorization.
  • Centralized identity management regardless of deployment environment (AgentCore Runtime, self-hosted, hybrid).
  • Eliminates need for custom access controls and identity infrastructure.

Tool Management (AgentCore Gateway)

  • AgentCore Gateway – Unified MCP (Model Context Protocol) gateway for tool discovery and invocation.
  • Serves as a single endpoint for accessing tools from different teams, organizations, and applications.
  • Fine-grained access control with gateway interceptors for per-principal permissions.
  • Supports the AWS-curated skills catalog accessible with a single toggle.
  • Web Search tool enables agents to ground responses in current web knowledge.

Memory Management

  • Memory provisions automatically when a harness is created.
  • Extracts useful information from short-term memory and stores as long-term memory records.
  • Supports strictly consistent metadata on memory records for organized retrieval.
  • Agents recognize returning users without additional setup.

Quality Evaluations

  • Batch Evaluation – Define what “good” looks like and measure candidate changes against quality bars at scale.
  • Customers specify evaluation criteria and AgentCore runs assessments across multiple test cases.
  • Supports comparison of agent versions before deployment.

A/B Testing

  • Controlled comparison between agent versions by splitting live production traffic.
  • Measures outcomes side-by-side to confirm improvements hold under real conditions.
  • Enables data-driven decisions about agent updates and configuration changes.

Policy Controls

  • AgentCore Policy – Authorization capability that controls which actions agents are authorized to take.
  • Integrates with Amazon Bedrock Guardrails for content safety and prompt injection protection.
  • Provides enterprise defenses against security and safety risks in agent workloads.
  • Supports sensitive data exposure prevention and prompt injection attack detection.

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides configurable safeguards for generative AI applications. It helps detect and filter harmful content, block undesirable topics, redact sensitive information, and reduce hallucinations — applied to both user inputs and model responses.

Content Filters

  • Detect and filter harmful content across six categories with configurable strength levels (None, Low, Medium, High):
  • Hate – Content that discriminates, criticizes, insults, or dehumanizes based on identity attributes.
  • Insults – Content that demeans, bullies, or includes negative/derogatory language.
  • Sexual – Content that indicates sexual interest, activity, or arousal.
  • Violence – Content that glorifies or threatens physical harm to individuals or groups.
  • Misconduct – Content related to criminal activity, including fraud, theft, and illegal substance use.
  • Prompt Attacks – Detects prompt injection and jailbreak attempts designed to bypass safety controls.
  • Supports tiered filtering (announced June 2025) for cost-optimized content moderation at scale.

Image Content Filters (GA March 2025)

  • Extends content filtering to image modality — moderates both image and text content.
  • Applies to all categories: hate, insults, sexual, violence, misconduct, and prompt attacks.
  • Blocks up to 88% of harmful multimodal content.
  • Industry-leading safeguards for applications handling user-uploaded images or model-generated images.

Denied Topics

  • Define custom topics that the AI should refuse to engage with.
  • Provide a natural language definition and optional sample phrases for each denied topic.
  • Example: A bank’s AI assistant can deny conversations about investment advice or cryptocurrencies.
  • Applied to both user inputs (block the question) and model outputs (block the response).

Word Filters

  • Block specific words or phrases from appearing in inputs or outputs.
  • Supports exact match and managed word lists (e.g., profanity lists).
  • Useful for blocking competitor names, internal project codes, or inappropriate terminology.

Sensitive Information Filters

  • PII Detection – Identifies personally identifiable information including names, email addresses, phone numbers, SSNs, credit card numbers, and more.
  • Regex Patterns – Define custom patterns for domain-specific sensitive data (e.g., account numbers, internal IDs).
  • Actions: Block (reject the entire message) or Anonymize/Redact (mask the PII and allow the message through).
  • Supports over 30 built-in PII entity types.

Contextual Grounding Check

  • Detects hallucinations in RAG and summarization use cases.
  • Grounding – Validates that model responses are factually consistent with the provided reference source/context.
  • Relevance – Checks that the response is relevant to the user’s query.
  • Configurable thresholds for grounding and relevance scores.
  • Filters over 75% of hallucinated responses in RAG applications.

Automated Reasoning Checks

  • Uses formal verification methods grounded in mathematical logic to validate AI-generated outputs.
  • Detects hallucinations, suggests corrections, and highlights unstated assumptions.
  • Provides provably correct, auditable assessments with deterministic formal logic.
  • First and only safeguard using Automated Reasoning to prevent factual errors.
  • Policy refinement workflows added June 2026 for iterative improvement.

ApplyGuardrail API

  • Standalone API to apply guardrails independently of model invocation.
  • Enables guardrail evaluation on any text content — even from non-Bedrock models or external systems.
  • Use cases: validate content from third-party LLMs, pre-screen user inputs, post-process outputs from any source.
  • InvokeGuardrailChecks API – Enhanced API for agentic AI applications requiring step-level guardrail checks.

Code Domain Support (Jan 2025)

  • Protects against undesirable content within code elements.
  • Inspects user prompts, comments, variables, function names, and string literals.
  • Prevents injection of harmful content via code constructs.

Amazon Bedrock Model Evaluation

Amazon Bedrock Evaluations helps you compare, evaluate, and select foundation models for your specific use cases. It supports automatic evaluation, human evaluation, and LLM-as-a-judge workflows.

Automatic Evaluation

  • Evaluate models using built-in metrics without human involvement.
  • Accuracy – Measures correctness of model responses using metrics like BERTScore, ROUGE, and exact match.
  • Robustness – Tests model consistency across paraphrased inputs and adversarial perturbations.
  • Toxicity – Measures harmful or inappropriate content in model outputs.
  • Supports custom datasets in JSONL format with prompt-response-reference triples.
  • Can evaluate models running on Bedrock, other cloud providers, or on-premises (GA April 2025).

LLM-as-a-Judge (Preview Dec 2024)

  • Uses a foundation model to evaluate other models with human-like quality assessment.
  • Fraction of the cost and time of human evaluations.
  • Supports custom evaluation criteria and scoring rubrics.

RAG Evaluation

  • Evaluate end-to-end RAG systems including retrieval quality and generation accuracy.
  • Metrics: context relevance, answer faithfulness, answer relevance.
  • Can evaluate fully built applications, not just individual model responses.

Human Evaluation Workflows

  • Set up human evaluation jobs with custom work teams.
  • Evaluators rate model responses on custom criteria (helpfulness, harmlessness, coherence).
  • Supports comparison of multiple models side-by-side.
  • Integrates with Amazon SageMaker Ground Truth for workforce management.

Model Comparison

  • Compare multiple foundation models on the same evaluation dataset.
  • Side-by-side results with statistical significance testing.
  • Helps select optimal model balancing quality, latency, and cost for specific use cases.

Amazon Bedrock Fine-Tuning & Customization

Amazon Bedrock provides multiple model customization techniques to adapt foundation models to specific tasks and domains.

Continued Pre-Training

  • Extend a model’s knowledge by training on unlabeled, domain-specific data.
  • Adapts the model’s language understanding to specialized vocabularies and concepts.
  • Training data format: Plain text documents in S3 (no prompt-completion pairs needed).
  • Best for: Domain adaptation (medical, legal, financial terminology).

Instruction Fine-Tuning

  • Train models on labeled prompt-completion pairs to improve task-specific performance.
  • Training data format: JSONL with {"prompt": "...", "completion": "..."} or chat-format messages.
  • Supports validation datasets for monitoring overfitting.
  • Configurable hyperparameters: epochs, batch size, learning rate, warmup steps.
  • Best for: Improving performance on specific tasks like classification, extraction, or formatting.

Reinforcement Fine-Tuning (RFT) – GA December 2025

  • Advanced customization using reward-based learning without requiring large labeled datasets.
  • Bring your own prompts or use existing Bedrock API invocation logs as training data.
  • Delivers 66% accuracy gains on average over base models.
  • Supported models: Amazon Nova, OpenAI GPT OSS 20B, Qwen 3 32B (Feb 2026).
  • Automates the reinforcement workflow — accessible to developers without deep ML expertise.
  • Built-in evaluation tools to compare RFT model against the base model.
  • Supports iterative fine-tuning: build upon previously customized models for continuous improvement.
  • Training data: JSONL with prompts; rewards are computed by a verifier/judge function you define.

Model Distillation

  • Transfer knowledge from a larger “teacher” model to a smaller “student” model.
  • Provide input prompts in JSONL; Bedrock generates responses from the teacher model and uses them to fine-tune the student.
  • Achieves teacher-model quality at student-model cost and latency.
  • Best for: Reducing inference costs while maintaining quality for specific use cases.

Training Data Format Summary

Method Data Format Data Requirements
Continued Pre-Training Plain text files Unlabeled domain corpus
Instruction Fine-Tuning JSONL (prompt/completion) Min ~100 examples, recommended 1000+
Reinforcement Fine-Tuning JSONL (prompts) + verifier Prompts + reward/judge function
Distillation JSONL (input prompts) Prompts only; teacher generates completions

Amazon Bedrock Prompt Flows & Management

Amazon Bedrock provides tools for creating, managing, and orchestrating prompts and generative AI workflows.

Prompt Management (GA November 2024)

  • Streamlined interface to create, evaluate, version, and share prompts.
  • Prompt Versioning – Each version is linked to its evaluation results. Supports rollback, audit trails, and A/B testing.
  • Prompt Variables – Template variables (e.g., {{context}}, {{question}}) for dynamic prompt construction.
  • Model Selection – Test the same prompt across different foundation models to compare performance.
  • Sharing – Share prompts across teams and projects for collaboration and reuse.
  • Treats prompts as critical as code — version-controlled and reproducible.

Bedrock Flows (Visual Flow Builder)

  • Intuitive visual builder to create, test, and deploy generative AI workflows.
  • Drag-and-drop interface to link Prompts, Agents, Knowledge Bases, Guardrails, and AWS services.
  • Node Types:
    • Prompt Node – Invokes a foundation model with a configured prompt.
    • Agent Node – Invokes a Bedrock Agent for autonomous task execution.
    • Knowledge Base Node – Retrieves relevant information from a Knowledge Base.
    • Condition Node – Routes flow based on conditional logic.
    • Lambda Node – Executes custom business logic.
    • Lex Node – Integrates with Amazon Lex for conversational interfaces.
    • Iterator Node – Loops over collections of items.
    • Collector Node – Aggregates results from parallel or iterated executions.
  • Serverless execution — pricing based on resources consumed (model invocations, Lambda, etc.).
  • Supports versioning and aliases for deployment management.

A/B Testing for Prompts

  • Version-control prompts and compare performance across versions.
  • Use Bedrock Evaluations to measure quality differences between prompt versions.
  • Deploy prompt versions with aliases and switch traffic between versions.
  • Combine with AgentCore A/B testing for full agent-level experimentation.

Comparison: Bedrock Knowledge Bases vs Amazon Kendra vs OpenSearch

Feature Bedrock Knowledge Bases Amazon Kendra Amazon OpenSearch Service
Primary Purpose RAG for generative AI Intelligent enterprise search Full-text search, analytics, vector search
Search Type Semantic + hybrid (keyword) Semantic + keyword (NLU-based) Full-text, keyword, vector (k-NN), hybrid
RAG Integration Native (fully managed) Via Retrieve API + custom orchestration Custom implementation required
Management Fully managed Fully managed Managed clusters or serverless
Data Sources S3, Confluence, SharePoint, Salesforce, Web Crawler, Google Drive, OneDrive 40+ connectors (S3, SharePoint, Salesforce, databases, ServiceNow, etc.) Custom ingestion pipelines
Chunking Fixed, semantic, hierarchical, no chunking Automatic (document passages) Custom (application-managed)
Vector Store Managed or BYO (OpenSearch, Aurora, Pinecone, Redis, MongoDB) Built-in (not configurable) Native k-NN plugin
Metadata Filtering Yes (custom JSON metadata) Yes (document attributes) Yes (field-level filtering)
Access Control Via metadata filtering Native ACL integration (SharePoint, etc.) Fine-grained access control
Multimodal Yes (FM parsing for images/tables) Limited (document text extraction) Yes (with custom embeddings)
Re-ranking Yes (Managed KB) Built-in semantic re-ranking Custom (Learning to Rank plugin)
Best For GenAI applications, RAG pipelines, AI agents Enterprise search portals, FAQ systems, document discovery Custom search, log analytics, observability, full control over retrieval
Pricing Model Pay per query + storage (vector store) Index-based (provisioned capacity) Instance/serverless OCU hours

AWS Certification Exam Practice Questions

Question 1:

A company is building a RAG application using Amazon Bedrock Knowledge Bases. Their documents contain complex tables, charts, and embedded images in PDF format. Standard text extraction is losing critical information. Which parsing approach should they use to improve data quality?

  1. Fixed-size chunking with 512 tokens
  2. Foundation model parsing with a customized extraction prompt
  3. Semantic chunking with sentence boundary detection
  4. Amazon Textract with default settings
Show Answer

Answer: B – Foundation model parsing uses an FM (e.g., Claude) to interpret complex document layouts including tables, charts, and images. It allows customizable extraction prompts to capture the specific information needed. While Textract handles OCR, FM parsing provides superior understanding of document structure and semantics.

Question 2:

A financial services company wants their Bedrock Agent to execute a trade only after receiving explicit user approval. Which feature should they implement?

  1. Guardrails with denied topics
  2. Return of Control with user confirmation
  3. Custom orchestration with Lambda
  4. Multi-agent collaboration with a supervisor
Show Answer

Answer: B – Return of Control (ROC) allows the agent to return the proposed action to the calling application instead of executing it directly. Combined with user confirmation configuration, this ensures sensitive actions like trade execution require explicit user approval before proceeding.

Question 3:

An organization is deploying multiple AI agents that need to access different enterprise tools and data sources on behalf of users. Each agent requires its own identity with scoped permissions and integration with their existing Okta identity provider. Which service should they use?

  1. Amazon Bedrock Agents with IAM roles
  2. Amazon Bedrock AgentCore Identity
  3. AWS IAM Identity Center with SAML federation
  4. Amazon Cognito User Pools
Show Answer

Answer: B – Amazon Bedrock AgentCore Identity provides robust identity and access management for agents at scale. It’s compatible with existing identity providers (including Okta) without requiring user migration, assigns unique workload identities to agents, and provides centralized identity management regardless of deployment environment.

Question 4:

A healthcare company uses Amazon Bedrock to generate patient-facing content. They need to ensure responses don’t contain hallucinated medical information and are always grounded in the reference documents provided. Which Guardrails feature provides the MOST reliable hallucination detection?

  1. Content filters set to High
  2. Contextual grounding check
  3. Denied topics for medical advice
  4. Automated Reasoning checks
Show Answer

Answer: D – Automated Reasoning checks use formal verification methods grounded in mathematical logic to validate AI-generated outputs. They provide provably correct, auditable assessments and can detect hallucinations, suggest corrections, and highlight unstated assumptions — making them the most reliable option for critical healthcare content. Contextual grounding is useful but probabilistic, while Automated Reasoning is deterministic.

Question 5:

A company wants to improve their foundation model’s performance on a specific classification task but has limited labeled data (only 50 examples). They do have access to a high-quality larger model and 5,000 unlabeled prompts representative of their use case. Which customization approach is MOST appropriate?

  1. Instruction fine-tuning with the 50 labeled examples
  2. Continued pre-training with domain documents
  3. Model distillation using the larger model as teacher
  4. Reinforcement fine-tuning with a reward function
Show Answer

Answer: C – Model distillation transfers knowledge from a larger “teacher” model to a smaller “student” model. The company provides their 5,000 unlabeled prompts, Bedrock generates high-quality responses from the teacher model, and uses those to fine-tune the student. This achieves teacher-model quality at lower cost without requiring labeled data. With only 50 labeled examples, instruction fine-tuning would likely underperform.

Frequently Asked Questions

What is a Bedrock Knowledge Base?

A Bedrock Knowledge Base connects your data sources (S3, web pages, Confluence, etc.) to foundation models via RAG. It automatically chunks documents, generates embeddings, stores them in a vector database, and retrieves relevant context to ground model responses in your data.

What are Bedrock Guardrails?

Guardrails are configurable safety controls that filter harmful content, block denied topics, mask PII, and verify response grounding. They can be applied to any Bedrock model call, agent, or knowledge base to ensure responsible AI usage within your organization’s policies.

How do Bedrock Agents work?

Bedrock Agents use a foundation model to break down user requests into steps, determine which tools/APIs to call (action groups), execute them, and synthesize results. They support multi-step reasoning, code execution, memory across sessions, and can collaborate with other agents.

References