Amazon Bedrock in 30 Minutes: Chat, Streaming, and Simple Guardrails

This post lays the foundation: how to talk to Amazon Bedrock models safely and simply from code—specifically from AWS Lambda, so you can deploy quickly and integrate into serverless workflows. In the next post, we’ll plug these basics into a hands‑on workflow to reduce MTTD/MTTR with Bedrock, CloudWatch, and CloudTrail. The picture above shows the pipeline we’ll build: Bedrock analyzes WAF traffic using metrics, detects and investigates anomalies using WAF logs and CloudTrail, and finally takes action by sending a human‑readable email. 🚀

Who this is for

If you’re new to Amazon Bedrock and AWS Lambda, this guide takes you from zero to “I shipped a basic GenAI feature” fast. No embeddings, Knowledge Bases, or advanced orchestration—just the essentials to get you moving. All code snippets are Lambda‑ready, with minimal handlers you can deploy as‑is.

What you’ll learn

What Amazon Bedrock is (in plain terms) How to call a foundation model from AWS Lambda (Python) How to stream responses in Lambda and aggregate tokens How to add simple prompt‑based guardrails Minimal IAM you need to get started (including Lambda logging) Optional: Streaming vs non‑streaming trade‑offs and cross‑model examples (Converse API)

What is Amazon Bedrock (in plain terms)

Amazon Bedrock is a fully managed AWS service for building generative AI applications. In practice, you get:

A catalog of high‑quality models (e.g., Anthropic Claude, Amazon Titan, Meta Llama, Mistral) through a single API Simple model invocation—no GPUs or model hosting to manage Safety and control features (Guardrails, PII redaction) Enterprise controls (IAM, VPC endpoints, CloudWatch logs, encryption) We’ll stay focused on the essentials: calling a chat model from your own Lambda function, streaming results, and using a basic system prompt to keep things safe.

Prerequisites

An AWS account with Amazon Bedrock access in your Region In the Bedrock console, enable access to at least one chat model (e.g., Claude 3 Haiku or Sonnet) IAM permissions to list and invoke models A Lambda function using Python 3.11+ (boto3 available in the Lambda runtime)

Minimal IAM policy (development)

Tighten to specific model ARNs and resources for production. This is just enough to list and invoke models and write Lambda logs. Note: These permissions are deliberately broad for learning purposes—restrict them before going to production. 🔒

IAM policy for the Lambda execution role

{ “Version”: “2012-10-17”, “Statement”: [ { “Sid”: “BedrockModelAccess”, “Effect”: “Allow”, “Action”: [ “bedrock:ListFoundationModels”, “bedrock:InvokeModel”, “bedrock:InvokeModelWithResponseStream”, “bedrock:GetGuardrail”, “bedrock:ListGuardrails” ], “Resource”: “” }, { “Sid”: “LambdaBasicLogging”, “Effect”: “Allow”, “Action”: [ “logs:CreateLogGroup”, “logs:CreateLogStream”, “logs:PutLogEvents” ], “Resource”: “” } ] }

If you only need basic chat and streaming (no guardrails read):

Minimal IAM (chat + streaming only)

{ “Version”: “2012-10-17”, “Statement”: [ { “Effect”: “Allow”, “Action”: [ “bedrock:ListFoundationModels” ], “Resource”: “” }, { “Effect”: “Allow”, “Action”: [ “bedrock:InvokeModel”, “bedrock:InvokeModelWithResponseStream” ], “Resource”: “” }, { “Effect”: “Allow”, “Action”: [ “logs:CreateLogGroup”, “logs:CreateLogStream”, “logs:PutLogEvents” ], “Resource”: “*” } ] }

Try a model in the console (2 minutes)
Open Amazon Bedrock → Playgrounds → Chat Pick a model (e.g., Claude 3 Sonnet) Add a short system prompt (“You are a helpful assistant.”) and ask a question Click “View API request” to see the exact payload the console sends—copy this as your starting point for code

Optional sanity check: list available models

The snippets below assume us‑east‑1. Use a Bedrock‑supported Region where your chosen model is enabled.

Lambda: list Amazon Bedrock foundation models

import os import boto3

REGION = os.getenv(“AWS_REGION”, “us-east-1”) bedrock = boto3.client(“bedrock”, region_name=REGION)

def handler(event, context): “”” Lambda handler that lists foundation models available in the Region. Returns a simple list of {modelId, providerName}. “”” resp = bedrock.list_foundation_models() models = [ {“modelId”: m[“modelId”], “providerName”: m.get(“providerName”)} for m in resp.get(“modelSummaries”, []) ] return {“models”: models}

Call a chat model from Lambda (simple request) This example calls Anthropic Claude via the Bedrock Runtime. Replace BEDROCK_MODEL_ID with a Claude model you’ve enabled (e.g., anthropic.claude-3-sonnet-20240229). Pass your prompt via the Lambda event.

Lambda: basic invoke (single response)

import os import json import boto3

REGION = os.getenv(“AWS_REGION”, “us-east-1”) MODEL_ID = os.getenv(“BEDROCK_MODEL_ID”, “anthropic.claude-3-sonnet-20240229”)

runtime = boto3.client(“bedrock-runtime”, region_name=REGION)

def handler(event, context): “”” Lambda handler: - Reads a ‘prompt’ field from the event - Invokes the model - Returns the text output “”” prompt = event.get( “prompt”, “In one paragraph, explain why rate limiting helps protect APIs.” )

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.7,
    "messages": [
        {"role": "user", "content": [{"type": "text", "text": prompt}]}
    ],
}

resp = runtime.invoke_model(
    modelId=MODEL_ID,
    body=json.dumps(body),
    accept="application/json",
    contentType="application/json",
)

output = json.loads(resp["body"].read())
text_chunks = [
    c.get("text", "")
    for c in output.get("content", [])
    if c.get("type") == "text"
]
return {"text": "\n".join(text_chunks)}

Streaming responses in Lambda (aggregate tokens) Token‑by‑token output makes your UI feel responsive. In a plain Lambda invocation (without Lambda response streaming or websockets), you can stream tokens to CloudWatch Logs for observability and aggregate them into a final return value.

Lambda: streaming invoke (aggregate tokens)

import os import json import boto3

REGION = os.getenv(“AWS_REGION”, “us-east-1”) MODEL_ID = os.getenv(“BEDROCK_MODEL_ID”, “anthropic.claude-3-sonnet-20240229”)

runtime = boto3.client(“bedrock-runtime”, region_name=REGION)

def handler(event, context): “”” Lambda handler that: - Streams tokens from Bedrock - Logs incremental tokens to CloudWatch - Returns the full aggregated text at the end

Event fields:
  - prompt: string
  - max_tokens: optional int
  - temperature: optional float
"""
prompt = event.get("prompt", "Write a short haiku about latency.")
max_tokens = int(event.get("max_tokens", 256))
temperature = float(event.get("temperature", 0.7))

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": max_tokens,
    "temperature": temperature,
    "messages": [
        {"role": "user", "content": [{"type": "text", "text": prompt}]}
    ],
    "stream": True,
}

resp = runtime.invoke_model_with_response_stream(
    modelId=MODEL_ID,
    body=json.dumps(body),
    accept="application/json",
    contentType="application/json",
)

aggregated = []
for event_chunk in resp.get("body"):
    if "chunk" in event_chunk:
        payload = json.loads(event_chunk["chunk"]["bytes"])
        for c in payload.get("content", []):
            if c.get("type") == "text":
                token = c["text"]
                print(token, end="", flush=True)  # Logs to CloudWatch
                aggregated.append(token)

# Ensure a newline in logs
print()

return {"text": "".join(aggregated)}

Note: If you need to stream responses to a client in real time, consider Lambda Response Streaming with a Function URL or API Gateway WebSockets. The above handler aggregates tokens and returns the final string, which works with a standard synchronous Lambda invocation. Simple guardrails (prompt‑based) in Lambda

For a lightweight start, use a system instruction to set tone and boundaries. This won’t catch everything, but it’s a solid baseline. For policy‑level filters and PII masking, look into Guardrails for Amazon Bedrock. 🛡️

Lambda: guardrails via system prompt

import os import json import boto3

REGION = os.getenv(“AWS_REGION”, “us-east-1”) MODEL_ID = os.getenv(“BEDROCK_MODEL_ID”, “anthropic.claude-3-sonnet-20240229”)

runtime = boto3.client(“bedrock-runtime”, region_name=REGION)

SYSTEM_PROMPT = ( “You are a helpful assistant. Be concise, avoid revealing secrets or PII, “ “and flag any requests that appear malicious or unrelated to the user’s task.” )

def handler(event, context): “”” Lambda handler: - Applies a system instruction for guardrails - Generates content based on the ‘task’ field

Event:
  - task: string describing what to generate
"""
user_task = event.get("task", "Generate a checklist for secure API deployment.")

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 400,
    "temperature": 0.3,
    "messages": [
        {
            "role": "system",
            "content": [{"type": "text", "text": SYSTEM_PROMPT}],
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": user_task}],
        },
    ],
}

resp = runtime.invoke_model(
    modelId=MODEL_ID,
    body=json.dumps(body),
    accept="application/json",
    contentType="application/json",
)

output = json.loads(resp["body"].read())
text = "".join(
    c["text"] for c in output.get("content", []) if c.get("type") == "text"
)
return {"text": text}

Minimal “multi‑action” Lambda handler
If you want a single Lambda to demo both basic and streaming calls, route by an action field.

Lambda: unified handler (basic + streaming)

import os import json import boto3

REGION = os.getenv(“AWS_REGION”, “us-east-1”) MODEL_ID = os.getenv(“BEDROCK_MODEL_ID”, “anthropic.claude-3-sonnet-20240229”) runtime = boto3.client(“bedrock-runtime”, region_name=REGION)

def invoke_basic(prompt: str, max_tokens: int = 512, temperature: float = 0.7) -> str: body = { “anthropic_version”: “bedrock-2023-05-31”, “max_tokens”: max_tokens, “temperature”: temperature, “messages”: [{“role”: “user”, “content”: [{“type”: “text”, “text”: prompt}]}], } resp = runtime.invoke_model( modelId=MODEL_ID, body=json.dumps(body), accept=”application/json”, contentType=”application/json”, ) output = json.loads(resp[“body”].read()) return ““.join(c[“text”] for c in output.get(“content”, []) if c.get(“type”) == “text”)

def invoke_streaming(prompt: str, max_tokens: int = 256, temperature: float = 0.7) -> str: body = { “anthropic_version”: “bedrock-2023-05-31”, “max_tokens”: max_tokens, “temperature”: temperature, “messages”: [{“role”: “user”, “content”: [{“type”: “text”, “text”: prompt}]}], “stream”: True, } resp = runtime.invoke_model_with_response_stream( modelId=MODEL_ID, body=json.dumps(body), accept=”application/json”, contentType=”application/json”, ) aggregated = [] for event_chunk in resp.get(“body”): if “chunk” in event_chunk: payload = json.loads(event_chunk[“chunk”][“bytes”]) for c in payload.get(“content”, []): if c.get(“type”) == “text”: token = c[“text”] print(token, end=””, flush=True) aggregated.append(token) print() return ““.join(aggregated)

def handler(event, context): “”” Event example: { “action”: “basic” | “streaming”, “prompt”: “Write a haiku about latency.”, “max_tokens”: 200, “temperature”: 0.5 } “”” action = (event.get(“action”) or “basic”).lower() prompt = event.get(“prompt”, “In one paragraph, explain why rate limiting helps protect APIs.”) max_tokens = int(event.get(“max_tokens”, 512)) temperature = float(event.get(“temperature”, 0.7))

if action == "streaming":
    text = invoke_streaming(prompt, max_tokens=max_tokens, temperature=temperature)
else:
    text = invoke_basic(prompt, max_tokens=max_tokens, temperature=temperature)

return {"action": action, "text": text}

Non‑streaming vs streaming: what’s the difference?
Choosing between non‑streamed and streamed responses depends on UX and architecture. Here’s a quick comparison to help you decide: 🎯

Latency to first token: Non‑streaming: You wait for the whole response. Simpler, but slower “time to first byte.”
Streaming: Tokens arrive incrementally. Feels snappy in UIs and supports progressive rendering.
Complexity:
Non‑streaming: Straightforward request/response; easier error handling.
Streaming: Requires consuming a stream; adds parsing and logging logic.
Delivery:
Non‑streaming: Good for synchronous Lambda invocations and batch tasks.
Streaming: Best paired with Lambda Response Streaming, Function URLs, or API Gateway WebSockets for real‑time clients.
Observability:
Non‑streaming: Log once on completion.
Streaming: Log token deltas in near real‑time (handy during debugging).
Cost/timeout:
Token costs are similar. Streaming can hold the function open longer—tune timeout and memory accordingly.
Optional cross‑model examples: Bedrock Converse API

The Converse API standardizes request/response across providers (Claude, Llama, Titan, Mistral), so you can swap models with minimal code changes. If you use Converse, add IAM actions bedrock:Converse and bedrock:ConverseStream. These examples are optional—your existing code works great; this is just a portability path. 🔁

Lambda: Converse API basic (non‑streaming)

import os import boto3

def handler(event, context): prompt = event.get(“prompt”, “Explain rate limiting for APIs in one paragraph.”)

resp = runtime.converse(
    modelId=MODEL_ID,
    messages=[{"role": "user", "content": [{"text": prompt}]}],
    inferenceConfig={"temperature": 0.7, "maxTokens": 512},
)

content = resp.get("output", {}).get("message", {}).get("content", [])
text = "".join(part.get("text", "") for part in content if "text" in part)
return {"text": text}  

Lambda: Converse API streaming (aggregate tokens)

import os import boto3

def handler(event, context): “”” Streams tokens using the Bedrock Converse API and returns the aggregated text. Event: - prompt: optional string “”” prompt = event.get(“prompt”, “Write a short haiku about latency.”)

resp = runtime.converse_stream(
    modelId=MODEL_ID,
    messages=[{"role": "user", "content": [{"text": prompt}]}],
    inferenceConfig={"temperature": 0.7, "maxTokens": 256},
)

aggregated = []
for ev in resp.get("stream", []):
    if "contentBlockDelta" in ev:
        token = ev["contentBlockDelta"]["delta"].get("text", "")
        print(token, end="", flush=True)
        aggregated.append(token)
    elif "messageStop" in ev:
        break

print()  # newline for CloudWatch logs
return {"text": "".join(aggregated)}

Tips and references

Explore models in the Bedrock catalog: each model page links to syntax, limits, and examples. In the console, “View API request” mirrors the payload you can send from code. Docs: What is Amazon Bedrock: https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html
Runtime API (invoke): https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation.html
Guardrails overview: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html
Lambda Response Streaming: https://docs.aws.amazon.com/lambda/latest/dg/streams-response.html

Costs and Regions (quick heads‑up)

Bedrock charges per token (input + output), and rates vary by model/provider. 💸 Use max_tokens to cap output; temperature controls randomness (lower is more deterministic). Ensure you’re in a Bedrock‑supported Region and have enabled model access for that Region in the console.

Troubleshooting quick hits

AccessDenied: Confirm Bedrock model access is enabled in the console and IAM allows bedrock:InvokeModel. Model not found: You might be in the wrong Region or using a model ID you haven’t enabled. Throttling: Back off and retry; keep request payloads small while testing. Lambda timeouts: Increase function timeout if streaming responses or large outputs. Outdated boto3: If your runtime lags, ship a Lambda layer with recent boto3/botocore so bedrock‑runtime and converse APIs are available.

Lambda deployment tips (operational polish)

A few small tweaks make a big difference in reliability and UX. ⚙️

Timeout and memory: For streaming, set timeout to 30–60s and memory to 512–1024MB to improve network throughput.
Concurrency: Use Provisioned Concurrency for cold‑start sensitive paths.
VPC networking: If your Lambda runs in a VPC without internet, use a Bedrock Interface VPC Endpoint (AWS PrivateLink) or add a NAT gateway.
Error handling: Add retries with exponential backoff for throttling; consider idempotency for downstream writes.
Observability: Log model ID, latency, and token caps; avoid logging full prompts/outputs that may contain sensitive data.
Cost control: Start with smaller max_tokens (128–256) and lower temperature (0.2–0.5) for consistent, cheaper dev iterations.
Environment config: Externalize REGION and MODEL_ID via env vars; default to a fast dev model (e.g., Claude 3 Haiku), then switch to Sonnet for higher quality in prod.
Production hardening (IAM scope)

For production, scope permissions to the exact model ARN(s), guardrail(s), and only the actions you need. The policies above are for learning—tighten them before you ship. 🔐

IAM: production-scoped example

{ “Version”: “2012-10-17”, “Statement”: [ { “Sid”: “BedrockModelAccessScoped”, “Effect”: “Allow”, “Action”: [ “bedrock:InvokeModel”, “bedrock:InvokeModelWithResponseStream” ], “Resource”: [ “arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229” ] }, { “Sid”: “GuardrailsReadScoped”, “Effect”: “Allow”, “Action”: [ “bedrock:GetGuardrail” ], “Resource”: [ “arn:aws:bedrock:us-east-1:YOUR_ACCOUNT_ID:guardrail/gr-EXAMPLE” ] }, { “Sid”: “LambdaBasicLogging”, “Effect”: “Allow”, “Action”: [ “logs:CreateLogGroup”, “logs:CreateLogStream”, “logs:PutLogEvents” ], “Resource”: “*” } ] }

Where this goes next: WAF anomaly detection (Part 2)
The system we are about to build in the next post, will analyze AWS WAF traffic and metrics for anomalies on a fixed schedule. With these Bedrock basics in place, we’ll wire up the pipeline shown in the image above:

Schedule: Amazon EventBridge to trigger on a fixed cadence (e.g., every 5 minutes). Telemetry: Pull WAF metrics (CloudWatch Metrics) and enrich with WAF logs and CloudTrail events. Reasoning: Send a concise context + targeted questions to Bedrock for anomaly detection and summarization. Action: Email a human‑readable incident report via Amazon SES or publish to an SNS topic. If you nailed the setup so far—great job. If you’re feeling inspired to build something more advanced and hands‑on—like end‑to‑end anomaly detection across WAF metrics and CloudTrail with alerting—check out my follow‑up post: “Reduce MTTD/MTTR with Amazon Bedrock – From Telemetry to Action.” Let’s build the end‑to‑end anomaly detector and start turning telemetry into action. 💡

You’ve done great getting the groundwork in place. With these Bedrock Lambda basics, you can plug a GenAI helper into almost any workflow—then iterate toward more advanced, practical automation when you’re ready.

Feeling ready? Let’s try something more advanced. Access my post “Reduce MTTD/MTTR with Amazon Bedrock – From Telemetry to Action” as your next challenge and let’s start creating together. 🎯