AWS 中间件集成 - LangChain中文版文档

专为托管在 AWS Bedrock 上的模型设计的中间件。了解更多关于中间件的信息。

中间件	描述
提示词缓存	通过缓存重复的提示词前缀来降低成本

提示词缓存

通过在 Amazon Bedrock 上缓存频繁重用的提示词前缀，降低推理延迟和输入令牌成本。此中间件会在系统提示词、工具定义和最近的消息之后自动放置缓存检查点，以便模型在后续请求中跳过对先前所见内容的重新计算。提示词缓存适用于以下场景：

具有长且一致的系统提示词的多轮对话
拥有大量跨调用保持不变的工具体系定义的代理
基于文档的问答，用户针对同一上传上下文提出多个问题
具有重复静态内容的批处理工作负载

支持的模型：

Anthropic Claude
Amazon Nova

了解更多关于 AWS Bedrock 提示词缓存策略和限制的信息。缓存内容必须超过 1,024 个令牌，缓存检查点才会生效，具体取决于模型有时可能更多。请参阅支持的模型、区域和限制。

API 参考： BedrockPromptCachingMiddleware

ChatBedrockConverse

from langchain_aws import ChatBedrockConverse
from langchain_aws.middleware.prompt_caching import BedrockPromptCachingMiddleware
from langchain.agents import create_agent

agent = create_agent(
    model=ChatBedrockConverse(model="us.anthropic.claude-sonnet-4-5-20250929-v1:0"),
    system_prompt="<Your long system prompt here>",
    middleware=[BedrockPromptCachingMiddleware(ttl="1h")],
)

ChatBedrock

from langchain_aws import ChatBedrock
from langchain_aws.middleware.prompt_caching import BedrockPromptCachingMiddleware
from langchain.agents import create_agent

agent = create_agent(
    model=ChatBedrock(model="us.anthropic.claude-sonnet-4-5-20250929-v1:0"),
    system_prompt="<Your long system prompt here>",
    middleware=[BedrockPromptCachingMiddleware(ttl="5m")],
)

配置选项

type

string

default:"ephemeral"

缓存类型。对于 ChatBedrock，目前仅支持 'ephemeral'。对于 ChatBedrockConverse，此值被忽略，因为 Converse API 始终使用 "default" 缓存类型。

ttl

string

default:"5m"

缓存内容的生存时间。有效值：'5m' 或 '1h'。请注意，Amazon Nova 模型仅支持 '5m'。

min_messages_to_cache

number

default:"0"

开始缓存前的最小消息数量。

unsupported_model_behavior

string

default:"warn"

使用不支持的模型时的行为。选项：'ignore', 'warn', 或 'raise'。

完整示例

该中间件会缓存每个请求中直到并包括最新消息的内容。在 TTL 窗口内（5 分钟或 1 小时）的后续请求中，先前看到的内容将从缓存中检索而不是重新处理，从而降低成本和延迟。工作原理：

首次请求：系统提示词、工具和用户消息被发送到 API 并缓存
第二次请求：从缓存中检索缓存的内容。只需处理新消息
此模式在每个回合继续，每个请求重用缓存的对话历史

提示词缓存通过缓存令牌来降低 API 成本，但不提供对话记忆。要在跨调用中持久化对话历史，请使用类似 MemorySaver 的检查点器。

from langchain_aws import ChatBedrockConverse
from langchain_aws.middleware.prompt_caching import BedrockPromptCachingMiddleware
from langchain.agents import create_agent
from langchain_core.runnables import RunnableConfig
from langchain.messages import HumanMessage
from langchain.tools import tool
from langgraph.checkpoint.memory import MemorySaver


@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny and 72F."


# System prompt must exceed 1,024 tokens for caching to take effect
LONG_PROMPT = (
    "You are a helpful weather assistant with deep expertise in meteorology, "
    "climate science, and atmospheric phenomena. When answering questions about "
    "weather, provide accurate and up-to-date information. "
    + "You should always strive to give the most helpful response possible. " * 85
)

agent = create_agent(
    model=ChatBedrockConverse(model="us.anthropic.claude-sonnet-4-5-20250929-v1:0"),
    system_prompt=LONG_PROMPT,
    tools=[get_weather],
    middleware=[BedrockPromptCachingMiddleware(ttl="5m")],
    checkpointer=MemorySaver(),  # Persists conversation history
)

# Use a thread_id to maintain conversation state
config: RunnableConfig = {"configurable": {"thread_id": "user-123"}}

# First invocation: Creates cache with system prompt, tools, and user message
response = agent.invoke(
    {"messages": [HumanMessage("What is the weather in Miami?")]}, config=config
)

last_msg = response["messages"][-1]
print(last_msg.content)

# Check cache token usage
um = last_msg.usage_metadata
if um:
    details = um.get("input_token_details", {})
    cache_read = details.get("cache_read", 0) or 0
    cache_write = details.get("cache_creation", 0) or 0
    print(f"Cache read: {cache_read}, Cache write: {cache_write}")

# Second invocation: Reuses cached system prompt, tools, and previous messages
response = agent.invoke(
    {"messages": [HumanMessage("How about Seattle?")]}, config=config
)
print(response["messages"][-1].content)

特定于模型的行为

该中间件会自动处理 API 和模型系列之间的差异：

功能	ChatBedrockConverse (Anthropic)	ChatBedrockConverse (Nova)	ChatBedrock (Anthropic)
系统提示词缓存	✅	✅	✅
工具定义缓存	✅	❌	✅
消息缓存	✅	✅ (排除工具结果消息)	✅
扩展 TTL (`1h`)	✅	❌	✅

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Documentation Index

​提示词缓存

​特定于模型的行为

提示词缓存

特定于模型的行为