LangChain 实现了一个流式系统,用于展示实时更新。
流式传输对于增强基于大语言模型(LLM)构建的应用程序的响应性至关重要。通过逐步显示输出,甚至在完整响应准备好之前,流式传输显著改善了用户体验(UX),特别是在处理 LLM 的延迟时。
LangChain 的流式系统允许您将智能体运行的实时反馈展示到您的应用程序中。
使用 LangChain 流式传输可以实现:
请参阅下面的 常见模式 部分以获取更多端到端示例。
支持的流模式
将以下一个或多个流模式作为列表传递给 stream 方法:
| 模式 | 描述 |
|---|
updates | 在每个智能体步骤后流式传输状态更新。如果同一步骤中有多个更新(例如,运行了多个节点),这些更新将分别流式传输。 |
messages | 流式传输来自任何调用 LLM 的图节点的 (token, metadata) 元组。 |
custom | 使用流写入器从您的图节点内部流式传输自定义数据。 |
智能体进度
要流式传输智能体进度,请使用 stream 方法并设置 streamMode: "updates"。这会在每个智能体步骤后发射一个事件。
例如,如果您有一个调用一次工具的代理,您应该看到以下更新:
import z from "zod";
import { createAgent, tool } from "langchain";
const getWeather = tool(
async ({ city }) => {
return `The weather in ${city} is always sunny!`;
},
{
name: "get_weather",
description: "Get weather for a given city.",
schema: z.object({
city: z.string(),
}),
}
);
const agent = createAgent({
model: "gpt-5-nano",
tools: [getWeather],
});
for await (const chunk of await agent.stream(
{ messages: [{ role: "user", content: "what is the weather in sf" }] },
{ streamMode: "updates" }
)) {
const [step, content] = Object.entries(chunk)[0];
console.log(`step: ${step}`);
console.log(`content: ${JSON.stringify(content, null, 2)}`);
}
/**
* step: model
* content: {
* "messages": [
* {
* "kwargs": {
* // ...
* "tool_calls": [
* {
* "name": "get_weather",
* "args": {
* "city": "San Francisco"
* },
* "type": "tool_call",
* "id": "call_0qLS2Jp3MCmaKJ5MAYtr4jJd"
* }
* ],
* // ...
* }
* }
* ]
* }
* step: tools
* content: {
* "messages": [
* {
* "kwargs": {
* "content": "The weather in San Francisco is always sunny!",
* "name": "get_weather",
* // ...
* }
* }
* ]
* }
* step: model
* content: {
* "messages": [
* {
* "kwargs": {
* "content": "The latest update says: The weather in San Francisco is always sunny!\n\nIf you'd like real-time details (current temperature, humidity, wind, and today's forecast), I can pull the latest data for you. Want me to fetch that?",
* // ...
* }
* }
* ]
* }
*/
LLM 令牌
要流式传输由 LLM 产生的令牌,请使用 streamMode: "messages":
import z from "zod";
import { createAgent, tool } from "langchain";
const getWeather = tool(
async ({ city }) => {
return `The weather in ${city} is always sunny!`;
},
{
name: "get_weather",
description: "Get weather for a given city.",
schema: z.object({
city: z.string(),
}),
}
);
const agent = createAgent({
model: "gpt-4.1-mini",
tools: [getWeather],
});
for await (const [token, metadata] of await agent.stream(
{ messages: [{ role: "user", content: "what is the weather in sf" }] },
{ streamMode: "messages" }
)) {
console.log(`node: ${metadata.langgraph_node}`);
console.log(`content: ${JSON.stringify(token.contentBlocks, null, 2)}`);
}
自定义更新
要流式传输工具执行时的更新,您可以使用配置中的 writer 参数。
import z from "zod";
import { tool, createAgent } from "langchain";
import { LangGraphRunnableConfig } from "@langchain/langgraph";
const getWeather = tool(
async (input, config: LangGraphRunnableConfig) => {
// Stream any arbitrary data
config.writer?.(`Looking up data for city: ${input.city}`);
// ... fetch city data
config.writer?.(`Acquired data for city: ${input.city}`);
return `It's always sunny in ${input.city}!`;
},
{
name: "get_weather",
description: "Get weather for a given city.",
schema: z.object({
city: z.string().describe("The city to get weather for."),
}),
}
);
const agent = createAgent({
model: "gpt-4.1-mini",
tools: [getWeather],
});
for await (const chunk of await agent.stream(
{ messages: [{ role: "user", content: "what is the weather in sf" }] },
{ streamMode: "custom" }
)) {
console.log(chunk);
}
Looking up data for city: San Francisco
Acquired data for city: San Francisco
如果您将 writer 参数添加到工具中,您将无法在没有提供 writer 函数的情况下在 LangGraph 执行上下文之外调用该工具。
流式传输多种模式
您可以通过传递流模式数组来指定多个流模式:streamMode: ["updates", "messages", "custom"]。
流式传输的输出将是 [mode, chunk] 元组,其中 mode 是流模式的名称,chunk 是该模式流式传输的数据。
import z from "zod";
import { tool, createAgent } from "langchain";
import { LangGraphRunnableConfig } from "@langchain/langgraph";
const getWeather = tool(
async (input, config: LangGraphRunnableConfig) => {
// Stream any arbitrary data
config.writer?.(`Looking up data for city: ${input.city}`);
// ... fetch city data
config.writer?.(`Acquired data for city: ${input.city}`);
return `It's always sunny in ${input.city}!`;
},
{
name: "get_weather",
description: "Get weather for a given city.",
schema: z.object({
city: z.string().describe("The city to get weather for."),
}),
}
);
const agent = createAgent({
model: "gpt-4.1-mini",
tools: [getWeather],
});
for await (const [streamMode, chunk] of await agent.stream(
{ messages: [{ role: "user", content: "what is the weather in sf" }] },
{ streamMode: ["updates", "messages", "custom"] }
)) {
console.log(`${streamMode}: ${JSON.stringify(chunk, null, 2)}`);
}
常见模式
以下是展示流式传输常见用例的示例。
流式传输思考/推理令牌
某些模型在生成最终答案之前会进行内部推理。您可以通过过滤 标准内容块 的 type 为 "reasoning" 来流式传输这些正在生成的思考/推理令牌。
要从智能体流式传输思考令牌,请使用 streamMode: "messages" 并过滤推理内容块。当模型支持时,使用具有扩展思考功能的模型实例(例如 ChatAnthropic):
import z from "zod";
import { createAgent, tool } from "langchain";
import { ChatAnthropic } from "@langchain/anthropic";
const getWeather = tool(
async ({ city }) => {
return `It's always sunny in ${city}!`;
},
{
name: "get_weather",
description: "Get weather for a given city.",
schema: z.object({ city: z.string() }),
},
);
const agent = createAgent({
model: new ChatAnthropic({
model: "claude-sonnet-4-6",
thinking: { type: "enabled", budget_tokens: 5000 },
}),
tools: [getWeather],
});
for await (const [token, metadata] of await agent.stream(
{ messages: [{ role: "user", content: "What is the weather in SF?" }] },
{ streamMode: "messages" },
)) {
if (!token.contentBlocks) continue;
const reasoning = token.contentBlocks.filter((b) => b.type === "reasoning");
const text = token.contentBlocks.filter((b) => b.type === "text");
if (reasoning.length) {
process.stdout.write(`[thinking] ${reasoning[0].reasoning}`);
}
if (text.length) {
process.stdout.write(text[0].text);
}
}
[thinking] The user is asking about the weather in San Francisco. I have a tool
[thinking] available to get this information. Let me call the get_weather tool
[thinking] with "San Francisco" as the city parameter.
The weather in San Francisco is: It's always sunny in San Francisco!
无论模型提供商如何,其工作原理相同—LangChain 将特定于提供商的格式(Anthropic thinking 块、OpenAI reasoning 摘要等)通过 content_blocks 属性标准化为标准的 "reasoning" 内容块类型。
要从聊天模型直接流式传输推理令牌(不使用智能体),请参阅 与聊天模型流式传输。
禁用流式传输
在某些应用程序中,您可能需要禁用给定模型的单个令牌的流式传输。这在以下情况下很有用:
- 与 多智能体 系统一起工作以控制哪些智能体流式传输其输出
- 混合支持流式传输和不支持流式传输的模型
- 部署到 LangSmith 并希望防止某些模型输出流式传输到客户端
初始化模型时设置 streaming: false。
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
model: "gpt-4.1",
streaming: false,
});
部署到 LangSmith 时,对任何您不希望流式传输到客户端的模型设置 streaming=False。这是在部署前在您的图代码中配置的。
并非所有聊天模型集成都支持 streaming 参数。如果您的模型不支持它,请使用 disableStreaming: true。此参数通过基类在所有聊天模型上可用。
有关更多详细信息,请参阅 LangGraph 流式指南。
相关资源