PGVector 集成 - LangChain中文版文档

一种使用 postgres 作为后端并利用 pgvector 扩展的 LangChain 向量存储抽象实现。

代码位于一个名为：langchain-postgres 的集成包中。

状态

此代码已从 langchain-community 移植到一个名为 langchain-postgres 的专用包中。已进行以下更改：

langchain-postgres 仅支持 psycopg3。请将您的连接字符串从 postgresql+psycopg2://... 更新为 postgresql+psycopg://langchain:langchain@...（是的，驱动名称是 psycopg 而不是 psycopg3，但它会使用 psycopg3）。
嵌入存储和集合的模式已更改，以便 add_documents 能正确使用用户指定的 ID。
现在必须传递显式的连接对象。

目前，没有机制支持在模式更改时轻松迁移数据。向量存储中的任何模式更改都需要用户重新创建表并重新添加文档。如果这是您关心的问题，请使用不同的向量存储。如果不是，此实现应该能满足您的需求。

设置

首先下载配套包：

pip install -qU langchain-postgres

您可以运行以下命令来启动带有 pgvector 扩展的 postgres 容器：

%docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16

凭据

运行此笔记本不需要凭据，只需确保您已下载 langchain-postgres 包并正确启动了 postgres 容器。如果您希望获得最佳级别的模型调用自动跟踪，也可以通过取消注释以下内容来设置您的 LangSmith API 密钥：

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

实例化

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_postgres import PGVector

# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain"  # Uses psycopg3!
collection_name = "my_docs"

vector_store = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True,
)

管理向量存储

向向量存储添加项目

注意，通过 ID 添加文档将覆盖任何匹配该 ID 的现有文档。

from langchain_core.documents import Document

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"id": 1, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"id": 2, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"id": 3, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"id": 4, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"id": 5, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"id": 6, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"id": 7, "location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"id": 8, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"id": 9, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"id": 10, "location": "community center", "topic": "classes"},
    ),
]

vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

从向量存储删除项目

vector_store.delete(ids=["3"])

查询向量存储

一旦创建了向量存储并添加了相关文档，您很可能希望在运行链或代理期间对其进行查询。

过滤支持

向量存储支持一组可应用于文档元数据字段的过滤器。

操作符	含义/类别
$eq	相等 (==)
$ne	不等 (!=)
$lt	小于 (<)
$lte	小于或等于 (<=)
$gt	大于 (>)
$gte	大于或等于 (>=)
$in	特殊处理 (in)
$nin	特殊处理 (not in)
$between	特殊处理 (between)
$like	文本 (like)
$ilike	文本 (不区分大小写的 like)
$and	逻辑 (and)
$or	逻辑 (or)

直接查询

执行简单的相似度搜索可以如下所示：

results = vector_store.similarity_search(
    "kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]
* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]

如果您提供包含多个字段但没有操作符的字典，顶级将被解释为逻辑 AND 过滤器

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
 Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={
        "$and": [
            {"id": {"$in": [1, 5, 2, 9]}},
            {"location": {"$in": ["pond", "market"]}},
        ]
    },
)

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
 Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

如果您想执行相似度搜索并接收相应的分数，可以运行：

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]

有关可在 PGVector 向量存储上执行的不同搜索的完整列表，请参阅 API 参考。

转换为检索器进行查询

您也可以将向量存储转换为检索器，以便在链中更轻松地使用。

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]

用于检索增强生成的用法

关于如何使用此向量存储进行检索增强生成 (RAG) 的指南，请参见以下部分：

API 参考

有关所有 PGVector VectorStore 功能和配置的详细文档，请前往 API 参考

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

​状态

​设置

​凭据

​实例化

​管理向量存储

​向向量存储添加项目

​从向量存储删除项目

​查询向量存储

​过滤支持

​直接查询

​转换为检索器进行查询

​用于检索增强生成的用法

​API 参考

状态

设置

凭据

实例化

管理向量存储

向向量存储添加项目

从向量存储删除项目

查询向量存储

过滤支持

直接查询

转换为检索器进行查询

用于检索增强生成的用法

API 参考