Pinecone 集成 - LangChain中文版文档

Pinecone 是一个功能广泛的向量数据库。

本笔记本展示了如何使用与 Pinecone 向量数据库相关的功能。

设置

要使用 PineconeVectorStore，您首先需要安装合作伙伴包，以及本笔记本中使用的其他包。

pip install -qU langchain langchain-pinecone langchain-openai

迁移说明：如果您是从 langchain_community.vectorstores 的 Pinecone 实现迁移过来，在安装依赖 pinecone-client v6 的 langchain-pinecone 之前，您可能需要移除您的 pinecone-client v2 依赖项。

凭据

创建一个新的 Pinecone 账户，或登录现有账户，并创建一个用于本笔记本的 API 密钥。

import getpass
import os

from pinecone import Pinecone

if not os.getenv("PINECONE_API_KEY"):
    os.environ["PINECONE_API_KEY"] = getpass.getpass("Enter your Pinecone API key: ")

pinecone_api_key = os.environ.get("PINECONE_API_KEY")

pc = Pinecone(api_key=pinecone_api_key)

如果您想要获得模型调用的自动追踪，也可以通过取消注释以下内容来设置您的 LangSmith API 密钥：

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

初始化

在初始化我们的向量存储之前，让我们连接到一个 Pinecone 索引。如果名为 index_name 的索引不存在，它将被创建。

from pinecone import ServerlessSpec

index_name = "langchain-test-index"  # change if desired

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index(index_name)

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

from langchain_pinecone import PineconeVectorStore

vector_store = PineconeVectorStore(index=index, embedding=embeddings)

管理向量存储

一旦创建了您的向量存储，我们可以通过添加和删除不同的项目来与其交互。

向向量存储添加项目

我们可以使用 add_documents 函数将项目添加到我们的向量存储中。

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store.add_documents(documents=documents, ids=uuids)

从向量存储删除项目

vector_store.delete(ids=[uuids[-1]])

查询向量存储

一旦您的向量存储已创建并添加了相关文档，您很可能希望在运行链或代理期间对其进行查询。

直接查询

执行简单的相似度搜索可以如下所示：

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": "tweet"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

带分数的相似度搜索

您也可以带分数进行搜索：

results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=1, filter={"source": "news"}
)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

其他搜索方法

本笔记本未列出更多搜索方法（例如 MMR），要查找所有方法，请务必阅读 API 参考。

转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在链中更轻松地使用。

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 1, "score_threshold": 0.4},
)
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})

检索增强生成 (RAG) 的使用

关于如何使用此向量存储进行检索增强生成 (RAG) 的指南，请参阅以下部分：

API 参考

有关所有功能和配置的详细文档，请前往 API 参考

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

​设置

​凭据

​初始化

​管理向量存储

​向向量存储添加项目

​从向量存储删除项目

​查询向量存储

​直接查询

​带分数的相似度搜索

​其他搜索方法

​转换为检索器进行查询

​检索增强生成 (RAG) 的使用

​API 参考

设置

凭据

初始化

管理向量存储

向向量存储添加项目

从向量存储删除项目

查询向量存储

直接查询

带分数的相似度搜索

其他搜索方法

转换为检索器进行查询

检索增强生成 (RAG) 的使用

API 参考