langchain_community.vectorstores.awadb
.AwaDB¶
- class langchain_community.vectorstores.awadb.AwaDB(table_name: str = 'langchain_awadb', embedding: Optional[Embeddings] = None, log_and_data_dir: Optional[str] = None, client: Optional[awadb.Client] = None, **kwargs: Any)[source]¶
AwaDB vector store.
- Initialize with AwaDB client.
If table_name is not specified, a random table name of _DEFAULT_TABLE_NAME + last segment of uuid would be created automatically.
- Parameters
table_name (str) – Name of the table created, default _DEFAULT_TABLE_NAME.
embedding (Optional[Embeddings]) – Optional Embeddings initially set.
log_and_data_dir (Optional[str]) – Optional the root directory of log and data.
client (Optional[awadb.Client]) – Optional AwaDB client.
kwargs (Any) – Any possible extend parameters in the future.
- Returns
None.
Attributes
embeddings
Access the query embedding object if available.
Methods
__init__
([table_name, embedding, ...])Initialize with AwaDB client.
aadd_documents
(documents, **kwargs)Async run more documents through the embeddings and add to the vectorstore.
aadd_texts
(texts[, metadatas])Async run more texts through the embeddings and add to the vectorstore.
add_documents
(documents, **kwargs)Add or update documents in the vectorstore.
add_texts
(texts[, metadatas, is_duplicate_texts])Run more texts through the embeddings and add to the vectorstore.
adelete
([ids])Async delete by vector ID or other criteria.
afrom_documents
(documents, embedding, **kwargs)Async return VectorStore initialized from documents and embeddings.
afrom_texts
(texts, embedding[, metadatas])Async return VectorStore initialized from texts and embeddings.
aget_by_ids
(ids, /)Async get documents by their IDs.
amax_marginal_relevance_search
(query[, k, ...])Async return docs selected using the maximal marginal relevance.
Async return docs selected using the maximal marginal relevance.
as_retriever
(**kwargs)Return VectorStoreRetriever initialized from this VectorStore.
asearch
(query, search_type, **kwargs)Async return docs most similar to query using a specified search type.
asimilarity_search
(query[, k])Async return docs most similar to query.
asimilarity_search_by_vector
(embedding[, k])Async return docs most similar to embedding vector.
Async return docs and relevance scores in the range [0, 1].
asimilarity_search_with_score
(*args, **kwargs)Async run similarity search with distance.
astreaming_upsert
(items, /, batch_size, **kwargs)aupsert
(items, /, **kwargs)create_table
(table_name, **kwargs)Create a new table.
delete
([ids])Delete the documents which have the specified ids.
from_documents
(documents[, embedding, ...])Create an AwaDB vectorstore from a list of documents.
from_texts
(texts[, embedding, metadatas, ...])Create an AwaDB vectorstore from a raw documents.
get
([ids, text_in_page_content, ...])Return docs according ids.
get_by_ids
(ids, /)Get documents by their IDs.
get_current_table
(**kwargs)Get the current table.
list_tables
(**kwargs)List all the tables created by the client.
load_local
(table_name, **kwargs)Load the local specified table.
max_marginal_relevance_search
(query[, k, ...])Return docs selected using the maximal marginal relevance.
Return docs selected using the maximal marginal relevance.
search
(query, search_type, **kwargs)Return docs most similar to query using a specified search type.
similarity_search
(query[, k, ...])Return docs most similar to query.
similarity_search_by_vector
([embedding, k, ...])Return docs most similar to embedding vector.
Return docs and relevance scores in the range [0, 1].
similarity_search_with_score
(query[, k, ...])The most k similar documents and scores of the specified query.
streaming_upsert
(items, /, batch_size, **kwargs)update
(ids, texts[, metadatas])Update the documents which have the specified ids.
upsert
(items, /, **kwargs)use
(table_name, **kwargs)Use the specified table.
- __init__(table_name: str = 'langchain_awadb', embedding: Optional[Embeddings] = None, log_and_data_dir: Optional[str] = None, client: Optional[awadb.Client] = None, **kwargs:]: Any) None [source]¶
- Initialize with AwaDB client.
If table_name is not specified, a random table name of _DEFAULT_TABLE_NAME + last segment of uuid would be created automatically.
- Parameters
table_name (str) – Name of the table created, default _DEFAULT_TABLE_NAME.
embedding (Optional[Embeddings]) – Optional Embeddings initially set.
log_and_data_dir (Optional[str]) – Optional the root directory of log and data.
client (Optional[awadb.Client]) – Optional AwaDB client.
kwargs (Any) – Any possible extend parameters in the future.
- Returns
None.
- Return type
None
- async aadd_documents(documents: List[Document], **kwargs: Any) List[str] ¶
Async run more documents through the embeddings and add to the vectorstore.
- Parameters
documents (List[Document]) – Documents to add to the vectorstore.
kwargs (Any) – Additional keyword arguments.
- Returns
List of IDs of the added texts.
- Raises
ValueError – If the number of IDs does not match the number of documents.
- Return type
List[str]
- async aadd_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] ¶
Async run more texts through the embeddings and add to the vectorstore.
- Parameters
texts (Iterable[str]) – Iterable of strings to add to the vectorstore.
metadatas (Optional[List[dict]]) – Optional list of metadatas associated with the texts. Default is None.
**kwargs (Any) – vectorstore specific parameters.
- Returns
List of ids from adding the texts into the vectorstore.
- Raises
ValueError – If the number of metadatas does not match the number of texts.
ValueError – If the number of ids does not match the number of texts.
- Return type
List[str]
- add_documents(documents: List[Document], **kwargs: Any) List[str] ¶
Add or update documents in the vectorstore.
- Parameters
documents (List[Document]) – Documents to add to the vectorstore.
kwargs (Any) – Additional keyword arguments. if kwargs contains ids and documents contain ids, the ids in the kwargs will receive precedence.
- Returns
List of IDs of the added texts.
- Raises
ValueError – If the number of ids does not match the number of documents.
- Return type
List[str]
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, is_duplicate_texts: Optional[bool] = None, **kwargs:]: Any) List[str] [source]¶
Run more texts through the embeddings and add to the vectorstore. :param texts: Iterable of strings to add to the vectorstore. :param metadatas: Optional list of metadatas associated with the texts. :param is_duplicate_texts: Optional whether to duplicate texts. Defaults to True. :param kwargs: any possible extend parameters in the future.
- Returns
List of ids from adding the texts into the vectorstore.
- Parameters
texts (Iterable[str]) –
metadatas (Optional[List[dict]]) –
is_duplicate_texts (Optional[bool]) –
kwargs (Any) –
- Return type
List[str]
- async adelete(ids: Optional[List[str]] = None, **kwargs: Any) Optional[bool] ¶
Async delete by vector ID or other criteria.
- Parameters
ids (Optional[List[str]]) – List of ids to delete. If None, delete all. Default is None.
**kwargs (Any) – Other keyword arguments that subclasses might use.
- Returns
True if deletion is successful, False otherwise, None if not implemented.
- Return type
Optional[bool]
- async classmethod afrom_documents(documents: List[Document], embedding: Embeddings, **kwargs: Any) VST ¶
Async return VectorStore initialized from documents and embeddings.
- Parameters
documents (List[Document]) – List of Documents to add to the vectorstore.
embedding (Embeddings) – Embedding function to use.
kwargs (Any) – Additional keyword arguments.
- Returns
VectorStore initialized from documents and embeddings.
- Return type
- async classmethod afrom_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) VST ¶
Async return VectorStore initialized from texts and embeddings.
- Parameters
texts (List[str]) – Texts to add to the vectorstore.
embedding (Embeddings) – Embedding function to use.
metadatas (Optional[List[dict]]) – Optional list of metadatas associated with the texts. Default is None.
kwargs (Any) – Additional keyword arguments.
- Returns
VectorStore initialized from texts and embeddings.
- Return type
- async aget_by_ids(ids: Sequence[str], /) List[Document] ¶
Async get documents by their IDs.
The returned documents are expected to have the ID field set to the ID of the document in the vector store.
Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.
Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.
This method should NOT raise exceptions if no documents are found for some IDs.
- Parameters
ids (Sequence[str]) – List of ids to retrieve.
- Returns
List of Documents.
- Return type
List[Document]
New in version 0.2.11.
- async amax_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[Document] ¶
Async return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query (str) – Text to look up documents similar to.
k (int) – Number of Documents to return. Defaults to 4.
fetch_k (int) – Number of Documents to fetch to pass to MMR algorithm. Default is 20.
lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
kwargs (Any) –
- Returns
List of Documents selected by maximal marginal relevance.
- Return type
List[Document]
- async amax_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[Document] ¶
Async return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding (List[float]) – 用于查找相似文档的嵌入向量。
k (int) – Number of Documents to return. Defaults to 4.
fetch_k (int) – Number of Documents to fetch to pass to MMR algorithm. Default is 20.
lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
**kwargs (Any) – 传递给搜索方法的参数。
- Returns
List of Documents selected by maximal marginal relevance.
- Return type
List[Document]
- as_retriever(**kwargs: Any) VectorStoreRetriever ¶
Return VectorStoreRetriever initialized from this VectorStore.
- Parameters
**kwargs (Any) –
传递给搜索函数的关键字参数。 可以包含:search_type (Optional[str]):定义检索器应执行的搜索类型。
可以是 “similarity”(默认),“mmr” 或 “similarity_score_threshold”。
- search_kwargs (Optional[Dict]):传递给搜索函数的关键字参数。
- 搜索函数。可以包括例如:
k:返回的文档数量(默认值:4) score_threshold:相似度评分阈值的最小相关性阈值
用于 similarity_score_threshold。
- fetch_k:传递给 MMR 算法的文档数量(默认值:20)。
(默认值:20)
- lambda_mult:MMR 返回结果的多样性;
1 表示最小多样性,0 表示最大多样性。(默认值:0.5)
filter:按文档元数据进行过滤
- Returns
VectorStore 的检索器类。
- Return type
示例
# Retrieve more documents with higher diversity # Useful if your dataset has many similar documents docsearch.as_retriever( search_type="mmr", search_kwargs={'k': 6, 'lambda_mult': 0.25} ) # Fetch more documents for the MMR algorithm to consider # But only return the top 5 docsearch.as_retriever( search_type="mmr", search_kwargs={'k': 5, 'fetch_k': 50} ) # Only retrieve documents that have a relevance score # Above a certain threshold docsearch.as_retriever( search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.8} ) # Only get the single most similar document from the dataset docsearch.as_retriever(search_kwargs={'k': 1}) # Use a filter to only retrieve documents from a specific paper docsearch.as_retriever( search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}} )
- async asearch(query: str, search_type: str, **kwargs: Any) List[Document] ¶
Async return docs most similar to query using a specified search type.
- Parameters
query (str) – 输入文本。
search_type (str) – 要执行的搜索类型。可以是 “similarity”、“mmr” 或 “similarity_score_threshold”。
**kwargs (Any) – 传递给搜索方法的参数。
- Returns
与查询最相似的文档列表。
- Raises
ValueError – 如果 search_type 不是 “similarity”、“mmr” 或 “similarity_score_threshold” 之一。
- Return type
List[Document]
- async asimilarity_search(query: str, k: int = 4, **kwargs: Any) List[Document] ¶
Async return docs most similar to query.
- Parameters
query (str) – 输入文本。
k (int) – Number of Documents to return. Defaults to 4.
**kwargs (Any) – 传递给搜索方法的参数。
- Returns
与查询最相似的文档列表。
- Return type
List[Document]
- async asimilarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[Document] ¶
Async return docs most similar to embedding vector.
- Parameters
embedding (List[float]) – 用于查找相似文档的嵌入向量。
k (int) – Number of Documents to return. Defaults to 4.
**kwargs (Any) – 传递给搜索方法的参数。
- Returns
与查询向量最相似的文档列表。
- Return type
List[Document]
- async asimilarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]] ¶
Async return docs and relevance scores in the range [0, 1].
0 表示不相似,1 表示最相似。
- Parameters
query (str) – 输入文本。
k (int) – Number of Documents to return. Defaults to 4.
**kwargs (Any) –
传递给相似度搜索的 kwargs。应包括:score_threshold:可选,一个介于 0 到 1 之间的浮点值,用于
过滤检索到的文档结果集。
- Returns
(doc, similarity_score) 元组的列表
- Return type
List[Tuple[Document, float]]
- async asimilarity_search_with_score(*args: Any, **kwargs: Any) List[Tuple[Document, float]] ¶
Async run similarity search with distance.
- Parameters
*args (Any) – 传递给搜索方法的参数。
**kwargs (Any) – 传递给搜索方法的参数。
- Returns
(doc, similarity_score) 元组的列表。
- Return type
List[Tuple[Document, float]]
- astreaming_upsert(items: AsyncIterable[Document], /, batch_size: int, **kwargs: Any) AsyncIterator[UpsertResponse] ¶
Beta 版本
在 0.2.11 版本中添加。API 可能会发生变化。
以流式方式更新文档。是 streaming_upsert 的异步版本。
- Parameters
items (AsyncIterable[Document]) – 要添加到向量存储的可迭代文档。
batch_size (int) – 每次更新批处理的大小。
kwargs (Any) – 额外的关键字参数。 kwargs 应该只包含所有文档通用的参数。(例如,索引超时,重试策略等)kwargs 不应包含 ids 以避免语义模糊。 相反,ID 应该作为 Document 对象的一部分提供。
- 返回值
UpsertResponse – 响应对象,其中包含已成功添加到向量存储或在向量存储中更新的 ID 列表,以及未能添加或更新的 ID 列表。
- Return type
AsyncIterator[UpsertResponse]
New in version 0.2.11.
- async aupsert(items: Sequence[Document], /, **kwargs: Any) UpsertResponse ¶
Beta 版本
在 0.2.11 版本中添加。API 可能会发生变化。
在向量存储中添加或更新文档。是 upsert 的异步版本。
如果提供了 Document 对象的 ID 字段,则更新功能应使用该字段。 如果未提供 ID,则 upsert 方法可以自由地为文档生成 ID。
当指定了 ID 并且文档已存在于向量存储中时,upsert 方法应使用新数据更新文档。 如果文档不存在,则 upsert 方法应将文档添加到向量存储中。
- Parameters
items (Sequence[Document]) – 要添加到向量存储的文档序列。
kwargs (Any) – Additional keyword arguments.
- Returns
一个响应对象,其中包含已成功添加到向量存储或在向量存储中更新的 ID 列表,以及未能添加或更新的 ID 列表。
- Return type
New in version 0.2.11.
- create_table(table_name: str, **kwargs: Any) bool [source]¶
Create a new table.
- Parameters
table_name (str) –
kwargs (Any) –
- Return type
bool
- delete(ids: Optional[List[str]] = None, **kwargs: Any) Optional[bool] [source]¶
Delete the documents which have the specified ids.
- Parameters
ids (Optional[List[str]]) – 嵌入向量的 ID。
**kwargs (Any) – Other keyword arguments that subclasses might use.
- Returns
如果删除成功,则为 True。 否则为 False,如果未实现则为 None。
- Return type
Optional[bool]
- classmethod from_documents(documents: List[Document], embedding: Optional[Embeddings] = None, table_name: str = 'langchain_awadb', log_and_data_dir: Optional[str] = None, client: Optional[awadb.Client] = None, **kwargs: Any) AwaDB [source]¶
Create an AwaDB vectorstore from a list of documents.
如果指定了 log_and_data_dir,表将持久化到该目录。
- Parameters
documents (List[Document]) – 要添加到向量存储的文档列表。
embedding (Optional[Embeddings]) – 嵌入函数。 默认为 None。
table_name (str) – 要创建的表的名称。
log_and_data_dir (Optional[str]) – 用于持久化表的目录。
client (Optional[awadb.Client]) – AwaDB 客户端。
Any – 未来任何可能的参数
kwargs (Any) –
- Returns
AwaDB 向量存储。
- Return type
- classmethod from_texts(texts: List[str], embedding: Optional[Embeddings] = None, metadatas: Optional[List[dict]] = None, table_name: str = 'langchain_awadb', log_and_data_dir: Optional[str] = None, client: Optional[awadb.Client] = None, **kwargs: Any) AwaDB [source]¶
Create an AwaDB vectorstore from a raw documents.
- Parameters
texts (List[str]) – 要添加到表中的文本列表。
embedding (Optional[Embeddings]) – 嵌入函数。 默认为 None。
metadatas (Optional[List[dict]]) – 元数据列表。 默认为 None。
table_name (str) – 要创建的表的名称。
log_and_data_dir (Optional[str]) – 日志记录和持久化的目录。
client (Optional[awadb.Client]) – AwaDB 客户端
kwargs (Any) –
- Returns
AwaDB 向量存储。
- Return type
- get(ids: Optional[List[str]] = None, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, not_include_fields: Optional[Set[str]] = None, limit: Optional[int] = None, **kwargs: Any) Dict[str, Document] [source]¶
Return docs according ids.
- Parameters
ids (Optional[List[str]]) – 嵌入向量的 ID。
text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。
meta_filter (Optional[dict]) – 按文档的任何元数据进行过滤。
not_include_fields (Optional[Set[str]]) – 不打包每个文档的指定字段。
limit (Optional[int]) – 要返回的文档数量。 默认为 5。 可选。
kwargs (Any) –
- Returns
满足输入条件的文档。
- Return type
Dict[str, Document]
- get_by_ids(ids: Sequence[str], /) List[Document] ¶
Get documents by their IDs.
The returned documents are expected to have the ID field set to the ID of the document in the vector store.
Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.
Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.
This method should NOT raise exceptions if no documents are found for some IDs.
- Parameters
ids (Sequence[str]) – List of ids to retrieve.
- Returns
List of Documents.
- Return type
List[Document]
New in version 0.2.11.
- get_current_table(**kwargs: Any) str [source]¶
Get the current table.
- Parameters
kwargs (Any) –
- Return type
str
- list_tables(**kwargs: Any) List[str] [source]¶
List all the tables created by the client.
- Parameters
kwargs (Any) –
- Return type
List[str]
- load_local(table_name: str, **kwargs: Any) bool [source]¶
Load the local specified table.
- Parameters
table_name (str) – 表名
kwargs (Any) – Any possible extend parameters in the future.
- Returns
加载本地指定表的成功或失败
- Return type
bool
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, **kwargs: Any) List[Document] [source]¶
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query (str) – Text to look up documents similar to.
k (int) – Number of Documents to return. Defaults to 4.
fetch_k (int) – 要获取并传递给 MMR 算法的文档数量。
lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。
meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。
kwargs (Any) –
- Returns
List of Documents selected by maximal marginal relevance.
- Return type
List[Document]
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, **kwargs: Any) List[Document] [source]¶
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding (List[float]) – 用于查找相似文档的嵌入向量。
k (int) – Number of Documents to return. Defaults to 4.
fetch_k (int) – 要获取并传递给 MMR 算法的文档数量。
lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。
meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。
kwargs (Any) –
- Returns
List of Documents selected by maximal marginal relevance.
- Return type
List[Document]
- search(query: str, search_type: str, **kwargs: Any) List[Document] ¶
Return docs most similar to query using a specified search type.
- Parameters
query (str) – Input text
search_type (str) – 要执行的搜索类型。可以是 “similarity”、“mmr” 或 “similarity_score_threshold”。
**kwargs (Any) – 传递给搜索方法的参数。
- Returns
与查询最相似的文档列表。
- Raises
ValueError – 如果 search_type 不是 “similarity”、“mmr” 或 “similarity_score_threshold” 之一。
- Return type
List[Document]
- similarity_search(query: str, k: int = 4, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, **kwargs: Any) List[Document] [source]¶
Return docs most similar to query.
- Parameters
query (str) – Text query.
k (int) – The maximum number of documents to return.
text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。
meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。
`{"color" (E.g.) –
”red”, “price”: 4.20}`. Optional.
`{"max_price" (E.g.) –
15.66, “min_price”: 4.20}`
field (price is the metadata) –
filter (means range) –
`{"maxe_price" (E.g.) –
15.66, “mine_price”: 4.20}`
field –
filter –
kwargs (Any) – Any possible extend parameters in the future.
- Returns
Returns the k most similar documents to the specified text query.
- Return type
List[Document]
- similarity_search_by_vector(embedding: Optional[List[float]] = None, k: int = 4, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, not_include_fields_in_metadata: Optional[Set[str]] = None, **kwargs: Any) List[Document] [source]¶
Return docs most similar to embedding vector.
- Parameters
embedding (Optional[List[float]]) – Embedding to look up documents similar to.
k (int) – Number of Documents to return. Defaults to 4.
text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。
meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。
not_incude_fields_in_metadata – Not include meta fields of each document.
not_include_fields_in_metadata (Optional[Set[str]]) –
kwargs (Any) –
- Returns
List of Documents which are the most similar to the query vector.
- Return type
List[Document]
- similarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]] ¶
Return docs and relevance scores in the range [0, 1].
0 表示不相似,1 表示最相似。
- Parameters
query (str) – 输入文本。
k (int) – Number of Documents to return. Defaults to 4.
**kwargs (Any) –
传递给相似度搜索的 kwargs。应包括:score_threshold:可选,一个介于 0 到 1 之间的浮点值,用于
filter the resulting set of retrieved docs.
- Returns
(doc, similarity_score) 元组的列表。
- Return type
List[Tuple[Document, float]]
- similarity_search_with_score(query: str, k: int = 4, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, **kwargs: Any) List[Tuple[Document, float]] [source]¶
The most k similar documents and scores of the specified query.
- Parameters
query (str) – Text query.
k (int) – The k most similar documents to the text query.
text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。
meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。
kwargs (Any) – Any possible extend parameters in the future.
- Returns
The k most similar documents to the specified text query. 0 is dissimilar, 1 is the most similar.
- Return type
List[Tuple[Document, float]]
- streaming_upsert(items: Iterable[Document], /, batch_size: int, **kwargs: Any) Iterator[UpsertResponse] ¶
Beta 版本
在 0.2.11 版本中添加。API 可能会发生变化。
以流式方式更新文档。
- Parameters
items (Iterable[Document]) – 要添加到向量存储的可迭代文档。
batch_size (int) – 每次更新批处理的大小。
kwargs (Any) – 附加关键字参数。 kwargs 应仅包含所有文档通用的参数。(例如,索引超时、重试策略等)kwargs 不应包含 ids 以避免语义模糊。相反,ID 应作为 Document 对象的一部分提供。
- 返回值
UpsertResponse – 响应对象,其中包含已成功添加到向量存储或在向量存储中更新的 ID 列表,以及未能添加或更新的 ID 列表。
- Return type
Iterator[UpsertResponse]
New in version 0.2.11.
- update(ids: List[str], texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] [source]¶
Update the documents which have the specified ids.
- Parameters
ids (List[str]) – 更新嵌入向量的 id 列表。
texts (Iterable[str]) – 更新文档的文本。
metadatas (Optional[List[dict]]) – 更新文档的元数据。
kwargs (Any) –
- Returns
已更新文档的 id。
- Return type
List[str]
- upsert(items: Sequence[Document], /, **kwargs: Any) UpsertResponse ¶
Beta 版本
在 0.2.11 版本中添加。API 可能会发生变化。
Add or update documents in the vectorstore.
如果提供了 Document 对象的 ID 字段,则更新功能应使用该字段。 如果未提供 ID,则 upsert 方法可以自由地为文档生成 ID。
当指定了 ID 并且文档已存在于向量存储中时,upsert 方法应使用新数据更新文档。 如果文档不存在,则 upsert 方法应将文档添加到向量存储中。
- Parameters
items (Sequence[Document]) – 要添加到向量存储的文档序列。
kwargs (Any) – Additional keyword arguments.
- Returns
一个响应对象,其中包含已成功添加到向量存储或在向量存储中更新的 ID 列表,以及未能添加或更新的 ID 列表。
- Return type
New in version 0.2.11.