langchain_community.vectorstores.awadb.AwaDB

class langchain_community.vectorstores.awadb.AwaDB(table_name: str = 'langchain_awadb', embedding: Optional[Embeddings] = None, log_and_data_dir: Optional[str] = None, client: Optional[awadb.Client] = None, **kwargs: Any)[source]

AwaDB vector store.

Initialize with AwaDB client.

If table_name is not specified, a random table name of _DEFAULT_TABLE_NAME + last segment of uuid would be created automatically.

Parameters
  • table_name (str) – Name of the table created, default _DEFAULT_TABLE_NAME.

  • embedding (Optional[Embeddings]) – Optional Embeddings initially set.

  • log_and_data_dir (Optional[str]) – Optional the root directory of log and data.

  • client (Optional[awadb.Client]) – Optional AwaDB client.

  • kwargs (Any) – Any possible extend parameters in the future.

Returns

None.

Attributes

embeddings

Access the query embedding object if available.

Methods

__init__([table_name, embedding, ...])

Initialize with AwaDB client.

aadd_documents(documents, **kwargs)

Async run more documents through the embeddings and add to the vectorstore.

aadd_texts(texts[, metadatas])

Async run more texts through the embeddings and add to the vectorstore.

add_documents(documents, **kwargs)

Add or update documents in the vectorstore.

add_texts(texts[, metadatas, is_duplicate_texts])

Run more texts through the embeddings and add to the vectorstore.

adelete([ids])

Async delete by vector ID or other criteria.

afrom_documents(documents, embedding, **kwargs)

Async return VectorStore initialized from documents and embeddings.

afrom_texts(texts, embedding[, metadatas])

Async return VectorStore initialized from texts and embeddings.

aget_by_ids(ids, /)

Async get documents by their IDs.

amax_marginal_relevance_search(query[, k, ...])

Async return docs selected using the maximal marginal relevance.

amax_marginal_relevance_search_by_vector(...)

Async return docs selected using the maximal marginal relevance.

as_retriever(**kwargs)

Return VectorStoreRetriever initialized from this VectorStore.

asearch(query, search_type, **kwargs)

Async return docs most similar to query using a specified search type.

asimilarity_search(query[, k])

Async return docs most similar to query.

asimilarity_search_by_vector(embedding[, k])

Async return docs most similar to embedding vector.

asimilarity_search_with_relevance_scores(query)

Async return docs and relevance scores in the range [0, 1].

asimilarity_search_with_score(*args, **kwargs)

Async run similarity search with distance.

astreaming_upsert(items, /, batch_size, **kwargs)

aupsert(items, /, **kwargs)

create_table(table_name, **kwargs)

Create a new table.

delete([ids])

Delete the documents which have the specified ids.

from_documents(documents[, embedding, ...])

Create an AwaDB vectorstore from a list of documents.

from_texts(texts[, embedding, metadatas, ...])

Create an AwaDB vectorstore from a raw documents.

get([ids, text_in_page_content, ...])

Return docs according ids.

get_by_ids(ids, /)

Get documents by their IDs.

get_current_table(**kwargs)

Get the current table.

list_tables(**kwargs)

List all the tables created by the client.

load_local(table_name, **kwargs)

Load the local specified table.

max_marginal_relevance_search(query[, k, ...])

Return docs selected using the maximal marginal relevance.

max_marginal_relevance_search_by_vector(...)

Return docs selected using the maximal marginal relevance.

search(query, search_type, **kwargs)

Return docs most similar to query using a specified search type.

similarity_search(query[, k, ...])

Return docs most similar to query.

similarity_search_by_vector([embedding, k, ...])

Return docs most similar to embedding vector.

similarity_search_with_relevance_scores(query)

Return docs and relevance scores in the range [0, 1].

similarity_search_with_score(query[, k, ...])

The most k similar documents and scores of the specified query.

streaming_upsert(items, /, batch_size, **kwargs)

update(ids, texts[, metadatas])

Update the documents which have the specified ids.

upsert(items, /, **kwargs)

use(table_name, **kwargs)

Use the specified table.

__init__(table_name: str = 'langchain_awadb', embedding: Optional[Embeddings] = None, log_and_data_dir: Optional[str] = None, client: Optional[awadb.Client] = None, **kwargs:]: Any) None[source]
Initialize with AwaDB client.

If table_name is not specified, a random table name of _DEFAULT_TABLE_NAME + last segment of uuid would be created automatically.

Parameters
  • table_name (str) – Name of the table created, default _DEFAULT_TABLE_NAME.

  • embedding (Optional[Embeddings]) – Optional Embeddings initially set.

  • log_and_data_dir (Optional[str]) – Optional the root directory of log and data.

  • client (Optional[awadb.Client]) – Optional AwaDB client.

  • kwargs (Any) – Any possible extend parameters in the future.

Returns

None.

Return type

None

async aadd_documents(documents: List[Document], **kwargs: Any) List[str]

Async run more documents through the embeddings and add to the vectorstore.

Parameters
  • documents (List[Document]) – Documents to add to the vectorstore.

  • kwargs (Any) – Additional keyword arguments.

Returns

List of IDs of the added texts.

Raises

ValueError – If the number of IDs does not match the number of documents.

Return type

List[str]

async aadd_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str]

Async run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts (Iterable[str]) – Iterable of strings to add to the vectorstore.

  • metadatas (Optional[List[dict]]) – Optional list of metadatas associated with the texts. Default is None.

  • **kwargs (Any) – vectorstore specific parameters.

Returns

List of ids from adding the texts into the vectorstore.

Raises
  • ValueError – If the number of metadatas does not match the number of texts.

  • ValueError – If the number of ids does not match the number of texts.

Return type

List[str]

add_documents(documents: List[Document], **kwargs: Any) List[str]

Add or update documents in the vectorstore.

Parameters
  • documents (List[Document]) – Documents to add to the vectorstore.

  • kwargs (Any) – Additional keyword arguments. if kwargs contains ids and documents contain ids, the ids in the kwargs will receive precedence.

Returns

List of IDs of the added texts.

Raises

ValueError – If the number of ids does not match the number of documents.

Return type

List[str]

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, is_duplicate_texts: Optional[bool] = None, **kwargs:]: Any) List[str][source]

Run more texts through the embeddings and add to the vectorstore. :param texts: Iterable of strings to add to the vectorstore. :param metadatas: Optional list of metadatas associated with the texts. :param is_duplicate_texts: Optional whether to duplicate texts. Defaults to True. :param kwargs: any possible extend parameters in the future.

Returns

List of ids from adding the texts into the vectorstore.

Parameters
  • texts (Iterable[str]) –

  • metadatas (Optional[List[dict]]) –

  • is_duplicate_texts (Optional[bool]) –

  • kwargs (Any) –

Return type

List[str]

async adelete(ids: Optional[List[str]] = None, **kwargs: Any) Optional[bool]

Async delete by vector ID or other criteria.

Parameters
  • ids (Optional[List[str]]) – List of ids to delete. If None, delete all. Default is None.

  • **kwargs (Any) – Other keyword arguments that subclasses might use.

Returns

True if deletion is successful, False otherwise, None if not implemented.

Return type

Optional[bool]

async classmethod afrom_documents(documents: List[Document], embedding: Embeddings, **kwargs: Any) VST

Async return VectorStore initialized from documents and embeddings.

Parameters
  • documents (List[Document]) – List of Documents to add to the vectorstore.

  • embedding (Embeddings) – Embedding function to use.

  • kwargs (Any) – Additional keyword arguments.

Returns

VectorStore initialized from documents and embeddings.

Return type

VectorStore

async classmethod afrom_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) VST

Async return VectorStore initialized from texts and embeddings.

Parameters
  • texts (List[str]) – Texts to add to the vectorstore.

  • embedding (Embeddings) – Embedding function to use.

  • metadatas (Optional[List[dict]]) – Optional list of metadatas associated with the texts. Default is None.

  • kwargs (Any) – Additional keyword arguments.

Returns

VectorStore initialized from texts and embeddings.

Return type

VectorStore

async aget_by_ids(ids: Sequence[str], /) List[Document]

Async get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

Parameters

ids (Sequence[str]) – List of ids to retrieve.

Returns

List of Documents.

Return type

List[Document]

New in version 0.2.11.

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query (str) – Text to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • fetch_k (int) – Number of Documents to fetch to pass to MMR algorithm. Default is 20.

  • lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • kwargs (Any) –

Returns

List of Documents selected by maximal marginal relevance.

Return type

List[Document]

async amax_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding (List[float]) – 用于查找相似文档的嵌入向量。

  • k (int) – Number of Documents to return. Defaults to 4.

  • fetch_k (int) – Number of Documents to fetch to pass to MMR algorithm. Default is 20.

  • lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • **kwargs (Any) – 传递给搜索方法的参数。

Returns

List of Documents selected by maximal marginal relevance.

Return type

List[Document]

as_retriever(**kwargs: Any) VectorStoreRetriever

Return VectorStoreRetriever initialized from this VectorStore.

Parameters

**kwargs (Any) –

传递给搜索函数的关键字参数。 可以包含:search_type (Optional[str]):定义检索器应执行的搜索类型。

可以是 “similarity”(默认),“mmr” 或 “similarity_score_threshold”。

search_kwargs (Optional[Dict]):传递给搜索函数的关键字参数。
搜索函数。可以包括例如:

k:返回的文档数量(默认值:4) score_threshold:相似度评分阈值的最小相关性阈值

用于 similarity_score_threshold。

fetch_k:传递给 MMR 算法的文档数量(默认值:20)。

(默认值:20)

lambda_mult:MMR 返回结果的多样性;

1 表示最小多样性,0 表示最大多样性。(默认值:0.5)

filter:按文档元数据进行过滤

Returns

VectorStore 的检索器类。

Return type

VectorStoreRetriever

示例

# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
docsearch.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 6, 'lambda_mult': 0.25}
)

# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
docsearch.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 5, 'fetch_k': 50}
)

# Only retrieve documents that have a relevance score
# Above a certain threshold
docsearch.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={'score_threshold': 0.8}
)

# Only get the single most similar document from the dataset
docsearch.as_retriever(search_kwargs={'k': 1})

# Use a filter to only retrieve documents from a specific paper
docsearch.as_retriever(
    search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}}
)
async asearch(query: str, search_type: str, **kwargs: Any) List[Document]

Async return docs most similar to query using a specified search type.

Parameters
  • query (str) – 输入文本。

  • search_type (str) – 要执行的搜索类型。可以是 “similarity”、“mmr” 或 “similarity_score_threshold”。

  • **kwargs (Any) – 传递给搜索方法的参数。

Returns

与查询最相似的文档列表。

Raises

ValueError – 如果 search_type 不是 “similarity”、“mmr” 或 “similarity_score_threshold” 之一。

Return type

List[Document]

Async return docs most similar to query.

Parameters
  • query (str) – 输入文本。

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) – 传递给搜索方法的参数。

Returns

与查询最相似的文档列表。

Return type

List[Document]

async asimilarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[Document]

Async return docs most similar to embedding vector.

Parameters
  • embedding (List[float]) – 用于查找相似文档的嵌入向量。

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) – 传递给搜索方法的参数。

Returns

与查询向量最相似的文档列表。

Return type

List[Document]

async asimilarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]]

Async return docs and relevance scores in the range [0, 1].

0 表示不相似,1 表示最相似。

Parameters
  • query (str) – 输入文本。

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) –

    传递给相似度搜索的 kwargs。应包括:score_threshold:可选,一个介于 0 到 1 之间的浮点值,用于

    过滤检索到的文档结果集。

Returns

(doc, similarity_score) 元组的列表

Return type

List[Tuple[Document, float]]

async asimilarity_search_with_score(*args: Any, **kwargs: Any) List[Tuple[Document, float]]

Async run similarity search with distance.

Parameters
  • *args (Any) – 传递给搜索方法的参数。

  • **kwargs (Any) – 传递给搜索方法的参数。

Returns

(doc, similarity_score) 元组的列表。

Return type

List[Tuple[Document, float]]

astreaming_upsert(items: AsyncIterable[Document], /, batch_size: int, **kwargs: Any) AsyncIterator[UpsertResponse]

Beta 版本

在 0.2.11 版本中添加。API 可能会发生变化。

以流式方式更新文档。是 streaming_upsert 的异步版本。

Parameters
  • items (AsyncIterable[Document]) – 要添加到向量存储的可迭代文档。

  • batch_size (int) – 每次更新批处理的大小。

  • kwargs (Any) – 额外的关键字参数。 kwargs 应该只包含所有文档通用的参数。(例如,索引超时,重试策略等)kwargs 不应包含 ids 以避免语义模糊。 相反,ID 应该作为 Document 对象的一部分提供。

返回值

UpsertResponse – 响应对象,其中包含已成功添加到向量存储或在向量存储中更新的 ID 列表,以及未能添加或更新的 ID 列表。

Return type

AsyncIterator[UpsertResponse]

New in version 0.2.11.

async aupsert(items: Sequence[Document], /, **kwargs: Any) UpsertResponse

Beta 版本

在 0.2.11 版本中添加。API 可能会发生变化。

在向量存储中添加或更新文档。是 upsert 的异步版本。

如果提供了 Document 对象的 ID 字段,则更新功能应使用该字段。 如果未提供 ID,则 upsert 方法可以自由地为文档生成 ID。

当指定了 ID 并且文档已存在于向量存储中时,upsert 方法应使用新数据更新文档。 如果文档不存在,则 upsert 方法应将文档添加到向量存储中。

Parameters
  • items (Sequence[Document]) – 要添加到向量存储的文档序列。

  • kwargs (Any) – Additional keyword arguments.

Returns

一个响应对象,其中包含已成功添加到向量存储或在向量存储中更新的 ID 列表,以及未能添加或更新的 ID 列表。

Return type

UpsertResponse

New in version 0.2.11.

create_table(table_name: str, **kwargs: Any) bool[source]

Create a new table.

Parameters
  • table_name (str) –

  • kwargs (Any) –

Return type

bool

delete(ids: Optional[List[str]] = None, **kwargs: Any) Optional[bool][source]

Delete the documents which have the specified ids.

Parameters
  • ids (Optional[List[str]]) – 嵌入向量的 ID。

  • **kwargs (Any) – Other keyword arguments that subclasses might use.

Returns

如果删除成功,则为 True。 否则为 False,如果未实现则为 None。

Return type

Optional[bool]

classmethod from_documents(documents: List[Document], embedding: Optional[Embeddings] = None, table_name: str = 'langchain_awadb', log_and_data_dir: Optional[str] = None, client: Optional[awadb.Client] = None, **kwargs: Any) AwaDB[source]

Create an AwaDB vectorstore from a list of documents.

如果指定了 log_and_data_dir,表将持久化到该目录。

Parameters
  • documents (List[Document]) – 要添加到向量存储的文档列表。

  • embedding (Optional[Embeddings]) – 嵌入函数。 默认为 None。

  • table_name (str) – 要创建的表的名称。

  • log_and_data_dir (Optional[str]) – 用于持久化表的目录。

  • client (Optional[awadb.Client]) – AwaDB 客户端。

  • Any – 未来任何可能的参数

  • kwargs (Any) –

Returns

AwaDB 向量存储。

Return type

AwaDB

classmethod from_texts(texts: List[str], embedding: Optional[Embeddings] = None, metadatas: Optional[List[dict]] = None, table_name: str = 'langchain_awadb', log_and_data_dir: Optional[str] = None, client: Optional[awadb.Client] = None, **kwargs: Any) AwaDB[source]

Create an AwaDB vectorstore from a raw documents.

Parameters
  • texts (List[str]) – 要添加到表中的文本列表。

  • embedding (Optional[Embeddings]) – 嵌入函数。 默认为 None。

  • metadatas (Optional[List[dict]]) – 元数据列表。 默认为 None。

  • table_name (str) – 要创建的表的名称。

  • log_and_data_dir (Optional[str]) – 日志记录和持久化的目录。

  • client (Optional[awadb.Client]) – AwaDB 客户端

  • kwargs (Any) –

Returns

AwaDB 向量存储。

Return type

AwaDB

get(ids: Optional[List[str]] = None, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, not_include_fields: Optional[Set[str]] = None, limit: Optional[int] = None, **kwargs: Any) Dict[str, Document][source]

Return docs according ids.

Parameters
  • ids (Optional[List[str]]) – 嵌入向量的 ID。

  • text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。

  • meta_filter (Optional[dict]) – 按文档的任何元数据进行过滤。

  • not_include_fields (Optional[Set[str]]) – 不打包每个文档的指定字段。

  • limit (Optional[int]) – 要返回的文档数量。 默认为 5。 可选。

  • kwargs (Any) –

Returns

满足输入条件的文档。

Return type

Dict[str, Document]

get_by_ids(ids: Sequence[str], /) List[Document]

Get documents by their IDs.

The returned documents are expected to have the ID field set to the ID of the document in the vector store.

Fewer documents may be returned than requested if some IDs are not found or if there are duplicated IDs.

Users should not assume that the order of the returned documents matches the order of the input IDs. Instead, users should rely on the ID field of the returned documents.

This method should NOT raise exceptions if no documents are found for some IDs.

Parameters

ids (Sequence[str]) – List of ids to retrieve.

Returns

List of Documents.

Return type

List[Document]

New in version 0.2.11.

get_current_table(**kwargs: Any) str[source]

Get the current table.

Parameters

kwargs (Any) –

Return type

str

list_tables(**kwargs: Any) List[str][source]

List all the tables created by the client.

Parameters

kwargs (Any) –

Return type

List[str]

load_local(table_name: str, **kwargs: Any) bool[source]

Load the local specified table.

Parameters
  • table_name (str) – 表名

  • kwargs (Any) – Any possible extend parameters in the future.

Returns

加载本地指定表的成功或失败

Return type

bool

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query (str) – Text to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • fetch_k (int) – 要获取并传递给 MMR 算法的文档数量。

  • lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。

  • meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。

  • kwargs (Any) –

Returns

List of Documents selected by maximal marginal relevance.

Return type

List[Document]

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, **kwargs: Any) List[Document][source]

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding (List[float]) – 用于查找相似文档的嵌入向量。

  • k (int) – Number of Documents to return. Defaults to 4.

  • fetch_k (int) – 要获取并传递给 MMR 算法的文档数量。

  • lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。

  • meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。

  • kwargs (Any) –

Returns

List of Documents selected by maximal marginal relevance.

Return type

List[Document]

search(query: str, search_type: str, **kwargs: Any) List[Document]

Return docs most similar to query using a specified search type.

Parameters
  • query (str) – Input text

  • search_type (str) – 要执行的搜索类型。可以是 “similarity”、“mmr” 或 “similarity_score_threshold”。

  • **kwargs (Any) – 传递给搜索方法的参数。

Returns

与查询最相似的文档列表。

Raises

ValueError – 如果 search_type 不是 “similarity”、“mmr” 或 “similarity_score_threshold” 之一。

Return type

List[Document]

Return docs most similar to query.

Parameters
  • query (str) – Text query.

  • k (int) – The maximum number of documents to return.

  • text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。

  • meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。

  • `{"color" (E.g.) –

    ”red”, “price”: 4.20}`. Optional.

  • `{"max_price" (E.g.) –

    15.66, “min_price”: 4.20}`

  • field (price is the metadata) –

  • filter (means range) –

  • `{"maxe_price" (E.g.) –

    15.66, “mine_price”: 4.20}`

  • field

  • filter

  • kwargs (Any) – Any possible extend parameters in the future.

Returns

Returns the k most similar documents to the specified text query.

Return type

List[Document]

similarity_search_by_vector(embedding: Optional[List[float]] = None, k: int = 4, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, not_include_fields_in_metadata: Optional[Set[str]] = None, **kwargs: Any) List[Document][source]

Return docs most similar to embedding vector.

Parameters
  • embedding (Optional[List[float]]) – Embedding to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。

  • meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。

  • not_incude_fields_in_metadata – Not include meta fields of each document.

  • not_include_fields_in_metadata (Optional[Set[str]]) –

  • kwargs (Any) –

Returns

List of Documents which are the most similar to the query vector.

Return type

List[Document]

similarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]]

Return docs and relevance scores in the range [0, 1].

0 表示不相似,1 表示最相似。

Parameters
  • query (str) – 输入文本。

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) –

    传递给相似度搜索的 kwargs。应包括:score_threshold:可选,一个介于 0 到 1 之间的浮点值,用于

    filter the resulting set of retrieved docs.

Returns

(doc, similarity_score) 元组的列表。

Return type

List[Tuple[Document, float]]

similarity_search_with_score(query: str, k: int = 4, text_in_page_content: Optional[str] = None, meta_filter: Optional[dict] = None, **kwargs: Any) List[Tuple[Document, float]][source]

The most k similar documents and scores of the specified query.

Parameters
  • query (str) – Text query.

  • k (int) – The k most similar documents to the text query.

  • text_in_page_content (Optional[str]) – 按 Document 的 page_content 中的文本进行过滤。

  • meta_filter (Optional[dict]) – 按元数据过滤。 默认为 None。

  • kwargs (Any) – Any possible extend parameters in the future.

Returns

The k most similar documents to the specified text query. 0 is dissimilar, 1 is the most similar.

Return type

List[Tuple[Document, float]]

streaming_upsert(items: Iterable[Document], /, batch_size: int, **kwargs: Any) Iterator[UpsertResponse]

Beta 版本

在 0.2.11 版本中添加。API 可能会发生变化。

以流式方式更新文档。

Parameters
  • items (Iterable[Document]) – 要添加到向量存储的可迭代文档。

  • batch_size (int) – 每次更新批处理的大小。

  • kwargs (Any) – 附加关键字参数。 kwargs 应仅包含所有文档通用的参数。(例如,索引超时、重试策略等)kwargs 不应包含 ids 以避免语义模糊。相反,ID 应作为 Document 对象的一部分提供。

返回值

UpsertResponse – 响应对象,其中包含已成功添加到向量存储或在向量存储中更新的 ID 列表,以及未能添加或更新的 ID 列表。

Return type

Iterator[UpsertResponse]

New in version 0.2.11.

update(ids: List[str], texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str][source]

Update the documents which have the specified ids.

Parameters
  • ids (List[str]) – 更新嵌入向量的 id 列表。

  • texts (Iterable[str]) – 更新文档的文本。

  • metadatas (Optional[List[dict]]) – 更新文档的元数据。

  • kwargs (Any) –

Returns

已更新文档的 id。

Return type

List[str]

upsert(items: Sequence[Document], /, **kwargs: Any) UpsertResponse

Beta 版本

在 0.2.11 版本中添加。API 可能会发生变化。

Add or update documents in the vectorstore.

如果提供了 Document 对象的 ID 字段,则更新功能应使用该字段。 如果未提供 ID,则 upsert 方法可以自由地为文档生成 ID。

当指定了 ID 并且文档已存在于向量存储中时,upsert 方法应使用新数据更新文档。 如果文档不存在,则 upsert 方法应将文档添加到向量存储中。

Parameters
  • items (Sequence[Document]) – 要添加到向量存储的文档序列。

  • kwargs (Any) – Additional keyword arguments.

Returns

一个响应对象,其中包含已成功添加到向量存储或在向量存储中更新的 ID 列表,以及未能添加或更新的 ID 列表。

Return type

UpsertResponse

New in version 0.2.11.

use(table_name: str, **kwargs: Any) bool[source]

使用指定的表。 如果不知道表,请调用 list_tables。

Parameters
  • table_name (str) –

  • kwargs (Any) –

Return type

bool

AwaDB 的使用示例