`langchain_community.document_loaders.doc_intelligence`.AzureAIDocumentIntelligenceLoader¶

class langchain_community.document_loaders.doc_intelligence.AzureAIDocumentIntelligenceLoader(api_endpoint: str, api_key: str, file_path: Optional[str] = None, url_path: Optional[str] = None, api_version: Optional[str] = None, api_model: str = 'prebuilt-layout', mode: str = 'markdown', *, analysis_features: Optional[List[str]] = None)[source]¶

使用 Azure 文档智能加载 PDF 文件。

使用 Azure 文档智能（之前称为表单识别器）初始化文件处理对象。

此构造函数初始化一个 AzureAIDocumentIntelligenceParser 对象，用于使用 Azure 文档智能 API 解析文件。load 方法生成 Document 对象，其内容表示形式由模式参数确定。

参数:¶

api_endpoint: str: 用于 DocumentIntelligenceClient 构造的 API 端点。
api_key: str: 用于 DocumentIntelligenceClient 构造的 API 密钥。
file_path[可选]str: 需要加载的文件的路径。必须指定 file_path 或 url_path。
url_path[可选]str: 需要加载的文件的 URL。必须指定 file_path 或 url_path。
api_version: [可选]str: DocumentIntelligenceClient 的 API 版本。设置 None 以使用来自 azure-ai-documentintelligence 包的默认值。
api_model: str: 唯一的文档模型名称。默认值为“prebuilt-layout”。注意，覆盖此默认值可能会导致不支持的行为。
mode: [可选]str: 生成的文档的内容表示类型的类型。使用“single”、“page”或“markdown”。默认值为“markdown”。
analysis_features: [可选][List[str]]: 可选分析功能列表，每个功能应以符合 azure-ai-documentintelligence 包中 DocumentAnalysisFeature 枚举的 str 格式传递。默认值为 None。

示例：

>>> obj = AzureAIDocumentIntelligenceLoader(
...     file_path="path/to/file",
...     api_endpoint="https://endpoint.azure.com",
...     api_key="APIKEY",
...     api_version="2023-10-31-preview",
...     api_model="prebuilt-layout",
...     mode="markdown"
... )

方法

`__init__`(api_endpoint, api_key, file_path, ...)	使用 Azure 文档智能（之前称为表单识别器）初始化文件处理对象。
`alazy_load`()	Documents 的懒加载器。
`aload`()	将数据加载到 Document 对象中。
`lazy_load`()	以页面方式懒加载数据。
`load`()	将数据加载到 Document 对象中。
`load_and_split`([text_splitter])	加载 Documents 并分割成块。

__init__(api_endpoint: str, api_key: str, file_path: Optional[str] = None, url_path: Optional[str] = None, api_version: Optional[str] = None, api_model: str = 'prebuilt-layout', mode: str = 'markdown', *, analysis_features: Optional[List[str]] = None) → None[source]¶

使用 Azure 文档智能（之前称为表单识别器）初始化文件处理对象。

此构造函数初始化一个 AzureAIDocumentIntelligenceParser 对象，用于使用 Azure 文档智能 API 解析文件。load 方法生成 Document 对象，其内容表示形式由模式参数确定。

参数：¶

api_endpoint: str: 用于 DocumentIntelligenceClient 构造的 API 端点。
api_key: str: 用于 DocumentIntelligenceClient 构造的 API 密钥。
file_path[可选]str: 需要加载的文件的路径。必须指定 file_path 或 url_path。
url_path[可选]str: 需要加载的文件的 URL。必须指定 file_path 或 url_path。
api_version: [可选]str: DocumentIntelligenceClient 的 API 版本。设置 None 以使用来自 azure-ai-documentintelligence 包的默认值。
api_model: str: 唯一的文档模型名称。默认值为“prebuilt-layout”。注意，覆盖此默认值可能会导致不支持的行为。
mode: [可选]str: 生成的文档的内容表示类型的类型。使用“single”、“page”或“markdown”。默认值为“markdown”。
analysis_features: [可选][List[str]]: 可选分析功能列表，每个功能应以符合 azure-ai-documentintelligence 包中 DocumentAnalysisFeature 枚举的 str 格式传递。默认值为 None。

示例：¶

>>> obj = AzureAIDocumentIntelligenceLoader(
...     file_path="path/to/file",
...     api_endpoint="https://endpoint.azure.com",
...     api_key="APIKEY",
...     api_version="2023-10-31-preview",
...     api_model="prebuilt-layout",
...     mode="markdown"
... )

参数

api_endpoint (str) –
api_key (str) –
file_path (Optional[str]) –
url_path (Optional[str]) –
api_version (可选[[str]]) –
api_model (str) –
mode (str) –
analysis_features (可选[[str]]) –

返回类型

无

async alazy_load() → AsyncIterator[Document]¶

Documents 的懒加载器。

返回类型: AsyncIterator[Document]

async aload() → List[Document]¶

将数据加载到 Document 对象中。

返回类型: List[Document]

lazy_load() → Iterator[Document][source]¶

以页面方式懒加载数据。

返回类型: Iterator[Document]

load() → List[Document]¶

将数据加载到 Document 对象中。

返回类型: List[Document]

load_and_split(text_splitter: Optional[TextSplitter] = None) → List[Document]¶

加载文档并将它们拆分为块。块以Document的形式返回。

不要重写此方法。应考虑将其弃用！

参数: text_splitter (可选[TextSplitter]) – 使用于文档分段的 TextSplitter 实例。默认为 RecursiveCharacterTextSplitter。
返回: 文档列表。
返回类型: List[Document]