langchain_community.document_loaders.arxiv
.ArxivLoader¶
- class langchain_community.document_loaders.arxiv.ArxivLoader(query: str, doc_content_chars_max: Optional[int] = None, **kwargs: Any)[来源]¶
从Arxiv加载查询结果。加载器将原始PDF格式转换为文本。
- 设置
安装
arxiv
和PyMuPDF
包。PyMuPDF
将从arxiv.org网站下载的PDF文件转换为文本格式。pip install -U arxiv pymupdf
- 实例化
from langchain_community.document_loaders import ArxivLoader loader = ArxivLoader( query="reasoning", # load_max_docs=2, # load_all_available_meta=False )
- 加载
docs = loader.load() print(docs[0].page_content[:100]) print(docs[0].metadata)
- 延迟加载
docs = [] docs_lazy = loader.lazy_load() # async variant: # docs_lazy = await loader.alazy_load() for doc in docs_lazy: docs.append(doc) print(docs[0].page_content[:100]) print(docs[0].metadata)
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggre { 'Published': '2024-02-29', 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang', 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning...' }
- 异步加载
docs = await loader.aload() print(docs[0].page_content[:100]) print(docs[0].metadata)
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggre { 'Published': '2024-02-29', 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang', 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning...' }
- 使用文章摘要作为文档
from langchain_community.document_loaders import ArxivLoader loader = ArxivLoader( query="reasoning" ) docs = loader.get_summaries_as_docs() print(docs[0].page_content[:100]) print(docs[0].metadata)
Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning { 'Entry ID': 'http://arxiv.org/abs/2402.03268v2', 'Published': datetime.date(2024, 2, 29), 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang' }
使用搜索查询初始化以在Arxiv中查找文档。支持 ArxivAPIWrapper 的所有参数。
- 参数
query (str) – 用于在Arxiv中查找文档的文本
doc_content_chars_max (可选[int]) – 文档内容长度的切割限制
kwargs (Any) –
方法
__init__
(query[, doc_content_chars_max])使用搜索查询初始化以在Arxiv中查找文档。
Document的延迟加载器。
aload
()将数据加载到Document对象中。
使用论文摘要作为文档,而不是Arvix原始论文
延迟加载Arvix文档
load
()将数据加载到Document对象中。
load_and_split
([text_splitter])加载文档并分割为块。
- __init__(query: str, doc_content_chars_max: Optional[int] = None, **kwargs: Any)[来源]¶
使用搜索查询初始化以在Arxiv中查找文档。支持 ArxivAPIWrapper 的所有参数。
- 参数
query (str) – 用于在Arxiv中查找文档的文本
doc_content_chars_max (可选[int]) – 文档内容长度的切割限制
kwargs (Any) –
- load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document] ¶
加载文档并将其分割成块。将块作为Document返回。
不要重写此方法。应将其视为已弃用!
- 参数
text_splitter (可选):用于分割文档的TextSplitter实例。默认为RecursiveCharacterTextSplitter。
- 返回
文档列表。
- 返回类型
List[文档]