langchain_text_splitters.base
.Tokenizer¶
- class langchain_text_splitters.base.Tokenizer(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]])[source]¶
Tokenizer 数据类。
属性
chunk_overlap
片段间词汇重叠
tokens_per_chunk
每个片段最大词汇数
decode
将一系列词汇 ID 解码为字符串的函数
encode
将字符串编码为词汇 ID 列表的函数
方法
__init__
(chunk_overlap, tokens_per_chunk, ...)- 参数
chunk_overlap (int) –
tokens_per_chunk (int) –
decode (Callable[[List[int]], str]) –
encode (Callable[[str], List[int]]) –
- __init__(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]]) None ¶
- 参数
chunk_overlap (int) –
tokens_per_chunk (int) –
decode (Callable[[List[int]], str]) –
encode (Callable[[str], List[int]]) –
- 返回类型
None