langchain.smith.evaluation.runner_utils
.run_on_dataset¶
- langchain.smith.evaluation.runner_utils.run_on_dataset(client: Optional[Client], dataset_name: str, llm_or_chain_factory: Union[Callable[[], Union[Chain, Runnable]], BaseLanguageModel, Callable[[dict], Any], Runnable, Chain], *, evaluation: Optional[RunEvalConfig] = None, dataset_version: Optional[Union[datetime, str]] = None, concurrency_level: int = 5, project_name: Optional[str] = None, project_metadata: Optional[Dict[str, Any]] = None, verbose: bool = False, revision_id: Optional[str] = None, **kwargs: Any) Dict[str, Any] [source]¶
在数据集上运行链或语言模型并将跟踪信息存储到指定的项目名称。
- 参数
dataset_name (str) – 要运行链的数据集名称。
llm_or_chain_factory (Union[Callable[[], Union[Chain, Runnable]], BaseLanguageModel, Callable[[dict], Any], Runnable, Chain]) – 运行在数据集上的语言模型或链构造器。链构造器用于允许对每个示例独立调用而不携带状态。
evaluation (Optional[RunEvalConfig]) – 评估器在链的结果上运行的配置
concurrency_level (int) – 同时运行的异步任务数量。
project_name (Optional[str]) – 将跟踪的数据存储在项目中的名称。默认为 {dataset_name}-{chain class name}-{datetime}。
project_metadata (Optional[Dict[str, Any]]) – 可选项目元数据,用于存储测试变体的信息。(提示版本、模型版本等。)
client (Optional[Client]) – 用于访问数据集和记录反馈以及运行跟踪的 LangSmith 客户端。
verbose (bool) – 是否打印进度。
tags – 添加到项目每个运行中的标签。
revision_id (Optional[str]) – 可选修订标识符,将该测试运行分配给以跟踪系统不同版本的性能。
dataset_version (Optional[Union[datetime, str]]) –
kwargs (Any) –
- 返回
包含运行项目名称和模型输出的字典。
- 返回类型
Dict[str, Any]
有关此函数的大多数情况下更快版本的异步版本,请参阅
arun_on_dataset()
。示例
from langsmith import Client from langchain_openai import ChatOpenAI from langchain.chains import LLMChain from langchain.smith import smith_eval.RunEvalConfig, run_on_dataset # Chains may have memory. Passing in a constructor function lets the # evaluation framework avoid cross-contamination between runs. def construct_chain(): llm = ChatOpenAI(temperature=0) chain = LLMChain.from_string( llm, "What's the answer to {your_input_key}" ) return chain # Load off-the-shelf evaluators via config or the EvaluatorType (string or enum) evaluation_config = smith_eval.RunEvalConfig( evaluators=[ "qa", # "Correctness" against a reference answer "embedding_distance", smith_eval.RunEvalConfig.Criteria("helpfulness"), smith_eval.RunEvalConfig.Criteria({ "fifth-grader-score": "Do you have to be smarter than a fifth grader to answer this question?" }), ] ) client = Client() run_on_dataset( client, dataset_name="<my_dataset_name>", llm_or_chain_factory=construct_chain, evaluation=evaluation_config, )
您还可以通过从
StringEvaluator
或 LangSmith 的 RunEvaluator 类派生来自定义评估器。from typing import Optional from langchain.evaluation import StringEvaluator class MyStringEvaluator(StringEvaluator): @property def requires_input(self) -> bool: return False @property def requires_reference(self) -> bool: return True @property def evaluation_name(self) -> str: return "exact_match" def _evaluate_strings(self, prediction, reference=None, input=None, **kwargs) -> dict: return {"score": prediction == reference} evaluation_config = smith_eval.RunEvalConfig( custom_evaluators = [MyStringEvaluator()], ) run_on_dataset( client, dataset_name="<my_dataset_name>", llm_or_chain_factory=construct_chain, evaluation=evaluation_config, )