Skip to content

Configuration System

The project's configuration is managed through a system based on pydantic for data validation and hydra for loading from YAML files. This provides a powerful and flexible way to define agents and experiments.

All configurations are stored as .yaml files inside the /configs directory.

ConfigLoader

The ConfigLoader is the main entry point for loading configurations from YAML files into pydantic models. It abstracts away the file paths and loading logic.

Usage:

from utu.config import ConfigLoader

# Load an agent configuration from /configs/agents/my_agent.yaml
agent_config = ConfigLoader.load_agent_config("my_agent")

# Load an evaluation configuration from /configs/eval/my_eval.yaml
eval_config = ConfigLoader.load_eval_config("my_eval")


AgentConfig

AgentConfig is the central data structure for defining an agent. It specifies everything the agent needs to operate, including its model, tools, and personality.

Key Components:

  • type: The agent's architecture. Can be:
    • simple: A single agent that performs a task.
    • orchestra: A more complex, multi-agent system with a planner and workers.
  • model: (ModelConfigs) Defines the primary LLM the agent will use, including the API provider, model name, and settings like temperature.
  • agent: (ProfileConfig) Defines the agent's profile, such as its name and system-level instructions (e.g., "You are a helpful assistant.").
  • env: (EnvConfig) Specifies the environment the agent operates in (e.g., shell_local or browser_docker). See Agent Environments for more details.
  • toolkits: (dict[str, ToolkitConfig]) A dictionary defining the tools available to the agent. Each toolkit can be loaded in builtin mode (running in the main process) or mcp mode (running as a separate process).
  • max_turns: The maximum number of conversational turns the agent can take before stopping.

For the orchestra type, AgentConfig also includes fields for defining the planner, workers, and reporter agents.


EvalConfig

EvalConfig defines a complete evaluation experiment. It specifies the dataset to use, the agent to test, and how to judge the results.

Key Components:

  • data: (DataConfig) Defines the dataset to be used for the evaluation, including its name/path and the relevant fields (question_field, gt_field).
  • rollout: This section defines the execution phase of the evaluation.
    • agent: (AgentConfig) A full AgentConfig for the agent being tested.
    • concurrency: The number of parallel processes to use when running the agent on the dataset.
  • judgement: This section defines the judgment phase.
    • judge_model: (ModelConfigs) The configuration for the LLM that will act as the judge.
    • judge_concurrency: The number of parallel processes to use for judging the results.
    • eval_method: The method used for evaluation (e.g., comparing against a ground truth answer).