Quickstart
This guide will walk you through setting up the project, running your first agent, and executing evaluations.
1. Installation & Setup
First, clone the repository and set up the Python environment.
# Clone the project repository
git clone https://github.com/Tencent/uTu-agent.git
cd uTu-agent
# We use `uv` to manage the virtual environment and dependencies
# Create the virtual environment
uv venv
# Activate the environment
source .venv/bin/activate
# Install all dependencies, including development tools
uv sync --group dev
# Create your environment configuration file from the example
cp .env.example .env
After creating the .env
file, you must edit it to add your necessary API keys (e.g., OPENAI_API_KEY
, SERPER_API_KEY
, etc.).
2. Running an Agent
You can interact with agents directly from the command line using the cli_chat.py
script.
Simple Agent
Run a simple agent defined by a configuration file. For example, to run an agent with search capabilities:
# python scripts/cli_chat.py --help
python scripts/cli_chat.py --config_name simple_agents/search_agent.yaml --stream
Orchestra Agent
Run a multi-agent (Plan-and-Execute) orchestra agent by specifying its configuration file:
# TODO: add a web UI for orchestra agent
3. Running Evaluations
The framework includes a powerful evaluation harness to benchmark agent performance.
Run a Full Experiment
This command runs a complete evaluation, from agent rollout to judging.
python scripts/run_eval.py --config_name <your_eval_config> --exp_id <your_exp_id> --dataset WebWalkerQA --concurrency 5
Re-judge Existing Results
If you have already run the rollout and only want to re-run the judgment phase, use this script:
python scripts/run_eval_judge.py --config_name <your_eval_config> --exp_id <your_exp_id> --dataset WebWalkerQA
Dump Experiment Data
You can also dump the trajectories and results from the database for a specific experiment:
python scripts/db/dump_db.py --exp_id "<your_exp_id>"
4. Advanced Setup
Database Configuration
The evaluation framework uses a SQL database (defaulting to SQLite) to store datasets and experiment results. To use a different database (e.g., PostgreSQL), set the DB_URL
environment variable:
export DB_URL="postgresql://user:password@host:port/database"
Tracing
We use Phoenix as our default tracing service for observing agent behavior. To enable it, set the following environment variables:
- PHOENIX_ENDPOINT
- PHOENIX_BASE_URL
- PHOENIX_PROJECT_NAME
The framework also supports any tracing service compatible with the openai-agents
library. See the official list of tracing processors for more options.
5. Next Steps
- Explore Examples: Check the
/examples
directory for more detailed use cases and advanced scripts. - Dive into Evaluations: Learn more about how the evaluation framework works by reading the Evaluation Framework documentation.