Toolkits

Toolkits are collections of related tools that an agent can use to perform actions. They are the primary way to extend an agent's capabilities.

`AsyncBaseToolkit`

All toolkits inherit from the AsyncBaseToolkit abstract base class. This class provides a standardized interface for creating and managing tools. The core requirement for any toolkit is to implement the get_tools_map() method, which returns a dictionary mapping tool names to their corresponding Python functions.

The base class automatically handles the conversion of these functions into FunctionTool objects that the agent runner can understand and execute.

All available toolkits are registered in the TOOLKIT_MAP dictionary within utu/tools/__init__.py.

Summary of Core Toolkits

Here is a summary of some key toolkits available in the framework:

Toolkit Class	Provided Tools (Functions)	Core Functionality & Mechanism
SearchToolkit	`search_google_api`, `web_qa`	Performs web searches using the Serper API and reads webpage content using the Jina API. It can use an LLM to answer questions based on page content.
DocumentToolkit	`document_qa`	Processes local or remote documents (PDF, DOCX, etc.). It uses the `chunkr.ai` service to parse the document and an LLM to answer questions or provide a summary.
PythonExecutorToolkit	`execute_python_code`	Executes Python code snippets in an isolated environment using `IPython.core.interactiveshell`. It runs in a separate thread to prevent blocking and can capture outputs, errors, and even `matplotlib` plots.
BashToolkit	`run_bash`	Provides a persistent local shell session using the `pexpect` library. This allows the agent to run a series of commands that maintain state (e.g., current directory).
ImageToolkit	`image_qa`	Answers questions about an image or provides a detailed description. It uses a vision-capable LLM to analyze the image content.
AudioToolkit	`audio_qa`	Transcribes audio files using an audio model and then uses an LLM to answer questions based on the transcription.
CodesnipToolkit	`run_code`	Executes code in various languages (Python, C++, JS, etc.) by sending it to a remote sandbox service (like SandboxFusion) and returning the result.