Agent
Reference for the current version of the Agent library.
The Agent library provides the ComputerAgent class and tools for building AI agents that automate workflows on Cua Computers.
Reference
Basic Usage
from agent import ComputerAgent
from computer import Computer
computer = Computer() # Connect to a cua container
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer]
)
prompt = "open github, navigate to trycua/cua"
async for result in agent.run(prompt):
print("Agent:", result["output"][-1]["content"][0]["text"])
ComputerAgent Constructor Options
The ComputerAgent
constructor provides a wide range of options for customizing agent behavior, tool integration, callbacks, resource management, and more.
Parameter | Type | Default | Description |
---|---|---|---|
model | str | required | Model name (e.g., "claude-3-5-sonnet-20241022", "computer-use-preview", "omni+vertex_ai/gemini-pro") |
tools | List[Any] | None | List of tools (e.g., computer objects, decorated functions) |
custom_loop | Callable | None | Custom agent loop function (overrides auto-selection) |
only_n_most_recent_images | int | None | If set, only keep the N most recent images in message history (adds ImageRetentionCallback) |
callbacks | List[Any] | None | List of AsyncCallbackHandler instances for preprocessing/postprocessing |
verbosity | int | None | Logging level (logging.DEBUG , logging.INFO , etc.; adds LoggingCallback) |
trajectory_dir | str | None | Directory to save trajectory data (adds TrajectorySaverCallback) |
max_retries | int | 3 | Maximum number of retries for failed API calls |
screenshot_delay | float | int | 0.5 | Delay before screenshots (seconds) |
use_prompt_caching | bool | False | Use prompt caching to avoid reprocessing the same prompt (mainly for Anthropic) |
max_trajectory_budget | float | dict | None | If set, adds BudgetManagerCallback to track usage costs and stop when budget is exceeded |
**kwargs | any | Additional arguments passed to the agent loop |
Parameter Details
- model: The LLM or agent model to use. Determines which agent loop is selected unless
custom_loop
is provided. - tools: List of tools the agent can use (e.g.,
Computer
, sandboxed Python functions, etc.). - custom_loop: Optional custom agent loop function. If provided, overrides automatic loop selection.
- only_n_most_recent_images: If set, only the N most recent images are kept in the message history. Useful for limiting memory usage. Automatically adds
ImageRetentionCallback
. - callbacks: List of callback instances for advanced preprocessing, postprocessing, logging, or custom hooks. See Callbacks & Extensibility.
- verbosity: Logging level (e.g.,
logging.INFO
). If set, adds a logging callback. - trajectory_dir: Directory path to save full trajectory data, including screenshots and responses. Adds
TrajectorySaverCallback
. - max_retries: Maximum number of retries for failed API calls (default: 3).
- screenshot_delay: Delay (in seconds) before taking screenshots (default: 0.5).
- use_prompt_caching: Enables prompt caching for repeated prompts (mainly for Anthropic models).
- max_trajectory_budget: If set (float or dict), adds a budget manager callback that tracks usage costs and stops execution if the budget is exceeded. Dict allows advanced options (e.g.,
{ "max_budget": 5.0, "raise_error": True }
). - **kwargs: Any additional keyword arguments are passed through to the agent loop or model provider.
Example with advanced options:
from agent import ComputerAgent
from computer import Computer
from agent.callbacks import ImageRetentionCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[Computer(...)],
only_n_most_recent_images=3,
callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)],
verbosity=logging.INFO,
trajectory_dir="trajectories",
max_retries=5,
screenshot_delay=1.0,
use_prompt_caching=True,
max_trajectory_budget={"max_budget": 5.0, "raise_error": True}
)
Message Array (Multi-turn)
messages = [
{"role": "user", "content": "go to trycua on gh"},
# ... (reasoning, computer_call, computer_call_output, etc)
]
async for result in agent.run(messages):
# Handle output, tool invocations, screenshots, etc.
print("Agent:", result["output"][-1]["content"][0]["text"])
messages += result["output"] # Add agent output to message array
...
Supported Agent Loops
- Anthropic: Claude 4, 3.7, 3.5 models
- OpenAI: computer-use-preview
- UITARS: UI-TARS 1.5 models (Hugging Face, TGI)
- Omni: Omniparser + any LLM
See Agent Loops for supported models and details.
Callbacks & Extensibility
You can add preprocessing and postprocessing hooks using callbacks, or write your own by subclassing AsyncCallbackHandler
:
from agent.callbacks import ImageRetentionCallback, PIIAnonymizationCallback
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)]
)