LogoCua Documentation

Agent

Reference for the current version of the Agent library.

The Agent library provides the ComputerAgent class and tools for building AI agents that automate workflows on Cua Computers.

Reference

Basic Usage

from agent import ComputerAgent
from computer import Computer

computer = Computer()  # Connect to a cua container
agent = ComputerAgent(
    model="anthropic/claude-3-5-sonnet-20241022",
    tools=[computer]
)

prompt = "open github, navigate to trycua/cua"

async for result in agent.run(prompt):
    print("Agent:", result["output"][-1]["content"][0]["text"])

ComputerAgent Constructor Options

The ComputerAgent constructor provides a wide range of options for customizing agent behavior, tool integration, callbacks, resource management, and more.

ParameterTypeDefaultDescription
modelstrrequiredModel name (e.g., "claude-3-5-sonnet-20241022", "computer-use-preview", "omni+vertex_ai/gemini-pro")
toolsList[Any]NoneList of tools (e.g., computer objects, decorated functions)
custom_loopCallableNoneCustom agent loop function (overrides auto-selection)
only_n_most_recent_imagesintNoneIf set, only keep the N most recent images in message history (adds ImageRetentionCallback)
callbacksList[Any]NoneList of AsyncCallbackHandler instances for preprocessing/postprocessing
verbosityintNoneLogging level (logging.DEBUG, logging.INFO, etc.; adds LoggingCallback)
trajectory_dirstrNoneDirectory to save trajectory data (adds TrajectorySaverCallback)
max_retriesint3Maximum number of retries for failed API calls
screenshot_delayfloat | int0.5Delay before screenshots (seconds)
use_prompt_cachingboolFalseUse prompt caching to avoid reprocessing the same prompt (mainly for Anthropic)
max_trajectory_budgetfloat | dictNoneIf set, adds BudgetManagerCallback to track usage costs and stop when budget is exceeded
**kwargsanyAdditional arguments passed to the agent loop

Parameter Details

  • model: The LLM or agent model to use. Determines which agent loop is selected unless custom_loop is provided.
  • tools: List of tools the agent can use (e.g., Computer, sandboxed Python functions, etc.).
  • custom_loop: Optional custom agent loop function. If provided, overrides automatic loop selection.
  • only_n_most_recent_images: If set, only the N most recent images are kept in the message history. Useful for limiting memory usage. Automatically adds ImageRetentionCallback.
  • callbacks: List of callback instances for advanced preprocessing, postprocessing, logging, or custom hooks. See Callbacks & Extensibility.
  • verbosity: Logging level (e.g., logging.INFO). If set, adds a logging callback.
  • trajectory_dir: Directory path to save full trajectory data, including screenshots and responses. Adds TrajectorySaverCallback.
  • max_retries: Maximum number of retries for failed API calls (default: 3).
  • screenshot_delay: Delay (in seconds) before taking screenshots (default: 0.5).
  • use_prompt_caching: Enables prompt caching for repeated prompts (mainly for Anthropic models).
  • max_trajectory_budget: If set (float or dict), adds a budget manager callback that tracks usage costs and stops execution if the budget is exceeded. Dict allows advanced options (e.g., { "max_budget": 5.0, "raise_error": True }).
  • **kwargs: Any additional keyword arguments are passed through to the agent loop or model provider.

Example with advanced options:

from agent import ComputerAgent
from computer import Computer
from agent.callbacks import ImageRetentionCallback

agent = ComputerAgent(
    model="anthropic/claude-3-5-sonnet-20241022",
    tools=[Computer(...)],
    only_n_most_recent_images=3,
    callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)],
    verbosity=logging.INFO,
    trajectory_dir="trajectories",
    max_retries=5,
    screenshot_delay=1.0,
    use_prompt_caching=True,
    max_trajectory_budget={"max_budget": 5.0, "raise_error": True}
)

Message Array (Multi-turn)

messages = [
    {"role": "user", "content": "go to trycua on gh"},
    # ... (reasoning, computer_call, computer_call_output, etc)
]
async for result in agent.run(messages):
    # Handle output, tool invocations, screenshots, etc.
    print("Agent:", result["output"][-1]["content"][0]["text"])
    messages += result["output"] # Add agent output to message array
    ...

Supported Agent Loops

  • Anthropic: Claude 4, 3.7, 3.5 models
  • OpenAI: computer-use-preview
  • UITARS: UI-TARS 1.5 models (Hugging Face, TGI)
  • Omni: Omniparser + any LLM

See Agent Loops for supported models and details.

Callbacks & Extensibility

You can add preprocessing and postprocessing hooks using callbacks, or write your own by subclassing AsyncCallbackHandler:

from agent.callbacks import ImageRetentionCallback, PIIAnonymizationCallback

agent = ComputerAgent(
    model="anthropic/claude-3-5-sonnet-20241022",
    tools=[computer],
    callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)]
)