Running Models Locally
You can run open-source LLMs and vision models on your own machine using cua, without relying on cloud APIs. This is ideal for development, privacy, or running on air-gapped systems.
Hugging Face (transformers)
Use the huggingface-local/
prefix to run any Hugging Face model locally via the transformers
library. This supports most text and vision models from the Hugging Face Hub.
Example:
model = "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
MLX (Apple Silicon)
Use the mlx/
prefix to run models using the mlx-vlm
library, optimized for Apple Silicon (M1/M2/M3). This allows fast, local inference for many open-source models.
Example:
model = "mlx/mlx-community/UI-TARS-1.5-7B-6bit"
Ollama
Use the ollama_chat/
prefix to run models using the ollama
library. This allows fast, local inference for many open-source models.
Example:
model = "omniparser+ollama_chat/llama3.2:latest"
For details on all supported providers, see Supported Agents.