Running Models Locally

You can run open-source LLMs and vision models on your own machine using cua, without relying on cloud APIs. This is ideal for development, privacy, or running on air-gapped systems.

Hugging Face (transformers)

Use the huggingface-local/ prefix to run any Hugging Face model locally via the transformers library. This supports most text and vision models from the Hugging Face Hub.

Example:

model = "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"

MLX (Apple Silicon)

Use the mlx/ prefix to run models using the mlx-vlm library, optimized for Apple Silicon (M1/M2/M3). This allows fast, local inference for many open-source models.

Example:

model = "mlx/mlx-community/UI-TARS-1.5-7B-6bit"

Ollama

Use the ollama_chat/ prefix to run models using the ollama library. This allows fast, local inference for many open-source models.

Example:

model = "omniparser+ollama_chat/llama3.2:latest"

For details on all supported providers, see Supported Agents.

Running Models Locally

Hugging Face (transformers)

MLX (Apple Silicon)

Ollama

On this page