Bringing Computer-Use to the Web
Published on August 5, 2025 by Morgan Dean
In one of our original posts, we explored building Computer-Use Operators on macOS - first with a manual implementation using OpenAI's computer-use-preview
model, then with our cua-agent framework for Python developers. While these tutorials have been incredibly popular, we've received consistent feedback from our community: "Can we use C/ua with JavaScript and TypeScript?"
Today, we're excited to announce the release of the @cua/computer
Web SDK - a new library that allows you to control your C/ua cloud containers from any JavaScript or TypeScript project. With this library, you can click, type, and grab screenshots from your cloud containers - no extra servers required.
With this new SDK, you can easily develop CUA experiences like the one below, which we will release soon as open source.
Let’s see how it works.
What You'll Learn
By the end of this tutorial, you'll be able to:
- Set up the
@cua/computer
npm library in any JavaScript/TypeScript project - Connect OpenAI's computer-use model to C/ua cloud containers from web applications
- Build computer-use agents that work in Node.js, React, Vue, or any web framework
- Handle different types of computer actions (clicking, typing, scrolling) from web code
- Implement the complete computer-use loop in JavaScript/TypeScript
- Integrate AI automation into existing web applications and workflows
Prerequisites:
- Node.js 16+ and npm/yarn/pnpm
- Basic JavaScript or TypeScript knowledge
- OpenAI API access (Tier 3+ for computer-use-preview)
- C/ua cloud container credits (get started here)
Estimated Time: 45-60 minutes
Access Requirements
OpenAI Model Availability
At the time of writing, the computer-use-preview model has limited availability:
- Only accessible to OpenAI tier 3+ users
- Additional application process may be required even for eligible users
- Cannot be used in the OpenAI Playground
- Outside of ChatGPT Operator, usage is restricted to the new Responses API
Luckily, the @cua/computer
library can be used in conjunction with other models, like Anthropic’s Computer Use or UI-TARS. You’ll just have to write your own handler to parse the model output for interfacing with the container.
C/ua Cloud Containers
To follow this guide, you’ll need access to a C/ua cloud container.
Getting access is simple: purchase credits from our pricing page, then create and provision a new container instance from the dashboard. With your container running, you'll be ready to leverage the web SDK and bring automation to your JavaScript or TypeScript applications.
Understanding the Flow
OpenAI API Overview
Let's start with the basics. In our case, we'll use OpenAI's API to communicate with their computer-use model.
Think of it like this:
- We send the model a screenshot of our container and tell it what we want it to do
- The model looks at the screenshot and decides what actions to take
- It sends back instructions (like "click here" or "type this")
- We execute those instructions in our container.
Model Setup
Here's how we set up the computer-use model for web development:
javascriptconst res = await openai.responses.create({ model: 'computer-use-preview', tools: [ { type: 'computer_use_preview', display_width: 1024, display_height: 768, environment: 'linux', // we're using a linux container }, ], input: [ { role: 'user', content: [ // what we want the ai to do { type: 'input_text', text: 'Open firefox and go to trycua.com' }, // first screenshot of the vm { type: 'input_image', image_url: `data:image/png;base64,${screenshotBase64}`, detail: 'auto', }, ], }, ], truncation: 'auto' });
Understanding the Response
When we send a request, the API sends back a response that looks like this:
json"output": [ { "type": "reasoning", // The AI explains what it's thinking "id": "rs_67cc...", "summary": [ { "type": "summary_text", "text": "Clicking on the browser address bar." } ] }, { "type": "computer_call", // The actual action to perform "id": "cu_67cc...", "call_id": "call_zw3...", // Used to track previous calls "action": { "type": "click", // What kind of action (click, type, etc.) "button": "left", // Which mouse button to use "x": 156, // Where to click (coordinates) "y": 50 }, "pending_safety_checks": [], // Any safety warnings to consider "status": "completed" // Whether the action was successful } ]
Each response contains:
- Reasoning: The AI's explanation of what it's doing
- Action: The specific computer action to perform
- Safety Checks: Any potential risks to review
- Status: Whether everything worked as planned
Implementation Guide
Provision a C/ua Cloud Container
- Visit trycua.com, sign up, purchase credits, and create a new container instance from the dashboard.
- Create an API key from the dashboard — be sure to save it in a secure location before continuing.
- Start the cloud container from the dashboard.
Environment Setup
-
Install required packages with your preferred package manager:
bashnpm install --save @trycua/computer # or yarn, pnpm, bun npm install --save openai # or yarn, pnpm, bun
Works with any JavaScript/TypeScript project setup - whether you're using Create React App, Next.js, Vue, Angular, or plain JavaScript.
-
Save your OpenAI API key, C/ua API key, and container name to a
.env
file:bashOPENAI_KEY=openai-api-key CUA_KEY=cua-api-key CUA_CONTAINER_NAME=cua-cloud-container-name
These environment variables work the same whether you're using vanilla JavaScript, TypeScript, or any web framework.
Building the Agent
Mapping API Actions to @trycua/computer
Interface Methods
This helper function handles a computer_call
action from the OpenAI API — converting the action into an equivalent action from the @trycua/computer
interface. These actions will execute on the initialized Computer
instance. For example, await computer.interface.leftClick()
sends a mouse left click to the current cursor position.
Whether you're using JavaScript or TypeScript, the interface remains the same:
javascriptexport async function executeAction( computer: Computer, action: OpenAI.Responses.ResponseComputerToolCall['action'] ) { switch (action.type) { case 'click': const { x, y, button } = action; console.log(`Executing click at (${x}, ${y}) with button '${button}'.`); await computer.interface.moveCursor(x, y); if (button === 'right') await computer.interface.rightClick(); else await computer.interface.leftClick(); break; case 'type': const { text } = action; console.log(`Typing text: ${text}`); await computer.interface.typeText(text); break; case 'scroll': const { x: locX, y: locY, scroll_x, scroll_y } = action; console.log( `Scrolling at (${locX}, ${locY}) with offsets (scroll_x=${scroll_x}, scroll_y=${scroll_y}).` ); await computer.interface.moveCursor(locX, locY); await computer.interface.scroll(scroll_x, scroll_y); break; case 'keypress': const { keys } = action; for (const key of keys) { console.log(`Pressing key: ${key}.`); // Map common key names to CUA equivalents if (key.toLowerCase() === 'enter') { await computer.interface.pressKey('return'); } else if (key.toLowerCase() === 'space') { await computer.interface.pressKey('space'); } else { await computer.interface.pressKey(key); } } break; case 'wait': console.log(`Waiting for 3 seconds.`); await new Promise((resolve) => setTimeout(resolve, 3 * 1000)); break; case 'screenshot': console.log('Taking screenshot.'); // This is handled automatically in the main loop, but we can take an extra one if requested const screenshot = await computer.interface.screenshot(); return screenshot; default: console.log(`Unrecognized action: ${action.type}`); break; } }
Implementing the Computer-Use Loop
This section defines a loop that:
- Initializes the
Computer
instance (connecting to a Linux cloud container). - Captures a screenshot of the current state.
- Sends the screenshot (with a user prompt) to the OpenAI Responses API using the
computer-use-preview
model. - Processes the returned
computer_call
action and executes it using our helper function. - Captures an updated screenshot after the action.
- Send the updated screenshot and loops until no more actions are returned.
javascriptconst openai = new OpenAI({ apiKey: process.env.OPENAI_KEY }); // Initialize the Computer Connection const computer = new Computer({ apiKey: process.env.CUA_KEY!, name: process.env.CUA_CONTAINER_NAME!, osType: OSType.LINUX, }); await computer.run(); // Take the initial screenshot const screenshot = await computer.interface.screenshot(); const screenshotBase64 = screenshot.toString('base64'); // Setup openai config for computer use const computerUseConfig: OpenAI.Responses.ResponseCreateParamsNonStreaming = { model: 'computer-use-preview', tools: [ { type: 'computer_use_preview', display_width: 1024, display_height: 768, environment: 'linux', // we're using a linux vm }, ], truncation: 'auto', }; // Send initial screenshot to the openai computer use model let res = await openai.responses.create({ ...computerUseConfig, input: [ { role: 'user', content: [ // what we want the ai to do { type: 'input_text', text: 'open firefox and go to trycua.com' }, // current screenshot of the vm { type: 'input_image', image_url: `data:image/png;base64,${screenshotBase64}`, detail: 'auto', }, ], }, ], }); // Loop until there are no more computer use actions. while (true) { const computerCalls = res.output.filter((o) => o.type === 'computer_call'); if (computerCalls.length < 1) { console.log('No more computer calls. Loop complete.'); break; } // Get the first call const call = computerCalls[0]; const action = call.action; console.log('Received action from OpenAI Responses API:', action); let ackChecks: OpenAI.Responses.ResponseComputerToolCall.PendingSafetyCheck[] = []; if (call.pending_safety_checks.length > 0) { console.log('Safety checks pending:', call.pending_safety_checks); // In a real implementation, you would want to get user confirmation here. ackChecks = call.pending_safety_checks; } // Execute the action in the container await executeAction(computer, action); // Wait for changes to process within the container (1sec) await new Promise((resolve) => setTimeout(resolve, 1000)); // Capture new screenshot const newScreenshot = await computer.interface.screenshot(); const newScreenshotBase64 = newScreenshot.toString('base64'); // Screenshot back as computer_call_output res = await openai.responses.create({ ...computerUseConfig, previous_response_id: res.id, input: [ { type: 'computer_call_output', call_id: call.call_id, acknowledged_safety_checks: ackChecks, output: { type: 'computer_screenshot', image_url: `data:image/png;base64,${newScreenshotBase64}`, }, }, ], }); }
You can find the full example on GitHub.
What's Next?
The @cua/computer
Web SDK opens up some interesting possibilities. You could build browser-based testing tools, create interactive demos for your products, or automate repetitive workflows directly from your web apps.
We're working on more examples and better documentation - if you build something cool with this SDK, we'd love to see it. Drop by our Discord and share what you're working on.
Happy automating on the web!