A V2 of Anthropic's Computer Use Agent

A while back, I came across Anthropic’s Computer Use demo on GitHub.
Out of curiosity, I decided to build a version 2, mostly to see if I could extend it with some extra functionality. My goals for this version were:
- Upload a file from my local machine directly into the agent’s virtual environment.
- Ask the agent to open that file (it would need to search for it first).
- Query the agent with analytics questions on the file and get meaningful answers.
The original demo didn’t support file management, so I had to build that part myself. And out of pure frustration, I refused to use Streamlit for the frontend (which the original demo relied on). Instead, I built the frontend in React and the backend in FastAPI.
Frontend responsibilities
- Left panel — shows current and historical sessions.
- Right panel — a chat interface for talking with the agent, plus a file upload section.
- Middle panel — displays the agent’s virtual desktop so you can actually see what it’s doing.
Backend responsibilities
- Save session states and chat contexts separately.
- Maintain the agentic loop and execute tool calls.
- Expose APIs for chat, session management, and file handling.
It’s a minimalistic monorepo that includes both the frontend and backend.
Here’s what the resulting React overlay looks like:
I uploaded a simple cats.csv file, asked the agent to open it, and then queried it about the contents. It surprisingly gave correct answers.
Here’s the result:
Key Learnings
Agentic Loops
I spent a good amount of time rebuilding the agentic looping logic after separating it from Anthropic’s original implementation. Understanding how to parse and handle tool calls took some effort. You can find that logic in services/chat.py.
Setting Up a VNC Connection
Setting up the VNC connection in headless mode was actually pretty fun. It involved exposing the port from the VNC server and connecting to it from the React frontend, enabling live visualization of the agent’s desktop.