Google AI Introduces Gemini 2.5 'Computer Use' (Preview): A Browser-Control Model to Power AI Agents to Interact with User Interfaces

Thank you for reading this post, don't forget to subscribe!

Which of your browser workflows would you delegate today if an agent could plan and execute predefined UI actions? Google AI introduces Gemini 2.5 Computer Use, a specialized variant of Gemini 2.5 that plans and executes real UI actions in a live browser via a constrained action API. It’s available in public preview through Google AI Studio and Vertex AI. The model targets web automation and UI testing, with documented, human-judged gains on standard web/mobile control benchmarks and a safety layer that can require human confirmation for risky steps.

What the model actually ships?

Developers call a new computer_use tool that returns function calls like click_at, type_text_at, or drag_and_drop. Client code executes the action (e.g., Playwright/Browserbase), captures a fresh screenshot/URL, and loops until the task ends or a safety rule blocks it. The supported action space is 13 predefined UI actions—open_web_browser, wait_5_seconds, go_back, go_forward, search, navigate, click_at, hover_at, type_text_at, key_combination, scroll_document, scroll_at, drag_and_drop—and can be extended with custom functions (e.g., open_app, long_press_at, go_home) for non-browser surfaces.

https://blog.google/technology/google-deepmind/gemini-computer-use-model/

What is the scope and constraints?

The model is optimized for web browsers. Google states it is not yet optimized for desktop OS-level control; mobile scenarios work by swapping in custom actions under the same loop. A built-in safety monitor can block prohibited actions or require user confirmation before “high-stakes” operations (payments, sending messages, accessing sensitive records).

Measured performance

Online-Mind2Web (official): 69.0% pass@1 (majority-vote human judgments), validated by benchmark organizers.

Browserbase matched harness: Leads competing computer-use APIs on both accuracy and latency across Online-Mind2Web and WebVoyager under identical time/step/environment constraints. Google’s model card lists 65.7% (OM2W) and 79.9% (WebVoyager) in Browserbase runs.

Latency/quality trade-off (Google figure): ~70%+ accuracy at ~225 s median latency on the Browserbase OM2W harness. Treat as Google-reported, with human evaluation.

AndroidWorld (mobile generalization): 69.7% measured by Google; achieved via the same API loop with custom mobile actions and excluded browser actions.

Early production signals

Automated UI test repair: Google’s payments platform team reports the model rehabilitates >60% of previously failing automated UI test executions. This is attributed (and should be cited) to public reporting rather than the core blog post.

Operational speed: Poke.com (early external tester) reports workflows often ~50% faster versus their next-best alternative.

Gemini 2.5 Computer Use is in public preview via Google AI Studio and Vertex AI; it exposes a constrained API with 13 documented UI actions and requires a client-side executor. Google’s materials and the model card report state-of-the-art results on web/mobile control benchmarks, and Browserbase’s matched harness shows ~65.7% pass@1 on Online-Mind2Web with leading latency under identical constraints. The scope is browser-first with per-step safety/confirmation. These data points justify measured evaluation in UI testing and web ops.

Check out the GitHub Page and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

Source link

Google AI Introduces Gemini 2.5 ‘Computer Use’ (Preview): A Browser-Control Model to Power AI Agents to Interact with User Interfaces

What the model actually ships?

What is the scope and constraints?

Measured performance

Early production signals

Is XRP Becoming the World’s Real-Time Settlement Network?