Automation

H Company’s Holo2-235B Preview sets new UI localization benchmark scores

H Company released Holo2-235B-A22B Preview for UI element localization, reporting 78.5 percent on ScreenSpot-Pro and 79.0 percent on OSWorld G.

H Company’s Holo2-235B Preview sets new UI localization benchmark scores
Feb 4, 2026
2 min read
By Marketing Team

Key Takeaways

  • H Company’s Holo2-235B-A22B Preview is a research release focused on UI element localization for GUI agents.
  • Reported performance includes 78.5 percent on ScreenSpot-Pro (agent mode, within three steps) and 79.0 percent on OSWorld G.
  • Single-step ScreenSpot-Pro accuracy is reported at 70.6 percent, highlighting the value of iterative “agentic localization.”
  • The company claims 10 to 20 percent relative gains across Holo2 sizes when using multi-step refinement.

Getting LLM-powered agents to reliably click the right button on a crowded 4K screen is still a bottleneck for automation teams. H Company is now positioning a new model as a step forward for UI element localization, the computer-vision task of mapping a text instruction like “open settings” to the exact on-screen element.

New Holo2-235B model targets UI element localization

H Company released Holo2-235B-A22B Preview as a research model focused on grounding instructions to specific UI components. The company says the model sets a new high-water mark on ScreenSpot-Pro, a benchmark tracked on the public ScreenSpot-Pro leaderboard.

For B2B marketers and e-commerce operators building browser automations, support bots, or internal agents, better localization can translate into fewer brittle scripts and less manual QA. In practical terms: a model that can find small icons, toggles, or menu items on dense layouts reduces failure rates when pages change.

The model is available via Hugging Face and is framed as a preview release rather than a fully productized offering.

Agentic localization improves accuracy in a few steps

H Company emphasizes “agentic localization,” where the system iteratively refines its predicted UI coordinates over multiple steps. Think of it as a short loop of “guess, zoom/adjust, re-guess” to narrow down a tiny target—useful on high-resolution interfaces where a single-pass prediction can be slightly off.

On ScreenSpot-Pro, the company reports 70.6 percent accuracy in a single step, rising to 78.5 percent within three steps in agent mode. It also reports 79.0 percent on OSWorld G. Across the Holo2 family, H Company claims the iterative approach delivers 10 to 20 percent relative gains.

For teams evaluating AI agents for customer operations or cross-app workflows, the takeaway is that step-wise localization may be a lever to trade small increases in latency for materially higher success rates on real UI tasks.

In the near term, this release is most relevant for builders benchmarking GUI agents and testing multi-step grounding strategies, especially on modern, high-DPI web apps.

Stay Informed

Weekly AI marketing insights

Join 5,000+ marketers. Unsubscribe anytime.

Related Topics

H CompanyHolo2GUI agentsUI localizationScreenSpot-ProOSWorld