Automation

H Company’s Holo2-235B model tops UI element localization on ScreenSpot-Pro

H Company reports new best-in-class results for UI grounding, with Holo2-235B-A22B reaching 78.5 percent on ScreenSpot-Pro in agent mode.

H Company’s Holo2-235B model tops UI element localization on ScreenSpot-Pro
Feb 6, 2026
2 min read
By Marketing Team

Key Takeaways

  • H Company reports 78.5 percent on ScreenSpot-Pro and 79.0 percent on OSWorld G for Holo2-235B-A22B Preview.
  • One-shot accuracy is reported at 70.6 percent on ScreenSpot-Pro, highlighting the latency-versus-accuracy tradeoff.
  • Agentic localization (iterative refinement) reaches the best score within three steps and is positioned for 4K UI scenarios.

Pinpointing a tiny button on a 4K interface is one of the hardest problems in GUI automation, and it’s where UI “grounding” models win or fail. H Company says its latest AI model, Holo2-235B-A22B Preview, now leads on key public benchmarks for UI element localization, a capability that underpins reliable browser agents, QA automation, and assistive workflows.

New benchmark results for UI localization and GUI grounding

H Company positions Holo2-235B-A22B Preview as its largest UI localization-focused research release to date. On the ScreenSpot-Pro leaderboard, the company reports 78.5 percent accuracy, and 79.0 percent on OSWorld G. ScreenSpot-Pro is tracked publicly via the GUI Agent Grounding leaderboard at gui-agent.github.io/grounding-leaderboard, which makes it easier for teams to compare model behavior on the same evaluation set.

The big detail for practitioners: the model’s performance differs substantially depending on how it’s run. In a single step (one-shot prediction), H Company reports 70.6 percent accuracy on ScreenSpot-Pro. That matters if you’re optimizing for latency or cost in production, where you may not want multi-step inference.

Agentic localization: iterative refinement for 4K UIs

H Company attributes much of the gain to “agentic localization,” an iterative approach where the model refines its predicted UI element position across multiple steps. In practical terms, this is similar to letting an agent take several passes at narrowing a bounding box instead of committing immediately—useful when UI elements are small relative to the full screen.

According to the company, running in agent mode reaches 78.5 percent accuracy within three steps, translating into 10 to 20 percent relative improvements across Holo2 model sizes. For B2B teams building automated onboarding flows, test bots, or customer support agents that click through SaaS dashboards, multi-step grounding can be the difference between brittle demos and dependable workflows.

Holo2-235B-A22B Preview is available on Hugging Face at huggingface.co/Hcompany/Holo2-235B-A22B, and is framed as a research release rather than a fully productized agent stack.

Stay Informed

Weekly AI marketing insights

Join 5,000+ marketers. Unsubscribe anytime.

Related Topics

H CompanyHolo2UI localizationGUI groundingagentic workflowsOSWorldScreenSpot-Pro