H Company’s Holo2-235B model tops UI element localization on ScreenSpot-Pro

Pinpointing a tiny button on a 4K interface is one of the hardest problems in GUI automation, and it’s where UI “grounding” models win or fail. H Company says its latest AI model, Holo2-235B-A22B Preview, now leads on key public benchmarks for UI element localization, a capability that underpins reliable browser agents, QA automation, and assistive workflows.

New benchmark results for UI localization and GUI grounding

H Company positions Holo2-235B-A22B Preview as its largest UI localization-focused research release to date. On the ScreenSpot-Pro leaderboard, the company reports 78.5 percent accuracy, and 79.0 percent on OSWorld G. ScreenSpot-Pro is tracked publicly via the GUI Agent Grounding leaderboard at gui-agent.github.io/grounding-leaderboard, which makes it easier for teams to compare model behavior on the same evaluation set.

The big detail for practitioners: the model’s performance differs substantially depending on how it’s run. In a single step (one-shot prediction), H Company reports 70.6 percent accuracy on ScreenSpot-Pro. That matters if you’re optimizing for latency or cost in production, where you may not want multi-step inference.

Agentic localization: iterative refinement for 4K UIs

H Company attributes much of the gain to “agentic localization,” an iterative approach where the model refines its predicted UI element position across multiple steps. In practical terms, this is similar to letting an agent take several passes at narrowing a bounding box instead of committing immediately—useful when UI elements are small relative to the full screen.

According to the company, running in agent mode reaches 78.5 percent accuracy within three steps, translating into 10 to 20 percent relative improvements across Holo2 model sizes. For B2B teams building automated onboarding flows, test bots, or customer support agents that click through SaaS dashboards, multi-step grounding can be the difference between brittle demos and dependable workflows.

Holo2-235B-A22B Preview is available on Hugging Face at huggingface.co/Hcompany/Holo2-235B-A22B, and is framed as a research release rather than a fully productized agent stack.

H Company’s Holo2-235B model tops UI element localization on ScreenSpot-Pro

Key Takeaways

New benchmark results for UI localization and GUI grounding

Agentic localization: iterative refinement for 4K UIs

Stay Informed

Related Topics