Frontier Labs Benchmark
AI Models Face Off in Poker Arena: Claude Opus 4.5 Emerges as Early Leader in New Reasoning Benchmark
A new informal benchmark pits leading LLMs in fast-paced poker tournaments to measure probabilistic reasoning, bluffing, and decision-making under uncertainty.
Tech Correspondent Jan 22 7 min read