How Early Access to NVIDIA GB200...

LMArena at the University of California, Berkeley is making it easier to see which large language models excel at specific tasks, thanks to help from NVIDIA and Nebius. Its rankings, powered by the Prompt-to-Leaderboard (P2L) model, collect votes from humans on which AI performs best in areas such as math, coding, or creative writing.

“We capture user preferences across tasks and apply Bradley-Terry coefficients to identify which model performs best in each domain,” said Wei-Lin Chiang, co-founder of LMArena and a doctoral student at Berkeley. LMArena (formerly LMSys) has been developing P2L for the past two years.

LMArena is working with NVIDIA DGX Cloud and Nebius AI Cloud to deploy P2L at scale. This collaboration—and LMArena’s use of NVIDIA GB200 NVL72—has enabled the development of scalable, production-ready AI workloads in the cloud. NVIDIA AI experts provided hands-on support throughout the project, fostering a cycle of rapid feedback and co-learning that helped refine both P2L and the DGX Cloud platform.

A diagram showing how human-generated rankings train the P2L model to direct traffic to the most effective LLM. — *Figure 1: How P2L routes prompt traffic to the best LLM possible.*

At the heart of P2L is a real-time feedback loop: Visitors compare AI-generated responses and vote for the best one, creating detailed, prompt-specific leaderboards. In essence, LMArena utilizes human ranking to train P2L, enabling it to determine the optimal outcome in terms of result quality for LLM queries.

“We wanted more than a single overall ranking,” said Evan Frick, an LMArena senior researcher and Berkeley doctoral student. “One model might excel at math but be average at writing. A single score often hides these nuances.”

In addition to personalized leaderboards, P2L enables cost-based routing. Users can set a budget (e.g., $5 per hour), and the system will automatically select the best-performing model within that limit.

LMArena’s P2L ranks models by query cost and selects the best-performing one to perform the desired operation. — *Figure 2: How different models are ranked by query cost on LMArena’s P2L*

How Early Access to NVIDIA GB200...

Bringing P2L to production: LMArena, Nebius, and NVIDIA

NVIDIA GB200 NVL72: Flexible, scalable, developer-ready

Open source enablement and ecosystem readiness

Key takeaways

Experience NVIDIA GB200 NVL72 at Nebius with NVIDIA DGX Cloud