Five tiny AIs agree at the very top, but could fill some serious radio time with their disagreements.
Mason Grimshaw
Jun 12, 2026/Data & AI · LLMs · Basketball
What they all agree on
1Michael Jordan2678
2Kareem Abdul-Jabbar2631
3LeBron James2585
4Bill Russell2494
5Kevin Durant2406
Numbers are Elo ratings, combined across all five judges.
What they very much don’t
#1Nikola Jokićllama3.1
#1Bill Russellmistral
#2Joel Embiidqwen2.5
#3Jerry Westmistral
I took five, tiny, open-source LLMs you can run on a Mac, asked each of them every possible matchup between 144 NBA all-time candidates (10,296 head-to-heads per model, 51,473 valid votes), and tallied who they think can ball. Fair warning, there are some TERRIBLE takes ahead.
How I asked.
The whole experiment is one prompt, run 51,480 times. Pick any two players and see how each judge voted:
SystemYou are a sports analyst. When asked to compare players, you must pick one. No hedging.
Who is the better NBA player:
or
?
Consider career achievements, impact, skill, and legacy. You MUST pick exactly one winner. Reply with ONLY the winner's full name exactly as written above. No explanation, no punctuation, no other text — just the full name.
loading…—
Meet the voters.
Five small models, 7 to 14 billion parameters each (that’s the B, and the bar on each card). The closed models from the big labs are estimated at 100 to 1000 times that size; this whole experiment ran on one Mac without noticeably warming the ocean. Each voter has a basketball agenda.
llama3.1 : 8B01
The NBA Twitter guy.
Top 3: Jokić · Giannis · Jordan
Heaviest recency lean. Luka top-5, Donovan Mitchell top-20.
mistral : 7B02
Your dad at the cookout.
Top 3: Russell · Jordan · Jerry West
Mid-century Celtics lean. James Worthy #7. Disagrees with everyone, but might have a point?
qwen2.5 : 7B03
The Embiid stan.
Top 3: Jordan · Embiid · LeBron
Made Joel Embiid the second-best player in NBA history. I’m as surprised as you.
phi4 : 14B04
The analyst.
Top 3: Kareem · Jordan · LeBron
Centrist takes. Agrees with gemma3 80% of the time, the highest pair in the room.
gemma3 : 12B05
The median voice.
Top 3: Kareem · Larry Bird · Jordan
Old-school lean. Sneaks Mikan and Elgin Baylor into the top 20.
The consensus zone.
If you put every model’s ballot on the wall, a small group of names show up on every ballot, in roughly the same neighborhood. These are the players everyone loves.
The 11 players in all five judges' top-20, sorted by mean rank. The bar spans each player's best-to-worst rank across models; the dot is the mean. Jordan barely moves (1–3); names lower down rattle around.
The chaos zone.
Past the top tier, the ladder breaks. Era-wars open up. Models start disagreeing by hundreds of Elo points on the same player.
Average Elo of players grouped by NBA debut decade. gemma3 rates the 1950s–60s highest (old-school);
llama3.1 and qwen2.5 climb toward the 2010s (recency). Dot size = players in that decade; the 2020s point is Anthony Edwards alone (n=1).
The six players with the widest Elo spread across the five judges. Lamar Odom runs from 729 to 1836: bench-warmer to borderline star, depending who you ask. Each dot is one model
(gemma3, llama3.1, mistral, phi4, qwen2.5). Drop in any player above to see where their spread lands.
Model (dis)agreement.
Most of the models get along, but mistral is contrarian.
Each rope is how often two judges picked the same winner across all 10,296 shared matchups; thicker = more agreement, and each judge's dot is sized by its average agreement with the room. Hover a rope or a dot for the number.
phi4 ↔ gemma3 is the steadiest pair at 80%; every line touching mistral is thin (all below 71%).
Five ballots, side by side.
Each judge’s top 15, with the combined ladder (every vote pooled into one shared Elo) on the far right. Flat lines are consensus. Names that dive between columns, or show up on only one ballot, are the most contentious.
Each model's top 15, plus the combined ladder. Hover a name to trace it through the room.
The verdicts.
Time to grade the room. Would I listen to them on the radio or change the channel?
llama3.1 : 8B01
Does not know ball*
*Maybe if the model was 15 years old, but Jokić at #1 is indefensible, LeBron sits at #7, and Donovan Mitchell at #20 is looney tunes.
mistral : 7B02
Remembers ball
Russell first, Jerry West third, James Worthy seventh. An impeccable ballot, graded on a forty-year delay. I’d leave it on for the stories.
qwen2.5 : 7B03
Does not know ball
Jordan at #1 was a promising start. Joel Embiid at #2 ended the interview.
phi4 : 14B04
Knows ball
Kareem, Jordan, LeBron. The one ballot in the room you could read on the air without apologizing.
gemma3 : 12B05
Almost knows ball
A defensible podium, and the George Mikan homework is charming. But LeBron at #5, behind Wilt and Bird? Changing the channel.
Final count: one keeper, one history hour, three channel changes. So, does tiny AI know ball? phi4 does. Everyone else is just good(?) radio.
Why the takes are like this.
They’re small. 7 to 14 billion parameters is a fraction of a frontier model. These models know less about basketball and everything else.
Order might matter. I only asked each matchup in one direction, and LLMs can be sensitive to which name comes first. I didn’t measure that here, so treat the rankings as a strong signal of model personality.
The reading list skews recent. It wouldn’t surprise me if much of what these models consumed was written in the last decade, which would explain the boosts for currently good players over historically great ones. It does not explain the general LeBron hate (the best player ever).