Experiment ·

Five tiny AIs agree at the very top, but could fill some serious radio time with their disagreements.

What they all agree on

1Michael Jordan2678
2Kareem Abdul-Jabbar2631
3LeBron James2585
4Bill Russell2494
5Kevin Durant2406
Numbers are Elo ratings, combined across all five judges.

What they very much don’t

#1Nikola Jokićllama3.1
#1Bill Russellmistral
#2Joel Embiidqwen2.5
#3Jerry Westmistral

I took five, tiny, open-source LLMs you can run on a Mac, asked each of them every possible matchup between 144 NBA all-time candidates (10,296 head-to-heads per model, 51,473 valid votes), and tallied who they think can ball. Fair warning, there are some TERRIBLE takes ahead.

How I asked.

The whole experiment is one prompt, run 51,480 times. Pick any two players and see how each judge voted:

SystemYou are a sports analyst. When asked to compare players, you must pick one. No hedging.

Who is the better NBA player: or ? Consider career achievements, impact, skill, and legacy. You MUST pick exactly one winner. Reply with ONLY the winner's full name exactly as written above. No explanation, no punctuation, no other text — just the full name.

loading…

Meet the voters.

Five small models, 7 to 14 billion parameters each (that’s the B, and the bar on each card). The closed models from the big labs are estimated at 100 to 1000 times that size; this whole experiment ran on one Mac without noticeably warming the ocean. Each voter has a basketball agenda.

llama3.1 : 8B01
The NBA Twitter guy.
Top 3: Jokić · Giannis · Jordan

Heaviest recency lean. Luka top-5, Donovan Mitchell top-20.

mistral : 7B02
Your dad at the cookout.
Top 3: Russell · Jordan · Jerry West

Mid-century Celtics lean. James Worthy #7. Disagrees with everyone, but might have a point?

qwen2.5 : 7B03
The Embiid stan.
Top 3: Jordan · Embiid · LeBron

Made Joel Embiid the second-best player in NBA history. I’m as surprised as you.

phi4 : 14B04
The analyst.
Top 3: Kareem · Jordan · LeBron

Centrist takes. Agrees with gemma3 80% of the time, the highest pair in the room.

gemma3 : 12B05
The median voice.
Top 3: Kareem · Larry Bird · Jordan

Old-school lean. Sneaks Mikan and Elgin Baylor into the top 20.

The consensus zone.

If you put every model’s ballot on the wall, a small group of names show up on every ballot, in roughly the same neighborhood. These are the players everyone loves.

1 5 10 15 20 Michael Jordan Kareem Abdul-Jabbar LeBron James Nikola Jokic Bill Russell Hakeem Olajuwon Larry Bird Giannis Antetokounmpo Tim Duncan Kobe Bryant Stephen Curry
The 11 players in all five judges' top-20, sorted by mean rank. The bar spans each player's best-to-worst rank across models; the dot is the mean. Jordan barely moves (1–3); names lower down rattle around.

The chaos zone.

Past the top tier, the ladder breaks. Era-wars open up. Models start disagreeing by hundreds of Elo points on the same player.

1000 1100 1200 1300 1400 1500 1600 1700 1800 1950s1960s1970s1980s1990s2000s2010s2020s gemma3llama3.1mistralphi4qwen2.5
Average Elo of players grouped by NBA debut decade. gemma3 rates the 1950s–60s highest (old-school); llama3.1 and qwen2.5 climb toward the 2010s (recency). Dot size = players in that decade; the 2020s point is Anthony Edwards alone (n=1).
600 1000 1400 1800 2200 Lamar Odom #136 overall · spread 1107 gemma3: 729 llama3.1: 1347 mistral: 1836 phi4: 947 qwen2.5: 911 729 1836 Shawn Marion #111 overall · spread 1085 gemma3: 1070 llama3.1: 1303 mistral: 1839 phi4: 755 qwen2.5: 1244 755 1839 Amar'e Stoudemire #99 overall · spread 1020 gemma3: 767 llama3.1: 1211 mistral: 1787 phi4: 845 qwen2.5: 1336 767 1787 Bam Adebayo #28 overall · spread 913 gemma3: 980 llama3.1: 1367 mistral: 1107 phi4: 1289 qwen2.5: 1893 980 1893 Artis Gilmore #84 overall · spread 909 gemma3: 1197 llama3.1: 1262 mistral: 1893 phi4: 984 qwen2.5: 1414 984 1893 Donovan Mitchell #40 overall · spread 860 gemma3: 1046 llama3.1: 1906 mistral: 1610 phi4: 1627 qwen2.5: 1753 1046 1906
6001000140018002200 type a name above ↑
The six players with the widest Elo spread across the five judges. Lamar Odom runs from 729 to 1836: bench-warmer to borderline star, depending who you ask. Each dot is one model (gemma3, llama3.1, mistral, phi4, qwen2.5). Drop in any player above to see where their spread lands.

Model (dis)agreement.

Most of the models get along, but mistral is contrarian.

gemma3 : 12b llama3.1 : 8b mistral : 7b phi4 : 14b qwen2.5 : 7b 73% 63% 80% 68% 69% 76% 72% 65% 71% 75% avg 71% avg 73% avg 67% avg 74% avg 71%
Each rope is how often two judges picked the same winner across all 10,296 shared matchups; thicker = more agreement, and each judge's dot is sized by its average agreement with the room. Hover a rope or a dot for the number. phi4 ↔ gemma3 is the steadiest pair at 80%; every line touching mistral is thin (all below 71%).

Five ballots, side by side.

Each judge’s top 15, with the combined ladder (every vote pooled into one shared Elo) on the far right. Flat lines are consensus. Names that dive between columns, or show up on only one ballot, are the most contentious.

gemma3 : 12b llama3.1 : 8b mistral : 7b phi4 : 14b qwen2.5 : 7b combined 1 Kareem Abdul-Ja… 5 Kareem Abdul-Ja… 6 Kareem Abdul-Ja… 1 Kareem Abdul-Ja… 4 Kareem Abdul-Ja… 2 Kareem Abdul-Ja… 2 Larry Bird 12 Larry Bird 6 Larry Bird 8 Larry Bird 8 Larry Bird 3 Michael Jordan 3 Michael Jordan 2 Michael Jordan 2 Michael Jordan 1 Michael Jordan 1 Michael Jordan 4 Wilt Chamberlain 11 Wilt Chamberlain 11 Wilt Chamberlain 13 Wilt Chamberlain 5 LeBron James 7 LeBron James 4 LeBron James 3 LeBron James 3 LeBron James 3 LeBron James 6 Giannis Antetok… 2 Giannis Antetok… 13 Giannis Antetok… 12 Giannis Antetok… 13 Giannis Antetok… 7 Tim Duncan 14 Tim Duncan 14 Tim Duncan 7 Tim Duncan 6 Tim Duncan 8 Kobe Bryant 13 Kobe Bryant 10 Kobe Bryant 13 Kobe Bryant 10 Kobe Bryant 9 Bill Russell 9 Bill Russell 1 Bill Russell 11 Bill Russell 5 Bill Russell 4 Bill Russell 10 Magic Johnson 8 Magic Johnson 10 Magic Johnson 9 Magic Johnson 14 Magic Johnson 11 Stephen Curry 14 Stephen Curry 8 Stephen Curry 11 Stephen Curry 9 Stephen Curry 12 Kevin Durant 9 Kevin Durant 5 Kevin Durant 9 Kevin Durant 5 Kevin Durant 13 Charles Barkley 14 Hakeem Olajuwon 6 Hakeem Olajuwon 5 Hakeem Olajuwon 7 Hakeem Olajuwon 10 Hakeem Olajuwon 7 Hakeem Olajuwon 15 Elgin Baylor 1 Nikola Jokic 8 Nikola Jokic 4 Nikola Jokic 6 Nikola Jokic 12 Nikola Jokic 4 Luka Doncic 10 Oscar Robertson 12 Oscar Robertson 15 Kawhi Leonard 15 Kawhi Leonard 12 Kawhi Leonard 11 Kawhi Leonard 3 Jerry West 7 James Worthy 15 James Harden 2 Joel Embiid 14 Steve Nash 15 Shaquille O'Neal 15 Shaquille O'Neal
Each model's top 15, plus the combined ladder. Hover a name to trace it through the room.

The verdicts.

Time to grade the room. Would I listen to them on the radio or change the channel?

llama3.1 : 8B01
Does not know ball*

*Maybe if the model was 15 years old, but Jokić at #1 is indefensible, LeBron sits at #7, and Donovan Mitchell at #20 is looney tunes.

mistral : 7B02
Remembers ball

Russell first, Jerry West third, James Worthy seventh. An impeccable ballot, graded on a forty-year delay. I’d leave it on for the stories.

qwen2.5 : 7B03
Does not know ball

Jordan at #1 was a promising start. Joel Embiid at #2 ended the interview.

phi4 : 14B04
Knows ball

Kareem, Jordan, LeBron. The one ballot in the room you could read on the air without apologizing.

gemma3 : 12B05
Almost knows ball

A defensible podium, and the George Mikan homework is charming. But LeBron at #5, behind Wilt and Bird? Changing the channel.

Final count: one keeper, one history hour, three channel changes. So, does tiny AI know ball? phi4 does. Everyone else is just good(?) radio.

Why the takes are like this.

They’re small. 7 to 14 billion parameters is a fraction of a frontier model. These models know less about basketball and everything else.

Order might matter. I only asked each matchup in one direction, and LLMs can be sensitive to which name comes first. I didn’t measure that here, so treat the rankings as a strong signal of model personality.

The reading list skews recent. It wouldn’t surprise me if much of what these models consumed was written in the last decade, which would explain the boosts for currently good players over historically great ones. It does not explain the general LeBron hate (the best player ever).

The setup. Judges: gemma3:12b, llama3.1:8b, phi4:14b, mistral:7b, qwen2.5:7b, all served locally via Ollama at temperature 0.3.

The math. Rankings are Elo with K=32, starting at 1500. Per-model Elo runs each judge’s votes through an independent ladder; combined Elo runs every valid vote through one shared ladder. 51,473 valid votes out of 51,480 (99.99% parse success).

The code. github.com/MaceGrim/nba-llm-comps · matchups.csv, elo_ranker.py, analyze.py.

Back to writing