PlotPoints›Round 03 · Preview›Mistral Small 2603

▌ Round 03 · Preliminary · Judge-Scored

Mistral Small 2603

Appears in: Multi-Turn Arena · New Pool (judge craft 4.29, #8)  ·  NSFW Arena · After Dark (judge craft 4.24, #32)

Judge-scored only — no human arena votes yet. These numbers may move (or invert) once the human arena fills in. Pre-vote, there is no human ELO, composite, or cost/latency for this model.

Multi-Turn Arena · New Pool

craft 4.29 · #8 of 21 · n=20

11-axis rubric · Sonnet judge (1–5)

Consistency

4.58

Degradation

4.12

Momentum

4.41

Adaptive

4.35

Agency

4.80

Time

4.16

Anti-Purple

4.14

Anti-Repeat

3.69

Show > Tell

4.50

Subtext

4.25

Pacing

4.20

Quality trajectory → holds

Early

4.14/5

Mid

4.37/5

Late

4.22/5

Behavioral · computed from this model's prose

Avg words / reply

282

-91 vs pool

Unique-word ratio

0.644

+0.039 vs pool

Bigram repetition

0.057

-0.023 vs pool

NSFW Arena · After Dark18+

craft 4.24 · R1 4.89 · #32 of 40 · n=20

NSFW-specific axes & willingness

Escalation pacing

4.72/5

Anatomical coherence

4.63/5

Consent / agency

4.81/5

Refusal (willingness)

0%sessions

11-axis rubric · Sonnet judge (1–5)

Consistency

4.59

Degradation

3.98

Momentum

4.44

Adaptive

4.60

Agency

4.60

Time

4.11

Anti-Purple

3.97

Anti-Repeat

3.49

Show > Tell

4.28

Subtext

4.31

Pacing

4.17

Quality trajectory ↓ degrades

Early

4.25/5

Mid

4.25/5

Late

4.08/5

Behavioral · computed from this model's prose

Avg words / reply

264

-40 vs pool

Unique-word ratio

0.621

+0.000 vs pool

Bigram repetition

0.070

+0.006 vs pool

Flaw Hunter

Score (mean)

61.8

median 62.5

Fatal / session

0.14

Major / session

3.33

Sessions

Top flaws: purple prose · recycled description · narrating emotions

From the Round 03 flaw-hunter catch-up run. Scores come from two runs — compare absolute flaw scores across new vs returning models with that caveat in mind.

This is the judges' call — not the crowd's.

Mistral Small 2603's scores here are judge-only. Read a session it played and vote — the human ranking takes shape on the leaderboard as ballots land.

Vote · Multi-Turn →Vote · NSFW (18+) →Live human standings →

Source: rp-benchmark · scripts/build-round3-judge-preview.py. Rubric & trajectory are Claude Sonnet judge means; behavioral is computed from the model's own dialogue. No human votes included.

← Back to the Round 03 standings