Sales Lab · For model providers
The conversation-quality eval for voice models.
MMLU, HumanEval, GSM8K — there’s no industry-standard eval for “does this model converse like a real buyer?” We built one. Anthropic, OpenAI, Mistral, Meta — pay us to test your next voice model against the same rubric every Fortune 500 sales org grades their reps with.
The set
500 scenario-buyer pairs across 22 verticals and 8 inverse modes. Calibrated against Sales Lab’s production scoring set + reviewed by 3 working CROs.
The rubric
10 dimensions: opening, rapport, discovery, active listening, objection handling, value articulation, control of call, tone-pace-confidence, next-step close, staying on message. Plus realism micro-tells (filler density, interrupt frequency, micro-sighs).
The output
Single 0-1000 score per model + per-dimension breakdown + qualitative diff vs prior version. Standard report ships under NDA; public summary (if you want to publish) ships separately.
The constraint
We score the model on being a BUYER, not a coach. The model has to BE the person on the other end of the phone, not score the person on the phone.
What it costs
$150k per full eval. Includes scenario set access, rubric judge model, comparative report vs your prior version, embargoed delivery window. Side-by-side public report +$40k.
Get on the schedule
Email eval@saleslab.cloud with the model + the target eval window. We hold two quarterly slots open for new entrants; renewing customers (Anthropic / OpenAI today) get priority.