Sales Lab · For model providers
The conversation-quality eval for voice models.
MMLU, HumanEval, GSM8K — there’s no industry-standard eval for “does this model converse like a real buyer?” We built one. Point your next voice model at the same rubric we grade production sales calls against, and get a single conversation-quality score back.
The set
500 scenario-buyer pairs across 22 verticals and 8 inverse modes. Calibrated against Sales Lab’s production scoring set + reviewed by 3 working CROs.
The rubric
10 dimensions: opening, rapport, discovery, active listening, objection handling, value articulation, control of call, tone-pace-confidence, next-step close, staying on message. Plus call-pressure micro-tells: filler density, interrupt frequency, and micro-sighs.
The output
Single 0-1000 score per model + per-dimension breakdown + qualitative diff vs prior version. Standard report ships under NDA; public summary (if you want to publish) ships separately.
The constraint
We score the model on being a BUYER, not a coach. The model has to BE the person on the other end of the phone, not score the person on the phone.
What it costs
Priced per eval, sized to scope. Includes scenario set access, rubric judge model, comparative report vs your prior version, and an embargoed delivery window. A side-by-side public report is an optional add-on. We share the number on a call.
Get on the schedule
Email eval@saleslab.cloud with the model + the target eval window. We hold a limited number of evaluation slots open each quarter and schedule on a first-come basis.