Sales Lab · For model providers

The conversation-quality eval for voice models.

MMLU, HumanEval, GSM8K — there’s no industry-standard eval for “does this model converse like a real buyer?” We built one. Point your next voice model at the same rubric we grade production sales calls against, and get a single conversation-quality score back.

The set

500 scenario-buyer pairs across 22 verticals and 8 inverse modes. Calibrated against Sales Lab’s production scoring set + reviewed by 3 working CROs.

The rubric

10 dimensions: opening, rapport, discovery, active listening, objection handling, value articulation, control of call, tone-pace-confidence, next-step close, staying on message. Plus call-pressure micro-tells: filler density, interrupt frequency, and micro-sighs.

The output

Single 0-1000 score per model + per-dimension breakdown + qualitative diff vs prior version. Standard report ships under NDA; public summary (if you want to publish) ships separately.

The constraint

We score the model on being a BUYER, not a coach. The model has to BE the person on the other end of the phone, not score the person on the phone.

What it costs

Priced per eval, sized to scope. Includes scenario set access, rubric judge model, comparative report vs your prior version, and an embargoed delivery window. A side-by-side public report is an optional add-on. We share the number on a call.

Get on the schedule

Email eval@saleslab.cloud with the model + the target eval window. We hold a limited number of evaluation slots open each quarter and schedule on a first-come basis.

← API & webhooks