AI Lead Scoring vs. the Sales Team's Gut

San Francisco · April 3, 2025

The VP of Sales — this is before we killed the sales team, back when we still had one — sits across from me in the cafeteria. He's eating a burrito and I'm trying to explain logistic regression, and neither of us is enjoying the conversation.

"So you want to replace my team's judgment with an algorithm," he says.

"I want to supplement it," I say.

"Same thing."

He's not wrong, and we both know it. The experiment I'm proposing is straightforward: build an AI-powered lead scoring model and run it alongside the sales team's manual lead qualification for sixty days. Compare which one better predicts which leads will close. Let the data decide.

Kevin — the VP of Sales — has been doing this for twelve years. He can look at a lead and tell you, within thirty seconds of scanning their LinkedIn and company website, whether they're going to buy. He says it's intuition. I think it's pattern matching — the same thing an AI does, just slower and with more biases.

"Sixty days," I say. "If the model doesn't outperform your team, I'll kill it and buy everyone dinner."

"And if it does?"

"Then we talk about how to use it."

He takes a bite of his burrito and considers this. "Fine. But my team doesn't change their process. They score leads the way they always have. Your robot does its thing separately. We compare at the end."

"Deal."

"You want to replace my team's judgment with an algorithm." "I want to supplement it." "Same thing." He's not wrong, and we both know it.

Building the Model

Priya builds the lead scoring model in two weeks. (She's the same analyst who later built the AI churn prediction model.) It's trained on eighteen months of historical lead data — 4,700 leads, of which 340 became paying customers. A 7.2% close rate, which is decent for B2B SaaS outbound.

The model ingests 47 features per lead. Some are basic firmographics: company size, industry, geography, funding stage. Some are behavioral: pages visited on our website, content downloaded, emails opened, webinar attendance. Some are enriched: the lead's seniority level, their company's tech stack (scraped from job postings and BuiltWith), how many competitors they're using (inferred from review sites).

The model outputs a score from 0 to 100, where 100 is "this lead is almost certainly going to buy" and 0 is "don't waste your time." Based on historical data, the model sets three tiers: Hot (score 70+), Warm (40-69), and Cold (below 40).

Meanwhile, Kevin's team continues scoring leads the way they always have. Each SDR reviews incoming leads and tags them as A, B, or C based on their judgment. A means "drop everything and call them now." B means "add to the sequence." C means "probably not worth it but log it anyway."

A is roughly equivalent to the model's Hot. B maps to Warm. C maps to Cold. The comparison isn't perfect, but it's close enough.

San Francisco, Week Three

Three weeks in, we have 312 new leads scored by both systems. Here's the preliminary data:

Agreement rate: 64%. The AI and the sales team agree on the tier classification for about two-thirds of leads. Where they agree, the close rate is predictable and unremarkable.

The interesting cases are the 36% where they disagree. These fall into two categories:

Category 1: AI says Hot, sales says Cold (or B). 42 leads. These are typically small companies or individual contributors at larger companies — the kind of leads that SDRs deprioritize because the deal size looks small. The AI flags them as Hot because their behavioral signals are strong: they've visited the pricing page multiple times, downloaded the API documentation, started a free trial. They're buying signals, but they don't look like traditional enterprise buyers.

Category 2: AI says Cold, sales says A. 29 leads. These are typically impressive-looking companies — big names, VP-level contacts, the kind of leads that make a salesperson's eyes light up. But their behavioral signals are weak: one website visit, no content engagement, came through a paid ad they probably clicked by accident. They look good on paper but haven't done anything that suggests genuine interest.

San Francisco, Day 60

Sixty days are up. We have full outcome data on 847 leads scored by both systems. Here are the results.

AI model accuracy: leads scored Hot closed at 23%. Warm: 8%. Cold: 1.4%. The spread between tiers is clear and consistent.

Sales team accuracy: leads scored A closed at 14%. B: 7%. C: 2.8%. The spread is narrower. The sales team's top tier closes at a lower rate than the AI's top tier, and their bottom tier closes at a higher rate. In other words, the sales team is less good at distinguishing between high and low quality leads.

The most revealing data point: of the 42 leads where the AI said Hot and the sales team disagreed, 11 closed. That's a 26% close rate — our highest of any segment. These were the leads the sales team would have deprioritized. The AI caught them because it saw behavior, not titles.

Of the 29 leads where the sales team said A and the AI disagreed, 2 closed. That's a 6.9% close rate — barely above baseline. These were the impressive-looking leads that never had real intent.

The 42 leads the sales team would have deprioritized? Eleven of them closed. That's a 26% close rate — our highest of any segment. The AI saw behavior. The sales team saw titles.

The Conversation

I present the results to Kevin and his team. I've been dreading this meeting because nobody likes being told that a machine does their job better. I've prepared the data carefully, focusing on the complementary angle: the AI and the team together would be better than either alone.

Kevin looks at the data for a long time. Then he asks the question I wasn't expecting: "What about the ones we both scored high?"

I pull up the data. Leads where both the AI and the sales team agreed on Hot/A status: 67 leads, 28 closed. A 41.8% close rate. That's extraordinary for outbound B2B. When the machine and the human agree, the signal is very strong.

"So use both," Kevin says.

This is smarter than what I had proposed. I was going to suggest replacing the sales team's scoring with the AI model. Kevin's suggestion is better: use both systems and focus the team's time on leads where both agree. For leads where only the AI scores high, automate the outreach (email sequences, retargeting) instead of having an SDR call. For leads where only the sales team scores high, investigate before investing time — the AI might be seeing something the team is missing.

We implement this system over the next two weeks. The results, over the following month:

SDR call volume drops 34% because they're no longer calling AI-identified Cold leads that looked good on paper. Close rate on calls increases from 7.2% to 12.1% because the remaining calls are higher quality. Revenue per SDR increases 28%.

The automated outreach for AI-only Hot leads (no sales team agreement) converts at 4.3%, which is lower than a human-called lead but costs essentially nothing per lead. It becomes a passive revenue stream — leads that would have been ignored entirely are now generating some revenue.

What Changed

The lead scoring experiment changed two things. The obvious one: we got better at prioritizing leads, which made the sales team more efficient and more effective. Revenue per rep went up. Wasted time went down. The math works.

The less obvious one: it changed how Kevin thinks about his team's role. Before the experiment, he saw his SDRs as hunters — finding and qualifying leads through instinct and experience. After the experiment, he sees them as closers — receiving pre-qualified leads and converting them through human connection and persuasion. The AI handles the qualifying. The humans handle the selling.

"I don't like it," Kevin tells me over beers after we review the thirty-day results. "But I can't argue with the numbers."

"What don't you like?"

"The feeling that the machine knows something my team doesn't. That it can look at a lead and see something we can't see."

"It's looking at 47 variables simultaneously. Your team looks at maybe five. It's not smarter — it's wider."

"Wider still feels like smarter when you're on the wrong side of it."

I buy him another beer. It's the least I can do.

Six months later, we'll kill the sales team entirely. Kevin will leave. The AI lead scoring model will keep running, feeding leads to an automated system and, eventually, to two CS managers who handle the human side. But that's a different story, one I've told separately.

The lead scoring experiment was the first crack. Not in the team's competence — they were good at what they did. In the economics that made that competence worth paying for. That's the thing about AI in sales. It doesn't make people worse. It makes them optional. And "optional" is a difficult thing to be in a company that's watching its burn rate.