November 20, 2025

Disagreement Predicts Startup Success

Informational Guide

8 Min Read

The study in plain language

Why polarizing ideas are the ones that matter

What evaluators actually disagree on

How venture currently misuses disagreement

Turning disagreement into an evaluation primitive

Re-designing the room, not just the spreadsheet

Beyond competitions: the meta-lesson

Learn to love the argument

When the room splits, pay attention.

Every investor we talk to at IPO Genie has lived through that moment. You walk out of an investment committee where one partner is glowing about a startup and another is visibly annoyed you even brought it in. One calls it “category-defining.” Another calls it “a feature, not a company.” The associate quietly suggests “too controversial, let’s pass” and everyone feels relieved.

You just killed what might have been your best deal.

A new paper from MIT Sloan by Luca Gius puts hard numbers behind that uncomfortable intuition: when informed evaluators disagree about a startup, that startup is more likely to succeed. Not less. And the effect survives all the usual explanations that would make this a simple “high risk, high reward” story.

The industry norm of worshipping consensus is both lazy and structurally “anti-alpha.”

The IPO Genie team wrote this article for the people who actually sit in those rooms: partners, angels, accelerator managers, competition organizers. The ones who have a vote. The ones who love to talk about being “contrarian and right” while designing processes that systematically filter out everything that feels contrarian.

This article should also inspire founders that are undergoing fundraising to not allow naysayers detract them from their mission.

Let’s dissect what this research actually shows, why it makes deep strategic sense, and how to re-wire your evaluation machinery so you stop treating disagreement as a bug and start treating it as a data stream.

The study in plain language

Forget the mythology for a moment and focus on the plumbing.

Gius builds his analysis on data from 67 venture competitions and 118 funding rounds, covering 2,650 startups. These are not back-of-the-napkin impressions. Most competitions use detailed rubrics with up to twenty-plus criteria such as team, product, market, business model, scalability, and so on. Each startup is scored on a 1–7 scale by at least three judges, typically more, who are not random passersby but former founders, angels, VCs and industry experts.

Then the paper does something simple and almost embarrassingly rare in venture: it actually follows these companies for years. Competition records are matched to Crunchbase and PitchBook to see what happens afterwards, how much money they raise, whether they hit at least one million in annual revenue, whether they exit, how prominent they become, and whether they ultimately shut down.

Two numbers from the raw scoring patterns already tell a story.

First, most of the variation in scores is within startups, not between them. In other words, judges looking at the same company see very different things. Second, when you compute a standard inter-rater reliability metric like Cohen’s kappa, you get a value that in medical research would signal “we basically don’t agree at all.”

This is the world you operate in. You are not running a neat, tight diagnostic instrument. You are running a noisy, political swarm of human priors.

Gius compresses that swarm into two variables for each startup in each competition:

The average score across judges, which is your usual proxy for perceived quality.
The dispersion of scores, which is a measure of how far those individual scores are from that average.

Imagine two startups. Both have an average score of 4.0. The first gets straight 4s from everyone. The second gets a 2, a 3, a 5, a 6, maybe another 3. Same average. Completely different shape.

Industry practice treats them as equivalent, or sometimes treats the second as “too polarizing.” The paper shows that this is precisely where you are leaving money on the table.

Holding the average score constant, and controlling for competition effects, industry, founding year, even whether the startup already had a patent, higher disagreement predicts more future funding, higher odds of hitting meaningful revenue, better prominence metrics, and greater exit potential. The entrepreneurs whose ideas split the room end up, on average, doing better than the ones who generate polite agreement.

Crucially, this is not just volatility. More disagreement does not predict a higher probability of shut-down. You get more upside without a fatter left tail of total disaster. The paper explicitly tests the “maybe these are just riskier” explanation and finds that a simple risk story does not fit the data.

In other words, dispersion is not just noise. It contains signal. The uncomfortable, argumentative, “I don’t see it” slot on your IC agenda is where the power law hides.

Why polarizing ideas are the ones that matter

Once you get over the surprise, the logic is almost painfully straightforward.

Startups are not lottery tickets. They are compressed theories of how the world works, specific claims about technology, behavior, regulation and timing. Founders pursue opportunities because they believe these theories, often in direct contradiction to what “everyone knows.”

Now layer strategy on top of that. Competitive advantage only exists when you do something that your rivals either cannot do or do not believe is worth doing. If everyone understands and agrees with your theory of the world on day one, then by definition that theory cannot be a durable edge. It might lead to a business, but not to outlier returns. Common opinion is not a source of alpha.

Gius connects this to what strategy scholars call the “uniqueness paradox.” The more unique and powerful a strategy is, the harder it is to evaluate through existing mental models. It clashes with pattern recognition. It does not fit the spreadsheet. It triggers discomfort. In a group setting, discomfort shows up as disagreement.

To test whether this is really about uniqueness rather than some vague halo of “spiciness,” the paper goes a step further. It builds a text-based distinctiveness score for every startup description. Using modern language embeddings, the same kind of models powering large language systems, each pitch is embedded in a high-dimensional space. The more distant a description is from all the others, the more distinctive the proposition.

Two results drop out:

First, more distinctive startups attract more disagreement. When a value proposition genuinely differs from the rest of the pack, judges’ priors scatter.

Second, and more important in practice, the predictive power of disagreement almost disappears among the least distinctive half of the startups. For “normal” ideas, variance is mostly just noise. For truly distinct ideas, variance becomes a powerful predictor of upside.

So the mechanism is not “any argument is good.” It is “when the idea is fundamentally different, arguments are a necessary side-effect of value discovery.” You cannot have non-consensus alpha and consensus comfort at the same time.

You already know this at the anecdote level. Airbnb, Uber, Bitcoin and Stripe did not sail through evaluation committees on unanimous votes. The point of the paper is that this is not just folklore. It is a pattern that survives serious statistical beating.

‍

What evaluators actually disagree on

If you have ever sat in a pitch session, you can probably guess where the arguments cluster. The data lines up with that intuition in a very clean way.

Because many competitions use structured rubrics, Gius can decompose disagreement by dimension. Judges are surprisingly aligned on founding teams. They tend to converge on whether a team “looks strong”: background, coherence, storytelling, basic execution competence. There are skirmishes here and there, but not the deep schisms.

The real disagreements erupt around strategy. Business model. Scalability. Pricing logic. The power of incumbents. The shape of the downstream value chain. Those abstract, uncomfortable questions about how this thing actually makes money and whether that engine can keep running once competitors wake up.

This is exactly where you want diversity of mental models. Evaluating a novel strategy is not a mechanical exercise. Different judges lean on different priors: different experiences with platform plays, different scars from regulatory battles, different readings of where a technology curve is headed. Of course they diverge.

The paper adds another nice twist. After a startup is granted its first patent, disagreement about it tends to increase. Patents are a crude but practical proxy for technological novelty. Once something clearly “new” is in play, people’s stories about commercial potential fan out rather than converge.

Finally, the judges themselves are not interchangeable. By linking evaluators to their professional histories, Gius shows that former founders are significantly more likely to deviate from the consensus score on a given startup. Their grades sit further away from the panel average than those of non-founders. Traditional badges of sophistication like an MBA or a PhD do not explain the same pattern.

This is not surprising. People who once staked several years of their life on a non-consensus theory of the world are more comfortable diverging again. They are used to being the only one in the room who sees something. When you invite them into your evaluation process, you are not just getting “operator insight.” You are literally importing structured disagreement.

You can either treat that as a nuisance to be smoothed out, or as a sensor to be amplified.

‍

How venture currently misuses disagreement

Walk through the typical flow of a contentious deal.

The partner who loves it is usually forced into a defensive posture. They are asked to justify why they are “pushing so hard” for something that “smart people clearly have doubts about.” The burden of proof shifts onto the champion. The rest of the room, consciously or not, is optimizing for not being wrong alone.

In that environment, disagreement is framed as a red flag. “If we can’t get comfortable as a partnership, we shouldn’t do it.” “If we are this split now, imagine the board dynamics later.” “If it were really that good, it would be easier to explain.”

Behind the language of prudence there is a very simple dynamic. Professional investing is a career. Careers are governed by social risk, not expected value. Being wrong with the group is tolerable. Being wrong against the group is career-threatening. So structures evolve that ritualize consensus and pathologize disagreement.

The Gius paper effectively quantifies the cost of that risk management. The deals that make you argue hardest, when the idea is genuinely distinct, are the ones that have a disproportionate share of the upside. You are actively pruning the right tail of your return distribution to make your Monday meetings feel less awkward.

The usual defense is: “maybe this is just risk.” High disagreement sounds like high uncertainty, and high uncertainty sounds like more crashes. But the data does not back that up. Polarizing startups are not more likely to shut down. They are simply more likely to do something interesting if they survive.

In other words, your aversion is not to risk. It is to narrative dissonance. You are allergic to being in a room where multiple mentally coherent stories about the same company coexist. So you flee to situations where everyone tells the same story, even when that story is mediocre.

From a portfolio perspective, that is insanity. From an individual career perspective, it is perfectly rational. Unless you deliberately hack your process.

‍

‍

Turning disagreement into an evaluation primitive

So what do you do if you are serious about harvesting this signal rather than just admiring it?

The answer is not to fetishize contrarianism, or to start green-lighting any company that provokes a fight. The answer is to treat disagreement and distinctiveness as first-class data in your funnel and to design governance that can actually act on that data.

At a minimum, this means three things:

Stop averaging away dispersion. Track, for every decision context, not just how high a company scored but how much the scores varied.
Condition on distinctiveness. For companies whose propositions look like everything else in your pipeline, treat disagreement as noise. For propositions that are textually and conceptually distinct, treat disagreement as a reason to lean in and investigate.
Protect the right to champion. In highly distinctive, highly polarizing cases, build explicit rules that allow a small number of deeply informed champions to carry a deal forward against group discomfort, subject to portfolio-level guardrails rather than social approval.

None of this requires exotic tooling. Most firms already collect some form of structured feedback, even if it is just scorecards in Google Sheets and free-text IC memos. You can compute simple dispersion measures with the same effort it takes to compute averages. You can approximate distinctiveness by embedding pitch descriptions with off-the-shelf language models and measuring distance to your own historical corpus of deals.

The crucial shift is conceptual. You stop asking, “Can we all get comfortable with this?” and start asking, “Where are we seeing structured disagreement on genuinely novel theories of the world, and do we have a mechanism to take those seriously?”

Once you think about your pipeline this way, it rearranges itself.

You can see four rough buckets emerge. The bulk of applications are low-distinctiveness and low-disagreement, generic ideas everyone vaguely agrees are fine. A smaller slice is low-distinctiveness, high-disagreement, likely noise, personality clashes, or misunderstandings. Then you have high-distinctiveness, low-disagreement, the rare obvious rocket ships. And finally, high-distinctiveness, high-disagreement, the uncomfortable quadrant where a non-trivial portion of your future fund performance is hiding.

Right now, most processes structurally overfund the first bucket and structurally underfund the last. You do not need a PhD in econometrics to see the problem. You just needed someone to run the numbers and prove the intuition is not survivorship-bias storytelling. That is what this paper does and what we are building IPO Genie to do for you.

‍

Re-designing the room, not just the spreadsheet

You can compute all the dispersion metrics you like and still end up killing the same deals if the social architecture of your decision-making stays intact. The room itself needs to change.

One immediate implication of the study is that you should stop treating ex-founders as decorative. If former entrepreneurs are more likely to diverge from consensus, then you can either sand them down or you can harness them. Put them on panels exactly where you expect unique strategic plays. Give their dissent explicit weight. Track, over time, the ex-ante disagreements versus ex-post outcomes and be honest about who is actually seeing around corners.

Similarly, you might want to be more deliberate about which dimensions you force convergence on. It is reasonable to require baseline alignment on team integrity, blatant fraud risks, or total addressable market that is not obviously tiny. These are hygiene factors. But on business model, on go-to-market, on ecosystem design, on the deeper theory of change, you want friction. You want multiple incompatible narratives to clash in the open, because that is where the insight lives.

This also has consequences for how you write memos. Most internal documents are crafted to make a case, not to represent uncertainty. They present a single narrative arc. A disagreement-aware process would explicitly log the existence and nature of dissenting views. Not in a performative “Devil’s Advocate” closing paragraph, but as a central part of the thesis:

What is the champion’s theory of the world?
What are the strongest alternative theories held inside the firm about the same data?
On which concrete questions do smart people diverge, and what evidence could resolve that divergence over the next 12–24 months?

That last question matters because it ties disagreement to learning. If you document ex-ante where you expected the world to move, and who believed what, you can later run an honest audit. Not in the crude sense of “who picked the unicorn,” but in the more useful sense of “who consistently saw upside in distinct theories before the consensus moved.”

Over a decade, that is how you find your real edge. Not through slogans about being contrarian, but through tracked, repeated, measured divergence that correlates with outcomes. Gius’ work is essentially a large-scale, anonymized audit of that dynamic across many evaluators and many competitions.

‍

Beyond competitions: the meta-lesson

The dataset in the paper comes from venture competitions and early-stage funding programs. That might tempt some to shrug and say, “Nice, but we run a serious fund, not a pitch contest.” That would be a mistake.

The underlying mechanism has nothing to do with prize money or demo-day theatrics. It has to do with what happens whenever a group of informed humans evaluates unconventional ideas under uncertainty. The same pattern is likely present in corporate innovation committees, grant panels, accelerator selection, even public R&D funding.

Wherever you have:

a flood of proposals,
a small number of actual hits,
and a strong cultural bias toward consensus,

you can safely assume that disagreement is being treated as an error term instead of as a possible glimpse of non-obvious value.

This is the broader significance of the research. It is not “a hack for winning student competitions.” It is an empirical wedge into how institutions handle originality. It quantifies how much novelty gets lost because our processes are tuned for comfort, not discovery.

‍