The Same People Get Rejected Everywhere

A landmark study of 4 million job applications finds that when employers screen candidates with the same AI vendor, the same people get filtered out everywhere.

Jun 06, 2026

Diagram of three applicants applying to three jobs: every application routes through the same pymetrics screening model, which returns the same verdict for each person at every employer

There is a familiar kind of rejection that doesn’t feel like discrimination because it doesn’t feel like anything. There’s no trainwreck interview, no recruiter who stops replying—just silence, the kind of ghosting every job seeker learns to absorb as ordinary bad luck. That’s what makes it so hard to see: the harm arrives as a mundane absence, and absences don’t leave fingerprints.

If you’ve applied to a graduate scheme or an entry-level role at a big consulting firm, bank, or consumer-goods giant in the last few years, you may have brushed up against it. Somewhere after the résumé screen, you’re sent off to play a set of online games. You tap balloons, you remember sequences, you make snap decisions for money, and a model you never see decides, from how you played, whether you measure up on traits like “processing speed,” “risk tolerance,” and “altruism.”

Pymetrics, the vendor at the center of the study in question, was used to screen candidates at firms like JPMorgan, Unilever, BCG, Accenture, and Mastercard. (If you’ve never run into it, and I had not myself, that’s because it’s concentrated in high-volume hiring—campus recruiting and bulk entry-level pipelines—rather than the specialized or senior roles most mid-career people apply to.)

The games are incidental, a quirk of one vendor. The mechanism underneath them is not. Strip away the balloons and what you have is algorithmic scoring: a model, trained on data, assigning you a number that decides whether a human ever sees your application. That mechanism is generalizing fast, and it certainly won’t keep the shape of a game. It’s already showing up as résumé scoring, video-interview analysis, and assessment tools bolted onto the applicant-tracking software nearly every large employer now runs. Games are simply the version that happened to be studied first, so treat this paper less as a report on one screening fad and more as the first clear look at a pattern that’s coming for the rest of hiring in forms we haven’t named yet.

Now imagine you apply to three such jobs at three different companies. You play the games, or near-identical ones, three times. You hear nothing back, three times.

What you can’t see is that all three companies bought their screening from the same vendor, that some of them may be running the literally identical model on your data, and that your game scores (which the system stores and reuses for 330 days) are quietly producing the same verdict every time. You’re not getting three chances. You’re getting one verdict, delivered three times, by a machine that has decided you are a “do not recommend.”

A new paper presented this month at the ACM’s Fairness, Accountability, and Transparency conference—Algorithmic Monocultures in Hiring, by researchers at Stanford, Chapman, and Northeastern—is the first to actually watch this happen at scale. They obtained data from the hiring vendor pymetrics (acquired by the recruiting-software firm Harver in 2022) covering 4.2 million applications from 3.4 million applicants across 156 employers. It appears to be the first study ever to observe what a single applicant’s real algorithmic outcomes look like across multiple employers. And what it found should reframe how we think about both algorithmic bias and corporate transparency.

The measurement is the scandal

Over 90% of U.S. employers now use software to screen or rank applicants, and a handful of vendors supply most of it. When pymetrics audited its own system for bias—pooling together all the applications it had ever processed—it found nothing alarming. Across the whole dataset, Black applicants were selected at 52.5%, White applicants at 58.3%. The ratio between those rates clears the bar regulators use.

That bar is the “four-fifths rule” (sometimes the “4/5ths rule”). It comes from how the U.S. enforces employment discrimination: if a protected group is selected at less than 80% of the rate of the most-selected group, and the gap is statistically significant, that’s a red flag for adverse impact. Or rather, the legal term for a practice that’s neutral on its face but lands disproportionately on one group. pymetrics’ aggregate numbers passed.

Here’s the problem the researchers identify: that’s the wrong way to do the math. U.S. discrimination law operationalizes adverse impact per job, not across an entire labor market. A vendor sitting between 156 employers and millions of applicants had simply averaged everything together. And averaging is exactly what hides the harm.

The analogy the authors use is sharp: imagine an employer that recommends only men for doctor roles and only women for nursing roles. The overall selection rates by gender could come out identical (perfectly fair in aggregate) while every individual position is rigidly sorted by sex. Aggregation launders the discrimination out of the data.

When the researchers redid the analysis the legally correct way—each of the 1,746 positions examined separately—the clean bill of health collapsed. 10.6% of positions show adverse impact against Black applicants. Roughly a quarter of all applications submitted by Black applicants went to positions where the model worked against them. The same pattern hit Asian applicants. None of this was visible in the pooled numbers. The discrimination didn’t appear because of a new finding; it appeared because someone finally divided by the right denominator.

This matters beyond hiring, and it’s the part I’d underline for anyone who works on platform accountability: the unit of analysis is a political choice. Consolidate the data enough and any harm averages away. It seems that the fight over how finely you’re allowed to look is often the whole fight.

Opportunity has a single point of failure

Adverse impact is an old worry. The genuinely new finding is what the authors call systemic rejection, and it only exists in a world where everyone uses the same few tools.

Start with the word monoculture. In farming, it means planting a single crop across a whole region: efficient, until one blight arrives and there’s nothing to stop it, because every plant is identical and vulnerable in the same way. Algorithmic monoculture is the same idea applied to decisions. When most employers screen applicants through the same handful of vendors, they stop making independent judgments and start making one shared judgment, replicated everywhere. The diversity that used to protect you, the fact that a recruiter who passed on you at one firm had nothing to do with the recruiter at the next, quietly disappears. Hivemind HR.

Here’s why that matters for a single person. In an ordinary labor market, applying to ten jobs means ten separate rolls of the dice. Some land, some don’t; a rejection here says nothing about your chances there, because different people with different tastes are doing the deciding. That independence is the thing that makes “just keep applying” sound like reasonable advice. It’s what gives a job search the shape of a numbers game you can eventually win.

Monoculture breaks that. If the same model—or a set of closely related models trained the same way on the same kind of data—is scoring you each time, you’re not getting ten chances. You’re getting one verdict, returned ten times. And because pymetrics stores your game scores and reuses them for nearly a year, the verdict doesn’t even get the benefit of you having a better day. The researchers find exactly this in the data: among applicants who applied to ten positions, 4% were rejected by every single one. Far more than independent chance would produce. The decisions aren’t independent. They’re correlated, because they’re frequently the same decision wearing different company logos.

This is what systemic rejection means: not that some people get turned down a lot, but that the same people get turned down everywhere, for reasons baked into a profile they never see and can’t appeal. The authors have a blunter phrase for the people it happens to: they’ve been “algorithmically blackballed.” And the cruelty of it is that it’s invisible from the inside. Each individual rejection looks like ordinary bad luck. Only by tracking one applicant across many employers—which no one before this study could do—does the pattern resolve into something systematic.

The researchers test the obvious escape hatch. Could you just apply to more jobs? Because the algorithm is deterministic—run it twice on the same data and you get the same answer—they could simulate outcomes for jobs people never actually applied to. The better news: no one is universally unhireable; every simulated applicant was eventually recommended by some model. The damning news: to drive your odds of being shut out everywhere down to near zero, you’d need to apply to roughly 25 positions where 10 would have sufficed in an independent market. The machine doesn’t make you unemployable. It just makes opportunity scarcer, and quietly stacks that scarcity against the same people again and again.

That’s the structural picture, and it’s bigger than any one applicant. When a labor market routes its decisions through a single vendor, that vendor becomes a single point of failure for human opportunity. Get a “do not recommend” lodged in your stored profile, and you can be locked out of an entire sector without one human ever seeing your name.

Accountability by permission

Here is the part that should worry anyone who cares about holding these systems accountable: everything above exists only because pymetrics voluntarily handed over its data, under an agreement that—to the company’s real credit—couldn’t veto the findings. That is extraordinarily rare. Most hiring algorithms are black boxes. Researchers studying them are usually reduced to scraping public job boards or sending fake résumés, neither of which can see the algorithm actually deciding.

Now follow the incentive. A vendor shares data. Independent researchers find adverse impact and systemic rejection. The findings embarrass the vendor and its clients. The lesson every other vendor draws: never share data. The authors say so themselves—this very paper may discourage the kind of openness that made it possible.

That’s the trap, and it’s the throughline that connects this paper to every other fight over algorithmic accountability. Voluntary corporate transparency is self-extinguishing. The more useful the access, the stronger the reason to shut it off. You cannot build an accountability regime on the goodwill of the companies being held accountable. Which is why the only durable answer is to stop asking nicely—and the two jurisdictions that matter are moving in opposite directions.

The EU is building the scaffolding

Europe already has the two pieces this problem needs, even if neither is pointed directly at hiring yet.

The first is the precedent for forced access. The EU’s Digital Services Act requires very large online platforms to grant vetted researchers access to their data—a rule that exists precisely because a decade of voluntary cooperation in social media research had collapsed into nothing. The logic transfers cleanly: if we accept that researchers must be able to study the algorithms shaping public discourse, the algorithms deciding who gets to earn a living are not a lesser case.

The second is classification. The EU AI Act designates AI used in hiring as “high-risk,” which triggers a battery of obligations: risk management, documentation, human oversight, bias monitoring. The systems in this paper would almost certainly qualify. On paper, the high-risk obligations become enforceable on August 2, 2026. In practice, that date is already wobbling: the Commission’s late-2025 “Digital Omnibus” package has floated pushing the high-risk deadline back, and industry is lobbying hard for the delay. So even Europe’s scaffolding, likely the most advanced in the world, is being softened before it’s load-bearing. The framework exists. The will to switch it on is still contested.

The gap the paper exposes is that even a fully-enforced AI Act doesn’t obviously require the right measurement. A vendor monitoring its own system for bias could do exactly what pymetrics did—pool everything, pass the aggregate test, and miss the per-position discrimination entirely. The lesson for European regulators is specific: high-risk bias audits have to be conducted at the level of the individual position, not the comfortable aggregate, or they’ll certify the same blindness the law was meant to cure.

The US is dismantling the concept

The American picture is starker, because the United States isn’t debating how to measure disparate impact. Rather, it’s debating whether disparate impact should exist as a legal idea at all.

The entire framework this paper relies on—the four-fifths rule, the very concept of a neutral practice that’s illegal because of its disproportionate effect—comes from Title VII of the Civil Rights Act of 1964 and the case law built on it. As of 2025, that framework is under direct attack. A Trump administration executive order, “Restoring Equality of Opportunity and Meritocracy,” explicitly targets disparate-impact liability, directing federal agencies to scale back its use. The theory that lets you challenge a hiring practice based on its results rather than its stated intent is exactly the theory now in the crosshairs.

Consider the timing: These researchers have built the sharpest instrument yet for detecting algorithmic discrimination in hiring, and they’ve released it at the precise moment the U.S. government is working to retire the legal concept that instrument measures. A better thermometer arrives just as the fever is being redefined as not a fever at all. In the U.S., whether this work changes anything may have less to do with the strength of the evidence than with whether the legal category it speaks to survives the decade.

There’s a narrower American failure worth naming too. New York City passed Local Law 144 in 2021, the first U.S. law to directly regulate algorithmic hiring, requiring employers to commission annual bias audits and post the results publicly. A 2024 study by Lucas Wright and colleagues found the law largely toothless in practice. Combing through 391 NYC employers, the researchers found only 18 had posted the required audit reports and 13 the required transparency notices, because the law lets employers decide for themselves whether they’re even in scope. They coined a term for the resulting fog: “null compliance,” a state where you can’t even tell whether a company is breaking the law. Tellingly, nearly every audit that was posted reported a passing score. Everyone cleared the four-fifths bar.

That last detail is the same blindness the pymetrics paper diagnoses, now written into law. New York’s own guidance, the study’s authors note, appears to instruct exactly the aggregation that hides the problem: Merging data across positions and employers into a single, reassuring number. So the first serious American attempt to regulate algorithmic hiring may be quietly mandating the very averaging that launders the discrimination out of view.

Leandro Oliva

Discussion about this post

Ready for more?